E
Modern Epidemiologic Approaches to Interaction: Applications to the Study of Genetic Interactions

Sharon Schwartz, Ph.D.*

INTRODUCTION

Epidemiology attempts to discern the causes of disease through an analysis of the patterns of exposure/disease relationships that are brought into view by our study designs. The types of designs and methods that are developed are largely influenced by the health challenges that the population faces as well as any methodologic and technological constraints.

Current epidemiologic methods were sparked by the rise of chronic diseases that did not fit well within the causal models underlying infectious disease epidemiology. Infectious disease models, based on the Henle-Koch principles, reserved the term “cause” for factors that were both necessary and sufficient for disease occurrence. Although this assumption did not apply strictly to the identified causes of many infectious diseases, this model worked well enough to provide utility over time.

A crisis arose, however, over the study of the relationship between smoking and lung cancer. Although the association between smoking and lung cancer was strong and seemed persuasive, smoking clearly was neither necessary nor sufficient for the development of lung cancer. This led to a paradigmatic crisis that over time resulted in the development of a new framework for the identification of causes, which crystallized as “risk factor epidemiology.” This framework is rooted in the notion that there are

*

Associate Professor of Clinical Epidemiology, Mailman School of Public Health, Columbia University, 722 West 168th Street, Room 720 b New York, NY 10032, sbs5@columbia.edu.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 310
E Modern Epidemiologic Approaches to Interaction: Applications to the Study of Genetic Interactions Sharon Schwartz, Ph.D.* INTRODUCTION Epidemiology attempts to discern the causes of disease through an analysis of the patterns of exposure/disease relationships that are brought into view by our study designs. The types of designs and methods that are developed are largely influenced by the health challenges that the popula- tion faces as well as any methodologic and technological constraints. Current epidemiologic methods were sparked by the rise of chronic diseases that did not fit well within the causal models underlying infectious disease epidemiology. Infectious disease models, based on the Henle-Koch principles, reserved the term “cause” for factors that were both necessary and sufficient for disease occurrence. Although this assumption did not apply strictly to the identified causes of many infectious diseases, this model worked well enough to provide utility over time. A crisis arose, however, over the study of the relationship between smoking and lung cancer. Although the association between smoking and lung cancer was strong and seemed persuasive, smoking clearly was neither necessary nor sufficient for the development of lung cancer. This led to a paradigmatic crisis that over time resulted in the development of a new framework for the identification of causes, which crystallized as “risk fac- tor epidemiology.” This framework is rooted in the notion that there are *Associate Professor of Clinical Epidemiology, Mailman School of Public Health, Colum- bia University, 722 West 168th Street, Room 720 b New York, NY 10032, sbs5@columbia. edu. 310

OCR for page 310
311 APPENDIX E multiple pathways to the same disease and that within each pathway there are multiple causes that work in tandem to lead to the disease. These types of causes are often referred to as “risk factors.” The risk factor framework generally is “egalitarian” in its assumptions about causation; all types of factors that contribute to disease occurrence can be called a cause. There may be some factors that are necessary causes in the sense that the disease never occurs in their absence, but other causes may not be necessary at all. In addition, even necessary causes require the presence of causal partners to lead to disease occurrence. These causal partners also are considered to be causes of the disease. The necessity of a causal partner for disease occurrence is what we mean by “biologic interaction.” Thus, the very definition of a cause in risk factor epidemiology places the issue of interaction front and center. It is assumed that virtually all diseases arise from the interaction of two or more causes. Despite the centrality of interaction to this causal framework, methodologic advances have focused mainly on the isolation of single causes and the identification of individual risk factors that contribute to disease occurrence in a population. New designs were developed to allow us to see the relationships between exposures1 and disease in our data that would provide clues to the identification of these causes. Statistical methods were developed to aid in causal inference. The identification of the causal partners of particular risk factors, the assessment of interaction, was a more complex notion that awaited concep- tual clarification and methodological advances. Considerable progress has been made; however, often a lag occurs between the development of new methods and approaches and their application and appearance in the litera- ture. Thus, the way in which interaction is assessed in epidemiologic studies is only now beginning to reflect these newer methods. What follows is a discussion of this newer way of thinking about how to identify “biologic interaction.” I prefer the use of the term “synergy” in this discussion because it is more neutral to the level of organization at which interaction is being described. Although these methods have devel- oped separately from those in the field of genetics, they are fully applicable to the field, and while genes have characteristics that are distinct from many of the risk factors studied in epidemiology, an epidemiologic approach to causation easily and naturally encompasses genes as causes. However, this application requires a shift in perspective. From a genetic point of view there is a hierarchy of causes, with “the gene” having centrality as the defining cause and all other factors being ancillary to it. Factors that are 1This paper uses the term “exposure” to mean any factor that is being examined to see if it is a cause of disease. The term applies to any factor under consideration—genetic or environmental.

OCR for page 310
312 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT considered equal causes from an epidemiologic frame are sometimes la- beled in genetics in a way that gives them secondary status. One example is the use of the term “phenocopy” to distinguish a case of disease caused in the absence of a putative genetic cause. Another example is the concept of “reduced penetrance.” This term refers to the inexact relationship between a genotype and a phenotype and implies that this slippage is a characteristic of a gene; the gene evidences “reduced penetrance,” or the gene is “fully penetrant.” From an epidemiologic perspective, reduced penetrance is sim- ply a normal characteristic of all causes—the lack of a one-to-one relation- ship between causes and diseases due to interaction. From an epidemiologic perspective, reduced penetrance is not a characteristic of the gene, but rather a characteristic of the distribution of the causal partners with which the gene works to cause disease. It is the natural state of most causal relationships. Thus epidemiologic approaches to interaction provide an exciting per- spective on genetic concepts that may shed new light on genetic issues. Likewise, the integration of genetic thinking into epidemiology can advance methodology. I begin this task with a discussion of why the assessment of interaction is so problematic, and then I will discuss the current epidemiologic resolu- tion to the problem. However, to fully understand the solution and its applicability to a genetic context, we need to probe the concept of causation in epidemiology more fully. Although this may seem a bit off topic, it is central to understanding the elements of the new ways of thinking about synergy. Finally, more specific problems of application and design will be addressed. CURRENT EPIDEMIOLOGIC FRAMEWORK FOR ASSESSING INTERACTION The Problem Because the testing of our hypotheses and the assessment of our data rely on statistical tools, we already are most familiar with the concept of statistical interaction. From a statistical perspective, we can say that there is interaction when in the presence of two factors the outcome occurs more frequently than would be expected based on the independent effects of each factor. By independent effect, we mean the effect of one factor in the absence of the other factor. To make this more concrete, we would say that interaction can be identified when among people with both a genetic vari- ant and an environmental exposure the disease rate is higher than would be expected if the genetic factor and environmental exposure each worked independently.

OCR for page 310
313 APPENDIX E Proportion of Respondents Developing Depression Intimacy Problems Yes No 32% 10% Yes Severe life event or major difficulty 3% 1% No FIGURE E-1 Assessment of interaction: example from Brown and Harris (1978). Although this definition is clear in statistical terms, it begs the question of “what would be expected.” As it turns out, what would be expected depends on the effect measure or statistical model used to express the relationship between exposures and disease. This can be seen in the data from a study in psychiatric epidemiology that proved to be very enlighten- ing in this regard. Brown and Harris (1978) wanted to test the theory that stressful life events and problems with intimacy interacted in causing de- pression. They hypothesized that, while both stressful life events and inti- macy problems each may confer a risk of depression, when they are both present they confer a greater risk than would be expected if each worked through a separate causal pathway. The data derived from a study to test this hypothesis are depicted in Figure E-1. Brown and Harris interpreted these data as supporting their claim for an interaction between intimacy problems and stressful life events. The risk of depression in those with neither stressful life events nor intimacy prob- lems was 1 percent, while among those with only stressful life events was 10 percent, and among those with only intimacy problems was 3 percent. The difference in the risk conferred by stressful life events alone was therefore 9 percent (10 percent − 1 percent), and the risk difference conferred by inti- macy problems alone was 2 percent (3 percent − 1 percent). If there were no interaction, one would expect that when both factors were present the risk conferred would be 11 percent (9 percent + 2 percent). However, the data show that the risk conferred when both were present was 32 percent, which is substantially greater than would be expected based on the independent effects of each risk factor. Brown and Harris therefore concluded that these data supported their theory of an interaction between stressful life events and intimacy problems in causing depression. Tennant and Bebbington (1978) challenged this conclusion. They re- analyzed these data using log linear modeling. This analysis calculated the effects on a different scale by calculating risk ratios. Using this model, life events acting alone increase the risk of depression by a factor of 10 (10

OCR for page 310
314 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT percent = 1 percent * 10). Intimacy problems alone increased the risk by a factor of 3 (3 percent = 1 percent * 3). Therefore, based on this calculus one would expect the co-presence of these risk factors to increase the effect by a factor of 30 (1 percent * 3 * 10) if they were acting independently of each other. This is very close to the 32 percent risk actually found. Thus, Tennant and Bebbington concluded from these same data that there was no support for Brown and Harris’s conclusion. What was not fully appreciated at the time was that both Brown and Harris and Tennant and Bebbington provided absolutely correct interpreta- tions of the data based on the unarticulated statistical assumptions of their approaches. Brown and Harris, using risk differences to express the effects of risk factors, used a model that implicitly assumed that, absent interac- tion, risks add in their effects. They used an additive model. Tennant and Bebbington, on the other hand, analyzed the data using a log linear model that implicitly assumed that absent interaction, risks multiply in their ef- fects. They used a multiplicative model. Thus, based on statistical defini- tions of interaction the same data both did and did not support a theory of interaction. This state of affairs is disconcerting, to say the least. We depend on our data and statistical tools to give us a rough estimate of the state of affairs in the real world, and it is problematic when the answers to our questions differ depending on the statistical model we use to assess our data. To make matters worse, the choice of statistical model often is based on statistical considerations. For example, we usually employ additive models, such as linear regression, when our outcome variables are continuous. When our outcomes are dichotomous, as they frequently are in genetic and epidemio- logic contexts, we employ logistic regression, because such outcomes vio- late the statistical assumptions of linear regression models. Although this choice meets statistical requirements, it shifts us to a multiplicative model. Linear regression assumes that risks add in their effects, and thus interac- tion is indicated by an appreciable deviation from additivity (i.e., sub- or superadditivity). Logistic regression assumes that risks multiply in their effects, and thus interaction is indicated by an appreciable deviation from multiplicativity (i.e., sub- or supermultiplicativity). The problem is that if both risk factors have an effect, there always will be interaction on at least one of these scales. As illustrated in Figure E-2, additivity implies submultiplicativity, and multiplicativity implies superadditivity. Thus, except in instances of supermultiplicativity (in which both models will index positive interaction) and subadditivity (in which both models index negative interaction), the answer to the question of whether or not there is interaction will depend on the statistical model that we choose. This is very unsettling, because we want our statistical models to repre-

OCR for page 310
315 APPENDIX E NO INTERACTION NO INTERACTION PERFECT MULTIPLICATIVITY PERFECT ADDITIVITY RISK INCREMENT E1 = 5 RISK INCREMENT E1 = 5 RISK INCREMENT E2 = 5 RISK INCREMENT E2 = 5 RISK INCREMENT BOTH = 25 RISK INCREMENT BOTH = 10 PERFECT PERFECT ADD. MULT. -10 -5 0 5 10 15 20 25 30 SUPERMULT. SUBADDITIVE SUPERADDITIVE SUBMULTIPLICATIVE Legend: E1 = Exposure 1 E2 = Exposure 2 FIGURE E-2 Relationship between additive and multiplicative interaction. sent our concepts rather than having them define our concepts. So, the question would be one of what model best represents the “true” relation- ship between risk factors. That is, do risk factors really add or multiply in their effects? Darroch (1997), Rothman and Greenland (1998), and others have grappled with this problem. It appears that the additive model with a twist best represents what we mean by interaction. The twist is due to redundancy in causes, as we shall see. To appreciate this argument, and to assess its applicability to the con- text of assessing interactions that include genetic factors, a fuller discussion of the causal model on which this assessment is based is necessary. This casual model—the counterfactual or potential outcomes model—developed in philosophy and statistics (Mackie, 1974; Maldonado and Greenland, 2002; Rubin, 2004; Shadish et al., 2002) underlies much of the causal thinking today in epidemiology and allied fields such as history, sociology, and economics. The solution to the interaction problem derives from the application of this causal model to synergy. The advantage of this approach is obvious. It provides a way to assess what we mean by interaction conceptually and asks what mathematic representations support our concepts, rather than providing a statistical model and then contorting our concepts to fit the

OCR for page 310
316 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT requirements of that model. Whether or not you agree that Darroch’s solu- tion is correct, this approach toward the solution seems reasonable. The Underlying Causal Model The counterfactual or potential outcomes model underlies many cur- rent developments in epidemiologic methods. Although at first blush it sounds intimidating, this way of thinking about causes echoes simple no- tions that we apply in everyday circumstances. Nonetheless, its articulation has many interesting implications for causal thinking and is an immensely useful tool for grappling with difficult design decisions, and, as we shall see, for assessing the relationship between our conceptual and statistical tools. From a counterfactual perspective, a cause is any factor without which the disease event would not have occurred, at least not when it did, given that all other conditions are fixed (Greenland and Robins, 1986; Maldonado and Greenland, 2002; Rothman and Greenland, 1998). Note that this defines causation at the level of the individual, with the definition applying to individual disease events. The counterfactual way of thinking is familiar to all of us when we second guess our actions and think about what would have happened had we taken a different action. We compare what happened to what would have happened had we made a different choice. Similarly, when we try to make a decision about how to act in the future, we often imagine the outcome under alternative sets of actions. We compare what we think would happen under one action with what we think would happen under a different action. We also use this type of thought experiment to conceptually separate co-occurrences that are coincidental from those that are causal. So, for example, if a teakettle whistles and then the doorbell rings, we do not assign causality to the teakettle’s whistle, because we think that the door- bell would have rung even without the teakettle whistling, assuming all else remained the same. This is the essence of causation from a counterfactual perspective. Rothman (1976) has developed a heuristic based on this definition of a cause—referred to as causal pies—that provides a useful framework for understanding the implications of this approach. In this heuristic, the causes of each disease event are depicted by a causal pie (a circle), cut into its constituent pieces. Each piece of the pie represents an exposure that contributes to the occurrence of the disease event. When all of the pieces of the pie are present, the disease occurs. Thus, each pie represents a sufficient cause of disease that is comprised of compo- nent causes each of which are necessary for the completion of this sufficient cause of disease. For example, as depicted in Figure E-3, there are three

OCR for page 310
317 APPENDIX E F E A G B D C Sufficient Cause 1 Sufficient Cause 3 Sufficient Cause 2 FIGURE E-3 Rothman’s causal pies. posited causal pathways to this disease outcome; individuals can get this disease from sufficient causes 1, 2, or 3. For sufficient cause 1 to occur, an individual must be exposed to components A, B, and C. If any one of the components is missing, the pie will not be complete and disease will not occur, at least not through this mechanism. Thus, each component in the pie is a cause according to the counterfactual definition, because given that all else is fixed (i.e., all of the causal partners are in place), if we remove component A, for example, the outcome would not have occurred. Thus, from this perspective, biologic interaction is the relationship be- tween two factors in the same causal pie. In more technical language, biologic interaction occurs when one risk factor allows the other to be expressed in a disease outcome. I prefer to refer to this process as synergy (a term also favored in the epidemiologic literature), because two factors may have causal effects when they influence each other on some level of organi- zation other than the biologic. In the Brown and Harris example above, the interaction between stressful life events and intimacy problems in causing depression might be considered “psychologic interaction.” Of course, these psychological factors need to have biologic consequences to cause disease, but the joint effects occur at the psychological level. The counterfactual perspective and Rothman’s causal pies are neutral to the level of organiza- tion under discussion.2 2The caveat to this is that an antecedent and a mediator cannot be considered simulta- neously, because under that circumstance each component would not be necessary for the pie to form. The pies cannot contain redundant “slices.” There is also an affinity for individual- level variables from the causal pie schema, but it can accommodate levels below and above the individual.

OCR for page 310
318 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT Proportion of exposed people with the disease Causal effect = Proportion of these same people who would have gotten disease without the exposure FIGURE E-4 Causal effect (causal contrast). Thus, for example, A could be a genetic mutation and B one of the environmental factors that stimulates synthesis of a detrimental gene prod- uct, or A and B could be two genes that interact to cause disease. Therefore, several concepts that are distinguished from one another in genetics (e.g., epistasis, gene-environment interaction) would be considered to be the same phenomenon in epidemiology. Note that because all of the components in the same casual pie interact in this way, when we ask about interaction, we always must specify the particular components for which we are assessing interaction. What becomes apparent from this model is that the effect of an expo- sure depends on the presence of its causal partners. Thus, A will have an effect if and only if its causal partners B and C are present. In contexts in which the causal partners are ubiquitous, the exposure will have a huge effect, since the conditions that activate it always will be present. In con- texts in which the causal partners are absent, the exposure will have no effect. In genetics, the classic example used to illustrate this point is phenylketonuria (PKU). The genetic variant that causes PKU has a huge effect in societies in which phenylalanine is a ubiquitous part of the human diet, but a small effect in those in which it is not. Thus, the effect of the “PKU gene” depends on the prevalence of its causal partners. Indeed, inter- vention on the causal partner is the way in which we largely prevent the deleterious effects of this genetic variant. As noted above, causes are defined for the individual who gets the disease, which makes sense because the disease occurs in the body of the individual. However, although we use individuals as the units of our analy- sis, we cannot draw conclusions about the units, but only about the average of the units. Thus, the causal effect (also called the causal contrast) is indexed by the difference between the proportion of exposed people who got the disease at a particular moment in time and the proportion of these same people who would have gotten the disease at that particular moment in time had the exposure not occurred, all things being equal (Mackie, 1974; Rothman and Greenland, 1998), as illustrated in Figure E-4.

OCR for page 310
319 APPENDIX E Estimating Causal Effects from a Counterfactual Perspective It is apparent that although we can observe the amount of disease that exposed people experience, we cannot observe the amount of disease that they would have experienced during that same period had they not been exposed. We cannot see both the “fact” (the exposure and disease state of a person) and the “counterfactual” (the disease state under the condition of nonexposure). The counterfactual is, by definition, counter to the facts and therefore not visible. This is a reiteration of the central problem in disease etiology—that causation is not observable. We can see the co-occurrence of exposures and disease, but causation itself cannot be observed, it can only be inferred. Since we cannot observe the counterfactual state, we select a group of unexposed people as a substitute, or proxy, for the unobservable counter- factual. This substitute gives us the “correct answer” (i.e., represents the true casual effect) to the extent that it is a good proxy. What we mean by a good proxy is that the disease proportion (i.e., disease risk) in this group of unexposed people represents the disease risk the exposed would have had had they not been exposed (i.e., the counterfactual risk). For the unexposed to be a good proxy, the exposed and the unexposed should be equal on all causes of disease other than the exposure of interest. When this occurs, the exposed and unexposed are said to be “exchange- able.” A lack of exchangeability—that is, when the disease risk in the unexposed does not equal that of the exposed had they not been exposed— is what we mean by confounding. When there is confounding, we cannot see whether the exposure had an effect or not. However, assuming ex- changeability, or assuming that the unexposed are a good proxy for the counterfactual, the difference in the disease risk between the exposed and unexposed provides an index of the effect of the exposure. We will discuss this issue of confounding in a bit more detail in order to more fully understand the implications of the counterfactual approach for interaction. This simpler scenario, in which we are attempting to identify the causal effect of a single exposure, will ease the discussion of the applica- tion to the more complex scenario of synergy. Suppose we have a disease such as depression, whose sufficient causes are depicted in Figure E-5. Our hypothesis is that A (perhaps some genetic variant) is a cause of depression. We assume that A has causal partners, which are unidentified but indicated in this model by B. Note that B is simply a stand-in for all of the factors that must be present for A to have an effect. We also assume that there are other pathways to the disease that do not include A. We will note all these other causal pathways by a causal pie with X. X is neither a single exposure nor a single causal pathway. Rather, X is a stand-in for all combinations of exposures that lead to disease that do not include A. Another complication is that it is possible for A to prevent

OCR for page 310
320 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT Absence A B Q X of A Sufficient Cause 1 Sufficient Cause 2 Sufficient Cause 3 FIGURE E-5 Hypothetical example—causes of depression. disease in some situations. If so, this means that some people have a combi- nation of exposures (depicted by Q) that require the absence of A to get the disease. If we consider causation under the counterfactual model, we can imag- ine what would happen to people with different causal partners if they were exposed to the risk factor under investigation—A in this instance. These potential outcomes are depicted in Figure E-6. People exposed to X will get the disease if they are exposed to A or not exposed to A (i.e., under the counterfactual they will get disease as well). The exposure does not cause the disease for these people, since even with- out the exposure they would have gotten it. We label these people Type 1, Doomed.3 The word is a little stronger than the meaning implied. It simply means that during the period under consideration these people will get the disease under study with or without the exposure of interest. Types are also not inherent characteristics of people; rather, they are a categorization of people by the causal partners (i.e., all risk factors other than those under study) to which they have been exposed by the end of the study period. People with B will get the disease if they are exposed but not if they are not exposed (i.e., under the counterfactual they will not get the disease). We call these people Type 2, Causal Types (i.e., the exposure under investiga- tion is causal for them). When we ask the question, “Is A a cause of disease?” what we really want to know is whether there are any Causal Types in the population. People with exposure Q will not get the disease if they are exposed, but under the counterfactual, if they were unexposed, they would get the dis- 3In this paper I will, in general, use terminology from the original sources to allow easy translation when consulting the original texts. Sometimes the terminology is confusing or can be misinterpreted. In those instances, I will try to clarify the terms, but not invent new ones.

OCR for page 310
327 APPENDIX E the proportion of diseased people (i.e., the risk) in this exposure cohort. Among those exposed only to Gene A, the Doomed, A Susceptible Types, and Parallel Types contribute to the risk; among those exposed to B only, the Doomed, B Susceptible, and Parallel Types contribute to the risk; and among those exposed to neither A nor B, only the Doomed contribute to the risk. We can see the risk (the proportion diseased) in each exposure group for which we provide specific labels. R12 is the risk (the proportion dis- eased) among those exposed to both A and B; R1 is the risk for those exposed only to A; R2 the risk for those exposed only to B; and R the baseline risk (i.e., the risk among those exposed to neither A nor B). We can now translate the proportion diseased (the risk) we observe under each exposure category into the underlying types that contribute to the risk in each exposure category. Using basic mathematical tools, we attempt to isolate Synergistic types from the others. The closest we can come is the isolation of the balance between Synergistic and Parallel Types. The proportion of (synergistic – parallel) types in the population = R12 − R1 − R2 + R.4 This is the additive model (R12 − R) − (R1 − R) − (R2 − R)5 that assumes risks add in their effects, with the twist that parallelism makes the relationships somewhat less than additive. Thus, if the risk of disease among those exposed to both factors is more than the sum of the risk differences for each factor alone, there is evidence of Synergistic Types in the population. This is evidence that Gene A and Environmental Factor B work in a synergistic way to cause disease for at least some people. Note, however, that we cannot definitively state what proportion of the disease is due to synergy. We can only say that the proportion of Synergistic Types is greater than the proportion of Parallel Types. In addition, perfect additivity is compatible with either no Synergistic Types in the population or a perfect balance of Synergistic and Parallel Types. Just as in the simple case of identifying single causes, we only identify the average risk—that is, the preponderance of causal over protective effects of an exposure—so too 4If we take the types that contribute to disease among those exposed to both A and B (the first box in Figure E-9), subtract from them those that contribute in the second box, subtract from them those that contribute in the third box, and then add those in the fourth box, we are left with (synergy – parallel). The Synergistic Types appear in only one box, so they cannot be canceled out, and the Parallel Types occur in three boxes, so their cancellation leaves the Parallel Type. All other types cancel out in this formula. 5R – R – R + R = (R – R) – (R – R) – (R – R). In the absence of synergy, the risk 12 1 2 12 1 2 difference for those with both factors (R12 – R) will simply equal the risk difference for factor A (R1 – R) + the risk difference for factor B (R2 – R). Thus in the absence of synergy and parallelism, or a balance of synergy and parallelism, (R12 – R) – (R1 – R) – (R2 – R) = 0.

OCR for page 310
328 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT in the face of parallelism, we cannot rule out synergy if we find less than superadditivity, but we do find support if there is superadditivity. It is important to note the constraints on this conclusion. First, this analysis makes all of the usual assumptions that apply in the way we currently conduct research; it assumes such things as independence of out- comes between units and no feedback loops. Second, it makes the impor- tant assumption that the exposures under consideration express either syn- ergy or antagonism, but not both; it is assumed that a risk factor has only a casual effect or a preventive effect, but not both. How realistic this assump- tion is depends on the exposures under consideration. In psychology, this assumption is often unrealistic. For example, there may be parenting prac- tices (such as strict discipline) that would be beneficial for children with one type of temperament, but detrimental for children with another. In genetics, the “norm of reaction,” where a genetic factor has positive or negative effects depending on the context (Levins and Lewontin, 1985), could vio- late this assumption. However, this is simply a recognition that under these circumstances there are too many unknowns for any of our traditional mathematical models to handle. These caveats notwithstanding, Darroch’s argument begins with the conceptual model and then brings us to the mathematical model that represents synergy most closely. Applications in Practice The conclusion drawn from these analyses is that synergy is indexed by deviations from additivity. In practice then, how do we estimate synergy using this approach? One method is to calculate an “interaction contrast” (Rothman and Greenland, 1998). To illustrate how this is done, I will use an example based on the interaction between a serotonin transporter gene polymorphism and life stress in causing depression, as reported from the Dunedin birth cohort (Caspi et al., 2003). The hypothesis was that there is a synergistic relationship between a short “s” allele and multiple stressful life events in causing depression. As illustrated in Figure E-10, the disease prevalence among those with neither the susceptible genotype nor life events was 10 percent; among those with only the susceptible genotype, 10 percent; among those with only life events, 17 percent; and among those with both life events and the susceptible genotype, 33 percent. In this instance the interaction contrast would be .33 − .17 − .10 + .10 = .16. The interaction contrast thus equals the risk among those with both factors (.33), minus the risk among those with one (.17), minus the risk among those with the other (.10), plus the baseline risk (.10). Since the interaction contrast here is greater than zero (.16), it indicates the presence of synergy in this population. In this example, the risks required for the computation were directly

OCR for page 310
329 APPENDIX E Percentage of Individuals in Each Category Meeting Criteria for Depression “ S ” Genotype Yes No 33% 17% Yes 4+ Life events No 10% 10% FIGURE E-10 Estimation of the interaction contrast. provided by the report. However, in a cohort study we can compute the interaction contrast, regardless of the form in which the results are analyzed and presented. Suppose we analyzed the data under a logistic regression model. The baseline odds of disease would be derived from the intercept. The odds ratios from the logistic regression then would be used to obtain the odds of disease under the other conditions. Finally, the odds would be converted to risks (odds = p/1 − p). When we cannot estimate the baseline risk of disease, as in a case- control study, we can calculate an interaction contrast ratio using the odds ratios computed from a logistic regression analysis. The interaction con- trast ratio is the odds ratio for those with both factors, minus the odds ratio for those with one factor, minus the odds ratio for those with the other factor, plus one. For illustration, I computed the odds ratios for the Dunedin study from the prevalence estimates given in Figure E-10. The baseline odds of depression among those with neither the “s” allele nor stressful life events are .11 (.10/1.10). The odds for those with both factors are .49 (.33/ 1.33); for those with only the “s” allele they are .11 (.10/1.10), and for those with only life event they are .20 (.17/1.17). Therefore the odds ratios would be 4.4 for those with both factors, 1.8 for life events alone, and 1 for the “s” allele alone. The interaction contrast ratio in this context would be 4.4 − 1.8 − 1 + 1 = 2.6. Since the interaction contrast ratio is greater than 0, this indicates the presence of Synergistic Types in the population. Several methods have been developed to calculate p values and confidence intervals around these estimates (see, e.g., Assmann et al., 1996; Hosmer and Lemeshow, 1992; Rothman and Greenland, 1998). Although this is the understanding of synergy that is accepted in the methodologic literature, it has begun to filter down into actual research articles only recently (e.g., Li et al., 2005; Olshan et al., 2001; Rauscher et al., 2003; Shen et al., 2005). It is interesting to note that many of these articles assess gene-environment interactions. However, this model of as-

OCR for page 310
330 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT sessing synergy is applicable to genetic interactions only to the extent that the underlying counterfactual causal model is applicable. APPLICABILITY TO THE STUDY OF GENETIC INTERACTIONS Applicability of a Counterfactual Approach to a Genetic Context The counterfactual approach requires a thought experiment in which we hold everything constant and manipulate the exposure to see what the outcome would be under this new condition. The causal contrast—the index of the true effect of the exposure—is the difference between what was, given the exposure, and what would have been had the exposure been altered but everything else remained constant. Because this thought experi- ment requires the consideration of an alteration in the exposure and noth- ing else, the applicability of a counterfactual approach to nonmanipulable exposures has been questioned (e.g., Kaufman and Cooper, 1999). Since, currently, genes are not easily manipulable, this might open the question of the applicability of this approach to the consideration of genetic effects. In a similar vein, some have argued that personal characteristics, such as age, gender, ethnicity, and social class, should not be considered as causes be- cause they are not manipulable. However, others (Shadish et al., 2002; Susser and Schwartz, 2005) argue that the counterfactual can apply to nonmanipulable causes, although their detection is more difficult. Nonmanipulable causes cannot be randomly as- signed to rule out the many potential sources of nonexchangeability between the exposed and unexposed group that cause confounding. Nonetheless, at the least, one can conduct the thought experiment and search for, or design, studies that approximate the thought experiment as closely as possible. In addition, what is nonmanipulable today may, in the future, become manipulable. The use of animal “knock-out models” clearly indicates the possibility of genetic manipulation and, with increasing knowledge, even when the gene itself is not manipulable the active ingredients of the gene vis-à-vis the disease, the gene product, may be manipulable. In the final analysis, it seems that in genetic studies in which people are compared who do and do not have a particular gene variant, or who do or do not have a proxy for a genetic predisposition (e.g., family history), the comparison only makes sense if there is some underlying notion of a causal contrast underlying it. The association may not reflect causation due to the nonexchangeability of the exposed and unexposed, but the logic of the methods assumes that barring such methodological problems, the contrast would imply a causal contrast. Otherwise, why do we use such methods to try to detect causes? The counterfactual approach is merely the clear articu-

OCR for page 310
331 APPENDIX E lation of the framework that supports the logic that underlies all of our study designs. Primacy of the Genetic Effect The egalitarian assumptions regarding causation constitute another pos- sible objection to the application of this approach to a genetic context. As discussed above, from a counterfactual perspective, genes, behaviors, and the external environment share equally in the appellation “cause.” There is no hierarchy of enabling factors and triggers versus the “real cause.” This view is in contrast to genetic approaches that see the gene as the central actor, with all other “causes” playing a supporting role. However, this should not be a significant impediment to the application of epidemiologic approaches to interaction. There are many possible approaches to its resolution. First, one can impose a hierarchy on this approach by declaring a genetic factor to be a necessary cause of the outcome and by defining the phenotype based on the genetic component. As discussed above, there is nothing in this approach that precludes a cause that is found in every causal pie (i.e., in every causal pathway to disease). Of course, if the genetic factor is known to be a neces- sary cause, the detection of interaction is simplified. In such an instance, one would look for the main effects of a hypothesized causal partner among those with the genetic factor. But even if the genetic factor is not necessary, one could give it prominence by referring to the causal pies that do not contain the genetic effects as phenocopies. Similarly, in discussing the interaction between a genetic and an environmental cause, one can refer to the environ- mental factor as triggering a genetic effect. These interpretational preferences would not be inconsistent with a counterfactual approach. On the other hand, the counterfactual approach also suggests that there may be benefits, under some circumstances, to dismantling the hierarchy. That is, one can often describe a gene-environment interaction equally well as the interaction between an environmental factor and a genetic vulner- ability that allows the environmental factor to be expressed or as an inter- action between a genetic factor and an environmental context that allows the gene to be expressed. The counterfactual approach points out the sym- metry of interaction. Application to Study Designs Used to Detect Gene-Environment Interactions Many of the study designs used to detect gene-environment interactions are indistinguishable from those used to detect interactions between envi- ronmental or other nongenetic factors. Cohort studies and case-control

OCR for page 310
332 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT studies and their variants are prominent designs in general and genetic epidemiology (Hunter, 2005). Thus, the statistical models used to analyze the data—linear regression, logistic regression, Cox proportional hazards models, and Poisson regression—are used in both fields. The problems and arguments discussed above therefore apply directly. There are other study designs, however, that have been developed spe- cifically for the assessment of genetic exposures—for example, familial aggregation studies, twin studies, and the case-only design. The problem of the model dependence of interaction applies to these situations as well. In each instance, the data are analyzed using a model that makes some as- sumption about how independent effects influence risk and therefore about how interaction is indicated. Even case-only studies, which assess gene- environment interactions without the use of controls, make such an as- sumption. This design is predicated on a multiplicative model. Thus, case- only studies are also conservative if we think that synergy is best indicated by deviations from additivity (Gatto et al., 2004). Twin studies are perhaps the most problematic for assessing interaction, since the genetic and envi- ronmental factors are not measured. Their effects are derived from the pattern of results, which often have to assume the absence of interaction to be interpretable. To the best of my knowledge, the basic problem of the model dependence of measures of synergy is not solved by the use of spe- cific genetic designs. THE MESSINESS OF REAL-WORLD APPLICATIONS One of the advantages of the counterfactual approach is that it illumi- nates a central problem of causal inference: it is uncertain. Causality is an unobservable construct that leaves footprints in the real world that are open to misinterpretation. It is important to note that the counterfactual approach does not cause these problems, but rather articulates them and thus forces us to confront them. But forewarned is forearmed. Once we recognize the reality of the uncertainty and subjectivity of causal inference, we can think about the factors that exacerbate and mitigate these uncer- tainties and design our studies and analyses accordingly. This approach also should warn us against demanding more of our data than they can provide and against interpreting our data beyond their inherent limitations. The assessment of synergy is no exception. Our data can provide us with evidence that is consistent or inconsistent with synergy, but they can never provide definitive evidence for or against it. Each study has its own strengths and weaknesses. The most productive approach is to consider all of the extant evidence, consider our uncertainties about the data, and then design new studies that confront those uncertainties directly. No one study

OCR for page 310
333 APPENDIX E will provide us with an answer, but carefully designed studies that directly confront alternative hypotheses will move us toward greater clarity. All of the threats to validity that apply to the detection of single causes apply to the detection of synergy. Some become even more salient. I will, therefore, only briefly touch on some of the issues that were specifically raised in the mandate for this paper. Power Power, the ability to detect an association of a designated magnitude when it exists in the population, is a problem in all studies, but it is one that is particularly problematic for detecting synergy. Power is based on three factors: how well variables are measured, how large the true effect is in the population, and the sample size. It follows, therefore, that we can increase power by measuring our variables well, looking for effects that are large (or looking for them where they are large), and conducting studies with suffi- cient numbers of people. The genomic revolution should improve power because the genetic effect is more clearly and closely measured. When family history of a dis- ease, for example, is used as a proxy measure for a genetic effect, the bias toward that null that derives from measurement error is enormous. A true genetic effect of 50 can look like a genetic effect of 2 or less, depending on the prevalence of the outcome and other factors (Zimmerman, 2003). Thus, measuring actual genetic markers decreases measurement error and in- creases power. In detecting gene-environment interactions, accurate measurement of the environmental factor is equally important. The more clearly articulated the hypothesis, the more carefully the measures can be chosen, and the more power there will be to detect an effect. Vague theories about gene- environment interactions will be more likely to lead to poor construction of measurable variables and therefore decreased ability to detect synergistic effects. However, it should be noted that measurement error also can mas- querade as interaction as well as mask it. Thus, false positive as well as false negative results can be produced by poor measurement. Power also can be enhanced by looking for situations or populations in which the interaction is strong. As discussed in earlier sections of this paper, the effect of an exposure depends not only on its biologic effects, but on the prevalence of its causal partners and the number of sufficient causes in which it is not a partner. Therefore, the same biologic effect will be easier to detect in situations in which the other sufficient causes are rare and the causal partners are common. Along these lines, one suggestion for enhanc- ing power regarding main effects is to look for the effect of an exposure in

OCR for page 310
334 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT a group in which the outcome is rare (Rothman and Poole, 1988). This would enhance power because the base rate of disease in the unexposed group would be low. Similarly, to detect specific gene-environment interac- tions for a particular outcome, looking for populations in which the out- come is less common may help. In these situations, the same biologic effects will produce a larger risk ratio. Power also should be a consideration in the choice of study design. For the same number of people, case-control studies will in general provide more power when the outcome is rare, and cohort studies will provide more power when the exposures are rare. But whatever the choice, sample sizes need to be sufficient. Articulating hypotheses in advance has the added advantage of providing the basis for more accurate power estimates. However, methods for estimating power are less developed for synergy than they are for main effects, although some work has been done on proper power analyses for both additive and multi- plicative interaction (e.g., De Gonzalez and Cox, 2005; Greenland, 1983). What is clear is that the detection of interaction requires considerably larger sample sizes than the detection of the exposures’ main effects. Multiple Comparisons Multiple comparisons may be a particular problem regarding interaction because researchers are less likely to hypothesize them in advance. This raises the concern that we will increase the number of Type I errors in our studies; we will frequently reject the null in error. Some have suggested an adjustment to our alpha levels (e.g., Bonferroni adjustments) to take multiple compari- sons into account. To fully address this issue requires a detailed discussion of the meaning of p values and confidence intervals, which is beyond the scope of this paper. I will, however, touch on some issues to consider. The use of adjustments to the alpha level to correct for multiple com- parisons reifies the p value and potentially contributes to a misuse of null hypothesis testing. Null hypothesis testing tells us the probability of our data if the null is true. What we really want to know is the probability that the null is true, given our data. Unfortunately, these two probabilities are not the same. There is a tendency, however, to treat significant results as though they told us the latter rather than the former. In addition, p values do not strictly apply in the context of observational studies because the statistical premises on which they are constructed are often violated in nonexperimental settings. For both reasons, the use of confidence intervals rather than p values is preferred. Confidence intervals provide a rough estimate of the precision of our data. Wide confidence intervals tell us that our data do not provide much information about the effect. Narrow confi- dence intervals suggest that our data are more precise. Of course, there may

OCR for page 310
335 APPENDIX E be confounding and other biases reflected in our estimates, but the associa- tion is more trustworthy. The use of confidence intervals, with a statement of the number of comparisons made, provides more information for the reader to decide how seriously to take the results of a study. But nothing solves the problem of multiple comparisons. Data that are consistent with well-formulated hypotheses that are developed in advance of the study provide better evidence than data that result from studies for which the hypotheses are developed after the fact. Population Stratification From an epidemiologic perspective, population stratification is simply confounding; the exposed and unexposed may differ for reasons other than the exposure under study. One advantage of genetic epidemiology is that the confounders of genetic associations are limited, and the more carefully specified and measured the genetic factor, the more limited the potential sources of nonexchangeability. For example, if you measure a genetic effect by a family history of the outcome, the exposed and unexposed may differ on a large number of factors other than the exposure of interest, which is a genetic effect. However, if you measure the exposure as a particular genetic variant or marker, it becomes less likely that there will be nonexchange- ability of the exposed and unexposed on other causes of disease beyond what would occur by chance. Population stratification is simply confounding that arises because the groups with unequal distributions of a particular genetic variant also have unequal distributions of other risk factors for the disease. The problem is exacerbated in case-control studies for which the selection of cases and controls can create population stratification even when it does not exist in the naturally occurring populations that gave rise to the cases, as is true with most problems of confounding. In a cohort study, population stratifi- cation would be more easily detected and controlled. To the extent that population stratification is a problem in studies of single exposures, it will be a problem in studies of synergy. Just as single studies require the exposed and unexposed cohorts to be exchangeable regarding all causes of the disease other than the exposure of interest, so too the assessment of synergy requires that all four exposure cohorts (exposed to both factors, each of the two alone, and neither) be exchangeable regard- ing all causes of disease other than the two under investigation. CONCLUSION Epidemiologic approaches to biologic interaction have benefited from a full articulation of the underlying causal assumptions of risk factor epide-

OCR for page 310
336 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT miology. The counterfactual or potential outcomes approach clarifies methodologic principles and provides a guide for methodologic choices. This starting point suggests that risks add in their effects. Therefore, syn- ergy is best indicated by deviations from an additive rather than a multipli- cative model, with a twist. I think that this model applies as well to genetic causes as it does to the environmental and behavioral causes that are more frequently examined in traditional epidemiologic contexts. It has the added advantage of clarifying and unifying other genetic constructs, providing a basis for understanding confounding in general and population stratifica- tion in particular, and providing a bridge between genetic and risk factor epidemiology. Although this approach has limitations, the transparency of its conceptual basis makes the limitations transparent as well. It brings the limitations inherent in all forms of causal inference to light, making them more amenable to amelioration. ACKNOWLEDGMENTS Ann Madsen reviewed and provided helpful comments on this paper. Many of the ideas and examples in this paper derive from Schwartz and Susser (in press) “Causal Explanation Within a Risk Factor Framework,” Chapter 35, in: Susser, Schwartz, Morabia, and Bromet, Psychiatric Epide- miology: the Search for Causes of Mental Disorders, Oxford University Press. REFERENCES Assmann SF, Hosmer DW, Lemeshow S, Mundt KA (1996). Confidence intervals for mea- sures of interaction. Epidemiology 7:286-290. Brown GW, Harris T (1978). Social origins of depression: a reply. Psychological Medicine 8:577-588. Caspi A, Sugden K, Moffitt TE, et al. (2003). Influence of life stress on depression: modera- tion by a polymorphism in the 5-HTT gene. Science 301:386-389. Darroch J (1997). Biologic synergism and parallelism. American Journal of Epidemiology 145:661-668. De Gonzalez AB, Cox DR (2005). Additive and multiplicative models for the joint effect of two risk factors. Biostatistics 6:1-9. Gatto NM, Campbell UB, Rundle AG, Ahsan H (2004). Further development of the case-only design for assessing gene-environment interaction: evaluation of and adjustment for bias. International Journal of Epidemiology 33(5):1014-1024. Greenland S (1983). Test for interaction in epidemiologic studies: a review and a study of power. Statistics in Medicine 2:243-251. Greenland S, Robins JM (1986). Identifiability, exchangeability, and epidemiological con- founding. International Journal of Epidemiology 15:413-419. Hosmer DW, Lemeshow S (1992). Confidence interval estimation of interaction. Epidemiol- ogy 3:452-456.

OCR for page 310
337 APPENDIX E Hunter DJ (2005). Gene-environment interactions in human diseases. Nature Reviews 6:287- 298. Kaufman JS, Cooper RS (1999). Seeking causal explanations in social epidemiology. Ameri- can Journal of Epidemiology 150:113-120. Levins R, Lewontin R (1985). The Dialectical Biologist. Cambridge, MA: Harvard University Press. Li Y, Millikan RC, Bell DA, Cui L, Tse CJ, Newman B, Conway K (2005). Polychlorinated biphenyls, cytochrome P450 1A1 (CYP1A1) polymorphisms, and breast cancer risk among African American women and white women in North Carolina: a population- based case-control study. Breast Cancer Research 7:R12-R18. Mackie JL (1974). Cement of the Universe. Oxford, England: Clarendon Press. Maldonado G, Greenland S (2002). Estimating causal effects. International Journal of Epide- miology 31:422-429. Olshan AF, Weissler MC, Watson MA, Bell DA (2001). Risk of head and neck cancer and the alcohol dehydrogenase 3 genotype. Carcinogenesis 22:57-61. Rauscher G, Sandler DP, Poole C, Pankow J, Shore D, Bloomfield CD, Olshan AF (2003). Is family history of breast cancer a marker of susceptibility to exposures in the incidence of de novo adult acute leukemia? Cancer Epidemiology, Biomarkers and Prevention 12:289-294. Rothman KJ (1976). Reviews and commentary: causes. American Journal of Epidemiology 104:587-592. Rothman KJ, Greenland S (1998). Modern Epidemiology, 2nd ed. Philadelphia, PA: Lippincott-Raven. Rothman KJ, Poole C (1988). A strengthening programme for weak associations. Interna- tional Journal of Epidemiology 17:955-959. Rubin DB (2004). Direct and indirect causal effects via potential outcomes. Scandinavian Journal of Statistics 31:161-170. Shadish WR, Cook TD, Campbell DT (2002). Experimental and Quasi-Experimental De- signs for Generalized Causal Inference. Boston, MA: Houghton Mifflin. Shen J, Gammon MD, Terry MB, Wang L, Wang Q, Zhang F, Teitelbaum SL, Eng SM, Sagive SK, Gaudet MM, Neugut AI, Santella RM (2005). Polymorphisms in XRCC1 modify the association between polycyclic aromatic hydrocarbon-DNA adducts, ciga- rette smoking, dietary antioxidants, and breast cancer risk. Cancer Epidemiology, Biomarkers and Prevention 14:336-342. Susser E, Schwartz S (2005). Are social causes so different from all other causes? A comment on Sander Greenland. Emerging Themes in Epidemiology 2:4. Tennant C, Bebbington P (1978). The social causation of depression: a critique of the work of Brown and his colleagues. Psychological Medicine 8:565-575. Zimmerman R (2003). Familial aggregation study designs: causes of discrepancies in case- control and reconstructed cohort effect estimates. Ph.D. Dissertation: Columbia Univer- sity, AAT 3071403.