Page 198
Appendix C
Human Study Designs
Broadly, epidemiological studies can be categorized into three types: descriptive studies that focus on the occurrence of disease or health-related states in specific populations or representative samples, analytical studies designed to assess associations or test hypotheses about risk factors or exposures and health outcomes, and experimental trials in which investigators randomly assign exposures to treatment groups. Table C-1 lists the available designs by type of study.
Descriptive |
Analytical |
Experimental |
Case series |
Case-control |
Community |
Cross-sectional |
Cohort |
Randomized Clinical |
Ecological |
|
Descriptive studies do not formally test hypotheses; rather, they generate hypotheses based on evaluation of research questions. As such, descriptive designs cannot assess causality. One common descriptive design that offers considerable information on selected outcomes, such as birth defects, is the case series design. As the name suggests, this design encompasses a series of cases with the same outcome. There is no comparison group. This type of study can raise
Page 199
suspicion of an association and, in fact, has been instrumental in identifying certain adverse effects (e.g., diethylstilbestrol and vaginal adenocarcinoma). In fact, much of the available data on adverse outcomes and pharmaceutical compounds comes from case series designs. Cross-sectional studies measure exposures and outcomes at the same point in time. Correlational or ecological studies attempt to correlate an exposure with an outcome. Individual case studies or case series also are used. Most descriptive studies compare disease or health-related endpoints in relationship to a specific exposure or risk factor. Because comparison groups vary with regard to other factors associated with the exposure, further assessment of associations is needed, and causality cannot be determined.
Analytical studies include cohort (prospective and retrospective) and case-control (retrospective) types. Several hybrid designs exist as well, such as those that use retrospective cohorts. Control studies might be matched or unmatched in the design phase; matched-cohort studies are relatively rare, despite offering improved efficiency over other designs (K.J. Rothman and Greenland 1998). The major distinction between cohort and case-control designs is that cohort studies begin with the exposure and follow individuals to ascertain incident or new cases of disease. In this regard, the investigator has confidence in the temporal ordering between exposure and outcome. Case-control studies, on the other hand, start with disease status and retrospectively ascertain exposure, and may be subject to biases associated with the correction of data.
Experimental designs include randomized clinical (or community) trials and are considered the most scientifically desirable design available to epidemiologists. These designs ensure the temporal ordering between an exposure and outcome and minimize confounding via the randomization process by maximizing the internal validity of findings; external validity may be limited. Such designs have limited applicability to environmental and occupational epidemiology, given that exposures typically cannot be randomly assigned. Few “natural” experiments occur in which a particular subgroup of the population is exposed while others are not, and exposure is not randomized in such instances.
Scientifically sound epidemiological studies adhere to the essential elements of the epidemiological method:
Page 200
1. Formulation of a well-defined research question or study hypothesis suitable for testing.
2. Description of the referent population or representative (probability) sample;
3. use of standardized methodology for data collection (exposure, outcome, effect modifiers, confounders).
4. Application of a well-described and appropriate analytical plan.
5. careful interpretation of the data using an established paradigm for assessing causality.
After carefully weighing the choice of study design, the existing literature should be used to ground the hypothesis within a theoretical framework to enhance biological plausibility in interpreting the results. Characterization and selection of the study population or representative (probability) sample is extremely important and requires careful consideration. Random-sampling techniques should be used to ensure that each individual in the referent population has an equal chance of being selected. That approach minimizes bias and thereby enhances the validity of findings. Careful attention must be given to the inclusion and exclusion criteria that can render a sample too restrictive, resulting in limited external validity (generalizability). The effect of occupational exposures on reproductive and developmental outcomes is of added interest in that employed females might include a higher proportion of sub- or infertile women than is found in the general population. If fertile women leave the work force for child-bearing, bias can be introduced into the study, resulting in the “infertile worker effect” (Joffe 1985).
Use of a standardized methodology for data collection is critical for collecting information for all study subjects, regardless of exposure or disease status. Information must be collected on exposure, outcome, and effect modifiers or confounders. There are several methods available for ascertaining information on exposures and outcomes, such as self-reported data obtained in personal or telephone interviews, self-administered questionnaires, diaries, observation, existing records, actual physical measurements, and collection of biological specimens (Armstrong et al. 1995).
National and state registries provide another source of data that can be used for epidemiological studies that assess the effects of partic-
Page 201
ular exposures. All states maintain records in fetal death and live birth and death registries, and all are population based. Some states have cancer or birth defects registries. With the exception of live birth and death registries, states vary in their mechanisms (active versus passive) and requirements for surveillance. There also is variability in which developmental defects are ascertained and how they are classified. Hence, registry data are not all comparable. Most United States registries do not have exposure data readily available for analysis of outcomes, so this information must be collected retrospectively or surrogate information, such as parental occupation or residence location, must be used. For European countries with centralized health care systems, some prospectively collected exposure data can be linked to other registries, such as birth defect or live birth registries. The increasing frequency of pregnancy termination when prenatal diagnosis detects fetal anomalies could underestimate the accuracy of registry data. Hence, the birth prevalence of malformations may be estimated, but the incidence remains unknown.
Pregnancy registries established for postmarketing surveillance of pharmaceutical substances could offer some information for assessing reproductive and developmental toxicity. This data source could have limited validity, however, because only a proportion of affected or exposed women are included in such registries. The representativeness of registry data will be determined in part by the prevalence of the exposure across the population at risk, the voluntary nature of the registry, the type of sponsor (industry, government, university), and reporting and surveillance mechanisms. Similar concerns could affect data obtained by following-up on women who contact teratogen information services because of concerns about possible exposures.
Registries could offer some preliminary information about the distribution and determinants of a few reproductive and developmental outcomes (fetal death, live births, birth defects), but often additional information on exposures and the precise nature of the adverse outcomes will need to be collected. Registry data are simply not available for most fecundity-related outcomes indicative of male and female reproductive health (conception delay, early pregnancy loss).
Exposure data varies in quality across epidemiological studies, especially those concerned with environmental exposures. Many earlier epidemiological studies relied exclusively on self-reported or proxy exposures (e.g., residence). For example, a study of the effect of
Page 202
air pollution on respiratory health might have compared ambient-air concentrations for pollutants by rate of respiratory disease. Without individual measurements, it is hard to know who was truly exposed (or not exposed). Of late, there is a growing trend to collect biological specimens suitable for estimating exposure. If there are inadequate resources for measuring exposures for all study participants, epidemiologists often will stratify subjects by estimated exposure and randomly select subjects for more detailed study of exposure status (with biological specimens). Simulation techniques can be used to evaluate how well the associations from exposure biomarker studies are upheld as theoretical sample sizes are increased.
Other important concerns with respect to exposure assessment include ensuring the temporal ordering of the exposure-to-outcome relationship and assessment of dose-response effects. If effects are interpreted as causal, then the temporal ordering of exposure to outcome must be established. A spectrum of reproductive and developmental outcomes might be possible, depending on critical windows of development. For example, a fetus exposed to thalidomide in the first trimester is at increased risk for phocomelia; the same exposure in the last trimester does not increase that risk. Although evidence of a dose-response relationship is important for assessing causality, often such relationships are lacking. Moreover, aninverse dose-response relationship might result in a high early pregnancy loss rate (or sterility) for those most heavily exposed and, thereby, appear to have a protective effect on risk of malformations or other adverse pregnancy outcomes. Consideration of potential fertility bias (Weinberg et al. 1994) is needed. Continuous quality control ensures the validity and reliability of data, and often a proportion of individuals are selected for a formal study of validity (e.g., confirmation with another data source, such as medical records or biomarkers) and reliability (i.e., individuals are queried at different points in time about exposures).
The fourth element of the epidemiological method includes a well-developed analytical plan that might be modified over the course of study. The plan must be appropriate for the study design and hypothesis under study, type of data collected and scale of measurement, completeness of data (e.g., percentage of missing data), distributions of variables, appropriateness of assumptions that underlie statistical techniques for the data set, consideration of potential effect modification or confounding, and statistical significance testing for sample data.
Page 203
If multiple comparisons are made, it might be necessary to adjust for them (e.g., by Bonfierroni procedures) to ensure the validity of the findings.
The last step in the epidemiological method is the careful interpretation of findings. All alternative explanations (chance, random error, bias) must be considered carefully and eliminated in assessing causality. Negative findings should receive the same careful consideration as positive findings. To that end, a priori power estimates are extremely useful for determining the statistical power of the study and for assessing Type I and II errors.
Interpretation of findings requires evaluation of bias (systematic distortion), random error (noise), confounding (distortion produced by a third factor associated with both exposure and outcome), synergism (interaction of two or more causal factors to produce effects greater than the sum of individual effects) and effect modification (direction and strength of an association depending on a third variable) (Jekel et al. 1996). Bias is a major threat to validity that weaken or distort a true relationship between an exposure and disease or even produce a spurious one. Common sources of bias include the selection of study participants (selection bias), sources of information (information bias), or misclassification of subjects either by disease or by exposure status (misclassification bias). If subjects are randomly misclassified, effects will be underestimated (biased toward null). That might result in erroneous and negative findings. However, nonrandom or differential misclassification bias might produce effects that are either over- or underestimated. There are few statistical techniques for addressing bias (e.g., covariance adjustment and causal modeling); however, minimizing bias in the design phase is preferable to posthoc statistical adjustments.
Random error can over- or underestimate risk and is generally not as severe as bias. Moreover, the magnitude of error can be estimated with statistical techniques. Assessment of confounding, synergism, or effect modification can be accomplished in the analytical phase (by stratification or multivariate modeling), providing sufficient data have been collected on those factors. Restriction or randomization procedures also can be used in the design phase to minimize confounders.
Causality can be considered in analytical or experimental epidemiological studies. That involves assessing the statistical association, the temporal relationship between exposure and outcome, and the elimina-
Page 204
tion of other potential explanations such as chance or bias (Jekel et al. 1996). There are several different criteria for assessing causality; Hill's criteria are cited often (Hill 1965). The existing paradigms for assessing causality have been discussed by Weed (1995) and are illustrated in Table C-2. Moreover, authors vary in degree to which they use criteria for assessing causality (Weed 1997). Despite explicit criteria for evaluating causality, scientists might vary in their use or interpretation of the criteria. Scientists must consider formalized strategies for weighing scientific evidence to assist in the interpretation of available information (Weed 1997).
Page 205
Lilienfeld (1959) |
Sartwell (1960) |
Surgeon General (1964); Susser (1973) |
Hill (1965) |
MacMahon and Pugh (1970) |
Consistency |
Replication |
Consistency |
Consistency |
Strength of association (including magnitude of association and dose response) |
Magnitude of effect |
Strength of association |
Strength of association (including magnitude of effect of dose-response) |
Strength of association |
|
Dose-response |
Dose-response |
Biological gradiant |
Temporality |
|
Experimentation |
Temporality |
Temporality |
Temporality |
Experimentation |
Biological mechanism |
Biological reasonableness |
Biological coherence |
Experimental evidence |
Consonance with existing knowledge |
Specificity |
Biological plausibility |
Biological mechanism |
||
Biological coherence |
Consistency |
|||
Specificity |
Exclusion of alternative explanations |
|||
Analogy |