Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 25
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues 2 Methodological Considerations in Evaluating the Epidemiologic Literature on Cancer and Exposure to Trichloroethylene There are numerous epidemiologic investigations available on cancer outcomes and exposure to trichloroethylene. How to consider the findings of multiple studies that differ in design, quality, and outcome has been identified as one of the critical aspects of conducting a hazard characterization of trichloroethylene. In this chapter, the committee provides generic guidance on evaluating epidemiologic studies on trichloroethylene, including guidance on identifying relevant epidemiologic studies, evaluating their strengths and weakness, and qualitative methods for evaluating the data (e.g., the Hill  guidelines on assessing causality). Quantitative methods for combining and summarizing epidemiologic data (i.e., meta-analytical approaches) are discussed, and a review is provided of two available analyses that used such quantitative approaches to evaluate the data. The chapter provides targeted recommendations for how those quantitative assessments can be improved upon in a new meta-analysis. An example of how this chapter’s guidance should be applied is provided in the committee’s assessment of the epidemiologic literature on kidney cancer presented in Chapter 3, and should also be applied to other outcomes (see Chapters 3-8). An important area of future review will be lymphoid cancers, particularly non-Hodgkin’s lymphoma and childhood leukemia, which were topics the committee was unable to address during the course of its study. HEALTH OUTCOMES Epidemiologic studies of etiology are used to answer questions about whether antecedent exposures in populations increase the risk of develop-
OCR for page 26
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues ing specific health outcomes. A variety of health outcomes associated with trichloroethylene is discussed in Chapters 3 to 8. At least three levels of health outcomes should be considered in assessing the human health risks associated with exposure to trichloroethylene: biomarkers of effects and susceptibility, morbidity, and mortality. Few known susceptibility biomarkers specific to trichloroethylene have been assessed in humans. In the case of liver toxicity (see Chapter 4), incipient effects on the liver could be measured by changes in liver enzymes in the serum, although significant toxicity would have to be present for these measurements to be useful. Assessment of immune function may have a place in assessing adverse effects of trichloroethylene (see Chapter 8), but this outcome is nonspecific (Iavicoli et al. 2005). Human studies on proteinuria and other early markers of kidney toxicity are important (see Chapter 3). However, none of these potential biomarkers is specific to trichloroethylene. High occupational or accidental exposure to trichloroethylene can produce toxicity, in particular, liver and central nervous system effects. The public-health review process focuses on more subtle effects resulting from exposures to lower concentrations. These morbidity outcomes can be in the form of cancer and noncancer outcomes. Many nonfatal, noncancer health end points are poorly measured and the few studies are difficult to interpret, mostly because current health monitoring systems are not set up to easily link health outcome data to exposure. On the other hand, cancer incidence is enumerated much more accurately by tumor registries, which usually have high diagnostic accuracy (histologic assessment of tumor location and tumor type). Alternatively, histologically confirmed cases of cancer (except for nonmelanotic skin cancer) can be identified through records in hospital pathology departments, which may be useful in two ways. First, they provide the cases for case-control studies, the method of choice to assess rare tumors, such as childhood cancers. Second, they match cohorts to tumor registries where the cohort members reside. Mortality is readily identified from death certificates, which are collected routinely on a jurisdictional basis (e.g., state or province) and collated nationally. This outcome has the advantage of having complete national coverage but diagnostic accuracy is reduced because the attending physicians who fill out the certificates usually do not have the benefit of histologic diagnosis or autopsy findings. Most cohort studies rely on mortality data for risk assessment. It must be recognized that diagnostic accuracy from death certificates varies by the specific diagnosis (Brenner and Gefeller 1993). Disease classification systems are also periodically revised, adding to diagnostic inconsistency (Irons 1992). The issue of changes in diagnostic coding systems is illustrated for the classification of lymphatic and hemato-
OCR for page 27
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues poietic cancers (the non-Hodgkin’s lymphomas). As noted by the Institute of Medicine (IOM 2003), revisions 7 and earlier of the International Classification of Diseases did not have specific rubrics for some diseases, such as acute leukemia, but did have codes for lymphosarcoma and reticulosarcoma (ICD-200), Hodgkin’s disease (ICD-201), and lymphatic leukemia (ICD-204). Because of the lack of numbers for specific types of tumors, in older cohort studies all lymphatic and hematopoietic neoplasms were grouped together instead of handled as individual types of cancer (such as Hodgkin’s disease) or specific cell types (such as acute lymphocytic leukemia). The amalgamation of these relatively rare cancers would increase the apparent sample size but could result in diluted estimates of effect if the different sites of cancer were not associated in similar ways with the exposures of interest. In addition, before the use of immunophenotyping to distinguish ambiguous diseases, diagnoses of these cancers may have been misclassified; for example, non-Hodgkin’s lymphoma [NHL] may have been misclassified as Hodgkin’s disease [HD] [Irons 1992]. Misclassification of specific types of cancer, if unrelated to exposure, would have attenuated estimates of relative risk and reduced statistical power to detect associations. When the outcome was mortality, rather than incidence, misclassification would be greater because of the errors in the coding of underlying causes of death on death certificates (IOM 2003, p. 282). Thus, older studies that combined all lymphatic and hematopoietic neoplasms must be interpreted with care. Age and gender, two important factors influencing outcome, must be considered when assessing the risks associated with exposure to trichloroethylene. Cancer incidence varies widely by age; for example, children have different leukemia subtypes than adults. Age likely influences susceptibility to a number of environmental toxic materials both directly and indirectly through behavioral patterns, such as indoor-outdoor times, respiratory ventilation rates, and eating habits. Men and women have obvious differences in disease outcomes epitomized by diseases affecting sex organs. Again there are innate differences as well as differences that might be attributable to behavioral and environmental factors, such as exercise and occupation. For evaluating childhood disease risk, one must consider transmission of risk from the mother or the father. Obvious gender differences are in play again, such as in utero exposure and exposure to toxins in mother’s breast milk. Risk from germinal transmission could apply to either parent but again differences exist between ova and sperm formation, allowing potential differences in transmissible toxic risks. Such issues related to trichloroethylene are presented in Chapters 5 and 9.
OCR for page 28
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues DESIGNS OF EPIDEMIOLOGIC STUDIES The main study designs used in epidemiology to assess etiology are the cohort study and the case-control study; other designs used in epidemiology are case studies, ecologic studies, and cross-sectional studies. The cohort and case-control study designs can provide sufficiently high-quality data to determine whether there are associations between sites of cancer and previous exposure to trichloroethylene. Assessing causality from such associations can then be considered if there has been a suitable exposure assessment and if bias can be eliminated as a reason for observing these associations. Case studies (or case series) are not useful for estimating exposure-response relationships, because they do not make use of a reference population and therefore do not provide estimates of risk, incidence, or mortality rates. Case studies may be useful for developing hypotheses and may have some relevance for identifying hazards, particularly when a disease is extremely rare (e.g., angiosarcoma and vinyl chloride) and a few cases in a population with a common exposure may suggest an increased risk. Ecologic studies are used to estimate correlations between rates of cancer in geographically circumscribed populations and exposure measured at the geographic level. It is important to distinguish between “pure” ecologic studies and other types of analytic studies, which have data on an individual level but make use of an exposure variable that is assigned uniformly to all subjects in specific areas. In the latter types of studies, which are not to be classified as ecologic studies; it is assumed that it is valid to assign one level of exposure to all subjects in a geographic area, although there may be some inherent misclassification because not all are exposed uniformly. If the measurement error is independent of geographic area (e.g., county), then risk estimates will usually be attenuated. For the pure ecologic studies of end points in which there are no individual data and there are other important risk factors, the main methodological issue is bias from uncontrolled confounding (referred to as the “ecologic fallacy” or cross-level bias). This bias may occur because an association observed between variables measured on an aggregate level does not necessarily represent an association at the individual level (see Morgenstern 1998). A quintessential example is found in the literature on radon and lung cancer, where rates of lung cancer in U.S. counties showed a negative association with average concentrations of radon measured in the counties (Cohen and Colditz 1994), whereas the individual case-control and cohort studies show positive exposure-response patterns (NRC 1988). The cross-level bias in this example is likely due to the nonlinear exposure response measured on an individual level and to confounding by smoking (Greenland and Robins 1994). For diseases with only one major risk factor, ecologic studies may provide accurate estimates of risk at the individual level; for example, the
OCR for page 29
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues original study by Snow (1856) on cholera in London was ecologic and was not subject to the ecologic fallacy because cholera has only one cause. Cross-sectional studies provide a snapshot of the prevalence, but not incidence, of health conditions in a specific population at one point, or over a short period, in time. The prevalence of cancer in subjects who may or may not have been exposed to trichloroethylene can be compared as prevalence proportions. Incidence rate ratios (or differences), the main etiologic parameters of interest, cannot be estimated if the prevalence of disease is related to the duration of disease. Cross-sectional studies are rarely useful for studying cancer because of this issue and because of possible selection biases in the underlying cohort that provides the sources of the population (e.g., selection of study participants to assess the prevalence of kidney cancer that may be related to duration of cancer and also to exposure). The cohort study is the principal methodological paradigm describing all analytical epidemiologic study designs in that the other designs differ from the cohort study only in the way subjects are sampled from an explicitly defined or an implicitly defined cohort. Explicitly defined cohorts include, for example, occupational populations for which a roster is established and subjects are followed over time. In this type of study, incidence (mortality) rates are estimable directly from following the population through time, thereby assessing vital status as well as the health outcomes of interest; these rates can be compared by the estimated exposure, adjusting for potential confounding factors. Exposure can be defined at the beginning of follow-up (or earlier, say at the beginning of employment) or reevaluated through time. In principle, other risk factors can also be assessed so that confounding bias can be eliminated through statistical adjustments. The nested case-control design is used usually to reduce the costs of obtaining information not available on the cohort roster (e.g., smoking information) and incidence density sampling is used to produce odds ratios that are unbiased estimators of the rate ratio (although they usually have larger standard errors than a full cohort analysis). The nested case-control study has major advantages for exposure assessment because only a sample of subjects needs to be assessed. In some cases, the exposure histories of the cohort have been determined by an exposure assessment, and then the nested case-control study is usually involved to assess secondary data that may confound the exposure effect. The nested case-control study is not to be confused with population- or hospital-based case-control studies that are used to select subjects from the general population or a subset of the general population; for example, the nested case-control study by Greenland et al. (1994), discussed in Chapter 3, should be classified as a cohort study because the odds ratio, estimated from incidence density sampling, is an unbiased estimate of the hazard ratio. Implicit cohorts are the basis for case-control studies in the general population, where cases and controls are selected from an underlying popu-
OCR for page 30
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues lation. Often, incidence density sampling is used, as in nested case-control studies, but again the underlying cohort is not enumerated. Statistically, these studies are tremendously powerful, because the number of cases in principle can be maximized by increasing the intake period or using other geographic regions. A main methodological challenge with the case-control study is the definition of the population-based or nonnested control population (Wacholder et al. 1992a,b). A general weakness of population-based case-control studies is the quality of the exposure information. A wide range of exposures in the general population can be difficult to characterize, and exposures of interest may have low prevalences, leading to low statistical power to detect effects. Frequently, the source of exposure information in these studies comes from interviews or information from secondary sources such as occupation on death certificates. A strength of the study design is that covariates can usually be measured but, like the exposure of interest, there may be misclassification if these occurred in the distant past. To make information from the case group comparable to the control group, special methods for obtaining information are used, including using a control population that also has some pathology (e.g., cancer controls for a study on breast cancer); using independent evaluators to assess exposure based on job descriptions (Siemiatycki et al. 1981, 1987; Stewart and Stewart 1994; Stewart et al. 1998); and defining in advance rules to indicate exposure based on job and industry classifications (referred to as job exposure matrices) (Hoar et al. 1980; Hsieh et al. 1983; Sieber et al. 1991; Bouyer and Hemon 1993; Dosemeci et al. 1994). Thus, the two main designs most useful for risk assessment are cohort studies and case-control studies. However, judging the validity of a study solely in terms of type of design may be misleading. It has often been said that the cohort study is superior to the case-control study because data are collected so that the temporal chain in causality is clear and unambiguous. This may be true for prospective cohort studies in which exposure and other important variables are assessed prospectively, but a well-designed case-control study may be as informative as a well-designed retrospective cohort study. For example, in retrospective cohort studies, in which past exposure is inferred from various data sources, exposure misclassification may be as great as in population-based or hospital-based case-control studies. In addition, bias from the misclassification of disease may be introduced in cohort studies in which mortality is used as the end point, particularly for noncancer outcomes, and such studies may be inferior to a well-conducted case-control study in which disease status is confirmed through rigorous means (e.g., in studies of cancer with histologic confirmation). Another issue
OCR for page 31
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues with cohort studies, unless they are very large, is that the statistical power to detect small or moderate associations is diminished for rare outcomes. The validity of any study may be difficult or impossible to verify, although some fundamental principles may help guide the way individual studies are evaluated. The U.S. Environmental Protection Agency (EPA) has provided a list of features that need to be evaluated and the committee largely agrees with this list: (1) clear articulation of study objectives or hypothesis; (2) proper selection and characterization of comparison groups (exposed and unexposed groups or case and control groups); (3) adequate characterization of exposure; (4) sufficient length of follow-up for disease occurrence; (5) valid ascertainment of the causes of cancer morbidity and mortality; (6) proper consideration of bias and confounding factors; (7) adequate sample size to detect an effect; (8) clear, well-documented, and appropriate methodology for data collection and analysis; (9) adequate response rate and methodology for handling missing data; and (10) complete and clear documentation of results. No single criterion determines the overall adequacy of a study (EPA 2005a, p. 2-4). To this list can be added the notions of definition of the target population (all inferences are made to this population), selection of subjects (e.g., response rates, attrition rates), and statistical variation in the estimates of association. In the end, the judicious use of relevant epidemiologic studies will determine, using weight-of-the-evidence arguments (inductive reasoning), whether there is an association and, in conjunction with other data, whether the association may be causal. EXPOSURE ASSESSMENT A critical component of any epidemiologic study is the method used to assess exposure as well as its accuracy (validity and reliability) (see Smith 2002; Nieuwenhuijsen 2003). Figure 2-1 shows the basic levels of exposure assignments that may result from an exposure assessment and how they are related. Assignment of exposure is implicitly quantitative. The true underlying exposure intensity distribution on the right is highly skewed, with only a small fraction having high exposures. The fundamental exposure classification is to identify which members of a population are “exposed,” and the term “exposure” may have several definitions (see below). The qualitative judgment about an agent being present in a subject’s environment is based on information about the setting, which can be descriptive about the location, activities, agents that are or might be present, and data on local contamination of air, water, and food.
OCR for page 32
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues FIGURE 2-1 Exposure intensity classification approaches. Relative error rates are important for the utility of any of the approaches. There are two types of error: qualitative, for the presence of the agent; and quantitative, for assigning intensity of exposure. Definitions of Exposed The most commonly used epidemiologic definitions of exposure are (1) a subject is potentially exposed because he or she spends some time in a setting where the agent is known to be present; (2) there is reasonable probability of exposure to the agent by inhalation, skin contact, or ingestion because of a subject’s activities (e.g., job contact, water ingestion); (3) potentially exposed subjects have at least a minimum amount of the agent present in personal samples (e.g., skin contamination) or biological samples (e.g., blood, urine). Clearly these samples do not represent the same likelihood or degree of exposure. For example, an accountant who walks through a production area where trichloroethylene is used is potentially exposed, but the degree of contact (intensity) and duration are very limited. Another example is that residents in an area where some wells are contaminated are potentially exposed, but it is unknown if the well they used is contaminated. In this case, it is necessary to know the prevalence of contaminated wells or, ideally, whether the well serving the home was contaminated. Another example comprises measurements from a subset of workers with jobs where trichlo-
OCR for page 33
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues roethylene is routinely used, and it is known that they are all likely to have been exposed. Even in areas with high exposures, some workers may have only slight exposure, such as a supervisor who stays in an office most of the day. Care must be taken to recognize the potential for misclassification in different exposure settings. Epidemiologic Approaches to Population Exposure Assessment Exposure assessment uses a combination of approaches to answer two questions: (1) is the agent potentially present in the setting (workplace, community, home) and (2) if it is present, what were the intensity and duration of exposures (time profile of exposures)? For the first question, an agent (trichloroethylene) can be unequivocally shown to be present with no indication of the intensity of exposure, such as by identifying that degreasing operations were present and company purchasing records showing that large amounts of trichloroethylene were used. Given that trichloroethylene has been determined to be present, then we need to estimate the intensity of exposure. Intensity can be estimated from measurements, biological monitoring, and exposure modeling. Six components of an exposure assessment determine the answers to the two questions above: Qualitative assessment Industry, community, neighborhood Use of trichloroethylene, prevalence of exposure Coexposures Confounders Exposure setting Location of exposures, area descriptors, or location of wells or contamination Relevant jobs, tasks, or personal activity factors associated with exposure Exposure controls (if any) Temporal data Data source(s), data quality Period covered Median duration of exposures Median latency Exposure quantification Measurement method (precision and accuracy define quality) Specificity (trichloroethylene measured, or nonspecific method for solvents) Quantity of data (extensive or limited)
OCR for page 34
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues Temporal coverage of exposure data (current data only, or current and past data) Extrapolation methods Gaps in current exposure data, such as settings with low exposures, and estimating past periods (engineering-based model, or simplistic assumptions) Validation of past estimates (exposure data, or no validation) Dose metric Cumulative exposure, average exposure, duration in job, years exposed. As in all exposure assessments, the researcher is limited by the data available and by the resources that can be applied to the task. These components define the quality of the estimates of exposure. If investigators have not given details on these aspects, then it is not possible to fully assess the quality of the data. It is not a requirement that all the data come from the study in question. Useful data often come from hygiene studies of the same industry or from community studies of similar settings. The goal is to form as complete a picture of the exposures as possible. Information on Settings and Jobs Information on workplace settings and jobs helps in the assessment of exposure. Factors to consider include description of workplace setting (size, layout, number of sources or tasks with emissions), specific sources of exposure (degreaser tanks [type, dimensions, solvents used, volume or time used, presence of covers, local-exhaust-ventilation controls]), and work tasks (use of degreasers, size of parts cleaned, manual cleaning with rag and bucket, hours per shift or per week cleaning). The three primary types of degreasing and cleaning operations using solvents are listed in Table 2-1 with their approximate dates of use. Use of the vapor degreaser had the highest potential for exposure because vapors can escape from the degreaser, especially if poor work practices are used, such as early removal or too rapid removal that can carry concentrated solvent vapors and liquid out of the tank. Keeping the degreaser covered when not in use and careful operating procedures can minimize exposures. Dip tanks are the next important source of exposure. Hot dip tanks, where trichloroethylene is heated to close to its boiling point of 87°C, are major sources of vapor that can be as important as vapor degreasers. Cold dip tanks have a lower exposure potential, but they have a large surface area and removal of the pieces can carry solvent out. Small bench-top cleaning operations with a rag or brush and open bucket have the lowest exposure potential. Poor working techniques can distribute solvent across the bench
OCR for page 35
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues TABLE 2-1 Years of Solvent Use in Industrial Degreasing and Cleaning Operations Years Vapor Degreasers Cold Dip Tanks Rag or Brush and Bucket on Bench Top ~1934-1954 Trichloroethylene (poorly controlled) Stoddard solvent Stoddard solvent (general use), alcohols (electronics shop), carbon tetrachloride (instrument shop) ~1955-1968 Trichloroethylene (poorly controlled, tightened in 1960s) Trichloroethylene (replaced some Stoddard solvent) Stoddard solvent, trichloroethylene (replaced some Stoddard solvent), perchloroethylene, 1,1,1-trichloroethane (replaced carbon tetrachloride, alcohols, ketones) ~1969-1978 Trichloroethylene, (better controlled) Trichloroethylene, Stoddard solvent Trichloroethylene, perchloroethylene, 1,1,1-trichloroethane, alcohols, ketones, Stoddard solvent ~1979-1990s 1,1,1-Trichloroethane (replaced trichloroethylene) 1,1,1-Trichloroethane (replaced trichloroethylene), Stoddard solvent 1,1,1-Trichloroethane, perchloroethylene, alcohols, ketones, Stoddard solvent SOURCE: Stewart and Dosemeci 2005. top and to workers’ skin and clothing. Less volatile solvents are generally used in manual cleaning activities. In combination with the vapor source, the size and ventilation of the workroom are the main determinants of exposure intensity. Ranking by Semiquantitative Estimates of Exposure Given an indication that some parts of the exposed population may have higher exposure than others, it may be possible to identify ranked subgroups by semiquantitative relative exposure differences, such as high and low or high, medium, and low. However, without knowing the relative toxicity or carcinogenicity of an agent, it is not possible to say what a “high” exposure is that also carries a high risk. Often, it is not possible to say how much more exposure one group has than another, but because of their frequency of contact or proximity to the emission source, a difference may be defined. If there is certainty that large differences exist, then comparing risks among exposure groups can provide some evidence that a dose-response relationship exists.
OCR for page 47
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues studies. This approach does take into account the size of the studies (which is related to the variance) but does not consider other differences related to the quality and designs of the studies or their exposure assessments. By today’s standards for meta-analysis, simple ballot counting or even weighted averaging of findings generally is not considered an adequate analysis. Statistical methods for meta-analysis are now available to assess the extent of heterogeneity among studies, and for fitting of random effects models to account for heterogeneity when it exists (DerSimonium and Laird 1986). This approach presumes that the main source of variation is statistical, but systematic differences in population exposures and methods for assessing them can also be important. However, in many instances epidemiologic data are too variable to justify combining them no matter how sophisticated the statistical methods. Thus, some authors have suggested that the primary goal of an epidemiologic meta-analysis is more often to identify the source of heterogeneity in the study findings than to produce an overall or summary estimate of the effect (Greenland 1987). A large number of issues arise when performing a meta-analysis, particularly when such analyses are based on observational data (e.g., epidemiologic) as opposed to experimental data (e.g., clinical trials). Many of the decisions are largely subjective—for example, which studies to include (e.g., cohort, case-control), which results to use from each study (e.g., lagged or unlagged relative risks), and how to treat studies of questionable quality (e.g., eliminating them, using quality scoring). Such decisions are largely subjective and should be made carefully because they can affect the outcome of the meta-analysis. In addition, limited attention is usually given to problems arising from the wide variation in the quality and level of detail in exposure assessments for the studies (see discussion of the Wartenberg et al.  analysis below). As a result, even though the goal is to evaluate the relationship between exposure and disease, only the disease dimension is critiqued with any sophistication. Another common issue in meta-analysis, sometimes called the “file drawer” problem or publication bias, refers to the fact that positive studies are more likely to be published than negative ones. Conversely, the possibility also exists that some positive epidemiologic studies might not be published. Statistical methods have been developed to evaluate whether there is evidence for publication bias. These methods rely on constructing a plot of the findings, observed relative risk versus the variance of each of the studies, which is commonly referred to as a funnel plot (Light and Pillemer 1984). One generally expects that the negative studies that do not get published are small and have large variance. Implicit in this type of analysis is the assumption that the effects of differences in methods and exposure assessments are random and small. Thus, if a publication bias exists, one would expect to
OCR for page 48
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues see more small studies showing a positive finding than large ones. However, this approach is not a powerful method for detecting publication bias, and there is no way to check the basic assumptions. Meta-analysis of epidemiologic data remains somewhat controversial, despite advances in the methodology. Some epidemiologists have questioned whether meta-analysis is useful for summarizing epidemiologic data given the inherent problems of combining epidemiologic data (e.g., Shapiro 1994). Other epidemiologists have defended meta-analytic methods, particularly when they are properly applied (e.g., Petitti 1994). Despite these controversies, most epidemiologists have come to view meta-analytic methods as a useful, albeit imperfect, tool for performing a quantitative summary of the epidemiologic evidence. Following is a discussion of some specific issues for performing a meta-analysis of the epidemiologic data on trichloroethylene and cancer. Specific Meta-Analysis Issues for Trichloroethylene The committee reviewed two meta-analyses that were performed to examine the association between trichloroethylene exposure and the risk of cancer. The first analysis, by Wartenberg et al. (2000), was heavily relied on in the EPA (2001b) draft risk assessment on trichloroethylene. The second, by Kelsh et al. (2005), was presented to the committee at a meeting on June 9, 2005. Each analysis is discussed below in light of some of the general principles discussed above and in light of some criticisms pertaining to the published analysis by Wartenberg et al. (2000). Analysis by Wartenberg et al. (2000) The review by Wartenberg et al. (2000) presents estimates of the relative risk for kidney and renal cell cancers, liver and biliary cancer, non-Hodgkin’s lymphoma, Hodgkin’s disease, cervical cancer, and pancreatic cancer. The analyses were stratified into three tiers on the basis of the authors’ subjective judgment of the quality of the exposure data in the studies, with the first tier having the highest quality. The Tier I studies had direct information on exposures (biomarkers, job-exposure matrices, job histories), Tier II studies were based on job title, and Tier III studies were of dry-cleaning and laundry workers. A weighted average of the relative risks reported in the cohort studies was estimated where the weights were the inverse of the variance of each study. This analysis provided evidence supporting an association in the Tier I studies for exposure to trichloroethylene and increased risk of kidney cancer (relative risk [RR] = 1.7, 95% confidence interval [CI] = 1.1-2.7), liver cancer (RR = 1.9, 95% CI = 1.0-3.4), and non-Hodgkin’s lymphoma (RR = 1.5,
OCR for page 49
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues 95% CI = 0.9-2.3). To a lesser extent, the analysis provided evidence for an association between exposure to trichloroethylene and cervical cancer, Hodgkin’s disease, and multiple myeloma. Letters to the editor (Borak et al. 2000; Boice and McLaughlin 2001; Rhomberg 2002) have criticized this quantitative analysis. The common criticism relates to including the study by Henschler et al. (1995) in the analysis. There were several methodological concerns about that study (see discussion in Chapter 3). One of the main objections was that the cases were originally identified in a cluster investigation, which does have relevance in interpreting the study. Clusters do occur by chance and, if epidemiologic studies are performed in areas or industries with known clusters, they clearly may be biased toward observing an effect. On the other hand, many of what are considered to be known occupational carcinogens (e.g., vinyl chloride, bis[chloromethyl]ether) were originally identified from clusters and subsequent formal studies of the same population confirmed that the cancers were truly in excess. Excluding these types of studies would clearly introduce a negative bias into a meta-analysis. However, the variance of the estimate of the relative risks, as reported by Henschler et al., is underestimated, as it does not account for the fact that the study was based on a nonrandom sample (cluster); in principle, its formal use in a meta-analysis would need to make use of a corrected (inflated) variance. The Henschler et al. study also was unusual in that it reported an extremely large relative risk of kidney cancer (RR = 8.0, CI = 3.4-18.6). Although a corrected variance ideally should be used, the committee is unaware of methods to adjust the variance of a study that is based on a cluster. Some reviewers have minimized the importance of the Henschler et al. study by referring to it as an “outlier.” However, there is evidence that the exposure concentrations of trichloroethylene might have been higher at the facility studied by Henschler et al. than in other studies (see Chapter 3), providing a plausible explanation for the unusually high relative risks for kidney cancer that were observed. Other methodological concerns with the study have been raised by several authors (see Chapter 3). One way to assess these issues formally in the meta-analysis is to conduct sensitivity analyses by including and excluding the study to determine whether it strongly influences the results of the meta-analysis. One problem with meta-analysis is that each study is analyzed in isolation, such as the one by Henschler et al. (1995). However, a series of studies have been done on the general population in the geographic area where the study was conducted, which appears to have a segment with substantial exposures (Vamvakas et al. 1998; Brauch et al. 1999, 2004; Pesch et al. 2000a; Brüning et al. 2003). Those studies have investigated different segments of the population and different outcomes, and they all found evidence of kidney cancer. If the Henschler et al. (1995) study is considered in that
OCR for page 50
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues context, then it is consistent with the nature of the population and the level of occupational exposures inherent in the working population. The summary relative risk estimates presented by Wartenberg et al. (2000) did not include case-control studies and generally emphasized the Tier I cohort studies. It was suggested that case-control studies are inferior to cohort studies because they generally lack detailed information on exposures. However, this is not always the case (see previous discussion in this chapter) and certainly is not the case for nested case-control studies, such as the study by Greenland et al. (1994) that was excluded from the Wartenberg et al. analysis. Methods are available for including case-control studies with cohort studies in a meta-analysis (see Greenland 1987). Wartenberg et al.(2000) did not include any formal statistical analysis of the studies for heterogeneity. Testing for and evaluating heterogeneity is standard practice in meta-analysis. It appears that the findings for kidney cancer may not have been homogeneous given the unusually large effect reported by Henschler et al. (1995). Wartenberg et al. (2000, p. 174) recognized the need for further work and recommended that a meta-analysis be conducted that would “try to isolate the factors that help explain the observed risks, as well as to better quantify the risk. One would have to focus carefully on the possible heterogeneity among studies, carefully considering which groups of studies to combine.” In addition, one could conduct meta-regression whereby meta-characteristics of studies are used to help explain the observed heterogeneity (one important factor would be levels of meta-exposure). Finally, a much more detailed assessment could be made of the exposure data used in each of the studies. For example, studies with very high exposures provide information on a different part of the exposure-risk curve than those with lower exposures. Analysis by Kelsh et al. (2005) Kelsh et al. (2005) performed a meta-analysis of the epidemiologic studies of trichloroethylene, and this information was presented to the committee at a meeting on June 9, 2005. (The results of the study were subsequently published or submitted for publication after the committee had completed its deliberations. Those publications should be consulted for more detail, as the committee only reviewed materials presented to it at its June meeting). The review included several studies that were published after Wartenberg et al. performed their analysis, including cohort studies by Hansen et al. (2001) and Raaschou-Nielsen et al. (2003) and a case-control study of kidney cancer by Pesch et al. (2000a). Several studies used by Wartenberg et al. (2000) were excluded from the meta-analysis. In part, these omissions are explained by Kelsh et al.’s exclusion criteria, which eliminated proportional-mortality-ratio, community, and cross-sectional
OCR for page 51
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues studies. Excluding proportional-mortality-ratio and cross-sectional studies might be warranted based on concerns about the quality of these types of studies. However, well-designed proportional-mortality-ratio studies can yield results of similar quality to full cohort studies under certain conditions (Monson 1974; Wong and Decoufle 1982). It is difficult to justify eliminating community studies, because they provide the only information on the effects of exposures from water contamination with trichloroethylene, unless they are purely ecologic. The community studies have more limited exposure assessments but, under some circumstances, they can provide useful information about risks of exposures in the community. Similar to Wartenberg et al., Kelsh et al. categorized the cohort studies into two groups based on quality concerns. Group I studies had clearly identified exposures to trichloroethylene from biomonitoring, industrial hygiene, chemical inventories, or job-exposure matrices. Group II studies either had limited documentation of exposures to trichloroethylene or had “quality of data limitations.” There was generally good correspondence between those cohort studies that were classified as Tier I and II by Wartenberg et al. and Group I and II by Kelsh et al., with one notable difference—the study by Henschler et al. that Wartenberg et al. classified as being of the highest quality (Tier I) and that Kelsh et al. classified as being of lower quality (Group II). This difference points to the subjective nature of these qualitative rankings of studies. The statistical methods used in the analysis by Kelsh et al. are more consistent with modern methods for meta-analysis than those used by Wartenberg et al. Random effects models were fitted and tests were performed to evaluate the heterogeneity of the meta-results. Sensitivity analyses were conducted in which one study at a time was deleted from the meta-analysis. Kelsh et al. analyzed case-control studies separately from cohort studies in their analysis, which is similar to what Wartenberg et al. did. As noted previously, case-control and cohort studies may and should, if possible, be combined in a meta-analysis, and the basis for assignments of exposure needs to be carefully assessed to determine the level of misclassification by qualitative and quantitative criteria (as discussed earlier). Kelsh et al. reported an elevated and statistically significant meta-analysis relative risk for kidney cancer (RR = 1.29, 95% CI = 1.06-1.57) among the Group I studies, which was not heterogeneous (P = 0.90). The analysis of the Group II and case-control studies also showed elevated risks, but the findings were heterogeneous and highly dependent on including “outlier” studies. The authors suggested that the positive findings for kidney cancer might be explained by smoking or more intensive health monitoring in worker populations, particularly in U.S. workers. Both explanations appear unlikely to the committee. Smoking is a relatively weak risk factor for
OCR for page 52
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues kidney cancer with relative risk estimates for smokers being generally less than 2 (IARC 2004), and it seems unlikely that the industry studies are likely to have had special screening programs for kidney cancer for trichloroethylene-exposed workers. Kelsh et al. also reported a statistically significant increase in risk in their meta-analyses for liver cancer (RR = 1.32, 95% CI = 1.05-1.66) and non-Hodgkin’s lymphoma (results not reported but the 95% CI presented graphically clearly excludes unity). In both cases, the authors suggested that there was heterogeneity of these findings with European studies showing higher risks of liver cancer and non-Hodgkin’s lymphoma, even though the test for heterogeneity of these findings was not statistically significant (liver, P = 0.34; non-Hodgkin’s lymphoma, P = 0.18). The authors also emphasized the lack of evidence for an exposure-response relationship for these end points and suggested that was a reason for rejecting a causal association. The committee disagrees with this suggestion—lack of an exposure-response relationship is not convincing evidence against a causal interpretation because these studies generally lacked the information for estimating amount of exposure (Stayner et al. 2003). When considered across studies, there is a trend for increasing risk where there is evidence of higher exposure. Use of the Hill Guidelines to Assess Causality Reviews of epidemiologic data may be qualitative or quantitative. Qualitative reviews have frequently relied on interpreting findings with the set of guidelines first proposed by Hill (1965). These are often referred to as “criteria” and the committee was specifically asked to comment on the use of these “criteria.” In fact, Hill referred to them as “viewpoints” and he emphasized that none of them was necessary or sufficient. Thus, it is a mistake to view the “Hill criteria” as a checklist that must be completed before causality is determined. These guidelines generally include consideration of the (1) strength of the association, (2) evidence for an exposure-response relationship, (3) consistency of the findings between studies, (4) biological plausibility of the hypothesis, (5) temporality of the exposure (does it precede disease), (6) specificity of the exposure and disease relationship (is exposure associated with a specific disease), (7) support from analogy, and (8) support from experiments. These guidelines are extremely useful, but most epidemiologists do not consider any one of them to be a necessary condition for causality except for the one stating that exposure must precede the onset of disease. Hill’s guidelines do not directly address problems with the quality of the exposure assignments, but they are implicit in many of his viewpoints. It is not clear why the “specificity of the exposure and disease relation-
OCR for page 53
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues ship” is important—single agents can cause more than one outcome (e.g., cigarette smoking), and an outcome can be caused by more than one agent (e.g., smoking). Exposure-Response Analysis Information on exposure-response relationships derived from epidemiologic investigations play a critical role in the hazard identification and dose-response evaluation elements of risk assessment. Strong evidence for an exposure-response relationship is one of the key pieces of evidence that epidemiologists use to make inferences about causality. It is one of the elements of the Hill guidelines for judging causality that have been incorporated into the new EPA (2005a) cancer guidelines for hazard identification. The absence of evidence for an exposure-response relationship does not often provide a convincing argument against causality. For one, if the exposure estimates are inaccurate or imprecise then it is well recognized that there may a bias in the exposure-response relationship toward the null, and this bias might even eliminate the relationship under certain conditions (Armstrong 1990; Dosemeci et al. 1990; Steenland et al. 1996, 2000). Studies may be negative because there are too few subjects with sufficient exposure to increase overall risk or because controls have unrecognized exposures. Furthermore, there are numerous examples in occupational epidemiology where exposure-response relationships observed in a study flatten or even decrease at the highest exposures (Stayner et al. 2003). The reasons for this are unclear; however, they might be explained by biases in the studies (e.g., the healthy worker survivor effect), by biologic factors (e.g., a saturation of key enzyme pathways at high concentrations), or by misclassification in the highest exposure assignments, which can only be misclassified downward (some highly exposed subjects are assigned to medium exposures). Hertz-Piccioto (1995) has suggested that epidemiologic studies that are suitable for quantitative risk assessment should (1) provide evidence for a moderate-to-strong exposure-response relationship, (2) have strong biases and confounding ruled out, and (3) have exposures linked to individuals. These criteria are perhaps somewhat overly restrictive, and very few epidemiologic studies will meet them all. In practice, it may be helpful to use epidemiologic studies for quantitative risk assessment even when they do not meet all the criteria if for no other reason than to provide a test of the reasonableness of risk estimates derived from toxicologic or other epidemiologic investigations (for testing the validity of other risk assessment models). As noted earlier in this chapter, the most common limitation of epidemiologic data for quantifying risk is the availability of high-quality exposure data to construct exposure-response models. Hardly any epide-
OCR for page 54
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues miologic studies have actual measurements of individual exposures for each study subject over the entire study time period with the exception of studies of workers in the nuclear power industry. More commonly in occupational studies, as discussed above, a job-exposure matrix may be developed to estimate each individual’s exposure based on job title, work location, or industry. In the absence of measurements of personal exposure or exposure estimates of groups based on job characteristics, residence location, or water supply (see Siemiatycki et al. 1981; Siemiatycki 1991), only broad classifications generally are available for conducting exposure-response modeling for quantitative risk assessment purposes with epidemiologic data. Unfortunately, these types of exposure data were rarely available in the epidemiologic studies of trichloroethylene. Surrogates for quantitative estimates of exposure such as duration of exposure or exposure category (e.g., high, medium, low) are available for some of the trichloroethylene studies. Crude risk assessment models may be derived from these studies if some assumptions are made about the average exposures of individuals in the study groups. This approach is highly prone to error and should be used only to determine whether there is evidence to suggest a hazard. Where there is considerable misclassification of intensity of exposure, there may be no evidence of an exposure gradient even when there is good evidence of an increased risk for ever-exposed versus never-exposed subjects. If an exposure-response relationship is seen, it is likely that the slope is underestimated. If preventive action is needed, surrogate exposure information can be used when other more appropriate exposure-response information is unavailable. Occasionally, it may be possible to use biologic measures (e.g., urinary biomarkers) to estimate the exposures for study subjects (see discussion earlier in this chapter). These measurements have the advantage of integrating personal factors, such as metabolism of the compound, and thus may come closer to estimating the true dose resulting from an exposure. Because metabolic enzymes vary across individuals, if the carcinogenic agent is a metabolite, then use of the biomarker will increase the variability in the exposure assignments for the subjects in each exposure group. In the case of trichloroethylene, several studies from Scandinavia used measurements of trichloroacetic acid in urine in national databases of workplace surveillance programs to identify exposed subjects for a cohort study of cancer risk (Tola et al. 1980; Axelson et al. 1994; Anttila et al. 1995; Hansen et al. 2001). Unfortunately, these workers were exposed to other solvents that also produce trichloroacetic acid as a metabolite, so there is an unknown amount of misclassification. In addition, urinary biomarkers were measured in only a fraction of the subjects studied. EPA (2001b) used estimates of exposure to trichloroacetic acid to calculate trichloroethylene exposures based on the reported relationship between trichloroethylene in air and trichloroacetic
OCR for page 55
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues acid in urine. Unfortunately, the validity of the use of these data rests on some very weak assumptions. The first is that there is a strong relationship between trichloroethylene in air and trichloroacetic acid in urine, for which there is no strong supporting evidence (summarized by Axelson et al. 1978; also see Chapter 3). Second, the estimates of trichloroacetic acid were not from the entire study time period, and the duration of exposure of study subjects was not known. In addition, assuming that all the exposures were purely trichloroethylene, the estimated trichloroethylene exposures were low (1-10 parts per million for most subjects). FINDINGS AND RECOMMENDATIONS An epidemiologic study can provide useful data on risk per unit of exposure only when the exposures have been measured or the intensity can be reasonably inferred on the basis of the exposure circumstances. Studies with small exposure groups with limited or low-intensity exposures provide little useful data on the risk from exposures. The most useful studies for exposure-risk analysis have reasonable numbers of subjects with a wide range of exposures of sufficient duration. The studies by Henschler et al. (1995), Vamvakas et al. (1998), and Brüning et al. (2003) potentially identify the upper end of the exposure-risk relationship for trichloroethylene. The exposures were high and were not confounded by exposure to other chlorinated hydrocarbons. Given the physical constraints of reasonable workplace exposures, Henschler et al. (1995) represented the upper boundary of likely total exposures. However, because of difficulties and uncertainties with making quantitative estimates of exposure, the committee does not believe there are any currently available epidemiologic data suitable for conducting exposure-response modeling for quantifying cancer risks. Crude approaches such as those used by EPA (2001b) are appropriate for checking the reasonableness of predictions from models based on animal bioassays but are not suitable as a primary means of quantifying risks. Recommendations: Consideration should be given to developing a database that compiles the study designs and results of all potentially relevant epidemiologic studies. Such a database can be used as a tool to conduct formal evaluations of characteristics of studies, conduct meta-regression analyses using study characteristics and results, and develop tables for presentation. Tables that summarize the essential design and exposure characteristics of the epidemiologic studies should be included in risk-assessment documentation.
OCR for page 56
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues Epidemiologic studies should be analyzed to discriminate the amount of exposure experienced by the study population and used in classification schemes for meta-analyses (see below). Despite the fact that no single study can provide estimates of exposure-response patterns, the epidemiologic meta-analysis can make use of approximate “meta-levels” of exposure. An analysis of exposure should evaluate the quality of all exposure assignments and ensure that all relevant exposure data for each population were used. Studies that used the same base populations should be identified because they might be combined and their exposure information shared. The statistical power of each study should be assessed, given the likely percentage of the study population exposed and, where possible, the intensity of exposures. These findings could be plotted according to the methods of Beaumont and Breslow (1981). Because there are no studies with good exposure assessments for trichloroethylene, it is important to begin a prospective study on a suitable cohort that could provide the missing quantitative relationship between long-term trichloroethylene exposure and disease risk. Such a study should include an initial retrospective exposure assessment and state-of-the-art prospective exposure assessment, including the latest exposure biomarkers of trichloroethylene metabolites and biomarkers of early effects. It may be necessary to study cohorts outside the United States. The two meta-analyses reviewed by the committee have limitations. The meta-analysis by Kelsh et al. includes several studies that were not available at the time Wartenberg et al. performed their analysis. The new studies appear to have made the excess risk of kidney cancer more robust in the sense that the excess risk observed in the meta-analysis no longer depended solely on inclusion of the Henschler study. The Kelsh et al. analysis uses statistical methods that are more consistent with modern methods for meta-analysis than the Wartenberg et al. analysis. Both analyses inappropriately analyze case-control studies and cohort studies separately and used subjective assessments of quality to exclude or categorize studies. Thus, the committee judges that neither the Wartenberg et al. (2000) nor the Kelsh et al. (2005) analyses should be used for hazard characterization purposes in risk assessments for trichloroethylene. Recommendation: The following guidelines should be used to perform a new meta-analysis of the cancer risks associated with trichloroethylene:
OCR for page 57
Assessing the Human Health Risks of Trichloroethylene: Key Scientific Issues Study identification A thorough search of the literature must be conducted to make sure that no relevant studies are overlooked. An important effort is to identify studies that could have investigated the risks associated with trichloroethylene. An analysis of publication bias should be conducted. Study selection As much as possible, all relevant studies should be included in the analysis. Any exclusion should be clearly explained and should be based on objective criteria (e.g., studies in which it was unclear that the study population was actually exposed [e.g., dry-cleaning workers]). Weighting and classifying schemes Subjective quality scoring (e.g., tiers, groups) should not be used in the analysis. Instead, studies should be classified in terms of their characteristics, such as studies in which exposure to trichloroethylene was well documented or based on the study’s design (e.g., cohort, case-control). These study characteristics should be examined as possible reasons for any observed heterogeneity, and meta-regression should be carried out, if deemed feasible. Analysis Both case-control and cohort studies should be included and combined unless this introduces substantial heterogeneity into the analysis. Tests of heterogeneity should be performed for all analyses. If heterogeneity is found to exist then a thorough search should be made to determine whether there are any explanations for the heterogeneity, such as differences in population exposures. Fixed and random effects models should be fitted to the data. If there is no evidence for heterogeneity, then fixed models may be preferred, but it is still appropriate to report results from random effects models as well. A sensitivity analysis should be performed in which each study is excluded from the analysis one at a time to determine whether any study significantly influences the findings. The findings from the meta-analysis should be viewed cautiously if they are highly dependent on the inclusion of one or two studies, and these studies have severe methodological limitations.
Representative terms from entire chapter: