Page 24

2

The Evaluative Process: Part I. Assessing the Available Data

This chapter and the next are based on a paper, “An evaluative process for assessing human reproductive and developmental toxicity of agents,” published in Reproductive Toxicology (Moore et al. 1995a) that calls for the systematic application of knowledge and judgment to assess agents for reproductive and developmental toxicity in a practical, open, and informative manner. In addition, the U.S. Environmental Protection Agency's (EPA) guidelines for developmental toxicity (1991) and reproductive toxicity (1996a) risk assessment, and several additional sources reviewed in Chapter 1, were used extensively. Several principles and objectives that are incorporated in the evaluative process are described below. The principles and objectives are followed by details of the evaluative process. The description of the evaluative process continues in the next chapter. In that chapter, the interpretation of toxicity data, integration of toxicity and exposure data, and quantitative assessment steps are covered.

PRINCIPLES AND OBJECTIVES

Use of Data and Judgment

The evaluative process uses both scientific data and scientific judgment. The data required to identify an agent as toxic should



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 24
Page 24 2 The Evaluative Process: Part I. Assessing the Available Data This chapter and the next are based on a paper, “An evaluative process for assessing human reproductive and developmental toxicity of agents,” published in Reproductive Toxicology (Moore et al. 1995a) that calls for the systematic application of knowledge and judgment to assess agents for reproductive and developmental toxicity in a practical, open, and informative manner. In addition, the U.S. Environmental Protection Agency's (EPA) guidelines for developmental toxicity (1991) and reproductive toxicity (1996a) risk assessment, and several additional sources reviewed in Chapter 1, were used extensively. Several principles and objectives that are incorporated in the evaluative process are described below. The principles and objectives are followed by details of the evaluative process. The description of the evaluative process continues in the next chapter. In that chapter, the interpretation of toxicity data, integration of toxicity and exposure data, and quantitative assessment steps are covered. PRINCIPLES AND OBJECTIVES Use of Data and Judgment The evaluative process uses both scientific data and scientific judgment. The data required to identify an agent as toxic should

OCR for page 24
Page 25 adequately demonstrate adverse effect and dose-response relationships for general toxicological responses and for reproductive and developmental effects. Furthermore, there is a significant need for data that characterize human exposure or provide reasonable estimates based on the pattern of use of the agent. The essence of the evaluative process is that the interpretation of those data should reflect expert judgment, rather than acquiesce to the passive use of a repetitive series of default assumptions. A valuable adjunct to the evaluation of an exposure to an agent is the inclusion of a statement of what is known and the certainty with which it is known. That should lead to the identification of critical data needs that might stimulate investigations to yield useful information that will enhance certainty of judgment and better serve the U.S. Navy. Weight of Evidence With a weight-of-evidence approach that considers both toxicity and human exposure information, evaluators can determine whether human or experimental animal data can reasonably be used to predict reproductive or developmental effects in humans under particular conditions of exposure. The approach must distinguish those agents for which there is firm evidence about human risk potential, based on relevant data, from those for which the potential for human effects is uncertain or unlikely. It will aid in setting priorities and developing programs to protect personnel from undue exposure to toxic quantities of agents or from undue costs of unnecessary control measures. Using a weight-of-evidence approach to communicate a judgment about human risk, taking into consideration exposure potential, should diminish reliance on the assumption that reproductive and developmental toxicity observed in animals predicts similar effects in humans. Because the evaluative process requires a judgment about human risk potential based on the weight of the evidence, its approach and its results will be useful to the Navy. That approach differs from several programs that assess carcinogenic potential, including the International Agency for Research on Cancer (IARC) monographs, which invoke “sufficiency of evidence” determinations for experimental data; the Science Advisory Panel for the California Proposition 65 listing pro-

OCR for page 24
Page 26 cess, which follows a similar procedure in its review of carcinogenicity data; and the report on carcinogens produced by the National Toxicology Program (NTP), which primarily lists the results of experimental animal studies. IARC and NTP clearly state that their deliberations do not represent a complete assessment of human risk potential, but their monographs and lists continue to be misused for that purpose. Threshold Assumption Use of a threshold assumption for low dose-response relationships implies that there is an exposure level below which an adverse effect is not expected to occur. The assumption of a threshold has been made historically for the chemical induction of many types of reproductive and developmental effects as well as for other noncancer health effects. This is in contrast to the case for carcinogens which historically have been assumed to have no threshold. Recent emphasis on using mechanistic or mode-of-action information to improve the risk assessment process (EPA 1996b) and to harmonize the approaches used for cancer and other types of health effects (Bogdanffy et al. in press), underscores the use of mechanistic information in the weight-of-evidence approach for low-dose extrapolation. For example, some nongenotoxic carcinogens may not have a linear dose-response relationship at low doses (Andersen et al. 2000), whereas some agents that produce reproductive and developmental toxicity may act through a genetic mechanism or an endogenous mechanism that is additive to background, and therefore be more likely to exhibit a linear dose-response relationship at low doses (Gaylor et al. 1988). These types of mechanisms tend to blur the distinction between the default use of a linear low-dose extrapolation for cancer and a threshold assumption for other health effects which defaults to the application of uncertainty factors to the no-observed-adverse-effect level (NOAEL), lowest-observed-adverse-effect level (LOAEL), or benchmark dose. Consideration of mechanistic information (i.e., toxicokinetic and toxicodynamic data) should be a major factor in the weight-of-evidence process and in deciding how to proceed with low-dose extrapolation. In the absence of any mechanistic or mode-of-action information, the default assumption continues to be a threshold or low dose nonlinear dose-response relationship for health effects other than cancer, but this assumption should continue to be

OCR for page 24
Page 27 explored through the development and application of mechanistic data in the risk assessment process. Narrative Statement Communicating the results of a weight-of-evidence evaluation is best accomplished through a narrative document. A narrative permits expression of the degree of certainty associated with a judgment about the scientific evidence. The document must use terms that are meaningful to Navy policy officials or decision makers, it must define those terms carefully, and it must use them consistently. The narrative must be clear in explaining the basis of the judgment, the breadth of expert support, the degree to which the judgment reflects the actual information, and the assumptions made in the absence of information. Certainty Documents produced under the evaluative process will clearly describe the level of confidence in the evaluative judgment. The need to invoke a series of default assumptions will signify progressively greater degrees of uncertainty. Certainty based on the interpretation of essential data should be distinguished from “certainty based on defaults,” where default assumptions force evaluators to designate an agent as having toxic potential. Conservative default assumptions, based on prudent public health concerns, have a rightful place in the options available to risk assessors and managers. Such assumptions should be used only where absolutely necessary, and always openly. Finally, because the evaluative process adopts an open, candid, narrative form of communication, it minimizes the dissemination of inappropriate or simplistic statements that are commonly misused and that are needlessly alarming. Use All Relevant, Acceptable Data Reaching a determination about an agent's potential toxicity to humans is best done by a consideration of all relevant experimental

OCR for page 24
Page 28 animal and human data. Decisions to use either published or unpublished data should depend on the quality and completeness of the data set. Unfortunately, publication in the open scientific literature does not in itself qualify data as acceptable for evaluation. Many published articles present data in insufficient detail to allow them to be of use in risk evaluations. Whether data are judged acceptable from the perspective of sound scientific design and interpretation will depend heavily on the actual review of specific studies. Good laboratory practices have been promulgated by the Organization for Economic Cooperation and Development (OECD 1987), the U.S. Food and Drug Administration (FDA 1987), and the U.S. Environmental Protection Agency (EPA 1990). Good laboratory practices can serve as useful guides for assessing the quality and completeness of reported data. Comparing the test design and completeness of data reporting to what is outlined in test guidelines and procedures might be of particular value. Other factors that should be considered include statistical power, analytical approaches, data presentation, and consistency with other results. Qualities and Limitations of Reproductive and Developmental Toxicity Studies Developmental toxicity studies typically assess whether structural abnormalities are associated with administration of an agent to a pregnant female during major organogenesis in the developing embryo. General reproductive effects can be assessed through analysis of various types of mating studies. Reproductive and developmental toxicity studies provide much useful information on a chemical's potential to cause adverse reproductive and developmental effects. However, the limitations of each study type should be recognized. For example, the prenatal developmental toxicity study provides information on the effects of repeated exposure to an agent during the period of major organogenesis, but does not follow animals postnatally to evaluate aspects such as reversibility and repair or organ function. Detailed descriptions of various study types and their qualities and limitations are presented in Appendix D.

OCR for page 24
Page 29 Characterizing Data The evaluative process uses three generic criteria for judging data insufficient: There are no data. The studies are of limited utility as a result of deficiencies in design. or execution, or because the data are insufficiently detailed to allow an independent analysis. The available studies are acceptable, but the data are insufficient to reach a definitive conclusion because they do not span a sufficient number of outcomes; the study might, however, offer useful supplemental information. Data sets that are insufficient for evaluating reproductive or developmental toxicity do not arise solely from studies that are unreliable and therefore unworthy of consideration. Information from in vitro or nontraditional in vivo studies, for example, frequently provides enough experimental evidence to corroborate other evidence for an adverse effect. Alone, however, those studies might not provide enough evidence to be considered sufficient to identify an adverse effect. A judgment that data are insufficient to establish an adverse effect does not mean that they are sufficient to establish lack of an adverse effect. Such a presumption would be erroneous. Sufficiency is a designation with stringent criteria. The criteria required for a sufficient data set are discussed later. Expert Review Team Evaluations of exposures to agents should involve a group of experts. The breadth of expertise required is rarely found in one person and group review ensures that the views held by each member are subjected to the scrutiny and acceptance of scientific peers. The group should include epidemiologists and experts in toxicology and

OCR for page 24
Page 30 related areas (e.g., reproductive toxicologists, developmental toxicologists, developmental neurotoxicologists, risk assessors, biostatisticians), as well as in human exposure to the chemicals of interest. A rotating core of scientific members who serve for fixed periods on a series of working groups will enhance consistency of reviews. This model is in use by the NTP's Center for the Evaluation of Risks to Human Reproduction, which was established in 1998. Using the evaluative process is resource- and time-intensive and, therefore, in some cases, the Navy may want to consult existing sources of information which provide detailed evaluations developed by experts in reproductive and developmental toxicology. The detailed evaluations that are available as well as other sources of information are described in Appendix B. GENERAL DESCRIPTION The evaluative process recommended by the subcommittee (based on the process described by Moore et al. 1995a) outlines a systematic, sequenced procedure for reviewing data on animal and human reproductive and developmental toxicity, on general toxicological and biological parameters, and on the conditions of use that result in human exposure. The goal is to determine whether exposure to an agent could cause reproductive or developmental toxicity in humans. Expert judgment is applied in a series of steps that reflect the systematic thought sequences used by most experienced risk assessors. Brief summaries that describe each step appear below, followed by more detailed presentations in the rest of this chapter and in Chapter 3. The section on exposure data discusses the pattern and degree of human exposure to the agent. It primarily considers occupational exposures and develops numerical estimates of exposure from what is known about those uses and exposures. The section on general toxicological and biological parameters reviews and summarizes chemical data and basic toxicity information available on the agent of interest and reviews data on absorption, distribution, metabolism, and excretion in humans and experimental animals. The section on developmental and reproductive toxicity reviews data from human and animal studies. To ensure adequate assessments of both types of data, experts review each type of data independently and prepare synopses of individual studies.

OCR for page 24
Page 31 In the step for integration of toxicity and exposure information, the existing data on human and experimental animal developmental and reproductive toxicity are evaluated together for evidence of complementarity or inconsistency. Those evaluations are then assessed in terms of the known data on basic toxicity and pharmacokinetics. The result is an integrated judgment about the relevance of all the data for predicting human risk. If the expert committee members judge that the toxicity data are relevant to humans, the committee undertakes a quantitative evaluation. Finally, the toxicity and exposure data are integrated to characterize risk. When the data reviewed are deficient, the ensuing judgments usually involve a large degree of uncertainty. The identification of critical data needs provides a focus on research that can materially enhance the certainty of future judgments about the agent's potential risk. A summary reviews the scientific judgments and conclusions formed in the steps above and conveys the level of confidence in the judgment. The summary is written in a narrative style which the subcommittee considers to be the best way to present such information to Navy environmental health professionals. The narrative is central to accurate interpretation of the scientific judgments and conclusions about the exposure of interest. Agents present a reproductive or developmental risk to human health only under certain conditions. Single-letter or word designations, such as “positive” or “negative,” or labeling a chemical as a “reproductive toxicant” cannot effectively communicate that critical fact. Nor can essential facts about such parameters as frequency, duration, and route of exposure, susceptible populations, age, and reproductive status be conveyed without some sense of context. For those reasons, the narrative form is crucial. The last step is a listing of references for papers and studies of the agent of interest. DETAILS OF THE EVALUATIVE PROCESS The sections below detail the steps of the evaluative process recommended by the subcommittee. Box 2-1 is a sample table of contents from an evaluation of lithium (Moore et al. 1995b) using a similar process.

OCR for page 24
Page 32 Box 2-1 Example Table of Contents–Assessment of Lithium INTRODUCTION 1. Exposure Data    1.1 Consumer Exposure    1.2 Environmental Exposure    1.3 Occupational Exposure    1.4 Exposure Estimates 2. General Toxicological and Biological Parameters    2.1 Chemistry    2.2 Basic Toxicity    2.3 Pharmacokinetics 3. Reproductive and Developmental Toxicity Data    3.1 Human Data       3.1.1 Developmental Toxicity          3.1.1.1 Register Studies          3.1.1.2 Prospective Studies          3.1.1.3 Retrospective Studies          3.1.1.4 Clinical Case Reports       3.1.2 Reproductive Toxicity          3.1.2.1 Developmental Toxicity          3.1.2.2 Reproductive Toxicity    3.2 Experimental Animal Toxicity       3.2.1 Developmental Toxicity          3.2.1.1 Studies in Mice          3.2.1.2 Studies in Rats          3.2.1.3 Studies in Rabbits, Monkeys, and Pigs       3.2.2 Reproductive Toxicity          3.2.2.1 Female Reproductive Toxicity          3.2.2.2 Male Reproductive Toxicity 4. Integration of Toxicity and Exposure Information    4.1 Interpretation of Toxicity Data       4.1.1 General Toxicity and Pharmacokinetics Conclusions       4.1.2 Developmental Toxicity          4.1.2.1 Conclusions       4.1.3 Reproductive Toxicity          4.1.3.1 Female Reproductive Toxicity          4.1.3.2 Male Reproductive Toxicity          4.1.3.3 Conclusions    4.2 Default Assumptions    4.3 Quantitative Evaluation       4.3.1 Developmental Toxicity       4.3.2 Reproductive Toxicity 5. Critical Data Needs    5.1 Developmental Toxicity    5.2 Female Reproductive Toxicity    5.3 Male Reproductive Toxicity 6. Summary    6.1 Background    6.2 Human Exposure    6.3 Toxicology       6.3.1 Developmental Toxicity       6.3.2 Reproductive Toxicity          6.3.2.1 Female Reproductive Toxicity          6.3.2.2 Male Reproductive Toxicity    6.4 Quantitative Evaluation       6.4.1 Developmental Toxicity       6.4.2 Reproductive Toxicity    6.5 Certainty of Judgments and Data Needs       6.5.1 Developmental Toxicity       6.5.2 Reproductive Toxicity 7. References Source: Moore et al. (1995b).

OCR for page 24
Page 33 Exposure Data Human exposure data are evaluated to achieve three goals: To identify potentially exposed populations. To identify potential pathways of exposure and to describe the parameters associated with each pattem of use, including route, dose, duration, frequency, timing, age, and number of people potentially exposed. To estimate the range of exposure and thus obtain quantitative estimates of exposures associated with each pattern of use. Although human exposure data are essential for accurate evaluation of an agent's risk potential, data of sufficient quality and quantity are frequently unavailable. Thus, there is uncertainty in the exposure component of the evaluative process, even as there is in hazard characterization. When toxicity data indicate the potential for an adverse effect, the need to estimate the nature of human exposure becomes imperative. In those instances, exposure estimates can be derived using modeling approaches based on data from other sources, and one or more default assumptions can be used. The greater the number of default assumptions employed, the greater the uncertainty about the accuracy of the expert judgment. A chemical might have a variety of uses, and the concentration, route, and frequency of exposure can differ for each use. The physical form of the chemical and the presence of other agents also might vary with use. Those factors can dramatically influence both the probability that exposure will lead to absorption into the body and the rate at which absorption occurs. Some uses might lead to indirect exposures, perhaps resulting from deliberate, incidental, or accidental environmental releases of the chemical. Pesticide residues in food are an example of exposure that arises from a deliberate environmental release. Incidental or deliberate releases of pesticides, through normal use, might lead to exposure through drinking water or in respired air. Some exposures are direct: Examples include consuming a chemical as a drug or using chemicals to mask odors. Although the frequency and intensity of exposure to an agent are typically greatest in occupational

OCR for page 24
Page 34 settings, sometimes use of certain products by Navy personnel outside the workplace can lead to episodes of exposure intensity that approach or exceed occupational exposures. Examples include the use of cosmetics and nonprescription drugs; pesticide applications in the home; furniture refinishing; and home remodeling. Because it is frequently difficult to establish directly many patterns of agent use, data on exposure can be estimated by indirect modeling (EPA 1992). Indirect assessments use available information on concentrations of chemicals in exposure media, and information about when, where, and how individuals might have contact with the exposure media. Models and a series of exposure factors (e.g., agent concentration, contact duration, contact frequency) are then used to estimate exposure. The models can be deterministic or probabilistic. A deterministic model provides a point estimate of exposure; a probabilistic model considers the range of estimates and provides a probability distribution of exposures. Data sets are rarely complete and, therefore, exposure estimates are developed using various default assumptions (combined with the modeling estimates). For example, to estimate the risk posed by pesticides in foods, EPA initially assumes that residues are at tolerance levels and that 100% of a crop has been treated (EPA 1999). To maintain occupational exposure limits, personal exposure monitoring techniques can be used. Exposure assessments generally focus on a single chemical and a single route of exposure. However, there have been recent efforts to examine multiple pathways of exposure. The current approach is to add the single point estimates for each exposure source to arrive at a sum. Research continues on developing new data and exposure models for estimating multiple-pathway exposures. It might not be necessary to review each exposure parameter on a chemical-and use-specific basis. Exposure paradigms and values that are in regular use in government agencies or that are recommended by scientific organizations could be adopted. The American Conference of Government Industrial Hygienists' (ACGIH) Threshold Limit Values (TLVs) and Biological Exposure Indices (BEIs) (ACGIH 2000), the U.S. National Institute for Occupational Safety and Health's (NIOSH) recommended exposure limits (RELs) (NIOSH 2000), and the U.S. Occupational Safety and Health Administration's (OSHA) permissible exposure limits (PELs) and short-term exposure limits (STELs) (29

OCR for page 24
Page 46 A brief discussion of each feature can be found later in this chapter. Underlying the epidemiological method are two assumptions: first, that disease does not occur randomly and, second, that systematic study can identify factors that cause or can prevent disease (Hennekens and Buring 1987). Those design features help to ensure the scientific validity of human data, and they underscore the added strength and utility of epidemiological data that contribute to assessment of human health risk from toxicological hazards. Weighing the Evidence All epidemiological studies should be critically evaluated with respect to research design (especially in relation to study purposes), methods, analysis, and interpretation of results. Evaluation requires all aspects of the epidemiological method to be weighed carefully, as shown in Box 2-2. Although some researchers advocate ranking studies by design type, this approach can be overly simplistic because it assumes strict adherence to methodological rigor. In essence, there is no single way to rank studies; design and methodology must be considered simultaneously. For example, a cohort study of limited statistical power should not be weighed more heavily than a well-conducted case-control study. Critical weighing of the available literature is necessary. Selection of an appropriate control group is an important criterion for assessing case-control studies. The control groups should be similar to the study group, save for the presence of disease. Controls can be selected from registries, such as the list of people kept by departments of motor vehicles or voter registration; or they can be drawn from neighborhoods, hospitals, or lists of friends and family, depending on the study's hypothesis. Selection of appropriate controls minimizes selection bias and enhances validity of case-control studies. For methodological aspects, greater weight should be assigned to studies that use an entire population or employ probability sampling techniques to develop a random sample. Probability samples help ensure the external validity (generalizability) of study results. There are various types of probability samples: simple, systematic, stratified, cluster, or multistage random sampling. The choice of the sample is predicated on the study's purpose.

OCR for page 24
Page 47 Box 2-2 Weighing Epidemiological Evidence Design Experimental Randomized clinical trial Community trial Observational (analytical) Prospective cohort Retrospective cohort Case-control Methods Population or sampling frame Choice of control group (for case-control) Choice of study exposure(s) Acute vs. chronic Continuous vs. intermittent Dose, timing Choice of study outcome(s) Healthy (live birth) vs. adverse Unit of analysis (maternal, paternal, parental) Lack of independence Data sources Standardized data collection In-person interview Telephone interview Self-administered questionnaire Existing records Sample size Participation rate Analysis Multivariate analysis (Effect modification, interaction, confounding) Bivariate analysis Univariate analysis Interpretation Statistical significance Type I and II errors Alternative explanations (chance, error, confounding, bias) Causality Necessary, sufficient Risk factors

OCR for page 24
Page 48 Epidemiological studies that collect data from several sources are useful because their validity and reliability can be further assessed. Such studies should be given greater weight than those that rely exclusively on self-reported data. The use of biological markers of exposure, susceptibility, or effect in disease is another strategy for maximizing the validity of study results. However, careful interpretation of biomarker data is needed, given our limited understanding of what the findings might actually mean in terms of human health. Selection of a health outcome for study depends in part upon the exposure of interest and methodological considerations such as the ability to define, measure, and validate adverse outcomes, especially if self-reported. Operational definitions for outcomes can be general or specific in nature (e.g., all birth defects versus spina bifida, respectively), and will affect the type of statistical analysis which can be performed and the interpretation of results. Statistical power may be limited if restrictive operational definitions are used for rare outcomes, or if other important covariates cannot be fully addressed. Use of a standardized methodology for ascertaining data on exposure, outcome, or other relevant covariates is an essential feature of an epidemiological study that enhances the validity of results. All study participants should be subjected to the same method for collecting data. In-person interviews are reported to provide the most reliable self-reported exposure data, followed by telephone or mail survey techniques. Standardized forms for collecting existing data also should be used. Epidemiological studies of sufficient size to minimize Type I (alpha) and II (beta) errors should be weighed more heavily than statistically underpowered studies. Type I error is the incorrect rejection of the null hypotheses–the investigator erroneously concludes that an association exists. Alpha levels, by convention, are typically set at .05, which denotes that a “significant” chance finding can occur 5% of the time. Type II errors occur when the investigator fails to reject the null hypothesis when an association does exist. Beta levels, by convention, are typically set at .20, yielding a study power of 80% –a study detects a true difference 80% of the time. Ad hoc power calculations can provide better insight about the sufficiency of the sample for study purposes. However, many published papers are secondary analyses

OCR for page 24
Page 49 with uncertain statistical power. As such, the absence of an effect needs to be weighed in relation to the study's statistical power and interpreted accordingly. The analytical plan of epidemiological studies should use descrip tive and analytical techniques in describing the sample and results. Descriptive statistics, such as frequency distributions, cross-tabulations, measures of central tendency, and variation, can help explain underlying distributions of variables and direct the assessment of appropriateness of more advanced statistical techniques. Careful weighing of study findings with respect to the design and methods helps to ensure the validity of results. Greater weight should be assigned to epidemiological studies that have carefully assessed statistical significance. Because a wide variety of statistical tools are available for testing significance, consideration must to be given to the design of the study, to the types of data collected, to the sample size, and to the study purpose. Several textbooks provide diagrams to assist in selecting the appropriate statistics (e.g., Hennekens et al. 1987). Studies that provide confidence intervals and not just probability values alone should be assigned more weight. The chance of Type I and II errors should be considered. Alternative explanations for the results should be carefully addressed. Studies that discuss results in relationship to chance findings, random errors, possible confounders, or sources of bias should be weighed more heavily than are studies that ignore or incompletely address those issues. Case Reports and Clinical Series Almost all exposures that are currently recognized as having unequivocal developmental or reproductive toxicity in humans were initially recognized in case reports and clinical series. This was possible because adverse reproductive exposures typically produce qualitatively distinct patterns of toxicity. They do not normally affect all reproductive and developmental outcomes indiscriminately. This effect is most apparent with developmental toxicity–exposure to a particular agent during development characteristically causes a distinctive pattern of congenital anomalies depending on the timing of exposure.

OCR for page 24
Page 50 Clinical series can be compelling when they demonstrate the occurrence of a highly characteristic pattern of anomalies in children of women who experienced similar well-defined exposures at similar times in pregnancy. The association is especially convincing if both the pattern of anomalies and the exposure are rare in other circumstances. For example, the characteristic patterns of congenital anomalies produced by excessive maternal exposure to alcohol, toluene, methylmercury, or polychlorinated biphenyls during pregnancy led to recognition of the developmental toxicity of exposure to these substances. The dysmorphic syndromes that occur–fetal alcohol syndrome, toluene embryopathy, congenital Minimata disease, and congenital rice oil disease, respectively–are not distinguished by the presence of a single distinctive feature. In fact, many of the component features are rather common and can have a variety of causes. When the features occur together, however, they constitute a distinctive pattern of congenital anomalies that is rare except in children born to mothers who have been exposed to one of these substances during pregnancy. In contrast to well-designed cohort and case-control studies, neither case reports nor clinical series can provide reliable quantitative estimates of the risk of adverse outcome in children of women with a toxic exposure during pregnancy. Case reports and clinical series are useful as a means for generating hypotheses that can be tested with analytical designs. Adverse reproductive outcomes are common in the general population–spontaneous abortion occurs in 15-20 % of recognized pregnancies, and approximately 5% of all children have serious congenital anomalies or mental retardation that become apparent within the first year of life. The frequency of learning disabilities and behavioral disorders in childhood is even greater. Coincidental occurrence of various exposures in a pregnant woman and miscarriage or congenital anomaly in the offspring is, therefore, common. Chance associations are even more likely if one considers the full range of possible adverse reproductive effects and exposure of either parent for a variable period of time before conception. The observation of adverse developmental or reproductive outcomes in a few case reports or clinical series is, therefore, never sufficient by itself to establish the reproductive toxicity of an exposure in humans.

OCR for page 24
Page 51 Assessing Causality in Human Studies Careful attention must be given to assessing causality. Causation also can be ranked in terms of weight of evidence (Jekel et al. 1996). Greater weight is given to a sufficient cause that, when present, always results in disease. Next, a necessary cause also precedes disease and, when absent, cannot result in disease. The third and weakest type of causation is a risk factor that when present, increases the likelihood of an outcome in exposed versus unexposed individuals. A risk factor is neither a necessary nor a sufficient cause of death or adverse health outcome. Most observational studies estimate risk factors in assessing causality. Necessary and sufficient causes are often useful for the study of infectious diseases. If an association is observed between exposure to a specific substance and a particular adverse outcome, the investigator must determine whether it is a chance finding or causal in nature. There are several widely recognized criteria for assessing causality, including temporal relationships, strength of association, dose-response relationships, replication of findings, biological plausibility, consideration of alternative explanations, cessation of exposure, and specificity of association (Gordis 1996). Experimental and observational designs alike can assess temporal relationships; descriptive studies cannot. The strength of an association and dose-response relationships are determined in an observational epidemiological study by assessment of the relative risk (RR) for cohort studies or odds ratios (ORs) for case-control studies. RRs or ORs greater than 1.0 denote an increase in risk of disease given exposure in comparison with the unexposed. Conversely, a RR or OR of less than 1.0 denotes a reduction in risk of disease given exposure. RRs or ORs equal to 1.0 denote no effect. Confidence intervals that exclude 1.0 indicate statistical significance of the risk factor. Multivariate modeling or stratification procedures can be used in observational studies to assess for confounding, interaction, or effect modification and, thereby, to help rule out alternative explanations. The existing literature is used to assess the remaining criteria for causality: replication of findings in other populations, biological plausibility to aid in interpreting risk factors, cessation of exposure, specific-

OCR for page 24
Page 52 ity of association, and consistency with other knowledge. In attempting to assess causality based on the available data, careful consideration must be given to the direction and strength of statistical estimates across studies that use various populations and study designs. Formalized approaches currently are available to weigh empirical findings (e.g., meta-analysis) and should be considered. The actual interpretation of the empirical evidence must be tempered by study limitations and attention to methodological rigor. If human data are considered with animal data, the plausibility of the agent causing an adverse outcome could be enhanced. See Appendix C for further discussion of causality. Evaluating human studies requires careful assessment of all elements of the epidemiological method. This is not an easy process, and it requires understanding of research methodology and appreciation of biostatistics. Experimental Animal Toxicity Utility and Limitations The study of chemical exposure in experimental animals is a reasonably efficient and effective means for ascertaining a chemical's toxic potential in humans. Investigators have developed a standard series of animal test procedures that domestic and international regulatory bodies require, for example, for approval to market drugs, pesticides, and, to a lesser degree, other industrial and commercial substances. Data showing adverse effects from such animal reproductive and developmental toxicity studies are assumed to be predictive of a potential human reproductive or developmental effect, although the precise manifestations may not be the same. This assumption is based on comparisons of data from animals and humans for exposures that are known to cause human reproductive and developmental toxicity (Thomas 1981; Nisbet and Karch 1983; Kimmel et al. 1984; Hemminki and Vineis 1985; Meistrich 1986; Working 1988; Francis et al. 1990; Newman et al. 1993). Animal models are available for all known exposures causing reproductive toxicity in humans, and in many cases the effects in animals are similar, or are of a similar type, as those

OCR for page 24
Page 53 observed in humans (Kimmel et al. 1984,1990; Schardein 1998). The species that show strict concordance with human effects vary (Schardein 1998), but this is likely due to variability in human exposure, the level of exposure, and differences in pharmacokinetics and pharmacodynamics that are to some extent chemical-specific. Despite the proven utility of animal and other laboratory data, several factors can limit their usefulness. There are close similarities among mammals in such biological processes as fetal and embryonic development, sperm production, and ovulation, but distinct differences and variations also exist among species. Such differences can limit the certainty of predicting that an effect seen in a laboratory species will occur in humans. It is not uncommon, for example, for one animal species to exhibit an adverse effect, while a second species either shows no effect or shows effects only at markedly different doses. There are practical restraints on the number of animals that can be studied; this places statistical limits on the certainty of some test results. Poor study design or laboratory practices also can compromise the data. Thus, because experimental animal toxicity data can be misinterpreted, it is imperative that the evaluative process include a review and interpretation of animal data by scientists with appropriate training and experience. The logic that underpins their interpretation of data should be stated clearly in the evaluation so that other experts can understand the basis for the evaluative judgment. Adverse Effect In general, three criteria must be met to support a conclusion that animal data are sufficient to indicate an adverse effect in the species studied under the conditions specified for the experiment: At least one well-conducted study must show reproductive or developmental toxicity in a mammalian species. When the study data are insufficient, improper study design or execution, inadequate doses or duration of exposure, poor survival, or too few animals to achieve statistical power are often the cause. At present, no nonmammalian or in vitro systems are considered to be predictive of human responses, and are not accepted by

OCR for page 24
Page 54 regulatory agencies for human hazard assessment of reproductive and developmental toxicity. Studies might be considered adequately conducted but still insufficient because the endpoints are not clearly related to an adverse effect. Such data should still be cited. For example, only one study might be relevant to reproductive toxicity. That study might have noted a decrease in production of progesterone by cultured granulosa cells. Although the study is adequate in every technical respect, the data themselves are insufficient for rendering an assessment of animal hazard because the relationship of this effect to an adverse effect in vivo cannot be predicted. In such an instance, the evaluators should consider more definitive test data as a critical data requirement for predicting effects on ovarian function and related outcomes in humans. The data must be interpreted as having biological significance. Although the evaluative process strongly endorses the application of appropriate and rigorous statistical methods, it must be clear that, when a study meets conventional statistical criteria, it also must yield data that reflect an effect that is both biologically plausible and considered adverse. In the occasional instance where there is statistical but not biological significance, the evaluation must clearly articulate the basis for concluding that the evidence is insufficient to show an adverse effect, and discuss the uncertainties associated with the data. A case in point is the evaluation of HFC-134a exposure in a two-generation study by Alexander et al. (1996) in which the parental generation was exposed before mating and during pregnancy and lactation, and F2 offspring were found to have slight but statistically significant delays in physical and reflex development that were not clearly dose-related. Because the F2 generation was never exposed to HFC-134a directly or indirectly and the changes reported represented a one-half to one-day delay and were not clearly dose-related, these changes were not considered treatment-related (see Appendix A for further discussion). Dose response. Evidence of a dose-response relationship is an important criterion in the assessment of a toxic exposure in

OCR for page 24
Page 55 experimental animal studies. However, traditional dose-response relationships might not always be observed for some endpoints. With increasing dose, for example, a pregnancy might end in a fetal loss rather than in birth of live offspring with malformations. Typically, the demonstration of no adverse effect requires a larger set of evidence than does the demonstration of an adverse effect. To support a conclusion that a given exposure does not cause developmental toxicity, the available studies must be conducted in at least two mammalian species and must test for a wide variety of pre- and postnatal outcomes (EPA 1991). A minimum data set for a conclusion of no reproductive toxicity would normally consist of at least one two-generation reproductive toxicity study. Additional studies often are warranted, especially when there is prior knowledge of the general toxicity of a given agent or chemical class, or when there is knowledge of the pharmacological activity of the agent. Several examples illustrate cases in which the minimum data set described above should not be used as the basis for concluding no adverse effect. The absence of a postnatal functional evaluation in the database renders the developmental toxicity database incomplete without an additional developmental neurotoxicity study (Moore et al. 1995a; EPA 1998c). A standard reproductive study in rats showing no effect on the ability of males to impregnate females should not be considered to support a conclusion that there is no male reproductive toxicity in all species. Unlike humans, rodents produce sperm in numbers that greatly exceed the minimum requirements for fertility. A substantial reduction in sperm production in rodents may not compromise fertility in rodents while a less severe reduction in human males could cause reduced fertility. If there are data that suggest that standard experimental animal species are not appropriate for comparison with humans, data from a less commonly used species, such as dog or a nonhu-

OCR for page 24
Page 56 man primate that is metabolically similar to humans, would be needed to confirm the lack of reproductive and developmental toxicity. As they are performed today, in vitro studies will not by themselves provide sufficient evidence of no adverse effect. Studies in two species are often available in which pregnant females were exposed during pregnancy and killed just before parturition. This permits full evaluation of adverse effects on mother and fetus. Such prenatal developmental toxicity studies designed to determine a substance's potential to cause structural abnormalities, growth deficits, or death are available in two species for many agents, drugs and pesticides, in particular. When such studies demonstrate no adverse effects, one might conclude that the data are sufficient to indicate that there is little or no risk that the agent might cause developmental toxicity manifested at birth. However, it is important to recognize the limitations of these studies and that if dosing stops at the end of major organogenesis (gestation day 15 in rats and mice, day 19-20 in rabbits), later fetal exposures might result in further growth retardation and in other developmental defects (e.g., those occurring in late-developing reproductive organs such as hypospadias). In addition, there is no information from such studies on the postnatal effects of prenatal exposures, including possible neurobehavioral deficits or other impaired organ system function. Similarly, the absence of adverse effects in a two-generation reproductive study would not preclude the possibility of significant reproductive toxicity that is not manifested as a fertility problem, unless more detailed sperm evaluations, estrous cycling, ovarian histology, and endocrine function were included. Detailed description of animal testing protocols, and their qualifications and limitations are discussed in Appendix D.