Read "Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde" at NAP.edu

Page 174 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

Appendix B
Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines

The text in this appendix was excerpted directly from the indicated guidelines of the U.S. Environmental Protection Agency (EPA).

GUIDELINES FOR MUTAGENICITY RISK ASSESSMENT

The evidence for a chemical’s ability to produce mutations and to interact with the germinal target is integrated into a weight-of-evidence judgment that the agent may pose a hazard as a potential human germ-cell mutagen. All information bearing on the subject, whether indicative of potential concern or not, must be evaluated. Whatever evidence may exist from humans must also be factored into the assessment.

All germ-cell stages are important in evaluating chemicals because some chemicals have been shown to be positive in postgonial stages but not in gonia (Russell et al., 1984). When human exposures occur, effects on postgonial stages should be weighted by the relative sensitivity and the duration of the stages. Chemicals may show positive effects for some endpoints and in some test systems, but negative responses in others. Each review must take into account the limitations in the testing and in the types of responses that may exist.

To provide guidance as to the categorization of the weight of evidence, a classification scheme is presented to illustrate, in a simplified sense, the strength of the information bearing on the potential for human germ-cell mutagenicity. It is not possible to illustrate all potential combinations of evidence, and considerable judgment must be exercised in reaching conclusions. In addition, certain responses in tests that do not measure direct mutagenic end points (e.g., SCE induction in mammalian germ cells) may provide a basis for raising the weight

Page 175 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

of evidence from one category to another. The categories are presented in decreasing order of strength of evidence.

Positive data derived from human germ-cell mutagenicity studies, when available, will constitute the highest level of evidence for human mutagenicity.
Valid positive results from studies on heritable mutational events (of any kind) in mammalian germ cells.
Valid positive results from mammalian germ-cell chromosome aberration studies that do not include an intergeneration test.
Sufficient evidence for a chemical's interaction with mammalian germ cells, together with valid positive mutagenicity test results from two assay systems, at least one of which is mammalian (in vitro or in vivo). The positive results may both be for gene mutations or both for chromosome aberrations; if one is for gene mutations and the other for chromosome aberrations, both must be from mammalian systems.
Suggestive evidence for a chemical's interaction with mammalian germ cells, together with valid positive mutagenicity evidence from two assay systems as described under 4, above. Alternatively, positive mutagenicity evidence of less strength than defined under 4, above, when combined with sufficient evidence for a chemical's interaction with mammalian germ cells.
Positive mutagenicity test results of less strength than defined under 4, combined with suggestive evidence for a chemical's interaction with mammalian germ cells.
Although definitive proof of nonmutagenicity is not possible, a chemical could be classified operationally as a nonmutagen for human germ cells if it gives valid negative test results for all endpoints of concern.
Inadequate evidence bearing on either mutagenicity or chemical interaction with mammalian germ cells (EPA 1986, Pp 9-10).

METHODS FOR DERIVATION OF INHALATION REFERENCE CONCENTRATIONS AND APPLICATION OF INHALATION DOSIMETRY

The culmination of the hazard identification phase of any risk assessment involves integrating a diverse data collection into a cohesive, biologically plausible toxicity “picture”; that is, to develop the weight of evidence that the chemical poses a hazard to humans. The salient points from each of the laboratory animal and human studies in the entire data base should be summarized as should the analysis devoted to examining the variation or consistency among factors (usually related to the mechanism of action), in order to establish the likely outcome for exposure to this chemical. From this analysis, an appropriate animal model or additional factors pertinent to human extrapolation may be identified.

Page 176 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

The utility of a given study is often related to the nature and quality of the other available data. For example, clinical pharmacokinetic studies may validate that the target organ or disease in laboratory animals is likely to be the same effect observed in the exposed human population. However, if a cohort study describing the nature of the dose-response relationship were available, the clinical description would rarely give additional information. An apparent conflict may arise in the analysis when an association is observed in toxicologic but not epidemiologic data, or vice versa. The analysis then should focus on reasons for the apparent difference in order to resolve the discrepancy. For example, the epidemiologic data may have contained other exposures not accounted for, or the laboratory animal species tested may have been inappropriate for the mechanism of action. A framework for approaching data summary is provided in Table 2-6. Table 2-7 provides the specific uses of various types of human data in such an approach. These guidelines have evolved from criteria used to establish causal significance, such as those developed by the American Thoracic Society (1985) to assess the causal significance of an air toxicant and a health effect. The criteria for establishing causal significance can be found in Appendix C. In general, the following factors enhance the weight of evidence on a chemical:

Clear evidence of a dose-response relationship;
Similar effects across sex, strain, species, exposure routes, or in multiple experiments;
Biologically plausible relationship between metabolism data, the postulated mechanism of action, and the effect of concern;
Similar toxicity exhibited by structurally related compounds;
Some correlation between the observed chemical toxicity and human evidence.

The greater the weight of evidence, the greater the confidence in the conclusion derived. Developing improved weight-of-evidence schemes for various noncancer health effect categories has been the focus of efforts by the Agency to improve health risk assessment methodologies (Perlin and McCormack, 1988).

Another difficulty encountered in this summarizing process is that certain studies may produce apparently positive or negative results, yet may be flawed. The flaws may have arisen from inappropriate design or execution in performance (e.g., lack of statistical power or adjustment of dosage during the course of the study to avoid undesirable toxic effects). The treatment of flawed results is critical; although there is something to be learned from every study, the extent that a study should be used is dependent on the nature of the flaw (Society of Toxicology, 1982). A flawed negative study could only provide a false sense of security, whereas a flawed positive study may contribute to some limited understanding. Although there is no substitute for good science, grey areas such as this are ultimately a matter of scientific judgment. The risk assessor will have to decide what is and is not useful within the framework outlined earlier.

Page 177 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

Studies meeting the criteria detailed in Sections 2.1.1 and 2.1.2 (epidemiologic, nonepidemiologic data), and experimental studies on laboratory animals that fit into this weight-of- evidence framework are used in the quantitative dose-response assessment discussed in Chapter 4 (EPA 1994, Pp 2-42 to 2-46).

GUIDELINES FOR DEVELOPMENTAL TOXICITY RISK ASSESSMENT

The 1989 Proposed Amendments described important considerations in determining the relative weight of various kinds of data in estimating the risk of developmental toxicity in humans. The intent of the proposed weight-of-evidence (WOE) scheme was that it not be used in isolation, but be used as the first step in the risk assessment process, to be integrated with dose-response information and the exposure assessment.

The WOE scheme was the subject of a considerable number of public comments, and was one of the major concerns of the SAB. The concern of public commentors was that the reference to human developmental toxicity in this scheme suggested that a chemical could be prematurely designated, and perhaps labeled, as causing developmental toxicity in humans prior to the completion of the risk assessment process. The SAB suggested that the intended use of this scheme was not consistent with the use of the term “weight of evidence” in other contexts, since WOE is usually thought of as an evaluation of the total composite of information available to make a judgment about risk. In addition, the SAB Committee proposed that the Agency consider development of a more conceptual approach using decision analytical techniques to predict the relationships among various outcomes.

In the final Guidelines, the terminology used in the WOE scheme has been completely changed and retitled “Characterization of the Health-Related Database.” The intended purpose of the scheme is to provide a framework and criteria for making a decision on whether or not sufficient data are available to conduct a risk assessment. This decision is based on the available data, whether animal or human, and does not necessarily imply human hazard. This decision process is part of, but not the complete, WOE evaluation, which also takes into account the RfDDT or RfCDT and the human exposure information, culminating in risk characterization.

The final Guidelines also place strong emphasis on the integration of the dose-response evaluation with hazard information in characterizing the sufficiency of the health-related database. In line with this approach, the Guidelines have been reorganized to combine hazard identification and dose-response evaluation. Finally, the SAB comments on developing a conceptual matrix provide an interesting challenge, but current data indicate that the relationships among endpoints of developmental toxicity are not consistent across chemicals or species. The Agency is currently supporting modeling efforts to further explore the relationship among various development toxicity endpoints and the

Page 178 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

development of biologically based dose-response models that consider multiple effects (EPA 1991, Pp 69-70).

A REVIEW OF THE REFERENCE DOSE AND REFERENCE CONCENTRATION PROCESSES

A weight-of-evidence approach such as that provided in EPA’s RfC Methodology (U.S. EPA, 1994) or in EPA’s proposed guidelines for carcinogen risk assessment (U.S. EPA, 1999a) should be used in assessing the database for an agent. This approach requires a critical evaluation of the entire body of available data for consistency and biological plausibility. Potentially relevant studies should be judged for quality and studies of high quality given much more weight than those of lower quality. When both epidemiological and experimental data are available, similarity of effects between humans and animals is given more weight. If the mechanism or mode of action is well characterized, this information is used in the interpretation of observed effects in either human or animal studies. Weight of evidence is not to be interpreted as simply tallying the number of positive and negative studies, nor does it imply an averaging of the doses or exposures identified in individual studies that may be suitable as points of departure (PODs) for risk assessment. The study or studies used for the POD are identified by an informed and expert evaluation of all the available evidence (EPA 2002b, Pp 4-11 to 4-12).

GUIDELINES FOR CARCINOGEN RISK ASSESSMENT

The cancer guidelines emphasize the importance of weighing all of the evidence in reaching conclusions about the human carcinogenic potential of agents. This is accomplished in a single integrative step after assessing all of the individual lines of evidence, which is in contrast to the step-wise approach in the 1986 cancer guidelines. Evidence considered includes tumor findings, or lack thereof, in humans and laboratory animals; an agent’s chemical and physical properties; its structure-activity relationships (SARs) as compared with other carcinogenic agents; and studies addressing potential carcinogenic processes and mode(s) of action, either in vivo or in vitro. Data from epidemiologic studies are generally preferred for characterizing human cancer hazard and risk. However, all of the information discussed above could provide valuable insights into the possible mode(s) of action and likelihood of human cancer hazard and risk. The cancer guidelines recognize the growing sophistication of research methods, particularly in their ability to reveal the modes of action of carcinogenic agents at cellular and subcellular levels as well as toxicokinetic processes.

Weighing of the evidence includes addressing not only the likelihood of human carcinogenic effects of the agent but also the conditions under which such effects may be expressed, to the extent that these are revealed in the toxicological and other biologically important features of the agent.

Page 179 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

The weight of evidence narrative to characterize hazard summarizes the results of the hazard assessment and provides a conclusion with regard to human carcinogenic potential. The narrative explains the kinds of evidence available and how they fit together in drawing conclusions, and it points out significant issues/strengths/limitations of the data and conclusions. Because the narrative also summarizes the mode of action information, it sets the stage for the discussion of the rationale underlying a recommended approach to dose-response assessment.

In order to provide some measure of clarity and consistency in an otherwise free-form, narrative characterization, standard descriptors are used as part of the hazard narrative to express the conclusion regarding the weight of evidence for carcinogenic hazard potential. There are five recommended standard hazard descriptors: “Carcinogenic to Humans,” “Likely to Be Carcinogenic to Humans,” “Suggestive Evidence of Carcinogenic Potential,” “Inadequate Information to Assess Carcinogenic Potential,” and “Not Likely to Be Carcinogenic to Humans.” Each standard descriptor may be applicable to a wide variety of data sets and weights of evidence and is presented only in the context of a weight of evidence narrative. Furthermore, as described in Section 2.5 of these cancer guidelines, more than one conclusion may be reached for an agent (EPA 2005b, Pp 1-11 to 1-12).

The weight of evidence narrative is a short summary (one to two pages) that explains an agent's human carcinogenic potential and the conditions that characterize its expression. It should be sufficiently complete to be able to stand alone, highlighting the key issues and decisions that were the basis for the evaluation of the agent’s potential hazard. It should be sufficiently clear and transparent to be useful to risk managers and non-expert readers. It may be useful to summarize all of the significant components and conclusions in the first paragraph of the narrative and to explain complex issues in more depth in the rest of the narrative.

The weight of the evidence should be presented as a narrative laying out the complexity of information that is essential to understanding the hazard and its dependence on the quality, quantity, and type(s) of data available, as well as the circumstances of exposure or the traits of an exposed population that may be required for expression of cancer. For example, the narrative can clearly state to what extent the determination was based on data from human exposure, from animal experiments, from some combination of the two, or from other data. Similarly, information on mode of action can specify to what extent the data are from in vivo or in vitro exposures or based on similarities to other chemicals. The extent to which an agent’s mode of action occurs only on reaching a minimum dose or a minimum duration should also be presented. A hazard might also be expressed disproportionately in individuals possessing a specific gene; such characterizations may follow from a better understanding of the human genome. Furthermore, route of exposure should be used to qualify a hazard if, for example, an agent is not absorbed by some routes. Similarly, a hazard can be attribut-

Page 180 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

able to exposures during a susceptible lifestage on the basis of our understanding of human development.

The weight of evidence-of-evidence narrative should highlight:

the quality and quantity of the data;
all key decisions and the basis for these major decisions; and
any data, analyses, or assumptions that are unusual for or new to EPA.

To capture this complexity, a weight of evidence narrative generally includes

conclusions about human carcinogenic potential (choice of descriptor(s), described below),
a summary of the key evidence supporting these conclusions (for each descriptor used), including information on the type(s) of data (human and/or animal, in vivo and/or in vitro) used to support the conclusion(s),
available information on the epidemiologic or experimental conditions that characterize expression of carcinogenicity (e.g., if carcinogenicity is possible only by one exposure route or only above a certain human exposure level),
a summary of potential modes of action and how they reinforce the conclusions,
indications of any susceptible populations or lifestages, when available, and
a summary of the key default options invoked when the available information is inconclusive.

To provide some measure of clarity and consistency in an otherwise freeform narrative, the weight of evidence descriptors are included in the first sentence of the narrative. Choosing a descriptor is a matter of judgment and cannot be reduced to a formula. Each descriptor may be applicable to a wide variety of potential data sets and weights of evidence. These descriptors and narratives are intended to permit sufficient flexibility to accommodate new scientific understanding and new testing methods as they are developed and accepted by the scientific community and the public. Descriptors represent points along a continuum of evidence; consequently, there are gradations and borderline cases that are clarified by the full narrative. Descriptors, as well as an introductory paragraph, are a short summary of the complete narrative that preserves the complexity that is an essential part of the hazard characterization. Users of these cancer guidelines and of the risk assessments that result from the use of these cancer guidelines should consider the entire range of information included in the narrative rather than focusing simply on the descriptor.

In borderline cases, the narrative explains the case for choosing one descriptor and discusses the arguments for considering but not choosing another.

Page 181 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

For example, between “suggestive” and “likely” or between “suggestive” and “inadequate,” the explanation clearly communicates the information needed to consider appropriately the agent's carcinogenic potential in subsequent decisions.

Multiple descriptors can be used for a single agent, for example, when carcinogenesis is dose- or route-dependent. For example, if an agent causes point-of-contact tumors by one exposure route but adequate testing is negative by another route, then the agent could be described as likely to be carcinogenic by the first route but not likely to be carcinogenic by the second. Another example is when the mode of action is sufficiently understood to conclude that a key event in tumor development would not occur below a certain dose range. In this case, the agent could be described as likely to be carcinogenic above a certain dose range but not likely to be carcinogenic below that range.

Descriptors can be selected for an agent that has not been tested in a cancer bioassay if sufficient other information, e.g., toxicokinetic and mode of action information, is available to make a strong, convincing, and logical case through scientific inference. For example, if an agent is one of a well-defined class of agents that are understood to operate through a common mode of action and if that agent has the same mode of action, then in the narrative the untested agent would have the same descriptor as the class. Another example is when an untested agent's effects are understood to be caused by a human metabolite, in which case in the narrative the untested agent could have the same descriptor as the metabolite. As new testing methods are developed and used, assessments may increasingly be based on inferences from toxicokinetic and mode of action information in the absence of tumor studies in animals or humans.

When a well-studied agent produces tumors only at a point of initial contact, the descriptor generally applies only to the exposure route producing tumors unless the mode of action is relevant to other routes. The rationale for this conclusion would be explained in the narrative.

When tumors occur at a site other than the point of initial contact, the descriptor generally applies to all exposure routes that have not been adequately tested at sufficient doses. An exception occurs when there is convincing information, e.g., toxicokinetic data that absorption does not occur by another route.

When the response differs qualitatively as well as quantitatively with dose, this information should be part of the characterization of the hazard. In some cases reaching a certain dose range can be a precondition for effects to occur, as when cancer is secondary to another toxic effect that appears only above a certain dose. In other cases exposure duration can be a precondition for hazard if effects occur only after exposure is sustained for a certain duration. These considerations differ from the issues of relative absorption or potency at different dose levels because they may represent a discontinuity in a dose-response function.

When multiple bioassays are inconclusive, mode of action data are likely to hold the key to resolution of the more appropriate descriptor. When bioassays

Page 182 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

are few, further bioassays to replicate a study's results or to investigate the potential for effects in another sex, strain, or species may be useful.

When there are few pertinent data, the descriptor makes a statement about the database, for example, “Inadequate Information to Assess Carcinogenic Potential,” or a database that provides “Suggestive Evidence of Carcinogenic Potential.” With more information, the descriptor expresses a conclusion about the agent’s carcinogenic potential to humans. If the conclusion is positive, the agent could be described as “Likely to Be Carcinogenic to Humans” or, with strong evidence, “Carcinogenic to Humans.” If the conclusion is negative, the agent could be described as “Not Likely to Be Carcinogenic to Humans.”

Although the term “likely” can have a probabilistic connotation in other contexts, its use as a weight of evidence descriptor does not correspond to a quantifiable probability of whether the chemical is carcinogenic. This is because the data that support cancer assessments generally are not suitable for numerical calculations of the probability that an agent is a carcinogen. Other health agencies have expressed a comparable weight of evidence using terms such as “Reasonably Anticipated to Be a Human Carcinogen” (NTP) or “Probably Carcinogenic to Humans” (International Agency for Research on Cancer).

The following descriptors can be used as an introduction to the weight of evidence narrative. The examples presented in the discussion of the descriptors are illustrative. The examples are neither a checklist nor a limitation for the descriptor. The complete weight of evidence narrative, rather than the descriptor alone, provides the conclusions and the basis for them.

“Carcinogenic to Humans”

This descriptor indicates strong evidence of human carcinogenicity. It covers different combinations of evidence.

This descriptor is appropriate when there is convincing epidemiologic evidence of a causal association between human exposure and cancer.
Exceptionally, this descriptor may be equally appropriate with a lesser weight of epidemiologic evidence that is strengthened by other lines of evidence. It can be used when all of the following conditions are met: (a) there is strong evidence of an association between human exposure and either cancer or the key precursor events of the agent's mode of action but not enough for a causal association, and (b) there is extensive evidence of carcinogenicity in animals, and (c) the mode(s) of carcinogenic action and associated key precursor events have been identified in animals, and (d) there is strong evidence that the key precursor events that precede the cancer response in animals are anticipated to occur in humans and progress to tumors, based on available biological information. In this case, the narrative includes a summary of both the experimental and epidemiologic information on mode of action and also an indication of the

Page 183 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

relative weight that each source of information carries, e.g., based on human information, based on limited human and extensive animal experiments.

“Likely to Be Carcinogenic to Humans”

This descriptor is appropriate when the weight of the evidence is adequate to demonstrate carcinogenic potential to humans but does not reach the weight of evidence for the descriptor “Carcinogenic to Humans.” Adequate evidence consistent with this descriptor covers a broad spectrum. As stated previously, the use of the term “likely” as a weight of evidence descriptor does not correspond to a quantifiable probability. The examples below are meant to represent the broad range of data combinations that are covered by this descriptor; they are illustrative and provide neither a checklist nor a limitation for the data that might support use of this descriptor. Moreover, additional information, e.g., on mode of action, might change the choice of descriptor for the illustrated examples. Supporting data for this descriptor may include:

an agent demonstrating a plausible (but not definitively causal) association between human exposure and cancer, in most cases with some supporting biological, experimental evidence, though not necessarily carcinogenicity data from animal experiments;
an agent that has tested positive in animal experiments in more than one species, sex, strain, site, or exposure route, with or without evidence of carcinogenicity in humans;
a positive tumor study that raises additional biological concerns beyond that of a statistically significant result, for example, a high degree of malignancy, or an early age at onset;
a rare animal tumor response in a single experiment that is assumed to be relevant to humans; or
a positive tumor study that is strengthened by other lines of evidence, for example, either plausible (but not definitively causal) association between human exposure and cancer or evidence that the agent or an important metabolite causes events generally known to be associated with tumor formation (such as DNA reactivity or effects on cell growth control) likely to be related to the tumor response in this case.

“Suggestive Evidence of Carcinogenic Potential”

This descriptor of the database is appropriate when the weight of evidence is suggestive of carcinogenicity; a concern for potential carcinogenic effects in humans is raised, but the data are judged not sufficient for a stronger conclusion. This descriptor covers a spectrum of evidence associated with varying levels of concern for carcinogenicity, ranging from a positive cancer result in the only

Page 184 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

study on an agent to a single positive cancer result in an extensive database that includes negative studies in other species. Depending on the extent of the database, additional studies may or may not provide further insights. Some examples include:

a small, and possibly not statistically significant, increase in tumor incidence observed in a single animal or human study that does not reach the weight of evidence for the descriptor "Likely to Be Carcinogenic to Humans." The study generally would not be contradicted by other studies of equal quality in the same population group or experimental system (see discussions of conflicting evidence and differing results, below);
a small increase in a tumor with a high background rate in that sex and strain, when there is some but insufficient evidence that the observed tumors may be due to intrinsic factors that cause background tumors and not due to the agent being assessed. (When there is a high background rate of a specific tumor in animals of a particular sex and strain, then there may be biological factors operating independently of the agent being assessed that could be responsible for the development of the observed tumors.) In this case, the reasons for determining that the tumors are not due to the agent are explained;
evidence of a positive response in a study whose power, design, or conduct limits the ability to draw a confident conclusion (but does not make the study fatally flawed), but where the carcinogenic potential is strengthened by other lines of evidence (such as structure-activity relationships); or
a statistically significant increase at one dose only, but no significant response at the other doses and no overall trend.

“Inadequate Information to Assess Carcinogenic Potential”

This descriptor of the database is appropriate when available data are judged inadequate for applying one of the other descriptors. Additional studies generally would be expected to provide further insights. Some examples include:

little or no pertinent information;
conflicting evidence, that is, some studies provide evidence of carcinogenicity but other studies of equal quality in the same sex and strain are negative. Differing results, that is, positive results in some studies and negative results in one or more different experimental systems, do not constitute conflicting evidence, as the term is used here. Depending on the overall weight of evidence, differing results can be considered either suggestive evidence or likely evidence; or negative results that are not sufficiently robust for the descriptor, “Not Likely to Be Carcinogenic to Humans.”

Page 185 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

negative results that are not sufficiently robust for the descriptor, “Not Likely to Be Carcinogenic to Humans.”

“Not Likely to Be Carcinogenic to Humans”

This descriptor is appropriate when the available data are considered robust for deciding that there is no basis for human hazard concern. In some instances, there can be positive results in experimental animals when there is strong, consistent evidence that each mode of action in experimental animals does not operate in humans. In other cases, there can be convincing evidence in both humans and animals that the agent is not carcinogenic. The judgment may be based on data such as:

animal evidence that demonstrates lack of carcinogenic effect in both sexes in well designed and well-conducted studies in at least two appropriate animal species (in the absence of other animal or human data suggesting a potential for cancer effects),
convincing and extensive experimental evidence showing that the only carcinogenic effects observed in animals are not relevant to humans,
convincing evidence that carcinogenic effects are not likely by a particular exposure route (see Section 2.3), or
convincing evidence that carcinogenic effects are not likely below a defined dose range. A descriptor of “not likely” applies only to the circumstances supported by the data. For example, an agent may be “Not Likely to Be Carcinogenic” by one route but not necessarily by another. In those cases that have positive animal experiment(s) but the results are judged to be not relevant to humans, the narrative discusses why the results are not relevant.

Multiple Descriptors

More than one descriptor can be used when an agent's effects differ by dose or exposure route. For example, an agent may be “Carcinogenic to Humans” by one exposure route but “Not Likely to Be Carcinogenic” by a route by which it is not absorbed. Also, an agent could be “Likely to Be Carcinogenic” above a specified dose but “Not Likely to Be Carcinogenic” below that dose because a key event in tumor formation does not occur below that dose (EPA 2005b, Pp 2-49 to 2-58).

A FRAMEWORK FOR ASSESSING HEALTH RISKS OF ENVIRONMENTAL EXPOSURES TO CHILDREN

The WOE approach requires a critical evaluation (expert judgment) of all available data for consistency and biological plausibility. Criteria for this as-

Page 186 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

sessment are not presented here; rather, considerations important for the WOE are described. The key to WOE conclusions is the provision of a clear justification for decisions. Finally, the extent of the database is summarized, and assumptions made in the assessment are explicitly detailed. Further details about EPA’s WOE approach can be found in the Methods for Derivation of Inhalation Reference Concentrations and Application of Inhalation Dosimetry (U.S. EPA, 1994), Guidelines for Carcinogen Risk Assessment (U.S. EPA, 2005b), and Supplemental Guidance for Assessing Cancer Susceptibility from Early Life Exposure to Carcinogens (U.S. EPA, 2005c). A Review of the Reference Dose and Reference Concentration Processes (U.S. EPA, 2002b, Section 4.3.2.1.) and Determination of the Appropriate FQPA Safety Factor(s) on Tolerance Assessment (U.S. EPA, 2002c, Section III) provide additional detail on the WOE.

Key themes for the consideration of toxicity data in a WOE assessment, as adapted from Gray et al. (2001), are shown in Figure 4-5. This figure focuses on judging animal studies within a WOE assessment. However, if adequate human studies are available they would be given more weight. The process for evaluating these considerations is described in the following subsections. In this process, the quality of potentially relevant studies is judged, modifiers and interactions are detailed, outcomes across species are compared, TK and TD data are examined and weighed for comparisons across species, and the uncertainties and data gaps are determined. SARs with other chemicals or chemical classes are explored to determine the extent to which these data can inform the assessment via an MOA discussion or reduce uncertainties.

GUIDELINES FOR NEUROTOXICITY RISK ASSESSMENT

The interpretation of data as indicative of a potential neurotoxic effect involves the evaluation of the validity of the database. This approach and these terms have been adapted from the literature on human psychological testing (Sette, 1987; Sette and MacPhail, 1992), where they have long been used to evaluate the level of confidence in different measures of intelligence or other abilities, aptitudes, or feelings. There are four principal questions that should be addressed: whether the effects result from exposure (content validity); whether the effects are adverse or toxicologically significant (construct validity); whether there are correlative measures among behavioral, physiological, neurochemical, and morphological endpoints (concurrent validity); and whether the effects are predictive of what will happen under various conditions (predictive validity). Addressing these issues can provide a useful framework for evaluating either human or animal studies or the weight of evidence for a chemical (Sette, 1987; Sette and MacPhail, 1992). The next sections indicate the extent to which chemically induced changes can be interpreted as providing evidence of neurotoxicity.

Page 187 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

The qualitative characterization of neurotoxic hazard can be based on either human or animal data (Anger, 1984; Reiter, 1987; U.S. EPA, 1994). Such data can result from accidental, inappropriate, or controlled experimental exposures. This section describes many of the general and some of the specific characteristics of human studies and reports of neurotoxicity. It then describes some features of animal studies of neuroanatomical, neurochemical, neurophysiological, and behavioral effects relevant to risk assessment. The process of characterizing the sufficiency or insufficiency of neurotoxic effects for risk assessment is described in section 3.3. Additional sources of information relevant to hazard characterization, such as comparisons of molecular structure among compounds and in vitro screening methods, are also discussed.

FIGURE 4-5 Conceptual view of a weight of evidence (WOE) assessment. This figure illustrates the critical considerations within a WOE assessment of toxicity data. Rigor is the degree of proper conduct and analysis of a study; greater weight is generally given to more rigorous studies. Statistical Power is the ability of a study to detect effects of a given magnitude. Corroboration means that specific effects are replicated in similar studies, similar effects are observed under varied conditions and /or similar effects are observed in multiple laboratories. Reproducibility means that an effect is observed in multiple species by various routes of exposure. Relevance to Humans means that similar effects are observed in humans or in a species taxonomically related to humans or at doses similar to those expected in humans. Plausibility to Humans is the determination of whether a similar metabolism, mechanisms of damage and repair, and molecular target of response could be expected to occur in humans, based on an evaluation of the biologic mechanism of a toxic response in animals. Database Consistency is the extent to which all of the data are similar in outcome and dose (exposure-response) and are operating under a single biologically plausible assumption (mode of action). Source: Adapted from Gray et al. 2001, EPA 2006, Pp 29-30.

Page 188 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

The hazard characterization should:

Identify strengths and limitations of the database:
- Epidemiological studies (case reports, cross-sectional, case-control, cohort, or human laboratory exposure studies);
- Animal studies (including structural or neuropathological, neurochemical, neurophysiological, behavioral or neurological, or developmental endpoints).

Evaluate the validity of the database:
- Content validity (effects result from exposure);
- Construct validity (effects are adverse or toxicologically significant);
- Concurrent validity (correlative measures among behavioral, physiological, neurochemical, or morphological endpoints);
- Predictive validity (effects are predictive of what will happen under various conditions).

Identify and describe key toxicological studies.
Describe the type of effects:
- Structural (neuroanatomical alternations);
- Functional (neurochemical, neurophysiological, behavioral alterations).

Describe the nature of the effects (irreversible, reversible, transient, progressive, delayed, residual, or latent).
Describe how much is known about how (through what biological mechanism) the chemical produces adverse effects.
Discuss other health endpoints of concern.
Comment on any nonpositive data in humans or animals.
Discuss the dose-response data (epidemiological or animal) available for further dose-response analysis.
Discuss the route, level, timing, and duration of exposure in studies demonstrating neurotoxicity as compared to expected human exposures.
Summarize the hazard characterization:
- Confidence in conclusions;
- Alternative conclusions also supported by the data;
- Significant data gaps; and
- Highlights of major assumptions.

REFERENCES

American Thoracic Society. 1985. Guidelines as to what constitutes an adverse respiratory health effect, with special reference to epidemiologic studies of air pollution. Am. Rev. Respir. Dis. 131(4):666-668.

Page 189 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

Anger, W.K. 1984. Neurobehavioral testing of chemicals: Impact on recommended standards. Neurobehav. Toxicol. Teratol. 6(2):147-153.

EPA (U.S. Environmental Protection Agency). 1986. Guidelines for Mutagenicity Risk Assessment. U.S. Environmental Protection Agency [online]. Available: http://www.epa.gov/osa/mmoaframework/pdfs/MUTAGEN2.PDF [accessed Nov. 19, 2010].

EPA (U.S. Environmental Protection Agency). 1991. Guidelines for Developmental Toxicity Risk Assessment. U.S. Environmental Protection Agency [online]. Available: http://cfpub.epa.gov/ncea/cfm/recordisplay.cfm?deid=23162 [accessed Nov. 19, 2010].

EPA (U.S. Environmental Protection Agency). 1994. Methods for Derivation of Inhalation Reference Concentrations and Application of Inhalation Dosimetry. U.S. Environmental Protection Agency [online]. Available: http://www.epa.gov/raf/publications/methods-derivation-inhalation-ref.htm [accessed Nov. 19, 2010].

EPA (U.S. Environmental Protection Agency). 1998. Guidelines for Neurotoxicity Risk Assessment. U.S. Environmental Protection Agency [online]. Available: http://www.epa.gov/raf/publications/pdfs/NEUROTOX.PDF [accessed Dec. 16, 2010].

EPA (U.S. Environmental Protection Agency). 1999a. Guidelines for Carcinogen Risk Assessment [Review Draft]. NCEA-F-0644. Risk Assessment Forum, U.S. Environmental Protection Agency, Washington, DC. July 1999 [online]. Available: http://cfpub.epa.gov/ncea/cfm/recordisplay.cfm?deid=54932#Download [accessed Mar. 17, 2011].

EPA (U.S. Environmental Protection Agency). 2002b. A Review of the Reference Dose and Reference Concentration Process. U.S. Environmental Protection Agency [online]. Available: http://www.epa.gov/iris/RFD_FINAL1.pdf [accessed Nov. 19, 2010].

EPA (U.S. Environmental Protection Agency). 2002c. Determination of the Appropriate FQPA Safety Factor(s) in Tolerance Assessment. Office of Pesticide Programs, U.S. Environmental Protection Agency, Washington, DC. February 28, 2002 [online]. Available: http://www.epa.gov/oppfead1/trac/science/determ.pdf [accessed Mar. 17, 2011].

EPA (U.S. Environmental Protection Agency). 2005b. Guidelines for Carcinogen Risk Assessment. U.S. Environmental Protection Agency [online]. Available: http://www.epa.gov/osa/mmoaframework/pdfs/CANCER-GUIDELINES-FINAL-3-25-05%5B1%5D.pdf [accessed Nov. 19, 2010].

EPA (U.S. Environmental Protection Agency). 2005c. Supplemental Guidance for Assessing Cancer Susceptibility from Early-Life Exposure to Carcinogens. EPA/630/R-03/003F. Risk Assessment Forum, U.S. Environmental Protection Agency, Washington, DC. March 2005 [online]. Available: http://www.epa.gov/ttn/atw/childrens_supplement_final.pdf [accessed Mar. 17, 2011].

EPA (U.S. Environmental Protection Agency). 2006. A Framework for Assessing Health Risks of Environmental Exposures to Children. U.S. Environmental Protection Agency [online]. Available: http://cfpub.epa.gov/ncea/cfm/recordisplay.cfm?deid=158363 [accessed Nov. 19, 2010].

Gray, G.M., S.I. Baskin, G. Charnley, J.T. Cohen, L.S. Gold, N.I. Kerkvliet, H.M. Koenig, S.C. Lewis, R.M. McClain, L.R. Rhomberg, J.W. Snyder, and L.B. Weekley. 2001. The Annapolis accords on the use of toxicology in risk assessment and decision-making: An Annapolis Center Workshop report. Toxicol. Mech. Methods 11(3):225-231.

Page 190 Cite

Suggested Citation:"Appendix B: Weight-of-Evidence Descriptions from U.S. Environmental Protection Agency Guidelines." National Research Council. 2011. Review of the Environmental Protection Agency's Draft IRIS Assessment of Formaldehyde. Washington, DC: The National Academies Press. doi: 10.17226/13142.

×

Perlin, S.A., and C. McCormack. 1988. Using weight-of-evidence classification schemes in the assessment of non-cancer health risks. Pp. 482-486 in Proceedings of the 5th National Conference on Hazardous Wastes and Hazardous Materials (HWHM '88), April 19-21, Las Vegas, NV. Springfield, MD: Hazardous Materials Control Research Institute.

Reiter, L.W. 1987. Neurotoxicology in regulation and risk assessment. Dev. Pharmacol. Ther. 10(5):354-368.

Russell, L.B., C.S. Aaron, F. de Serres, W.M. Generoso, K.L. Kannan, M. Shelby, J. Springer, and P. Voytek. 1984. A report of the U.S. Environmental Protection Agency Gene-Tox Program. Evaluation of mutagenicity assays for purposes of genetic risk assessment. Mutat. Res. 134(2-3):143-157.

Sette, W.F. 1987. Complexity of neurotoxicological assessment. Neurotoxicol. Teratol 9(6):411-416.

Sette, W.F., and R.C. MacPhail. 1992. Qualitative and quantitative issues in assessment of neurotoxic effects. Pp. 345-361 in Neurotoxicology, 2nd Ed, H.A. Tilson, and C. Mitchell, eds. Target Organ Toxicity Series. New York: Raven Press.

SOT (Society of Toxicology). 1982. Animal data in hazard evaluation: Paths and pitfalls. Task Force of Past Presidents. Fundam. Appl. Toxicol. 2(3):101-107.