Cover Image

PAPERBACK
$162.25



View/Hide Left Panel

Page 106

7
Models, Methods, and Data

Introduction

Health risk assessment is a multifaceted process that relies on an assortment of methods, data, and models. The overall accuracy of a risk assessment hinges on the validity of the various methods and models chosen, which in turn are governed by the scope and quality of data. The degree of confidence that one can place in a risk assessment depends on the reliability of the models chosen and their input parameters (i.e., variables) and on how well the boundaries of uncertainty have been quantified for the input parameters, for the models as a whole, and for the entire risk-assessment process.

Quantitative assessment of data quality, verification of method, and validation of model performance are paramount for securing confidence in their use in risk assessment. Before a data base is used, the validity of its use must be established for its intended application. Such validation generally encompasses both the characterization and documentation of data quality and the procedures used to develop the data. Some characteristics of data quality are overall robustness, the scope of coverage, spatial and temporal representativeness, and the quality-control and quality-assurance protocols implemented during data collection. More specific considerations include the definition and display of the accuracy and precision of measurements, the treatment of missing information, and the identification and analysis of outliers. Those and similar issues are critical in delineating the scope and limitations of a data set for an intended application.

The performance of methods and models, like that of data bases, must be characterized and verified to establish their credibility. Evaluation and valida-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 106
Page 106 7 Models, Methods, and Data Introduction Health risk assessment is a multifaceted process that relies on an assortment of methods, data, and models. The overall accuracy of a risk assessment hinges on the validity of the various methods and models chosen, which in turn are governed by the scope and quality of data. The degree of confidence that one can place in a risk assessment depends on the reliability of the models chosen and their input parameters (i.e., variables) and on how well the boundaries of uncertainty have been quantified for the input parameters, for the models as a whole, and for the entire risk-assessment process. Quantitative assessment of data quality, verification of method, and validation of model performance are paramount for securing confidence in their use in risk assessment. Before a data base is used, the validity of its use must be established for its intended application. Such validation generally encompasses both the characterization and documentation of data quality and the procedures used to develop the data. Some characteristics of data quality are overall robustness, the scope of coverage, spatial and temporal representativeness, and the quality-control and quality-assurance protocols implemented during data collection. More specific considerations include the definition and display of the accuracy and precision of measurements, the treatment of missing information, and the identification and analysis of outliers. Those and similar issues are critical in delineating the scope and limitations of a data set for an intended application. The performance of methods and models, like that of data bases, must be characterized and verified to establish their credibility. Evaluation and valida-

OCR for page 106
Page 107 tion procedures for a model might include sensitivity testing to identify the parameters having the greatest influence on the output values and assessment of its accuracy, precision, and predictive power. Validation of a model also requires an appropriate data base. This chapter discusses the evaluation and validation of data and models used in risk assessment. In cases where there has been an insufficient assessment of performance or quality, research recommendations are made. Although in this chapter we consider validation issues sequentially, according to each of the stages in the (modified) Red Book paradigm, our goal here is to make the assessment of data and model quality an iterative, interactive component of the entire risk-assessment and risk-characterization process. Emission Characterization As described in Chapter 3, emissions are characterized on the basis of emission factors, material balance, engineering calculations, established Environmental Protection Agency (EPA) protocols, and measurement. In each case, this characterization takes the structural forms of a linearly additive process (i.e., emissions equals product – [feedstock + accumulations]), a multiplicative model (i.e., emissions equals [emission factor][process rate]), or an exponential relationship (e.g., emission equals intercept + [(emission factor) (measurement)exp]). The additive form is based on the mass-balance concept. An estimate is made by measuring the feedstock and product to determine an equipment-specific or process-specific transfer coefficient. This coefficient is used to estimate emissions to the atmosphere. The measurements available for the additive form are often not sufficiently precise and accurate to yield complete information on inputs and outputs (NRC, 1990a). For example, an NRC committee (NRC, 1990a) considered a plant that produced 5 million pounds of ethylene per day and used more than 200 monitoring points to report production with a measurement accuracy of 1%, equivalent to 50,000 lb of ethylene per day. The uncertainty in this estimate (50,000 lb) greatly exceeded a separate estimate of emissions, 191 lb, which was calculated by the plant and was confirmed by monitoring of the emission points. Thus, despite the apparently good precision of estimates within 1%, the additive method was not reliable. This seems to be generally true for complicated processes or multiple processing steps. The other forms are based on exponential and multiplicative models. Each may be deterministic or stochastic. For example, emissions from a well-defined sample of similar sources may be tested to develop an emission factor that is meant to be representative of the whole population of sources. A general difficulty with such fits that use these functional (linear or one of several nonlinear forms) forms is that the choice of form may be critical but hard to validate. In addition, it must be assumed that data from the sources used in the calculations are directly applicable to the sources tested in process design and in the manage-

OCR for page 106
Page 108 ment and maintenance approaches of the organizations that run them are the same in all cases. An example of an exponential form of an emission calculation is shown in Figure 7-1. This figure shows the correlation between screening value (the measurement) and leak rate (the emission rate) for fugitive emissions from a valve. The screening value is determined by measuring the hydrocarbons emitted by a piece of equipment (in this case, a valve in gas service) with an instrument like an OVA (organic-vapor analyzer). The leak rate (i.e., emission) is then determined by reading the value on the y axis corresponding to that screening value. Note that the plot is on a log-log scale, so that a "3" on the x axis indicates that a 1,000-ppm screening value corresponds to a "-3.4" on the y axis, or 0.001 lb/hr for each value in gas service at that screening value. The observations here are based on an analysis conducted for 24 synthetic organic chemical manufacturing industry (SOCMI) units representing a cross-section of this industry (EPA, 1981a). As part of this analysis, a six-unit maintenance study (EPA, 1981a) was used to determine the impact of equipment monitoring and maintenance using an OVA instrument on emission reduction. The equation derived for the value FIGURE 7-1 Log10 leak rate vs. log10 OVA reading for values-gas service. SOURCE: EPA, 1981a.

OCR for page 106
Page 109 emissions in gas service explains only 44% (square of the correlation coefficient) of the variance in the points shown in Figure 7-1. Similar results were obtained from other possible emission points. The facilities in this SOCMI study could reduce the estimate of their emissions by 29-99% by determining plant-specific emission factors, indicating the difficulties in using industry-wide average to represent specific plant behavior. The multiplicative form improves on the emission-factor approach, in that it incorporates more features of the process, attempting to accommodate the types of equipment being used, the physical properties of the chemical, and the activity of the equipment as a whole. The deterministic form of the multiplicative model is based on the chemical and physical laws that determine the emission rate. The variables measured—vapor pressure, molecular weight, temperature, etc.—are chemical physical properties that are related to the emission rate. The multiplicative form provides some scientific basis for the estimate beyond the simple curve-fitting. However, it has difficulties, because some of the properties are not constant. For example, the ambient air temperature, one factor in determining the emission rate, can vary quite widely within a day. The average temperature for a given period, such as a month, is used for ease in calculation, but this practice introduces some error. EPA might want to consider a more detailed analysis in which the emissions that occur during the period are stratified into groups with smaller variations in variables such as ambient temperature. The emissions in the strata could be estimated and weighted sums calculated to provide a better estimate. Probably the most accurate procedure is to use none of those "forms" to determine emissions, but rather to sample stack and vent emissions at each source. However, such sampling can be quite expensive, and the costs could overburden owners of small sources. Apart from costs, the primary difficulty with this procedure is that it yields an estimate for one site on one occasion. Emissions could change because of a variety of factors. An alternative to testing is to estimate emissions from monitoring data. Continuous emission monitors (CEMs), which are available for a small number of chemicals, are placed in stacks or near fugitive-emission points to measure the concentration of a chemical being released; concentrations can then be converted to amounts. However, CEMs can be expensive and difficult to maintain, and they may produce incomplete or inaccurate measurements. When such testing is conducted, however, they may show that other kinds of estimates are seriously in error. For example, a study (Amoco/EPA, 1992) compared emissions estimated primarily from emission factors with those determined during testing. The measured overall actual estimate of emissions was more than twice as high as the TRI estimate for a variety of reasons, including identification of new sources, overestimation or underestimation of the importance of some sources, and the lack of a requirement to report source emissions under a particular regulation.

OCR for page 106
Page 110 Evaluation of EPA Practice EPA has worked diligently to help members of the public who are required to provide emission estimates for regulatory purposes. This 20-year effort has provided documents that are used to estimate air-pollutant emissions throughout the world. However, in some cases, EPA has had to provide emission estimation factors based on very little information about the process involved; it was difficult to check the assumption that the process for which the calculation is being used is similar to the process that was tested in the development of the emission factor. There are two basic difficulties with the way EPA applies its emission estimation techniques. First, most estimates are made by using the emission factors or by fitting the linear or exponential forms. As discussed previously, the accuracy of emission estimates using these techniques might not be high. Second, the information is generated in such a way that only point estimates are presented. Although it is clear from the earlier discussion that there can be uncertainty in the estimates, EPA has extensive files on how the emission factors were determined, and this information presumably contains enough points to generate distribution of emissions rather than just a point estimate. EPA provides only qualitative ratings of the accuracy of the emission method. The ratings are not based on the variance in the estimate, but just on the number of emission points used to generate the data. If there are enough points to generate an emission factor, it is possible to estimate the distribution of emission factors from which an estimate can be chosen to solve a particular exposure-risk estimation problem. However, the emission factors are given only a ''grade" from A (best) to E relative to the quality and amount of data on which estimates are based. An emission factor based on 10 or more plants would likely get an "A" grade, whereas a factor based on a single observation of questionable quality or one extrapolated from another factor for a similar process would probably get a D or E. The grades are subjective and do not consider the variance in the data used to calculate factors. According to EPA (1988e), the grades should "be used only as approximations, to infer error bounds or confidence intervals about each emission factor. At most, a [grade] should be considered an indicator of the accuracy and precision of a given factor used to estimate emissions from a large number of sources." The uncertainty in the estimates is such that EPA is not comfortable with the A-E system and is developing a new qualitative system to indicate uncertainty. EPA is attempting to generate estimation factors for hazardous air pollutants industry by industry, but it is still hesitant to ascribe any sort of uncertainty to emission factors. A single disruption in operation of a plant can increase the release rate for some interval (hour or day). An extreme example is the dioxin release from a manufacturing plant in Seveso, Italy. Such disruptions are not incorporated into any of the emission characterizations, except for the few cases where emission monitoring is available. However, in those cases, emissions might be so high

OCR for page 106
Page 111 that they exceed the maximum reading of a monitor and thereby lead to just a lower bound (if this problem is recognized) or even to a serious underestimate of the actual emission. Furthermore, the frequency and duration of such episodes are unpredictable. Therefore, EPA should also attempt to make some sort of quantitative estimates of the variability of measured emissions among sources within a category and of the uncertainty in its overall emission estimates for individual sources and the source category as a whole. This issue is discussed in more depth in Chapter 10, but could involve analyzing the four kinds of circumstances as appropriate for a particular source type—routine, regular maintenance, upsets and breakdowns, and rare catastrophic failures. EPA could also note the implications of the dynamics of causation of different effects for emission estimation, and the resulting need for estimates of exposure and exposure variability over different averaging times. The itemization of emissions by chemical constituent also raises problems. Emission characterization methods often provide only the amount of VOCs (volatile organic compounds) that is emitted. The amounts of particular compounds (benzene, toluene, xylene, etc.) within these VOC emissions are often not individually reported. Without the emission data on particular compounds, it is impossible to provide the information needed for exposure modeling in the risk-assessment process. EPA does not appear to be making major strides toward improving the methods used to evaluate emissions. Although EPA is making extensive efforts to distribute the emission factors it has generated, the committee has found insufficient effort either to evaluate the accuracy of the underlying method used to derive the emission estimates or to portray the uncertainty in the emission factors. The primary exception is a joint effort of the Chemical Manufacturers Association (CMA) and EPA on fugitive emissions called Plant Organization Software System for Emission Estimation or POSSEE (CMA, 1989). In this case, companies are testing fugitive emissions within plants and collecting data on chemical and physical variables to derive emission estimates based on deterministic models (which use physical and chemical properties), rather than stochastic models. There have been efforts to increase the scientific justification of estimates of emissions from storage tanks: the American Petroleum Institute has developed data that have been used for developing the estimation method shown in the multiplicative form described above. The question then arises as to how to approach emission estimates in exposure assessments and risk assessments. The uncertainty in the mass-balance approach (additive form) can be so large that its use should be discouraged for any purposes other than for a very general screening. It is unlikely that an emission estimate derived with this method would be appropriate for risk assessment. The linear emission-factor approach could be used as a general screening tool in an exposure assessment. As indicated by EPA in response to a question from this committee:

OCR for page 106
Page 112 While emission factor-based estimates can be useful in providing a general picture of emissions across an entire industrial category, use of such factors to provide inputs to a site-specific risk assessment may introduce a great deal of uncertainty into that assessment. If such an approach is used for an entire industrial category, then at least the uncertainty of each emission factor should be determined. If there is enough information to derive an emission factor, then a probability distribution could be calculated. There may then be disagreement about where on the probability distribution the emission estimate should be chosen. However, it is better to make the choice explicitly, as discussed in Chapter 9. The same situation is true for emissions estimated with the exponential and multiplicative approaches. EPA should include a probability distribution in all its emission estimates. One method to determine the uncertainty in an emission estimate more easily would be to require each person submitting an emission estimate (for SARA 313 requirements, permitting, etc.) to include an evaluation of the uncertainty in the estimate. EPA could then evaluate the uncertainty in the estimation methods to determine whether the estimation was done properly. Although that might increase the costs of developing submissions slightly, the organization submitting the estimate might benefit from the results. Small sources unable to afford such analysis could instead define a range that is consistent with known or readily determined factors in their operation (e.g., for a dry cleaner, the pounds of clothes per week and gallons of solvent purchased each month). EPA is reviewing, revising, and developing emission estimation methods for sources of the 189 chemicals. It is focusing on adding data, rather than evaluating its basic approach—the use of a descriptive model, instead of a model based on processes, for emission estimation. It appears from the examples given above that the uncertainties in emissions can dominate an exposure assessment and that a concerted effort to improve emission estimation could serve to substantially reduce the uncertainty in many risk estimates. Combined industry efforts to improve the techniques used to estimate fugitive emissions on the basis of physical and chemical properties (not just curve-fitting) should be encouraged. Exposure Assessment Once an emission characterization is developed, it becomes one of the inputs into an air-quality model to determine the amount of a pollutant in ambient air at a given location. A population-exposure model is then used to determine how much of a pollutant reaches people at that location. Population The size of the population that might be exposed to an emission must be determined. Population data have been collected, published, and scrutinized for

OCR for page 106
Page 113 centuries. Many such data refer to entire populations or subpopulations, so questions of representation and statistical aspects of sampling do not arise in their usual form. Even where sampling is used, a large background of technique and experience allows complex estimation and other kinds of modeling to proceed without the large uncertainties inherent in, for example, extrapolation from high to low doses of toxic agents or from rodents to humans. Population data are almost always affected to some degree by nonsampling error (bias), but this is well categorized, understood, and not a serious problem in the context of risk assessment. For example, terminal-digit preference (e.g., a tendency to report ages that end in zero or five) has been minimal since the attainment of nearly universal literacy and especially since the adoption of birth certification. Attainment of advanced ages (i.e., over 80 years) is still overstated, but this is not quantitatively serious in age estimation for purposes of risk assessment (because EPA still assumes that 70 years is the upper-bound value of the length of a liftime). Population undercounts in the U.S. census of 1990 averaged about 2.1% and were substantially higher for some subgroups, perhaps up to 30%; however, even 30% uncertainty is smaller than many other sources of error that are encountered in risk assessment. The largest proportionate claim of uncertainty seems to be in the number of homeless persons in the United States; estimated uncertainty is less than a factor of 10. Estimation of characteristics in groups or subgroups not examined directly is subject to additional uncertainty. For example, the 1992 population is not directly counted, but standard techniques are used to extrapolate from the census of 1990, which was a nearly complete counting of the population. Investigators have found earlier years estimates to be generally quite accurate, whether the extrapolations were strictly mathematical (e.g., based on linear extrapolation) or demographic (based on accounting for the addition of 3 years between 1990 and 1993, with adjustments for deaths, for births of the population under age 3, deaths, and net migration). The problems are greater for states and smaller areas, because data on migration (including internal migration) are not generally available. Error tends to increase as subgroups get smaller, partly because statistical variability increases (i.e., small sample size leads to less precision in the estimate of the central tendency with any distributed measurement), but also because individual small segments are not as well characterized and as well understood as larger aggregates and because population data are generally collected according to a single nationwide protocol that allows for little deviation to accommodate special problems. The committee is comfortable about using published population data for nearly all population characteristics and subgroups. Where adjustment to reduce errors is feasible, it should be used; but in the overall context of risk assessment, error in population assessment contributes little to uncertainty. In some cases, a research study must define and identify its own population

OCR for page 106
Page 114 without help from official census and surveys. An example is a long-term followup study of workers employed in a specific manufacturing plant. When such studies are done by skilled epidemiologists, total counts, ages, and other demographic items tend to be accurate to within a factor of 2 or 3. The largest uncertainties are likely to be in the estimation of exposure to some toxic agent; these are often dealt with by the use of rough categories (high, medium, and low exposure) or surrogate measures (e.g., years employed in a plant, rather than magnitude of exposure). Errors in such work are of great concern, but they tend to be peculiar to each study and hence lead to study-specific remedies in design, performance, or analysis. They tend to be smaller than other kinds of uncertainties, but can still be of concern if a putative effect is also small. As indicated, population data derived from a census and fortified with estimation methods are regarded as accurate and valid, and uncertainties introduced into risk assessment are relatively small. There is a need, however, for information on additional population characteristics that are not included in the census. There is a paucity of activity-pattern information, and population-exposure models or individual-exposure-personal-exposure models have not been adequately tested or validated, because they use people's activity to estimate exposure to chemicals in air. Only a few small efforts have been undertaken to develop such a data base, namely, EPA's Total Exposure and Assessment Methodology (TEAM) program and the California EPA's State Activity Pattern Study. Those programs have acquired information about people's activities that cause the emission of air pollutants or place people in microenvironments containing air pollutants that potentially lead to exposure. There is a need to develop a national data base on activity patterns that can be used to validate models that estimate personal exposure to airborne toxic chemicals. Accurately described activity patterns coupled with demographic characteristics (e.g., socioeconomic) can be used for making a risk assessment and assessing the environmental equity of risk across socioeconomic groups and races. When exposure-characterization models are developed for use in risk assessment, the bias and uncertainty that they yield in the calculation of exposure estimates should be clearly defined and stated, regardless of whether activity patterns are included. Later, the choice of an appropriate model from an array of possibilities should be based on, but not necessarily limited to, its quantitative measure of performance and its rationale should be included with a statement of the criteria for its selection. Air-Quality Model Evaluation Air-quality models are powerful tools for relating pollutant emissions to ambient air quality. Most air-quality models used in assessing exposure to toxic air pollutants have been extensively evaluated with specific data sets, and their underlying mathematical formulations have been critically reviewed. Relative

OCR for page 106
Page 115 to some of the other models for risk assessment of air pollutants, air-quality models probably enjoy the longest history of model evaluation, refinement, and re-evaluation. For example, the original Gaussian-plume models were formulated and tested in the 1950s. That does not mean, however, that model evaluation does not still continue or that the model evaluation should be dismissed in assessing air-pollutant exposure; in fact, previous studies have shown the benefits of model evaluation in every application. Evaluation of the air-quality models and other components of air-pollutant risk assessment is intended to determine accuracy for providing the details required in a given application and to provide confidence in the results. In air-quality modeling, that is particularly important. A Gaussian-plume model, when used with the input data generally available, might not correctly predict where maximal concentrations will be realized (e.g., because winds at the nearest station, such as an airport, might differ in direction from winds near the source of interest), but should provide a reasonable estimate of the distribution of pollutant concentrations around the site. That might be sufficient for some applications, but not others. Model evaluation can also add insight as to whether a tool is "conservative" or the opposite, and it can provide a quantitative estimate of uncertainty. Of particular concern are the more demanding applications of models, such as in areas of complex terrain (e.g., hills, valleys, mountains, and over water), when deposition is important, and when atmospheric transformation occurs. As discussed below, it is difficult enough to use models in the simple situations for which they were specifically designed. One should always try to ascertain the level of accuracy that can be expected from a given model in a given application. Sufficient studies have been performed on most air-quality models to address that question. Zannetti (1990) reviews evaluations of many air-quality models, including Gaussian-plume models. Evaluation procedures have recently been reviewed for photochemical air-quality models (NRC, 1991a). Similar procedures are applicable to other models. In essence, the models should be pushed to their limits, to define the range in which potential errors in either the models themselves or their inputs still lead to acceptable model performances and so that compensatory errors in the models and their inputs (e.g., meteorology, emissions, population distributions, routes of exposure, etc.) will be identified. That should lead to a quantitative assessment of model uncertainties and key weaknesses. As pointed out in the NRC (1991a) report, model evaluation includes evaluation of input data. The greatest limitation in many cases is in the availability and integrity of the input data; for the most part, many models can give acceptable results when good-quality input data are available. A key motivation in model evaluation is to achieve a high degree of confidence in the eventual risk assessment. Pollutant-transport model evaluation, as it pertains to estimating air-pollutant emissions, has been somewhat neglected and

OCR for page 106
Page 116 is used without adequate discussion and analysis. For example, the modeling of emissions from the ASARCO smelter (EPA, 1985b) showed significant bias. However, the reasons for both the bias and errors were not fully identified. A major plume-model validation study was mounted in the early 1980s with support of the Electric Power Research Institute (EPRI); it was the first study of a large coal-fired power plant situated in relatively simple terrain. The study compared three Gaussian-plume models and three first-order closure numerical (stochastic) models, and an experimental, second-order closure model, for which ground-level concentrations were obtained with both routine and intensive measurement programs (Bowne and Londergan, 1983). (First-order closure and second-order closure refer to how the effects of turbulence are treated.) The authors conclude that • The models were poor in predicting the magnitude or location of concentration patterns for a given event. • The models performed unevenly in estimating peak concentrations as a function of averaging time; none provided good agreement for 1-, 3-, and 24-hour averaging periods. • The cumulative distribution of hourly concentrations predicted by the models did not match the observed distribution over the full range of concentration values. • The variation of peak concentration values with atmospheric stability and distance predicted by the Gaussian models did not match the pattern of observed peak values. • One of the first-order closure models performed better than the Gaussian models in estimating peak concentration as a function of meteorological characteristics, but its predictive capacity was poorer than desirable for detailed risk assessments, and it systematically overpredicted the distance to the maximal concentrations. • One of the other first-order closure models systematically underpredicted plume impacts, but its predictive capacity was otherwise superior to that of the Gaussian models. • An experimental second-order closure model did not provide better estimates of ground-level concentrations than the operational models. Predictions and observed pollutant concentrations often differed by factors of 2-10. It is clear from the study—in which there was no effect of complex terrain, heat islands, or other complicating effects—that the dispersion models had serious deficiencies. Dispersion models have been developed since then, but they require further development and improvement and they warrant evaluation when applied to new locations or periods. Larger-scale urban air-quality models perform better in predicting concentrations of secondary species—such as ozone, nitrogen dioxide, and formaldehyde—even though the complex chemical reactions might seem to make the task

OCR for page 106
Page 133 assumption—that administered dose and delivered dose are always directly proportional and that the administered dose is therefore an appropriate basis for risk assessment—with direct, accurate information about the delivered or biologically active dose. Pharmacokinetic models are used to study the quantitative relationship between administered and delivered or biologically active doses. The relationship reflects the spectrum of biological responses to exposure, from physiological responses of a whole organism to biochemical responses within specific cells of a target organ. Pharmacokinetic models explicitly characterize biologic processes and permit accurate predictions of the doses of an agent's active metabolites that reach target tissues in exposed humans. As a consequence, the use of pharmacokinetic models to provide inputs to dose-response models reduces the uncertainty associated with the dose parameter and can result in more accurate estimates of potential cancer risks in humans. The relationship between administered and delivered doses often differs among individuals: because of such differences, some people might be acutely sensitive and others insensitive to the same administered dose. The relationship between administered and delivered doses can also differ between large and small exposures and between continuous and intermittent exposures, and it can differ among species, some species being more or less efficient than humans in the transport of an administered dose to tissues or in its metabolism to a biologically active or inactive derivative. Those differences in the relationship between administered and delivered or biologically active doses can dramatically affect the validity of the predictions of dose-response models; failure to incorporate the difference into the models contributes to the uncertainty in risk assessment. Differences between administered and biologically active doses occur because specialized organ systems intervene to modulate the body's responses to inhaled, ingested, or otherwise absorbed toxic materials. For example, the liver can detoxify materials circulating in the blood by producing enzymes to accelerate chemical reactions that break the materials down into harmless components (metabolic deactivation, or "detoxification"). Conversely, some substances can be activated by metabolism into more toxic reaction products. Activation and detoxification might occur at the same time and can occur in the same or different organ systems. Furthermore, the rates at which activation and detoxification take place might have natural limits. Metabolic deactivation might thus be overwhelmed by high exposure concentrations, as seems to be the case with formaldehyde: the biologically active dose and the risk of nasal-tumor development rise rapidly in exposed rats only at high airborne concentrations. The assumption of a simple linear relationship between administered and biologically active doses of formaldehyde is believed by many to result in exaggerated estimates of cancer risk at low exposure concentrations. In contrast, metabolic activation of vinyl chloride occurs more and more slowly with increasing administered dose, because a crit-

OCR for page 106
Page 134 ical enzyme system becomes overloaded; the biologically active dose and the resulting liver-tumor response increase more and more slowly as the administered dose increases. The assumption of a linear relationship between administered and delivered doses in the case of vinyl chloride could result in underestimation of the cancer risk associated with low doses. These examples illustrate how using pharmacokinetic models can reduce the uncertainty in risk estimation by modifying the dose values used in dose-response modeling to reflect the nonlinearity of metabolism. Although most pharmacokinetic models are derived from laboratory-animal data, they provide a biological framework that is useful for extrapolating to human pharmacokinetic behavior. Anatomical and physiological differences among species are well documented and easily scaled by altering model parameters for the species in question. This aspect of pharmacokinetic modeling reduces the uncertainty associated with extrapolating from animal experiments to human cancer risk. For example, considerable effort has been devoted to the development of pharmacokinetic models for methylene chloride, which is considered a rodent carcinogen. The model was initially developed on the basis of rat data, then scaled to predict human behavior. Predictions in humans were compared with published data and with the results of experiments in human volunteers. The model was shown to predict accurately the pharmacokinetic behavior of inhaled methylene chloride and its metabolite carbon monoxide in both species (Andersen et al., 1991). Use of a particular pharmacokinetic model for methylene chloride in cancer risk assessment reduces human risk estimates for exposure to methylene chloride in drinking water by a factor of 50-210, compared with estimates derived by conventional linear extrapolation and body surface-area conversions (Andersen et al., 1987). Other analyses show different results (Portier and Kaplan, 1989). What pharmacokinetic models for methylene chloride do not predict, however, is whether methylene chloride is a human carcinogen. Thus, although use of the model might improve confidence in dose estimation by replacing the conventional scaling-factor approach, it cannot predict the outcome of exposure in humans. Another way to reduce uncertainty would be to use pharmacokinetic models to extrapolate between exposure routes. If information on the disposition of an agent were available only as a result of its inhalation in the workplace, for example, and a risk assessment were required for its consumption in drinking water, appropriate models could be constructed to relate the delivered dose after inhalation to that expected after ingestion. To the committee's knowledge, pharmacokinetic models have not yet been used in a risk assessment for such regulatory purposes. Failure to include pharmacokinetic considerations in dose-response modeling contributes to the overall uncertainty in a risk assessment, but uncertainty is associated with their use as well. This uncertainty comes from several sources. First, uncertainty is associated with the pharmacokinetic model parameters them-

OCR for page 106
Page 135 selves.  Parameter values are usually estimated from animal data and can come from a variety of experimental sources and conditions. Quantities can be measured indirectly, they can be measured in vitro, and they can vary among individuals. Different data sets might be available to estimate values of the same parameters. Hattis et al. (1990) evaluated seven pharmacokinetic models for tetrachloroethylene (perchloroethylene) metabolism and found that their predictions varied considerably, primarily because of the differences in choice of data sets used to estimate values of model parameters. Moreover, analogous parameter values are also needed for humans—although some values, such as organ weights, are amenable to direct measurement and do not vary widely among humans, others, such as rate constants for enzymatic detoxification and activation, are both difficult to measure and highly variable. Second, there is uncertainty in the selection of the appropriate tissue dose available to model. For example, information might be available on the blood concentration of an agent, on its concentration in a tissue, or on the concentrations of its metabolites in the tissue. Tissue concentrations of one metabolite might be inappropriate if another metabolite is responsible for the biologic effects. Total tissue concentrations might not accurately reflect the biologically active dose if only one type of cell within the tissue is affected. Choice of an appropriate measure of tissue dose can have an effect on cancer risk estimates. Farrar et al. (1989) considered three measures of tissue dose for tetrachloroethylene: tetrachloroethylene in liver, tetrachloroethylene metabolites in liver, and tetrachloroethylene in arterial blood. Using EPA's pharmacokinetic model for tetrachloroethylene and cancer bioassay data in mice, they found that human cancer risk estimates varied by a factor of about 10,000, depending on the dose surrogate used. Interestingly, the estimates bracketed that obtained in the absence of any pharmacokinetic transformation of dose as shown in Table 7-2. This example illustrates the variation in dose and risk estimates that can be obtained under different assumptions, but it does not help to evaluate of the TABLE 7-2 Risk Estimates Based on EPA's Pharmocokinetic Model for Tetrachloroethylene and Cancer Bioassay Data in Mice Dose Surrogate Risk Estimatea Administered dose 5.57 × 10-3 Dose to liver 425 × 10-3 Dose of metabolites to liver 0.0195 × 10-3 Dose in blood 126 × 10-3 aMaximum-likelihood estimate. SOURCE: Adapted from Farrar et al., 1989.

OCR for page 106
Page 136 validity of any of the estimates in the absence of knowledge of the biologic mechanism of action of tetrachloroethylene as a rodent carcinogen and in the absence of knowledge of whether it is a human carcinogen. Although the dose of metabolites to the liver appears to be the most appropriate choice of dose surrogate, there is a high degree of nonlinearity between this dose and the tumor incidence in mice. The nonlinearity indicates either that this dose surrogate does not represent the actual biologically active dose for the particular sex-species combination analyzed by these authors or that the model does not adequately describe tetrachloroethylene pharmacokinetics. The science of pharmacokinetics seeks to gain a clear understanding of all the biological processes that affect the disposition of a substance once it enters the body. It includes the study of many active biological processes, such as absorption, distribution, metabolism (whether activation or deactivation), and excretion. Accurate prediction of delivered and biologically active doses requires comprehensive, physiologically based computer models of those linked processes. Because the science of pharmacokinetics aims to replace general assumptions with a more refined model based on the specific relationship between administered and delivered or biologically active doses, its use in risk assessment will help to reduce the uncertainties in the process and the related bias in risk estimation. Advances will come slowly and at considerable cost, because detailed knowledge of the biologically active dose of many materials must be acquired before generalizations can be confidently exploited. Nevertheless, EPA increasingly incorporates pharmacokinetic data into the risk-assessment process, and its use represents one of the clearest opportunities for improving the accuracy of risk assessments. Conclusions Developing improved methods for assessing the long-term health impacts of chemicals will depend on improved understanding of the underlying science and on more effective coordination, validation, and integration of the relevant environmental, clinical, epidemiological, and laboratory data, each of which is limited by various kinds of error and uncertainty. Goodman and Wilson (1991) have demonstrated that, for 18 of 22 chemicals studied, there is good agreement between risk estimates based on rodent data and on epidemiologic studies. Their quantitative assessment, which can be compared to the Ennever et al. (1987) qualitative evaluation of the same issue, provides stronger evidence that current risk-assessment strategies produce reasonable estimates of human experience for known human carcinogens (Allen et al., 1988). The reliability of a given health-risk assessment can be determined only by evaluating both the validity of the overall assessment and the validity of its components. Because the validity of a risk assessment depends on how well it predicts health effects in the human population, epidemiologic data are required

OCR for page 106
Page 137 for testing the predictions. To the extent that the requisite data are not already available, epidemiologic research will be necessary. An example is the study in which the New York Department of Health conducted biological monitoring for arsenic in schoolchildren (New York Department of Health, 1987). The researchers compared their findings with the arsenic concentrations predicted by the risk assessment conducted by EPA. The good agreement between the estimates and actual urinary arsenic concentrations in the children provided support for the EPA risk model. The committee believes that substantial research is warranted to validate methods, models, and data that are used in risk assessment. In some instances the magnitude of uncertainty is not well understood, because information on the accuracy of the prediction process for each model used in risk assessment is insufficient. We also note that the uncertainties tend to vary considerably; for example, uncertainties are relatively low for estimation of population characteristics, compared with those associated with extrapolation from rodents to human beings. The quality of risk analysis will improve as the quality of input improves. As we learn more about biology, chemistry, physics, and demography, we can make progressively better assessments of the risks involved. Risk assessment evolves continually, with re-evaluation as new models and data become available. In many cases, new information confirms previous assessments; in others, it necessitates changes, sometimes large. In either case, public confidence in the process demands that EPA make the best judgments possible. That an estimate of risk is subject to change is not a criticism of the process or of the assessors. Rather, it is a natural consequence of increasing knowledge and understanding. Re-evaluating risk assessments and making changes should be expected, embraced, and applauded, rather than criticized. Findings And Recommendations The following is a compilation of findings and recommendations related to evaluation of methods, data, and models for risk assessment. Predictive Accuracy and Uncertainty of Models Various methods and models are available to EPA and other organizations for conducting emission characterization, exposure assessment, and toxicity assessments. They include those used as default options and their corresponding alternatives, which represent deviations from the defaults. The predictive accuracy and uncertainty of the methods and models used for risk assessment are not clearly understood or fully disclosed in all cases. • EPA should establish the predictive accuracy and uncertainty of the methods and models and the quality of data used in risk assessment with the high

OCR for page 106
Page 138   priority given to those which support the default options. EPA and other organizations should also conduct research on alternative methods and models that might represent deviations from the default options to the extent that they can provide superior performance and thus more accurate risk assessments in a clear and convincing manner. Emission Characterization Guidelines EPA does not have a set of guidelines for emission characterization to be used in risk assessment. • EPA should develop guidelines that require a given quality and amount of emission information relative to a given risk-assessment need. Uncertainty EPA does not adequately evaluate the uncertainty in the emission estimates used in risk assessments. • Because of the wide variety of processes and differing maintenance of those sources, EPA should develop guidelines for the estimation and reporting of uncertainty in emission estimates; these guidelines may depend on the level of risk assessment. External Collaboration EPA has worked with outside parties to design emission characterization studies that have moved the agency from crude to more refined emission characterization. • EPA should conduct more collaborative efforts with outside parties to improve the overall risk-assessment process, and each step within that process. Exposure Assessment Gaussian-Plume Models In its regulatory practice, EPA has relied on Gaussian-plume models to estimate the concentrations of hazardous pollutants to which people are exposed. However, Gaussian-plume models are crude representations of airborne transport processes; because they are not always accurate, they lead to either underestimation or overestimation of concentrations. Stochastic Lagrangian and photochemical models exist, and evaluations have shown good agreement with

OCR for page 106
Page 139 observations. Also, EPA has typically evaluated its Gaussian-plume models for release and dispersion of criteria pollutants from plants with good dispersion characteristics (i.e., high thermal buoyancy, high exit velocity, and tall stacks). EPA has not fully evaluated the Gaussian-plume models for hazardous air pollutants with realistic plant parameters and locations; thus, their potential for underestimation or overestimation has not been fully disclosed. • EPA should evaluate the existing Gaussian-plume models under more realistic conditions of small distances to the site boundaries, complex terrain, poor plant dispersion characteristics (i.e., low plume buoyancy, low stack exit momentum, and short stacks), and presence of other structures in the plant vicinity. When there is clear and convincing evidence that the use of Gaussian-plume models leads to underestimation or overestimation of concentrations (e.g., according to monitoring data), EPA should consider incorporating state-of-the-art models, such as stochastic-dispersion models, into its set of concentration-estimation models and include a statement of criteria for their selection and for departure from the default option. Exposure Models EPA has not adequately evaluated HEM-II for estimation of exposures, and prior evaluations of exposure models have shown substantial discrepancies between measured and predicted exposures, i.e., yielding under prediction of exposures. • EPA should undertake a careful evaluation of all its exposure models to demonstrate their predictive accuracy (via pollutant monitoring and assessment of model input and theory) for estimating the distribution of exposures around plants that limit hazardous air pollutants. EPA should particularly ensure that, although exposure estimates are as accurate as possible, the exposure to the surrounding population is not underestimated. Population Data EPA has not previously used population activity, population mobility, and demographics in modeling exposure to hazardous air pollutants and has not adequately evaluated the effects of assuming that the population of a census enumeration district is all at the location of the district's population center. • EPA should use population-activity models in exposure assessments when there is reason to believe that the exposure estimate might be inaccurate (e.g., as indicated by monitoring data) if the default option is applied. This is particularly important in the case of potential underestimation of risk. Population mobility and demographics will also play a role in determining risk and lifetime exposures. EPA should conduct further evaluation of the use of both simple methods

OCR for page 106
Page 140   (e.g., use of center of the population examined) and more comprehensive tools (e.g., NEM and SHAPE exposure models). Human-Exposure Model EPA uses the Human-Exposure Model (HEM) to evaluate exposure associated with hazardous air-pollutant releases from stationary sources. This model generally uses a standardized EPA Gaussian-plume dispersion model and assumes nonmobile populations residing outdoors at specific locations. The HEM construct will not provide accurate estimates of exposure in specific locations and for specific sources and contaminants where conditions do not match the simplified exposure and dispersion-model assumptions inherent in the standard HEM components. • EPA should provide a statement on the predictive accuracy and uncertainty associated with the use of the HEM in each exposure assessment. The underlying assumption that the calculated exposure estimate based on the HEM is a conservative one should be reaffirmed; if not, alternative models whose performance has been clearly demonstrated to be superior should be used in exposure assessment. These alternative models should be adapted to include both transport and personal activity and mobility into an exposure-modeling system to provide more accurate, scientifically founded, and robust estimates of pollutant exposure distributions (including variability, uncertainty, and demographic information). Consideration may be given to linking these models to geographic information systems to provide both geographic and demographic information for exposure modeling.   EPA generally does not include non-inhalation exposures to hazardous air pollutants (e.g., dermal exposure and bioaccumulation); its procedure can lead to underestimation of exposure. Alternative routes can be an important source of exposure. Modeling systems similar to extensions of HEM have been developed to account for the other pathways. • EPA should explicitly consider the inclusion of noninhalation pathways, except where there is prevailing evidence that noninhalation routes—such as deposition, bioaccumulation, and soil and water uptake—are negligible. Assessment of Toxicity Extrapolation from Animal Data for Carcinogens EPA uses laboratory-animal tumor induction data, as well as human data, for predicting the carcinogenicity of chemicals in humans. It is prudent and reasonable to use animal models to predict potential carcinogenicity; however, additional information would enhance the quantitative extrapolation from animal models to human risks.

OCR for page 106
Page 141 • In the absence of human evidence for or against carcinogenicity, EPA should continue to depend on laboratory-animal data for estimating the carcinogenicity of chemicals. However, laboratory-animal tumor data should not be used as the exclusive evidence to classify chemicals as to their human carcinogenicity if the mechanisms operative in laboratory animals are unlikely to be operative in humans; EPA should develop criteria for determining when this is the case for validating this assumption and for gathering additional data when the finding is made that the species tested are irrelevant to humans.   EPA uses data that generally assume that exposure of rats and mice after weaning and until the age of 24 months is the most sensitive and appropriate test system for conservatively predicting carcinogenicity in humans. These doses miss exposure of animals before they are weaned including newborns. Furthermore, the sacrifice of animals at the age of 2 years makes it difficult to estimate accurately the health affects of a disease whose incidence increases with age (as does that of cancer). • EPA should continue to use the results of studies in mice and rats to evaluate the possibility of chemical carcinogenicity in humans. EPA and NTP are encouraged to explore the use of alternative species to test the hypothesis that results obtained in mice and rats are relevant to human carcinogenesis, the use of younger animals when unique sensitivity might exist for specific chemicals, and the age-dependent effects of exposure.   EPA typically extrapolates data from laboratory animals to humans by assuming that the delivered dose is proportional to the administered dose, as a default option. Alternative pharmacokinetic models are used less often to link exposure (applied dose) to effective dose. • EPA should be encouraged to continue to explore and, when it is scientifically appropriate, incorporate mechanism-based pharmacokinetic models that link exposure and biologically effective dose. The location of tumor formation in humans is related to route of exposure, chemical properties, and pharmacokinetic and pharmacodynamic factors, including systemic distribution of chemicals throughout the body. Thus, tumors might be found at different sites in humans and laboratory animals exposed to the same chemical. EPA has accepted evidence of carcinogenicity in tissues of laboratory animals as evidence of human carcinogenicity without necessarily assuming correspondence on a tumor-type or tissue-of-origin basis. EPA has extrapolated evidence of tumorigenicity by one route to another route where route-specific characteristics of disposition of the chemical are taken into account. EPA has traditionally treated almost all chemicals that induce cancer in a similar manner, using a linearized multistage nonthreshold model to extrapolate from large exposures and associated measured responses in laboratory animals to small exposures and low estimated rates of cancer in humans.

OCR for page 106
Page 142 • Pharmacokinetic and pharmacodynamic data and models should be validated, and quantitative extrapolation from animal bioassays to humans should continue to be evaluated and used in risk assessments. EPA should continue to use the linearized multistage model as the default for extrapolating from high to low doses. If information on the mechanism of cancer induction suggests that the slope of the linearized multistage model is not appropriate for extrapolation, this information should be made an explicit part of the risk assessment. If sufficient information is available for an alternative extrapolation, a quantitative estimate should be made. EPA should develop criteria for determining what constitutes sufficient information to support an alternative extrapolation. The evidence for both estimates should be made available to the risk manager. Extrapolation of Animal Data on Noncarcinogens EPA uses a semiquantitative NOAEL-uncertainty factor approach to regulating human exposure to noncarcinogens. • EPA should develop biologically based quantitative methods for assessing the incidence and likelihood of noncancer effects in an exposed population. These methods should permit the incorporation of information on mechanisms of action, as well as on differences in population and individual characteristics that affect susceptibility. The most sensitive end point of toxicity should continue to be used for establishing the reference dose. Classification of Evidence of Carcinogenicity EPA's narrative descriptions of the evidence of carcinogenic hazards are appropriate, but a simple classification scheme is also needed for decision-making purposes. The current EPA classification scheme does not capture information regarding the relevance to humans of animal data, any limitations regarding the applicability of observations, or any limitations regarding the range of carcinogenicity outside the range of observation. The current system might thus understate or overstate the degree of hazard for some substances. • EPA should provide comprehensive narrative statements regarding the hazards posed by carcinogens, to include qualitative descriptions of both: 1) the strength of evidence about the risks of a substance; and 2) the relevance to humans of the animal models and results and of the conditions of exposure (route, dose, timing, duration, etc.) under which carcinogenicity was observed to the conditions under which people are likely to be exposed environmentally. EPA should develop a simple classification scheme that incorporates both these elements. A similar scheme to that set forth in Table 7-1 is recommended. The agency should seek international agreement on a classification system.

OCR for page 106
Page 143 Potency Estimates EPA uses estimates of a chemical's potency, derived from the slope of the dose-response curve, as a single value in the risk-assessment process. • EPA should continue to use potency estimates—i.e., unit cancer risk—to estimate an upper bound on the probability of developing cancer due to lifetime exposure to one unit of a carcinogen. However, uncertainty about the potency estimate should be described as recommended in Chapter 9. Although EPA routinely cites available human evidence, it does not always rigorously compare the quantitative risk-assessment model based on rodent data with available information on molecular mechanisms of carcinogenesis or with available human evidence from epidemiological studies. • Because the validity of the overall risk-assessment model depends on how well it predicts health effects in the human population, EPA should acquire additional expertise in areas germane to molecular and mechanistic toxicology. In addition, EPA should also acquire additional epidemiological data to assess the validity of its estimates of risk. These data might be acquired in part by formalizing a relationship with the National Institute for Occupational Safety and Health to facilitate access to data from occupational exposures.