Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 269
Under the Influence? Drugs and the American Work Force APPENDIXES
OCR for page 270
Under the Influence? Drugs and the American Work Force This page in the original is blank.
OCR for page 271
Under the Influence? Drugs and the American Work Force A Methodological Issues This appendix is included to achieve two goals. The first is to alert readers unfamiliar with social research methodology to technical terms encountered in this report and to crucial methodological issues that arise in an attempt to collect and evaluate evidence relating to the causes and effects of alcohol and other drug use, the effects of drug-screening programs, and the effects of efforts designed to treat or prevent alcohol and other drug use by the work force. The second is to provide potential researchers in industry and elsewhere with guidelines concerning the strengths and weaknesses of alternative strategies for the collection of evidence. The paucity of good scientific research on many aspects of alcohol and other drug use in the workplace suggests that even generic guidelines may be useful. SCIENTIFIC JUDGMENT AND RESEARCH DESIGN The way in which a study is conducted is referred to as the design of the study. The design of a study in large measure determines the extent to which the results contribute compelling evidence for or against a hypothesized causal relation. A stronger (or better) design is one in which an observed association (between drug use and job performance, for example) is less subject to alternative or artifactual explanations. For example, a survey that asks workers about their wages and their drug use and finds the two directly related does not necessarily imply that high wages lead to increased drug use because the study's design does not rule out other plausible
OCR for page 272
Under the Influence? Drugs and the American Work Force explanations for the finding. Drug users may exaggerate their self-reported wages, people who use drugs may work harder to pay for them (rather than work less so as to spend more time in drug activities), and so on. In contrast, an experiment that found a reduction in positive drug tests at a large number of randomly selected work sites that implemented a particular prevention policy compared with smaller or no reduction at other randomly selected work sites would be subject to few alternative explanations. The extent to which a scientific hypothesis is considered ''established" does not involve proof in the sense found in logic or mathematics, but rather reflects the degree of consensus in the relevant scientific community that a series of studies supporting the hypothesis has been designed so that alternative explanations are unlikely. The definitive single study is an ideal that in some areas does not even admit of realization. Scientific consensus develops instead through a series of studies in which later research avoids the flaws of earlier work but may introduce problems of its own. It is the consistency of findings across studies with different strengths and weaknesses that allows the cumulative results of such studies to provide strong evidence for or against a hypothesis. When research findings are inconsistent, the flaws in individual studies preclude confident judgments. TYPES OF STUDY DESIGNS Epidemiologic (Observational, Nonexperimental) In epidemiologic study designs, researchers systematically observe variables related to human health, but they do not intervene to manipulate exposure to these variables. The potential power of systematic observation is illustrated by astronomy, one of the oldest and most successful of the observational sciences. There are three major types of epidemiologic designs: case-referent (or case-control) studies, cross-sectional surveys, and prospective panel (or cohort) studies. Case-referent studies are retrospective in that they begin with diseased persons (e.g., drug abusers) and then seek to identify aspects of their personal histories that differentiate them from nondiseased persons (nonabusers) in a control group. The cross-sectional survey is similar to the case-referent design except that all population members are eligible for sampling and the survey process itself separates diseased from nondiseased individuals. The case-referent design is necessary when the prevalence of a disease is so low that sampling from a population will yield too few instances of the disease for reliable analysis. Finally, the prospective panel design involves following a population or a population sample over time to
OCR for page 273
Under the Influence? Drugs and the American Work Force study the relation between risk factors measured at an earlier point in time and subsequent disease occurrence. Drug-Use-Related Issues by Epidemiologic Designs In this section, we briefly delineate a number of issues frequently encountered in epidemiologic research. Incidence of drug use: given a defined population of persons at a particular point in time, what is the probability that a member of that population will use drugs during a subsequent period of time? Prevalence of drug use: given a defined population of persons at a particular point in time, what is the probability that a member of the population is a current drug user? Risk factors for drug use: a risk factor for drug use is a characteristic of some individuals or subgroups of a population that is associated with an increased risk (relative to others in the population) of becoming a drug user. Whether or not a consistently observed risk factor is a causal determinant of drug use is difficult to establish with epidemiologic studies. Experimental studies involving randomized change in risk factors are better suited to establishing causation. However, it is often impractical to study causal relations experimentally, and judgments of causality must be derived from epidemiologic observations. Possible consequences of drug use: epidemiologic studies concerned with the consequences of drug use rely on comparisons between current or former drug users and nonusers. Treatment effectiveness: the effectiveness of treatment and prevention programs is generally addressed by experiments, but surveys can also address this issue. In particular, regional variation in the type and extent of drug treatment programs can be related to the level and rate of change of drug use in communities, cities, and states. Strengths and Weaknesses The major weakness of epidemiologic designs is that observed associations are often causally ambiguous due to factors that confound the relationship. A plausible causal relationship may be in fact due to the joint effects of unmeasured, or poorly measured, "third" variables (spuriousness); reverse causation; measurement artifacts or other explanations that the epidemiologic design does not allow the researchers to rule out. For example, a positive association between self-reported drug use and wages might be due to the joint effect of educational attainment on both variables (spuriousness); to the increased opportunity to purchase drugs that comes with higher income (reverse causation); or to exaggerated reports of wages by drug users
OCR for page 274
Under the Influence? Drugs and the American Work Force (measurement artifact). Similarly, preemployment drug-screening programs may induce drug users with a keen interest in the job for which they are applying to abstain for a period prior to the drug test. Thus, only drug users not very interested in the job might test positive. In that case, good job performance by those who passed a screening test and poor performance by those who failed (assuming they were hired anyway) would tell us little about the impact of drug use on job performance, since we would be wrong in supposing that those who had tested negative were not using drugs, and it might be lack of motivation which explained the poor performance of those who tested positive. (The results would, however, inform employers about the utility of a drug test as a preemployment screening device for identifying poor employment risks. It would just misinform them about why it was useful.) There are a number of techniques that researchers can use in an effort to determine if an observed association in an observational study is due to confounding. Statistical modeling with multivariate equations can be used to adjust for the joint effects of measured third variables; however, such modeling requires accurate measurement of third variables or the use of measurement models that estimate and adjust for well-behaved error in observed variables. Suppose, for example, that drug use is higher in one plant than another. Multivariate adjustment of drug use rates by age and sex may indicate that the difference is due to work force demographics rather than some other plant characteristic. Longitudinal observations can counter confounding due to reverse causation because the temporal sequence of changes in the hypothesized cause and effect variables can be observed. For example, observing that low self-esteem during the seventh grade predicts drug use in the twelfth grade is more convincing evidence that the trait leads to drug use than an observed association between self-esteem and drug use in the twelfth grade. If all we observed was the twelfth grade association, we could not rule out the possibility that using drugs caused the low self-esteem. Statistical modeling of cross-sectional observations can also address confounding due to reverse causation, but such "simultaneous equation" models require strong assumptions about the absence of other causal effects among variables in the model. Finally, the plausibility of measurement artifacts can be reduced through the use of multiple indicators; for example, drug use could be measured by self-report, peer report, and biochemical methods. If all measures of drug use yield the same conclusion about the relation of drug use with another variable, then the plausibility that each is due to a measurement artifact is low. The major strength of epidemiologic studies is that they can address important questions that are difficult or expensive to address experimentally. For example, only the short-term consequences of acute drug exposures
OCR for page 275
Under the Influence? Drugs and the American Work Force can be addressed in experiments with human beings. The long-term consequences of chronic use must be studied epidemiologically in humans or experimentally in animal models. Thus, some of the most important issues concerning drug use among the work force must be addressed with epidemiologic studies that minimize the potential for confounding through the use of longitudinal designs, careful measurement procedures, and appropriate statistical modeling. Quasi-Experiments: Strengths and Weaknesses The defining characteristic of a quasi-experiment is the nonrandomized manipulation of the causal (or independent) variable. Such designs in the workplace are likely to involve the assignment of work sites to conditions. For example, if one of a company's two plants begins drug testing and the other does not, the plant that started drug testing may be regarded as the experimental plant and the other as the control. Indeed, even if there were no control plant, the institution of the testing program could itself be treated as a quasi-experiment. There are many different models for quasi-experiments, and they have different strengths and weaknesses. Generally, a quasi-experimental design will be stronger (i.e., in the sense of ruling out other plausible explanations) if outcome data are available over time both before and after the experiment manipulation, if there are many units in the experimental and control conditions, and if within units there are many subjects exposed to the experimental and control conditions. These quasi-experimental designs are the most commonly used design strategies employed by applied researchers who attempt to evaluate the impact of new organizational programs. With regard to work force drug use, they can be used efficiently to assess the effectiveness of work-site drug use intervention programs (e.g., educational, drug testing) or work-site drug treatment programs. The major weakness of quasi-experiments is that the assignment of persons or work sites to experimental conditions may be confounded with other factors. For example, work sites eager to implement prevention programs may have other characteristics that will lead to a decline in drug use (even in the absence of implementing the program under evaluation), so comparisons with control sites may be misleading. Or a company may be motivated to implement a prevention program when its workers' drug use seems to have hit crisis proportions. Since exceptionally high rates of drug use may result in part from factors such as a temporary period of high stress, postintervention measurements may reveal improvement that in fact reflects a return to baseline that would have occurred absent the intervention. A major strength of quasi-experiments is that the potential limitations
OCR for page 276
Under the Influence? Drugs and the American Work Force associated with variants of these experiments are well known. For example, longitudinal observation and statistical modeling can deal with the possibility that future outcomes are consistent with preintervention trends. So-called deviation from secular trend designs (or regression discontinuity) uses a series of preintervention baseline measurements to establish a behavior trend line and then determine whether measures of postintervention behavior differ significantly from the behavior that a simple projection of the trend would have predicted (Dwyer, 1993). A design like this controls for unmeasured factors that affect both the level and trends in drug use for a particular work site. It also increases power and guards against numerous, but not all, sources of confounding. For example, it does not control for events around the time of the intervention that might explain changes in behavior. A firm that established through drug testing over time that its workers were abusing cocaine and also established a program to combat cocaine use around the time of Len Bias's death would not know whether a steep drop in cocaine use was due to its program or the publicity that Bias's death engendered about the dangers of cocaine use. This kind of threat could be partially controlled, however, by adding sites that did not institute anticocaine programs at the same time. If Bias's death was not associated with a drop in cocaine use at the control sites or if it was associated with a significantly lower drop, this competing historical explanation becomes less plausible. When a quasi-experimental design includes a number of experimental and control sites and has outcome data from these sites for a substantial period before and after the intervention, it is a particularly strong design for making causal inferences. In some circumstances, the real-world richness of the units studied may even make it superior to the more limited true experiments. Randomized Trials and Laboratory Experiments: Strengths and Weaknesses The defining characteristic of a randomized trial or experiment is the random assignment of persons or work sites to experimental and control conditions. Typically the results of well-designed true experiments provide the strongest evidence to either support or refute a hypothesis. Typical work-site drug use questions that can be addressed with such designs include assessing the effectiveness of work-site drug use prevention programs, work-site drug abuse treatment programs, and the effectiveness of drug abuse treatment programs for individuals as well as to identify the effects of drug use on individual performance. The major strength of the randomized design is that it renders the spuriousness and reverse causation explanations of association unlikely alternatives to the hypothesized causal relation under study. This is because, if randomization
OCR for page 277
Under the Influence? Drugs and the American Work Force is properly carried out, one can have confidence to a known degree of statistical probability that the units receiving the treatment are like the units serving as controls with respect to the myriad unmeasured factors that might affect the behavior of interest. Consider, for example, a field experiment in which, of 500 workers under treatment for drug abuse, 250 are randomly assigned to follow-up treatment and the other 250 are not offered follow-up. If the former group improves more than the latter, one can be confident to a known degree of statistical probability that the improvement was not due to factors, such as a greater desire to keep the job, that distinguished the follow-up group from the other and was also plausibly related to treatment success. One could not have such confidence if the workers chose whether they wanted follow-up treatment or if assignments to follow-up were made on some nonrandom basis, such as by offering follow-up to the day shifts but not to the evening or swing shifts. Despite their great strength in eliminating threats posed by confounding variables, randomized experiments are not a perfect methodology, nor are they always a feasible or even best way to search for causal relationships. They are still subject to measurement artifacts, and they can be prohibitively expensive or otherwise impractical. Large-scale experiments can also fail because of an inability to control the exposure of assigned groups to the experimental or the control treatment. In the example we have used, for instance, workers not offered follow-up aid may search it out on their own, and an effective intervention may appear to be without effect. Also, one may not know what it is about an experimental treatment that is effective, so experimental results may be misleading. In the drug treatment experiment we posit workers assigned to follow-up may feel they are especially valued by their employers, or those not assigned to treatment may feel undervalued. These feelings rather than follow-up or lack of it may affect future drug use in these groups, meaning that the experiment is no guide to what would happen if everyone or no one received follow-up treatment. The possibility of problems like these, however, should not disguise the great strengths of randomized field experiments. The more important problems associated with them are practical. Random assignment to treatment, for example, can be extraordinarily hard to implement. A supervisor told to assign workers to experimental or control groups might, for example, cheat and assign those he thinks are most likely to benefit from the treatment to that condition, or workers may informally trade places if controls to ensure against this are not in place. If deviations like these are known, they can to some extent be corrected for statistically, but some of the power of the randomized experiment is lost. Cost, too, is a factor. In particular, when treatments are randomized across units, like factories, other than across individuals, as they may be
OCR for page 278
Under the Influence? Drugs and the American Work Force when it is feared that morale problems would result if individuals within units were treated differently, only a small number of units may be assigned to the two conditions. Yet the strength of randomization depends on the random assignment of a sufficiently large number of units to weaken substantially the possibility that confounding factors would coincidentally vary with the random assignment. When only a small number of units are randomly assigned to conditions, the design should be considered quasi-experimental, and methods appropriate to quasi-experiments should be utilized (e.g., deviation from secular trend design). Laboratory studies can generally randomize individuals to conditions and thus achieve substantial internal validity by using large numbers of participants. The major practical drawback of laboratory research generally lies not in the threat that confounding variables pose for causal inference but rather in limited external validity. Because laboratory conditions are usually quite different from conditions in the world in which people live, use drugs, and work, it is often unclear how far one may generalize from what occurs in the laboratory to what occurs in the world of work. If, for example, smoking two marijuana cigarettes degrades the performance of a 19-year-old student responding to a speed addition test, it does not mean that two marijuana cigarettes would adversely affect the performance of a 35-year-old driving a truck who has smoked marijuana daily for 10 years. PRACTICAL IMPORTANCE VERSUS STATISTICAL SIGNIFICANCE Consumers of research, who are unfamiliar with the technical language of statistics, often misconstrue the term significant. When a finding is labeled statistically significant or simply significant, it seems reasonable to conclude that the finding is of practical or scientific importance—but this conclusion may be wrong. Moreover, the fact that a relationship is highly significant does not mean that it is large in magnitude. All statistical significance means is that it is unlikely (to some specified degree) that a relationship as large or larger than the one observed could arise by chance if no relationship between the variables investigated existed. When large samples are studied, relationships may be statistically significant without having important policy implications. The potential practical importance of a relationship is communicated by the magnitude of the actual difference, assuming the difference is statistically significant and so cannot be plausibly attributed to chance. But even when a study is of potential practical importance, it still may not provide a sound basis for policy decisions. All of the aspects of the study design must be considered. An observed effect in a study may be statistically significant and sufficiently large to be of practical importance, but if the
OCR for page 279
Under the Influence? Drugs and the American Work Force design is so flawed that many alternative explanations of the observed difference are plausible, then the study is a weak basis for policy decisions. Finally, even well-designed studies that identify substantial significant relationships do not necessarily mean that an intervention is justified. This further depends on the cost of the intervention, the prevalence of the problem to be prevented or solved, and the costs the problem imposes. To appreciate the interaction of these various factors, consider a company that is thinking of initiating a drug use prevention program and wisely decides to study the program's likely effects before investing heavily in its implementation. It finds that, among workers exposed to the program, 39 percent are using drugs after 6 months; among workers not exposed to the program, 40 percent of the workers are using drugs after 6 months. If there are enough workers in the exposed and unexposed groups, the difference may be statistically significant, suggesting that the program actually affects workers' drug use, but the effect is so weak that instituting the program would not be beneficial, unless its costs were truly minuscule, as for example, a program that consisted of posting a few "Just Say No" signs on company premises. Alternatively, suppose the effects were large—say 20 percent of exposed workers were using drugs after 6 months—compared with 40 percent of unexposed workers, but only 10 workers were in each group. The difference, although large, would not be statistically significant; before instituting the program the company would want to test more workers to be sure that the results were not due to chance factors. If, after exposing more workers to the treatment and control conditions, a large effect still remained, other aspects of the design might still lead the company to question the advisability of relying on the research. If, for example, the treatment had been delivered to workers with low-stress jobs while control group members had been selected from among workers with high-stress jobs and no preintervention baselines had been established, the design would not rule out the possibility that it was a job stress rather than a treatment effect that had been identified. Finally, suppose a well-designed study yielded large, significant differences between the treatment and control groups. Now the company could have confidence that it had identified a successful treatment that could be expected to substantially reduce drug use in its work force. This would still not mean that the company would necessarily want to initiate the program. If the program were expensive, if it reduced only the use of marijuana, and if the company had no reason to believe that marijuana use was impairing the performance of its workers, then the company could reasonably find that the cost of the program was not justified.
OCR for page 280
Under the Influence? Drugs and the American Work Force RESEARCH DESIGN ISSUES SPECIFIC TO DRUG USE IN THE WORKPLACE There are a number of problems, that, although not unique to studies of drug use in the workplace, are sufficiently common as to merit special mention. First, there are frequent misconceptions regarding what is the proper unit of analysis and the role of temporal sequence of events in establishing causality. Moreover, problems of error in measurement are important because the confounding effects of poorly measured variables will not be adequately adjusted when multivariate models are used in epidemiologic and quasi-experimental analyses (Dwyer, 1983). Understanding the characteristics of such errors is crucial in efforts to use multivariate models Unit of Analysis The unit of intervention, and thus of statistical inference, in workplace prevention studies is often whole companies or work sites within companies. Special problems arise in these circumstances. These include a lack of statistical power due to the small number of units studied and the need for multiple baseline observations to increase the power to detect deviations from secular trends. The appropriate statistical model when large social units are randomized to conditions is not always obvious, and in some recent reports from large community prevention trials the models used have been misspecified. The only direct estimate of sampling variability in such studies comes from between-community or between-company differences. Multiple observations over time can be used to reduce error in measures, but they cannot skirt the fundamental "degrees of freedom" problem that is inherent in a small number of experimental units (Dwyer, 1993). Temporal Sequence and Causality Researchers interested in how early exposure to some drugs affects the later use of other drugs often use longitudinal data to address issues of temporal sequence and causality. They may find, for example, that children who use cigarettes or alcohol at one point in time are more likely, than those who do not, to go on to use marijuana. The observation of such a sequence does not, however, imply that cigarette and alcohol use increase the risk of marijuana use. It may be that the use of all these drugs is influenced by the same set of social and psychological factors, and that their association arises from these underlying factors rather than from a causal sequence. Sorting out causal relationships is important. If, for
OCR for page 281
Under the Influence? Drugs and the American Work Force example, the relationship between early alcohol use and later marijuana use is spurious, removing access to alcohol could conceivably increase the use of marijuana rather than prevent its use. Measurement Errors of Drug Use Indicators Just determining who is using what drugs in which contexts is difficult. Drug tests are the basic measure of drug use in many studies and in most drug screening and treatment programs. They are powerful methodological tools in that they give consistent readings and they can accurately identify the presence of known drugs or their metabolites in tested specimens. It is tempting to interpret the high levels of reproducibility of urine tests results for certain drugs as evidence of validity of those tests. However, the validity of a test is the extent to which it measures what it is intended to measure. What a urinalysis drug test measures is the presence of drugs and/or their metabolites in urine as an indication of recent drug use. As discussed in Chapter 6, when performed according to current professional standards, the sensitivity and specificity of those tests in detecting recent drug use is very high. However, measurement problems do arise when researchers use such test results as a measure of constructs the test was not intended to measure. For example, urine testing does not provide accurate estimates of the prevalence of drug use in specified populations (see Chapter 3), it does not provide a good measure of the individual drug use involvement (i.e., use, abuse, dependence), and it is not a good measure of impairment (see Chapter 6). Urinalysis test results are not sensitive or specific measures of those latter constructs. Furthermore, drug testing is not always possible, and when it is, even if the sample of those tested is not biased by a need for cooperation, measurement of metabolic residues is problematic when the half-life of residues is shorter or longer than the time period of interest. Moreover, given the low prevalence of positive tests in many populations, it remains plausible that rare and unidentified causes of false positives are operating. Self-report measures of drug use and abuse are obviously suspect when respondents realize that admitting drug use may have adverse personal consequences, but even when there are no threatened adverse consequences, social norms may affect answers. Thus, several studies have found systematic underreporting of such behaviors as cigarette use among pregnant women, cocaine use among patients at a county hospital, drug use among arrestees, and drug use at work sites. In some instances, consistent bias in self-reports may allow valid comparisons between groups or over time. However, persons in drug use prevention programs may be more likely to underreport than controls, and temporal changes in social norms concerning drug use may change the extent of underreporting or overreporting over time. One
OCR for page 282
Under the Influence? Drugs and the American Work Force useful source of more information on this subject is a research monograph (Rouse et al., 1985) entitled Self-Report Methods of Estimating Drug Use: Meeting Current Challenges to Validity. Perhaps the most general conclusion that can be supported is that most people appear to be reasonably truthful (within the bounds of capability) under the proper conditions. The ''proper conditions" of course are the key words. When respondents believe they are guaranteed anonymity or confidentiality, when they accept the scientific or practical value of the survey, when they accept the legitimacy of the survey, and when they are not fearful of adverse consequences, then the evidence suggests that they tend to be generally truthful. More relevant to the present report, there is every reason to be cautious about self-report surveys that are conducted at a work site. Common sense suggests that employees, when asked about their substance use at a workplace, may have concerns about the uses to which the data could be put. Hence, there may be some considerable incentive for drug users to underreport their drug-using behaviors. In addition to problems of intentional underreporting, there are other potential problems with the validity of survey data, including issues of population coverage and response rates. Particularly with respect to estimation of trends, an important consideration is consistency over time. If response rates or coverage were to change, that could produce spurious changes in apparent prevalence rates. The best protection against all these threats to validity is to be aware of them and to deal with them as forthrightly as possible. The major point to be made for present purposes is that, when the circumstances allow the respondent to consider the questions reasonable and justified in terms of purpose, and when the respondent can feel reasonably certain that the answers will not be used against him or her, then self-reports can be sufficiently valid for research and policy purposes. When those conditions are not met—which is often the case in work site related research—there may well be very substantial underreporting. Psychological tests designed to assess individual drug use also have serious measurement limitations. As discussed in detail in Chapter 6, psychological tests designed to reveal current or predict future drug use have been shown to have moderate levels of validity. Combining this moderate validity with the relatively low prevalence of drug use in work site populations means that if psychological tests are used to identify drug users, false positive rates can be expected to be high. Thus, employers are unlikely to find psychological tests to be practical instruments for eliminating drug users from the ranks of employees or from applicant pools.
OCR for page 283
Under the Influence? Drugs and the American Work Force CONCLUSION AND RECOMMENDATION • The most powerful methodology for evaluating the effectiveness of workplace alcohol and other drug-intervention programs is the randomized field experiment. The implementation of new work site alcohol and other drug-intervention programs or significant changes in existing programs provide propitious occasions for experimental assessment. Recommendation: To enhance scientific knowledge, organizations instituting new work-site alcohol and other drug intervention programs should proceed experimentally if possible. Funding agencies should make field experiments a priority, and should consider providing start-up aid to private companies that are willing to institute programs experimentally and subject them to independent evaluation. REFERENCES Dwyer, J.H. 1983 Statistical models. New York: Oxford University. 1993 Estimating Statistical Power in the "Deviation From Secular Trend" Design. NIDA Research Monograph (in press). Rouse J.J., N.J. Kozel, and L. G. Richards, eds. 1985 Self-Report Methods of Estimating Drug Use: Meeting Current Challenges to Validity. NIDA Research Monograph No. 57. Rockville, Md.: National Institute on Drug Abuse.