3
The Scientific Basis for Polygraph Testing

Evidence relevant to the validity of polygraph testing can come from two main sources: basic scientific knowledge about the processes the polygraph measures and the factors influencing those processes, and applied research that assesses the criterion validity or accuracy of polygraph tests in particular settings. This chapter considers the first kind of evidence; the second is considered in Chapters 4 and 5.

We begin by discussing the importance of establishing a solid scientific basis, including empirically supported theory, for detection of deception by polygraph testing. We then present the main arguments that have been used to provide theoretical support for polygraph testing and evaluate them in relation to current understanding of human psychological and physiological responses. We also consider arguments based on current knowledge of psychology and physiology that raise questions about the validity of inferences of deception made from polygraph measures. We conclude with an assessment of the strength of the scientific base for polygraph testing.

THE SCIENTIFIC APPROACH

To an investigator interested in practical lie detection, basic science may seem irrelevant. The essential question is whether a technique works in practice: whether it provides information about guilty or deceptive individuals that cannot be obtained from other available techniques. As Chapter 2 makes clear, however, it can be very difficult in field situations



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 65
The Polygraph and Lie Detection 3 The Scientific Basis for Polygraph Testing Evidence relevant to the validity of polygraph testing can come from two main sources: basic scientific knowledge about the processes the polygraph measures and the factors influencing those processes, and applied research that assesses the criterion validity or accuracy of polygraph tests in particular settings. This chapter considers the first kind of evidence; the second is considered in Chapters 4 and 5. We begin by discussing the importance of establishing a solid scientific basis, including empirically supported theory, for detection of deception by polygraph testing. We then present the main arguments that have been used to provide theoretical support for polygraph testing and evaluate them in relation to current understanding of human psychological and physiological responses. We also consider arguments based on current knowledge of psychology and physiology that raise questions about the validity of inferences of deception made from polygraph measures. We conclude with an assessment of the strength of the scientific base for polygraph testing. THE SCIENTIFIC APPROACH To an investigator interested in practical lie detection, basic science may seem irrelevant. The essential question is whether a technique works in practice: whether it provides information about guilty or deceptive individuals that cannot be obtained from other available techniques. As Chapter 2 makes clear, however, it can be very difficult in field situations

OCR for page 65
The Polygraph and Lie Detection to determine scientifically whether or how well the polygraph (or any other technique for the psychophysiological detection of deception) “works.” The appropriate criterion of validity can be slippery; truth is often hard to determine; and it is difficult to disentangle the roles of physiological responses, interrogators’ skill, and examinees’ beliefs in order to make clear attributions of practical results to the validity of the test. Given all these confounding factors in the case evidence, even the most compelling anecdotes from practitioners do not constitute significant scientific evidence. Evidence of scientific validity is essential to give confidence that a test measures what it is supposed to measure. Such evidence comes in part from scientifically collected data on the diagnostic accuracy of a test with certain examiners and examinees. Evidence of accuracy is critical to test validation because it can demonstrate that the test works well under specific conditions in which it is likely to be applied. Evidence of accuracy is not sufficient, however, to give confidence that a test will work well across all examiners, examinees, and situations, including those in which it has not been applied. This limitation is important whenever a test is used in a situation or on a population of examinees for which accuracy data are not available and especially when scientific knowledge suggests that the test may not perform in the same way in the new situation or with the new population. This limitation of accuracy data is particularly serious for polygraph security screening because the main target populations, such as spies and terrorists, have not been and cannot easily be subjected to systematic testing. Confidence in polygraph testing, especially for security screening, therefore also requires evidence of its construct validity, which depends, as we have noted, on an explicit and empirically supported theory of the mechanisms that connect test results to the phenomenon they purport to be diagnosing. A test with good construct validity is one that uses methods that are defensible in light of the best theoretical and empirical understanding of those mechanisms, the external factors that may alter the mechanisms and affect test results, and the measurement issues affecting the ability to detect the signal of the phenomenon being measured and exclude extraneous influences. Only to the extent that a diagnostic test meets these construct validity criteria can one have confidence that it will work well in new situations and with different kinds of examinees. A well supported theory of the test is also essential to provide confidence that the test will work well in the face of efforts examinees may make to produce a false negative result. Spies and terrorists may be strongly motivated to learn countermeasures to polygraph tests and may develop potential countermeasures that have not been studied. To have confidence that such measures will fail or will be detected requires basic

OCR for page 65
The Polygraph and Lie Detection understanding of the physiological measures used in polygraph testing and of the ways they respond to various intentional activities of examinees. Issues of construct validity such as these are likely to arise in courts operating under Daubert and the Federal Rules of Evidence or under analogous state rules, which require that the admissibility of evidence be judged on the basis of the validity of the underlying scientific methods (see Saxe and Ben-Shakhar, 1999). For polygraph lie detection, scientific validity rests on the strength of evidence supporting all the inferential links between deception and the test results. Inferences from polygraph tests presume that deception on relevant questions uniquely causes certain psychological states different from those caused by comparison questions, that those states are tied to certain physiological concomitants, that those physiological responses are the ones measured by the polygraph instrument, that polygraph scoring systems reflect the deception-relevant aspects of the physiological responses, and that the interpretation of the polygraph scores is appropriate for making the discrimination between deception and truthfulness.1 Inferences also presume that factors unrelated to deception do not interfere with this chain of inference so as to create false test results that misdiagnose the deceptive as truthful or vice versa. A knowledge base to support the scientific validity of polygraph testing is one that adequately addresses those inferences. It would include evidence that answers such questions as the following: Are the procedures used to measure the physiological changes said to be associated with deception standardized and scientifically valid?2 Does the act of deception reliably cause identifiable changes in the physiological processes the polygraph measures (e.g., electrodermal, cardiovascular)? Is deception the only psychological state that would cause these physiological changes in the context of the polygraph test? Does the type of lie (rehearsed, spontaneous) affect the nature of the physiological changes? If the correlation between deception and the physiological response is not perfect, what are the mechanisms by which a truthful response can produce a false positive? Considering such mechanisms, how can the test procedure minimize the chances of false positive results? If the correlation between deception and the physiological response is not perfect, what are the mechanisms by which a deceptive response could produce a false negative result (i.e., mechanisms that would allow for effective countermeasures)?

OCR for page 65
The Polygraph and Lie Detection Considering such mechanisms, how can the test procedure minimize the chances of false negative results? Are the mechanisms relating deception to physiological responses universal for all people who might be examined, or do they operate differently in different kinds of people or in different situations? Is it possible that measured physiological responses do not always have the same meaning or that a test that works for some kinds of examinees or situations will fail with others? How might the test results be affected by the examinee’s personality or frame of mind? For example, can recent stress change the likelihood that an examinee will be judged deceptive? How might expectancies and personal interactions between an examiner and an examinee affect the reliability and validity of the physiological measurements? For example, might a test result have been different if a different examiner had given the test? How might the wording or presentation of the relevant or comparison questions affect an examinee’s differential physiological responses? For example, if a test procedure gives the examiner latitude in formulating relevant or comparison questions, might the test results be affected by the particular questions that are used? Which theory of psychophysiological detection of deception has the strongest scientific support? Which testing procedures are most consistent with this theory? These questions are central to developing an approach to the psychophysiological detection of deception that is scientifically justified and that deserves the confidence of decision makers. Although many of the questions are in the realms of basic science in psychology, physiology, and measurement, answering them also has major practical importance. For example, a well-supported theory of the physiological detection of deception can clarify how much latitude, if any, examiners can be given in question construction without undermining the validity of the test. It may also specify countermeasures by which an examinee can act intentionally to create false readings that lead to misinterpretations of polygraph results and thus can help examiners anticipate their use and develop counterstrategies. Research focused only on establishing accuracy does not provide an adequate basis for confidence in a test because it inevitably leaves many critical questions unanswered. Consider, for example, some inherent limitations of a standard research approach in which some individuals are asked to lie about a mock crime they have committed and the polygraph is used to distinguish those examinees from others who have only witnessed the mock crime or who have no knowledge of it. If the polygraph performs well in this experiment, one can only

OCR for page 65
The Polygraph and Lie Detection conclude that it “works” for people like the examinees in situations like the mock crime. There would be many unanswered questions, including: Would the physiological responses be the same if the crime had been real? Would the test procedure perform as well if the deceptive examinees had been coached in ways to make it difficult for examiners to discriminate between their responses to relevant and comparison questions? Would the test procedure have performed as well if the examinees had been from different cultural backgrounds? Would the test procedure work as well for the people most likely to commit the target infractions as for other people (for example, are there systematic differences between these groups of people that could affect test results)? Would a polygraph test procedure that performs well in specificevent investigations perform as well in a screening setting, when the relevant questions must be asked in a generic form? Would different examiners who constructed the relevant and comparison questions in slightly different ways have produced equally good results? Such questions can sometimes be answered by additional research, for instance, using different kinds of examinees or training some of them in countermeasures. But it is never possible to test all the possible kinds of examinees or countermeasures. A solid theoretical and scientific base is also valuable for improving a test because it can identify the most serious threats to the test’s validity and the kinds of experiments that need to be conducted to assess such threats; it can also tell researchers when further experiments are unlikely to turn up any new knowledge. In such ways, a solid scientific base is important for developing confidence in any technique for the psychophysiological detection of deception and critical for any technique that may be used for security screening. THEORIES OF POLYGRAPH TESTING Polygraph specialists have engaged in extensive debate about theories of polygraph questioning and responding in the context of a controversy about the validity of comparison question versus concealed information test formats. We are more impressed with the similarities among polygraph testing techniques than with the differences, although some of the differences are important, as we note at appropriate places in this and the following chapters. The most important similarities concern the physiological responses measured by the polygraph instrument, which are es-

OCR for page 65
The Polygraph and Lie Detection sentially the same across test formats. Factors that affect these physiological responses, including many factors unrelated to deception or attempts to conceal knowledge, have similar implications for the validity of all tests that measure those responses. Polygraph Questioning Polygraph practice is built on comparing physiological responses to questions that are considered relevant to the investigation at hand, which evoke a lie from someone who is being deceptive, with responses to comparison questions to which the person responds in a presumably known way (e.g., tells the truth or a probable or directed lie). The responses are compared only for one individual because it is recognized that there are individual differences in basal physiological functioning, physiological reactivity, and physiological response hierarchies (for more information, see Davidson and Irwin, 1999; Cacioppo et al., 2000; Kosslyn et al., 2002). Because of individual differences, the absolute magnitude of an individual’s physiological response to a relevant question cannot be a valid indicator of the truthfulness of a response. According to contemporary theories of polygraph questioning, individuals who are being deceptive or truthful in responding to relevant questions show different patterns of physiological response when their reactions to relevant and comparison questions are compared. In the relevant-irrelevant test format, the theory is that a guilty person, who is deceptive only to the relevant questions, will react more to those questions; in contrast, an innocent person, who is truthful about all questions, will not respond differentially to the relevant questions. In the comparison question format, a guilty person lies both to the relevant and the comparison questions (which are constructed to generate probable or directed lies), while the innocent person lies to the comparison but not the relevant question. The theory is that the innocent person will show equal or less physiological responsiveness to relevant than comparison questions and that the guilty person will show greater responsiveness to relevant than comparison. In the concealed information format, the theory is that examinees will respond most strongly to questions related to their actual knowledge and experience, so that concealed information will be revealed by a stronger response to questions that touch on that information than to the comparison questions. Examinees without special information to conceal will not respond differentially across questions. The specific nature of the relevant and comparison questions depends on the purpose and type of test. In specific-incident tests using the relevant-irrelevant format, the relevant question(s) focus on specifics of the target event about which a guilty individual would have to lie to conceal

OCR for page 65
The Polygraph and Lie Detection guilt. The typical comparison questions are very unlikely to yield deceptive responses (e.g., “Is today Friday?”). Specific-incident polygraph tests using comparison question test formats look like those in the relevant-irrelevant format. The comparison questions are specially formulated during a pretest interview with the intent to make an innocent examinee very concerned about them and either lie with high likelihood (a probable lie comparison question) or lie under instruction (a directed lie comparison question, such as, “During the first 18 years of your life did you ever steal something from someone who trusted you?”). Such comparison questions are often very similar to those used in lie scales or validity scales on personality questionnaires, except that the polygraph examiner is usually given latitude in choosing questions, so that different examinees may be asked different comparison questions at the same point in the test. The comparison questions tend to be more generic than the relevant questions in that they do not refer to a specific event known to the examiner. Concealed knowledge specific-incident tests ask about specific details of the target event that the examinee would be unlikely to know unless present at the scene (e.g., “Was the victim wearing a red dress? A yellow dress? A blue dress?”). The relevant questions are those that note accurate details; the comparison questions present false details of the same aspect of the event. If the stimuli that produce the strongest responses consistently correspond to actual details of the incident, the respondent is judged to have concealed information about the incident. In employee and preemployment screening tests, the relevant questions focus on generic acts, plans, associations, or behaviors (e.g., “Have you engaged in an act of sabotage?”) because the examiner does not know of a specific event. Comparison questions are typically also generic, but unrelated to the target event, and may in fact be the same questions used in specific-incident testing using the comparison question format. The concealed information format cannot be used if the examiner lacks specific knowledge that can be used in formulating relevant questions. Psychophysiological Responses Polygraph testing is based on the presumptions that deception and truthfulness reliably elicit different psychological states across examinees and that physiological reactions differ reliably across examinees as a function of those psychological states. Comparison questions are designed to produce known truthful or deceptive responses and therefore to produce physiological responses that can be compared with responses to relevant questions to detect deception or truthfulness. To have a well-supported theory of psychophysiological detection of deception, it is therefore nec-

OCR for page 65
The Polygraph and Lie Detection essary to identify the relevant psychological states and to understand how those states are linked to characteristics of the test questions intended to create the states and to the physiological responses the states are said to produce. Marston (1917), Larson (1922), and Landis and Gullette (1925) all found elevated autonomic (blood pressure) responses when individuals engaged in deception. Marston (1917) described the underlying psychological state as fear; other writers have conceived it as arousal or excitement. The idea that fear or arousal is closely associated with deception provides the broad underlying rationale for the relevant-irrelevant test format.3 Subsequent research has confirmed that the polygraph instrument measures physiological reactions that may be associated with an examinee’s stress, fear, guilt, anger, excitement, or anxiety about detection or with an examinee’s orienting response to information (see below) that is especially relevant to some forbidden act. The comparison question test and related formats are presumed to establish a context such that an examinee who is innocent of the acts identified in the relevant questions will be at least as concerned and reactive, if not more so, in relation to lying on the comparison questions as about giving truthful answers to the relevant questions. In contrast, the examinee guilty of some forbidden acts is assumed to be more fearful, anxious, or stressed about being detected for lying—and, therefore, more reactive—to the relevant questions than the comparison questions. Several theoretical accounts have been offered to lend support to these assumptions. Although there is evidence bearing on some of the propositions underlying some of these theories, none of them has been subjected to detailed investigation in the polygraph context. Conflict Theory According to the theory of conflict (Davis, 1961), two incompatible reaction tendencies aroused at the same time produce a large physiological reaction that is greater than the reaction to either alone. A life of answering questions straightforwardly would create one reaction tendency, and the circumstances that would motivate an examinee to deny the truth would create an incompatible reaction tendency. The assumption underlying variants of the comparison question technique is that a stronger reaction tendency (and, hence, greater reaction tendency incompatibility) will be aroused in response to relevant than control questions in guilty individuals than in others. Ben-Shakhar (1977) noted that the conflict hypothesis has trouble accounting for responses that are seen even when participants do not respond verbally to questions (e.g.,

OCR for page 65
The Polygraph and Lie Detection Gustafson and Orne, 1965; Kugelmass, Lieblich, and Bergman, 1967). Moreover, a conflict between an examinee and examiner, for instance, about persistent questioning of a response to a relevant question or an expectation of being falsely accused, could in theory also create especially large and repeatable responses to relevant questions even in wrongly accused examinees. Conditioned Response Theory The conditioned response theory (Davis, 1961) holds that the relevant questions play the role of conditioned stimuli and evoke in deceptive individuals an emotional (and concomitant physiological) response with which lying has been associated during acculturation. A variation of this theory holds that the stimuli associated with a major transgression serve as conditioned stimuli while the act itself (e.g., a homicide), an unconditioned stimulus, elicits a dramatic autonomic response (an unconditioned response) at the time of the transgression and produces single-trial emotional conditioning. Accordingly, the recollection of the act, elicited by the relevant question, acts as a conditioned stimulus for guilty individuals and elicits a minor autonomic response (conditioned emotional response). Innocent individuals, according to this theory, never undergo this conditioning and therefore do not show a conditioned emotional response to stimuli about the target act. There is substantial evidence that autonomic responses can be classically conditioned (Diven, 1937; Tursky et al., 1976; LeDoux, 1995). If this theory is correct, there are significant possibilities for the polygraph to misinterpret an examinee’s truthfulness because in conditioned response theory, lying is not the only possible elicitor of an autonomic response, and innocent individuals may show a conditioned emotional response triggered by some other feature of the relevant question or the manner in which it is asked. For example, questions related to traumatic experiences may produce large conditioned physiological responses even if the examinee responds truthfully—consider the psychological state of a victim or an innocent witness asked to recall specifics of a violent crime— while a lie about a trivial matter may elicit a much smaller response. Also according to this theory, relevant questions might also produce large responses in innocent examinees who have in the past experienced unfounded accusations that were associated with upsetting or punitive consequences that elevated autonomic activity. In such an examinee, a relevant question might serve as a conditioned stimulus for anger or fear similar to that associated with false accusations in the past.

OCR for page 65
The Polygraph and Lie Detection Psychological Set and Related Theories Psychological set theory (e.g., Barland, 1981) holds that when a person being examined fears punishment or anticipates serious consequences should he or she fail to deceive, such fear or anticipation produces a measurable physiological reaction (e.g., elevation of pulse, respiration, or blood pressure, or electrodermal activity) if the person answers deceptively. A variation on this theory, the threat-of-punishment theory (Davis, 1961), posits that lying is an avoidance reaction with considerably less than 100 percent chance of success, but the only one with any chance of success at all. If a person anticipates there is a good likelihood and serious consequences of being caught in the lie, then the threat of punishment when the person tries to deceive will be associated with a large physiological response. Because the consequences of lying to the comparison questions are thought to be less than lying to the relevant questions, the theory is that lying to relevant questions will be associated with larger physiological responses than lying to control questions. These theories suggest that the detection of deception will be more robust in real-life situations involving strong emotions and punishment than in innocuous interrogations or laboratory simulations. In another variation of this theory, Gustafson and Orne (1963) suggest that an individual’s motivation to succeed in the detection task will be greater in real-life settings (because the consequences of failing to deceive are grave), and this elevated motivational state will also produce elevated autonomic activation. This theoretical argument also leaves open significant possibilities for misinterpretation of the polygraph results of certain examinees. It is plausible, for instance, that a belief that one might be wrongly accused of deceptive answers to relevant questions—or the experience of actually being wrongly accused of a deceptive answer to a relevant question— might produce large and repeatable physiological responses to relevant questions in nondeceptive examinees that mimic the responses of deceptive ones. The related arousal theory holds that detection occurs because of the differential arousal value of the various stimuli, regardless of whether or not there is associated fear, guilt, or emotion (Ben-Shakhar, Lieblich, and Kugelmass, 1970; Prokasy and Raskin, 1973). The card test illustrates this theory. The card test is an information test in which an examinee selects one item from a set of matched items (e.g., a card from a deck). This item produces a different response from the others, whether the examinee denies special knowledge about any of the items (i.e., lies about the selected item) or claims special knowledge about all of the items (i.e., lies about all but the selected item) (Kugelmass, Lieblich, and Bergman, 1967).

OCR for page 65
The Polygraph and Lie Detection A related theory, Ben-Shakhar’s (1977) dichotomization theory, is built on the concepts of orienting, habituation, and signal value (Sokolov, 1963). According to dichotomization theory, stimuli are represented in terms of one of two categories—relevant and neutral—which habituate independently. A response to a given stimulus is an inverse function of the number of previous presentations of stimuli in its category and is unrelated to the number of previous presentations of stimuli in the other category (Ben-Shakhar, 1977). Dichotomization theory is seen as additive with rather than in competition with other theories. Thus, dichotomization theory emphasizes a “relevance” factor, based on the signal value of the stimulus (Sokolov, 1963), in which stimuli that are personally relevant for historical reasons yield stronger responses than neutral material made relevant in the experimental context. Orienting Theory The above theoretical accounts, all of which have been used as justification for the comparison question test format, predict that deceptive individuals will show stronger physiological reactions on relevant than on comparison questions; however, they also predict that truthful examinees, under certain conditions, will show physiological response patterns similar to those expected from deceptive examinees. They thus suggest that comparison question polygraph testing has a significant potential to lead to inferences of deception when none has occurred: that is, they suggest that the polygraph test may not be specific to deception because other psychological states that can result from stimuli arising during the test mimic the physiological signs of deception. The possibility that truthful examinees will occasionally exhibit stronger physiological responses to relevant than control questions based on chance alone also increases the possibility of false alarms. To address this issue, Lykken (1959, 1998) devised the guilty knowledge test (called here the concealed information test), based in part on orienting theory. The notion of an orienting or “what-is-it” response emerged from Pavlov’s studies of classical conditioning in dogs. Pavlov (1927:12) observed that a dog’s conditioned response to a stimulus would fail to appear if some unexpected event occurred: It is this reflex [the orienting response] which brings about the immediate response in men and animals to the slightest changes in the world around them, so that they immediately orientate their appropriate receptor organ in accordance with the perceptible quality in the agent bringing about the change, making a full investigation of it. The biological significance of this reflex is obvious.

OCR for page 65
The Polygraph and Lie Detection for example, the field includes little or no research on the emotional correlates of deception; the psychological determinants of the physiological measures used in the polygraph; the robustness of these measures to demographic differences, individual differences, intra-individual variability, question selection, attempted countermeasures, or social interaction variables in the interview context; or the best ways of measuring and scoring each physiological response for tapping the underlying emotional states to be measured. Because empirical evidence of accuracy does not exist for polygraph testing on important target populations, particularly for security screening, the absence of answers to such theoretical questions leaves important questions open about the likely accuracy of polygraph testing with target populations of interest. Relationships to Other Scientific Fields Polygraph research has not been adequately connected to at least two major scientific literatures, other than basic psychophysiology, that are also of direct relevance to improving the psychophysiological detection of deception. One of these is the research on diagnostic testing. As noted in Chapter 2, polygraph researchers and practitioners do not generally conceive of the polygraph as a diagnostic test, nor does most of the field recognize the concept of decision thresholds that is central to the science of diagnostic testing. Researchers and practitioners rarely recognize that the tradeoff between false positives and false negatives can be made as a matter of policy by setting decision thresholds. As a result, practitioners seem to make this tradeoff implicitly, sometimes in the choice of which polygraph testing procedure to use and sometimes, perhaps, in judging the likelihood that a particular examinee will be deceptive. Polygraph research also does not consider systematically the possible use of the polygraph as part of a sequence of diagnostic tests, in the manner of medical testing, with tests given in a standard order according to their specificity, their invasiveness, or related characteristics. (This approach to interpreting information from polygraph tests is discussed further in Chapter 7.) The other field that polygraph research has not for the most part benefited from is the science of psychological measurement. Psychological testing and measurement draws on nearly a century of well-developed research and theory (Nunnally and Bernstein, 1994), which has led to the development of reliable and valid measures of a wide range of abilities, personality characteristics, and other human attributes. There is substantial research dealing with the evaluation of objective tests, personality inventories, interviews, and other assessment methods, and clear

OCR for page 65
The Polygraph and Lie Detection standards for assessing and interpreting the reliability, validity, and utility of tests and assessments have been articulated and adopted by test developers and users (see Society for Industrial and Organizational Psychology, 1987; American Psychological Association, 1999). The goal of virtually all evaluations of psychological tests and assessments is to provide evidence about their construct validity. A wide range of methods (e.g., factor analyses, correlations, laboratory experiments) and types of evidence are used in investigating construct validity. Polygraph research and practice typically have not drawn on established psychometric theory or of current methods for developing and evaluating tests and measures. Some polygraph studies report inter-rater agreement in assessing charts and others report other types of reliability information, but there has been little serious effort to investigate the construct validity of the polygraph. Indeed, as already noted, it is rarely clear exactly what polygraph tests are designed to measure, or how the various pieces of data obtained from polygraph tests are thought to be linked to states or attributes of the examinee, making it difficult to even initiate the process of construct validation (Fiedler et al., in press). Despite several decades of polygraph research and practice, it is still difficult to determine the relationship, if any, between attributes of the examinee (e.g., deceptiveness, use of countermeasures) and the outcomes of a polygraph examination. There has been substantial progress in the development of psychometric methods and theory in the last 30 years. Cronbach et al. (1972) developed generalizability theory, which provides a framework for assessing measurement methods that involve multiple components or facets (polygraph outcomes might be affected by the types of questions used, by the examiner, by the context in which the examination is carried out, and so forth). Item response theory (for an overview, see Hambleton, Swaminathan, and Rogers, 1991), the method of choice for modern psychometric theory and research, provides detailed information about the relationship between the attribute or construct a test is designed to measure and responses to items and tests. McDonald (1999) has proposed a unified test theory that links traditional psychometric approaches, item response theory, and factor analytic methods. Unfortunately, none of these developments has had a substantial effect on the administration, scoring, interpretation, or evaluation of the polygraph. Modern psychometric methods are rarely if ever cited or recognized in papers and reports dealing with the polygraph, and while some studies do attempt to estimate some aspects of the reliability of polygraph examinations, none focuses on the cornerstone of modern psychometric theory and practice— the assessment of construct validity.

OCR for page 65
The Polygraph and Lie Detection Consequences for Practice Partly as a consequence of the isolation of polygraph research from related fields, polygraph practice has been very slow to adopt new technologies and methods. For example, some polygraph equipment still displays electrodermal activity as skin resistance rather than conductance, despite the fact that it has been known for decades that the latter gives a more useful measure of electrodermal response (see Fowles, 1986; Dawson, Schell, and Filion, 1990).18 There has been no systematic effort to address the basic question of how best to detect deception in criminal investigation or national security contexts. Such an effort would have led to earlier and more serious investigation of emerging physiological and neurological measurement techniques that might be expected on theoretical grounds to have potential for lie detection, particularly measurements of brain activity. Instead, there appears to be inertia among practitioners about using the familiar equipment and techniques that rely on 1920-era science and a lack of impetus from national security or criminal justice agencies, until quite recently, to develop methods and measures that might have a stronger base in modern psychophysiology and neuroscience. The field has also failed so far to make the best of knowledge about new and promising methods of data analysis that might do a better job of linking theory to measurement, for example, research on computer-based models for scoring polygraph charts. Early efforts, such as those reported by Kircher and Raskin (1988), focused on statistical discriminant analysis and used general notions (such as latency, rise, and duration) and other measures for each channel, drawing on general constructs that underlie psychophysiological detection of deception in the psychophysiology literature. But there appears to be limited justification for most specific choices of key parameters used in the formal models, and the operational measures one finds in this work often closely resemble what polygraph examiners claim to do in practice. This work was followed in the 1980s and 1990s by government-funded studies aimed at developing computer-based polygraph scoring systems that take advantage of advances in statistical and machine-learning algorithms capable of making the most of polygraph data (e.g., see Raskin et al., 1988; Raskin, Horowitz, and Kircher, 1989; Olsen et al., 1997). Those studies have not led to significant changes in practice. To the extent that the polygraph instrument measures physiological responses relevant to deception, this approach holds promise, but much of that promise has yet to be realized (see Appendix F). Unfortunately, the most recent and complex studies of this type, conducted at the Applied Physics Laboratory at Johns Hopkins University, appear to have taken a largely atheoretical approach, aiming to build a

OCR for page 65
The Polygraph and Lie Detection logistic regression detection algorithm by purely empirical means from a subset of 10,000 features extracted from physiological signals. Those efforts have not apparently built on advances in psychophysiology that might have helped in selecting features with theoretical or empirical rationales for their relevance. Social Context The above discussion might easily be read as a broad indictment of polygraph researchers; we do not intend that interpretation. Polygraph research has attracted and continues to attract well-trained and qualified scientists. We believe that the lack of progress in polygraph research is attributable not so much to the researchers as to the social context and structure of the work. Polygraph research has been guided, for the most part, by the perceived needs of law enforcement and national security agencies and the demands of the courts, rather than by basic scientific approaches to research. In this respect, polygraph research is like many other fields of forensic science. The 1923 decision in Frye v. United States (293 F.1013) did not support work on validity issues in forensic science because under Frye, courts accepted the judgment of communities of presumed experts. After Frye, the courts did not demand validation research or efforts to find the most scientifically defensible methods for the psychophysiological detection of deception. Not until the 1993 Daubert decision were courts asked to judge the admissibility of expert testimony on the basis of the scientific validity of the expert opinion. That decision brought validity issues to the fore and is likely to increase the demand for solid scientific validation. So far, however, the overall enterprise of forensic science and the subfield of polygraph research have not changed much. Meanwhile, promising young scientists from a number of relevant fields have not flocked to forensic science to make their careers. The questions being pursued have seemed far from the cutting edge of the fields in which those scientists were trained and unrelated to the major theoretical issues in those fields. Consequently, advisers in those fields have not steered their best students into forensic science, and a career in the area does not confer academic prestige. Psychophysiology and its relation to polygraph research is a case in point. Polygraph research, which has focused mainly on making incremental improvements in the way 1920s technology is used, would seem particularly unattractive to any young scientist wanting to advance understanding of modern psychology or physiology. As a result, there have been few new ideas for the research on the psychophysiological detection of deception. Polygraph and related research has been supported primarily by law

OCR for page 65
The Polygraph and Lie Detection enforcement and national security agencies whose concerns have been with practical detection of deception, not with advancing science. These concerns are perfectly valid, but they have impeded scientific progress. The fact that polygraph testing combines a diagnostic test and an interrogation practice in an almost inextricable way would be a major concern for any scientist seeking to validate the diagnostic test. The cultures of those parts of the agencies that deal with law enforcement and counterintelligence do not include traditions of scientific peer review, open exchange of information, and open critical debate that are common in scientific work. (The U.S. Department of Defense Polygraph Institute has, in the past few years, shown signs of becoming an exception to this generalization.) The culture of practice in security agencies, combined with the strong belief of practitioners in the utility of the polygraph, have made it easy for those agencies to continue their old practices. Thus, research has until quite recently focused almost exclusively on the polygraph and has been conducted within agencies that are committed to using the polygraph, believe strongly in its utility, and have seen little need to seek alternative techniques. Our conversations with practitioners at several national security agencies indicate that there is now an openness to finding techniques for the psychophysiological detection of deception that might supplement or replace the polygraph. However, both these conversations and the recent research that these agencies have sponsored on alternatives to the polygraph show a continuing atheoretical approach that does not build on or connect with the relevant scientific research in other fields. Assessment Criticisms of the scientific basis of polygraph testing have been raised since the earliest days of the polygraph. An indication of the state of the field is the fact that the validity questions that scientists raise today include many of the same ones that were first articulated in criticisms of Marston’s original work in 1917:19 My greatest reason for persistent skepticism as to the real use of the test, however, arises from the history of the subject. . . . The net result has been, I think to show that organic changes are an index of activity, of “something doing,” but not of any particular kind of activity . . . but the same results would be caused by so many different circumstances, anything demanding equal activity (intelligence or emotional) that it would be impossible to divide any individual case. Another assessment remains as true today as when it was written a half century ago (Guertin and Wilhelm, 1954:153): “There has been rela-

OCR for page 65
The Polygraph and Lie Detection tively little theoretical evaluation of the processes underlying the responses to lie detector procedure since lie detection instruments and techniques have been developed empirically in the field.” That assessment was in the introduction to a study that used factor analysis to examine the relationships of ten indices of electrodermal response and reduced them to two factors believed to have different psychological significance—one related to deception and the other to “test fright” and adaptation. Their research goal, as appropriate now as then, was to reveal basic links between psychological and physiological processes and thereby build scientific support for the choice of particular indicators of deception. This style of research, aimed at building a theory of the psychophysiological detection of deception by careful evaluation of empirical associations, has been little pursued. The same can be said of other strategies of theory building that draw on direct measurement of physiological phenomena, the techniques for which have been revolutionized over the past several decades. Essentially the same criticism was voiced two decades ago by the U.S. Office of Technology Assessment (1983:6): The basic theory of polygraph testing is only partially developed and researched. . . . A stronger theoretical base is needed for the entire range of polygraph applications. Basic polygraph research should consider the latest research from the fields of psychology, physiology, psychiatry, neuroscience, and medicine; comparison among question techniques; and measures of physiological research. More intensive efforts to develop the basic science in the 1920s would have produced a more favorable assessment in the 1950s; more intensive efforts in the 1950s would have produced a more favorable assessment in the 1980s; more intensive efforts in the 1980s would have produced a more favorable assessment now. A research strategy with better grounding in basic science might have led to answers to some of the key validity questions raised by earlier generations of scientists. Polygraph techniques might have been modified to incorporate new knowledge, or the polygraph might have been abandoned in favor of more valid techniques for detecting deception. As we have suggested, the failure to make progress seems to be structural, rather than a failure of individuals. We continue this issue in Chapter 8, where we offer some recommendations for redesigning the research enterprise that might address the structural impediments to progress. CONCLUSIONS One cannot have strong confidence in polygraph testing or any other technique for the physiological detection of deception without an ad-

OCR for page 65
The Polygraph and Lie Detection equate theoretical and scientific base. A solid theoretical and scientific base can give confidence about the robustness of a test across examinees and settings and against the threat of countermeasures and can lead to its improvement over time. The evidence and analysis presented in this chapter lead to several conclusions: The scientific base for polygraph testing is far from what one would like for a test that carries considerable weight in national security decision making. Basic scientific knowledge of psychophysiology offers support for expecting polygraph testing to have some diagnostic value, at least among naive examinees. However, the science indicates that there is only limited correspondence between the physiological responses measured by the polygraph and the attendant psychological brain states believed to be associated with deception—in particular, that responses typically taken as indicating deception can have other causes. The accuracy of polygraph tests can be expected to vary across situations because physiological responses vary systematically across examinees and social contexts in ways that are not yet well understood and that can be very difficult to control. Basic research in social psychophysiology suggests, for example, that the accuracy of polygraph tests may be affected when examiners or examinees are members of socially stigmatized groups and may be diminished when an examiner has incorrect expectations about an examinee’s likely innocence or guilt. In addition, accuracy can be expected to differ between event-specific and screening applications of the same test format because the relevant questions must be asked in generic form in the screening applications. Accuracy can also be expected to vary because different examiners have different ways to create the desired emotional climate for a polygraph examination, including using different questions, with the result that examinees’ physiological responses may vary with the way the same test is administered. This variation may be random, or it may be a systematic function of the examiner’s expectancies or aspects of the examiner-examinee interaction. In either case, it places limits on the accuracy that can be consistently expected from polygraph testing. Basic psychophysiology gives reason for concern that effective countermeasures to the polygraph may be possible. All of the physiological indicators measured by the polygraph can be altered by conscious efforts through cognitive or physical means, and all the physiological responses believed to be associated with deception can also have other causes. As a consequence, it is possible that examinees could take conscious actions that create false polygraph readings. Available knowledge about the physiological responses measured by the polygraph suggests that there are serious upper limits in principle

OCR for page 65
The Polygraph and Lie Detection to the diagnostic accuracy of polygraph testing, even with advances in measurement and scoring techniques. Polygraph accuracy may be reaching a point of diminishing returns. There is only limited room to improve the detection of deception from the physiological responses the polygraph measures. Although the basic science indicates that polygraph testing has inherent limits regarding its potential accuracy, it is possible for a test with such limits to attain sufficient accuracy to be useful in practical situations, and it is possible to improve accuracy within the test’s inherent limits. These possibilities must be examined empirically with regard to particular applications. We examine the evidence on polygraph test performance in Chapters 4 and 5. The bulk of polygraph research can accurately be characterized as atheoretical. The field includes little or no research on a variety of variables and mechanisms that link deception or other phenomena to the physiological responses measured in polygraph tests. Research on the polygraph has not progressed over time in the manner of a typical scientific field. Polygraph research has failed to build and refine its theoretical base, has proceeded in relative isolation from related fields of basic science, and has not made use of many conceptual, theoretical, and technological advances in basic science that are relevant to the physiological detection of deception. As a consequence, the field has not accumulated knowledge over time or strengthened its scientific underpinnings in any significant manner. There has been no serious effort in the U.S. government to develop the scientific base for the psychophysiological detection of deception by the polygraph or any other technique, even though criticisms of the polygraph’s scientific foundation have been raised prominently for decades. The reason for this failure is primarily structural. Because polygraph and other related research is managed and supported by national security and law enforcement agencies that do not operate in a culture of science to meet their needs for detecting deception and that also believe in and are committed to the polygraph, this research is not structured within these agencies to give basic science its appropriate place in the development of techniques for the physiological detection of deception.

OCR for page 65
The Polygraph and Lie Detection NOTES 1.   Proponents of concealed information tests argue that they rest on a different series of inferential links because the tests do not detect deception and that their admissibility in courts should therefore be judged against different criteria than comparison question tests under the Daubert rule (Ben-Shakhar, Bar-Hillel, and Kremnitzer, 2002). We discuss the different theoretical underpinnings of polygraph testing later in the chapter. 2.   The questions in this section are phrased with the presumption that the polygraph is being used to detect deception. With slightly different phrasing, they can be used to assess the validity of a polygraph test procedure that is being used to detect the examinee’s possession of concealed information. 3.   The relevant-irrelevant test format has not been the subject of sophisticated theory development or of much testing to establish construct validity. Most polygraph researchers now consider the technique fundamentally flawed on a theoretical level (e.g., Raskin and Honts, 2002). 4.   For this point to apply under orienting theory, it is necessary to assume that the orienting response is stronger for the specific issues covered by the relevant questions than for the issues evoked by the more generic comparison questions. 5.   The theories of the relevant-irrelevant and concealed knowledge polygraph techniques are somewhat different on this point. In the relevant-irrelevant test, truthful people are expected to be equally reactive to relevant and irrelevant questions, while guilty people are expected to react more strongly to the relevant questions. In the concealed knowledge test format, people without concealed knowledge will have the same reaction to all the questions in a set, while people with concealed knowledge will show a stronger response to the relevant question—the one that touches on their concealed knowledge. 6.   Some commonly used scoring systems give each physiological response equal weight. These include 7-point systems that compare each polygraph channel for each relevant question against the same channel for the appropriate comparison question and then sum these scores across channels. Other scoring methods, including the global, impressionistic scoring used for the relevant-irrelevant format and the various computerized scoring techniques for comparison question testing, do not treat the channels as having equal weight. Computer scoring systems give numerical weights to different channels (or measures using the channels) according to their value in discriminating truthful from deceptive responses in test samples. 7.   More specifically, arousal theory reflects the following empirical observations (see Cacioppo et al., 1992): (a) the autonomic control of the heart, smooth muscles, and glands is divisible into the sympathetic and parasympathetic systems; (b) postganglionic sympathetic fibers innervate the effector, where their catabolic (energetic) actions are typically mediated directly by the postganglionic release of norepinephrine and indirectly through adrenal medullary catecholamines; and (c) postganglionic parasympathetic fibers innervate specific effectors, where their anabolic (energy-conserving) actions are mediated by the neurotransmitter acetylcholine through muscarinic receptors that are not activated by blood borne catecholamines. 8.   We note that some psychological tests that have been constructed in a purely empirical manner can support fairly confident inferences about psychological processes. Confidence in such tests is based on a solid empirical record demonstrating that the particular test procedures used have consistently yielded accurate inferences with people like those being tested. This argument does not strongly justify polygraph testing for two reasons. One is that available theory raises specific doubts about the

OCR for page 65
The Polygraph and Lie Detection     validity of inferences of deception with certain populations and in certain situations that have not been resolved by empirical research. These issues are raised later in the chapter; the relevant empirical data are discussed in Chapter 5. The other is that in the case of polygraph security screening, the empirical record necessary for an atheoretical justification of the test does not exist, and is unlikely to be developed, because of the difficulty of building a large database of test results on active spies, saboteurs, or terrorists. 9.   This is the case even when the response reflects a change in the activation of a specific region of cortical tissue (see Sarter, Berntson, and Cacioppo, 1996). 10.   Converging evidence is always important in making inferences using the subtractive method because this method assumes that components or processes can be inserted or deleted without altering other components or processes (e.g., relevant and control questions differ only because the relevant questions have special meaning to deceptive individuals). This may not be true in relevant-irrelevant and comparison question polygraph tests. In concealed information tests, when only those with the information can identify the relevant items, a differential physiological response provides the basis for a stronger inference. 11.   Both terms are equal to P(deception AND physiological activity). Conditional probabilities show what proportion of a restricted sample have a certain property; thus they are ratios. The two conditional probabilities have the same numerator P(deception AND physiological activity), but different denominators p(deception) and p(physiological activity). With low base rates of deception and somewhat inaccurate tests, p(deception) can be orders of magnitude smaller than p(physiological activity), and so p(deception given physiological activity) can be orders of magnitude smaller than p(physiological activity given deception). 12.   Tests that are less accurate than DNA matching can have diagnostic value for detecting deception even though they are imperfect. Chapter 7 discusses the policy issues raised by using such tests, either alone or in combination with other sources of information, in security screening and other applications. 13.   If a test is 100 percent specific, the prosecutor’s fallacy is not a fallacy. For example, given the current state of DNA matching, finding blood with DNA that matches the defendant’s on the victim means it is virtually certain that the defendant was there and constitutes strong evidence against the defendant unless the defense has another reasonable explanation of how the blood got there. 14.   Some of these threats to validity can be ruled out if the test design provides adequate standardization or other controls. Efforts to standardize the interview process and the specific relevant and comparison questions across examinations can be helpful in this regard, and there is some such standardization in some tests, such as the Test of Espionage and Sabotage, that are used in federal employee screening programs. In addition, the concealed knowledge test approach rules out the possibility that extraneous factors may elicit differential responses to relevant and comparison questions by innocent examinees because they have no way of knowing which are the relevant questions. 15.   The effect might be different on concealed information tests. Examinees who do not have concealed information would not be able to respond differentially to relevant questions on these tests because they do not have the information needed to recognize those questions. Examinees who have concealed information, however, might respond differentially to relevant questions, with the possible result that the rate of false negative errors would be lower for stigmatized than unstigmatized groups. 16.   According to signal detection theory, it would be appropriate for expectancies about the probability that an examinee is deceptive to be reflected in the decision about what

OCR for page 65
The Polygraph and Lie Detection     threshold to use for judging a test result to indicate deception (see Green and Swets, 1966). Such changes do not alter the accuracy of the test. We are referring here to a different phenomenon, in which expectancies alter the social interaction in the test and through this interaction, affect the examinee’s physiological responses in ways unrelated to truth or deception. Such phenomena do alter the accuracy of the test. 17.   This problem may be less serious for concealed knowledge tests than for other test formats because innocent examinees in that format cannot discriminate between relevant and comparison questions. The problem is not completely obviated, however, because extraneous psychological phenomena can differentially affect the responses of examinees who have concealed knowledge and of all examinees in the event that the examiner’s knowledge of the identity of the relevant questions is subtly communicated to them. 18.   In some cases, equipment manufacturers will not reveal exactly what is being measured. 19.   Unpublished letter commenting on the work of Marston, dated December 14, 1917, from John F. Shepard to Major Robert M. Yerkes, attached to minutes of the 6th meeting of Committee on Psychology, National Research Council.