National Academies Press: OpenBook

Enhancing Human Performance: Background Papers, Issues of Theory and Methodology (1988)

Chapter: 3 Human Performance Technologies and Expectancy Effects

« Previous: 2 Mediation of Interpersonal Expectancy Effects
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 8
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 9
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 10
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 11
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 12
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 13
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 14
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 15
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 16
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 17
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 18
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 19
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 20
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 21
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 22
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 23
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 24
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 25
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 26
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 27
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 28
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 29
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 30
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 31
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 32
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 33
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 34
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 35
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 36
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 37
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 38
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 39
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 40
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 41
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 42
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 43
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 44
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 45
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 46
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 47
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 48
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 49
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 50
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 51
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 52
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 53
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 54
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 55
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 56
Suggested Citation:"3 Human Performance Technologies and Expectancy Effects." National Research Council. 1988. Enhancing Human Performance: Background Papers, Issues of Theory and Methodology. Washington, DC: The National Academies Press. doi: 10.17226/779.
×
Page 57

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

8 w and effect size for each of the four factors, again separately for the expectancy-behavior and behavior-outcome links. For the expectancy-behavior link, the four factors were highly statistically significant ant associated with small to medium effect sizes: climate, r=.20; feedback, r=.13, input, r=.26, and output, r=.19. With respect to the behavior-outcome link, again all four factors were statistically significant, but in terms of effect size, feedback did not seem to be very important: climate, r=.36; feedback, r-.07; input, r=.33; and output ~ r=.20 . Human Performance Technologies and Expectancy Effects We now turn to a more focused discussion of the possible influence of expectancy effects on research on techniques for the enhancement of human performance. In this next section, we (a) describe paradigmatic examples of each of five research areas concerned with improving human performance, and (b) offer opinions about the extent to which expectancy effects may be influencing research results in these areas. The five areas that will be covered are those targeted for evaluation by the Committee on Techniques for the Enhancement of Human Performance; these areas are research on accelerated learning, neurolinguistic programming, mental practice, biofeedback, and parapsychology. One caveat should be emphasized in advance: It is not possible for us to conduct meta-analyses of each of these areas; instead, we will have to rely on a light review of each area and focus on some examples of typical experiments.-Consequently, we need to stress that our overall assessment is accurate only to the extent that our samples are representative. Meta-analyses of these domains would be of great value and should be undertaken for any domains for which they are not yet available.

9 Research on Accelerated Learning Many techniques for accelerating learning have been recently advanced, techniques that claim to increase the rate or amount of learning by 200-300%. We discuss now one of.these methods, the Suggestive-Accelerative Learning and Teaching (SALT) method, and offer our assessment of the extent to which expectancy effects could be responsible for the observed learning gains. The SALT technique, an Americanized version of Lozanov's (1978) Suggestopedia technique, incorporates the use of suggestion techniques and unusual verbal and nonverbal styles of presenting material to accelerate learning. A SALT lesson comprises three primary phases: preliminaries, prese-,~tation, and practice. In the preliminary phase, the students are first led through a series of physical relaxation exercises (e.g., stretching, sidebands, head flops, and progressively tensing and relaxing all muscles). Next comes a series of mental relaxation exercises, typically guided imagery exercises such as "imagine that you are lying in a meadow watching white clouds going by." The goal of the relaxation procedures is to overcome any emotional or physical barriers to learning that might have arisen from past negative learning experiences. The last part of the preliminary phase is of particular relevance to expectancy effects, for it involves the explicit induction of positive expectancies-for learning. The teacher repeatedly stresses to the class that the SALT technique makes learning easy and fun, and that as long as the students go along with the interesting things the teacher has them do, they will find themselves learning better than they had ever imagined possible. Schuster & Gritton (1985) give an example of the communication of positive

10 ~ expe c ta t ions: : - "Imagine that we have come to the end of today ' s lesson and you are now taking the short quiz over the material. See yourself looking at the quiz questions; they are easy, you know all the answers! Feel yourself smiling as you write down the answers quickly to all the easy questions. Hear yourself talking to your friends later about how easy learning is in this class..." (p. 191. Another aspect of this phase is "early pleasant learning restimulation," or inducing a positive attitude toward learning by asking students to remember some prior experience where learning was exciting and fun ~ for example learning to ride a bicycle. In this phase of the SALT technique ~ then ~ expectancy ef fects are not an experimental artifact but rather are an explicit part of the experimental manipulation. Note, however, that these expectations are intrapersonal rather than interpersonal; they are the students' self-expectancies for their performance. The second phase of the SALT process is the presentation of the material. The presentation consists of three sections; first, there is a brief review/preview, providing students with a global impression of the content of the day's lesson. Next comes dramatic presentation of the material . The teacher uses dynamic vocal intonation to present the material; for example, the first sentence is spoken in normal tone of voice, the second sentence is shouted, the third sentence is whispered, and the cycle is repeated. Lively and engaging classical music (such as Beethoven 'a "Emperor" Concerto) is played at the same volume as the teacher's voice. At the same time , the teacher instructs students to create vivid images associated with the material. The teacher then repeats the material just presented, but this time in a soft, passive voice with baroque music playing in the background. (For reasons not clearly specified in any of the articles we surveyed but having something

to do with properties of the tonal frequencies, baroque music is supposedly particularly effective.) The goal of the passive review is to increase the students' alpha waves and to induce both hemispheres of the brain to work in tandem, thus allowing the utilization of previously untapped potential. The third and final phase of a SALT lesson involves the active practice of the material by the students. This can consist of more conventional classroom exercises (e.g., problem sets) or more imaginative activities (e.g., creating skits or writing stories using the new material). Lastly, lessons may conclude with an ungraded quiz. Students generally perform very well on these tests, increasing their confidence, and the fact that the test scores are not seen or recorded by the teacher reduces student apprehens ion . We now turn to an evaluation of the research on SALT. Let us begin our review by describing a study with particularly weak methodology. Garcia (1984) wanted to test SALT on a large class of adults learning English as a second language. Rather than randomly assigning students to the experimental and control conditions , though, she instead described the procedures for the two conditions to the 80 subjects and asked them to choose which class they preferred: the traditional teaching control class, or the experimental SALT class! This fatal error in itself renders any conclusions completely suspect: If any difference is obtained between the two conditions, we cannot tell whether it was due to the efficacy of the treatment or to the fact that dif ferent kinds of students chose to go into the two sections. It seems entirely plausible that the students who are more receptive to learning would choose to go into the experimental condition. The experimental manipulation in this study included relaxation exercises, positive suggestions, active and passive presentation, and

12 practice. The author was the instructor for both classes and consequently was not blind to the hypotheses or experimental condition of the students. (This is a serious problem in terms of expectancy effects that is true of al 1 the studies on SALT, and which we will discuss in more detail later.) The next serious error committed by this author was in the analysis of the results. Because of "the large number of subjects," she selected only eight subjects from each group for analysis. The statistical power afforded by 16 subjects so low that the author practically guaranteed that she would not obtain significant results. She fount that students in the experimental group improved more than the students in the control group, but the improvement was nonsignificant, t(l4)=1.40. This t, however, is associated with an effect size of r=.35, a nontrivial effect. Hat she used the data from all the subjects, the results would probably have been significant. (However, we cannot trust her t value very much as the means she reported in the text to not correspond to the values in her table.) In sum, from beginnir- ~o end we cannot be confident of Garcia's results. The question of to what extent expectancy effects may be responsible for the results is almost moot. Now we will turn to the best example (methodologically speaking) we found of a study on the SALT technique (Gritton & Benitez-Borten, 1976). In this study, SALT techniques were used by the first author in his 8th grade science classes (10 sections, 213 students total); two other junior high schools were used as control classes (106 students total). Consequently, neither students nor schools were randomly assigned to condition, again leaving open the possibility that preexisting differences among the classrooms or students could be responsible for any obtained results. The experimental manipulation consisted of using SALT techniques (exercises, relaxation, early pleasant

learning restimulation, active and passive presentation of material) on 15 occasions throughout the semester; traditional classroom procedures were followed on the other days. The control classrooms used traditional teaching methods. The same standard text, pretest, and posttest were used in all classrooms. Analysis of the pretest scores showed that the experimental classrooms scored significantly lower than the control rooms. Analysis of covariance, adjusting posttest scores for the pretest, revealed a significant treatment effect, F(1,314)=7.69, r=.155. The adjusted posttest means were 13.55 for the experimental group and 11.21 for the two control groups combined . A Contras t comparing the experiments 1 group to the control groups computed on the gain scores (an analysis similar in spirit to the ANCOVA) yielded an even more significant treatment effect, F(1,314)=22.16, r=.257. Therefore, the Gritton ~ Benitez-Borden (1976) study, which utilized better controls and analyses, suggests a small to medium positive effect of the SALT technique. However, this study was not without its own flaws. Again, there was no randomization of students to condition, the author of the study delivered the manipulation, and was not blind to the experimental condition of the students. These are characteristics that leave open the possibility of alternative hypotheses including expectancy effects. Furthermore, experimental treatment was completely confounded with teacher, so any significant results could be due simply to the characteristics of the different teachers rather than to the SALT technique itself. The remaining empirical articles we have examined tend to fall somewhere in between the two examples described above. We are not able to conduct a thorough review of all these studies, but some description is warranted to

14 convey a better impression of the literature. Table 2 shows in summary form the results of all the empirical articles we had available to us. The second column of the table shows the effect sizes, expressed as the correlation coefficient r, illustrating the degree of effectiveness of SALT obtained in the various studies. We estimated these correlations from the data provided by the authors; we corrected statistical errors when they could be identified before computing the effect size. The last two columns of Table 2 show how the effect sizes can be interpreted using the BESD. For example, the r(l2)-.38 for the Zeiss (1984) study can be interpreted using the BE SD as meaning that receiving SALT is equivalent to increasing student improvement rates from 311 to 69%. Glancing at the effect sizes for all the studies, we see that they range from a low of -.131 (meaning that the result was in the opposite direction) to a high of .672; the mean of the 14 correlations was .29. The mean correlation weighted by sample size was somewhat lower, r=.193 . What can we conclude from this cursory review of the SALT literature? There are two issues to address: the first is the general methodological adequacy of the studies, and the second (of more relevance to the goals of this paper) is the extent to which effects of SALT may actually be due to expectancy effects. In terms of general methodological adequacy, the studies reviewed all possess weaknesses that pose serious threats to the ability to draw causal inferences about the ef f icacy of SALT . Only a single study randomly assigned subjects to conditions (experimental or control classroom), the most crucial ingredient for causal inference . Consequently, any differences found could have been caused by pre-existing differences between the two conditions or selection bias influencing which students got into which condition. Furthermore, most of the studies used only one classroom per

15 condition. This also sheds doubt on SALT as the causal agent, for any differences could conceivably have been caused by any external change or event occurring in one of the classes, influencing all the students within the class. Research of this kind is more ideally conducted by having many classrooms involved in the project and using classroom (rather than student) as a unit of analysis. Students within a classroom may not be independent in the statistical sense, and it can be misleading to consider them so. An additional weakness of these studies is the small number of teachers used. In many cases, one teacher taught the experimental SALT class, and another teacher taught the control class. As noted earlier, such a design completely confounds treatment with teacher; any obtained differences could be due to SALT or they could be due to other, irrelevant differences between the two teachers. In other studies, there was one teacher who taught both the control and experimental classes. This removes the confound just discussed but introduces other serious problems, primarily of generalizability: When there is only one teacher or experimenter, any results obtained cannot be readily generalized beyond that particular teacher. An improved design of these studies would employ several teachers (at least four to ten) and have them teach several classes each. On the basis of the preceding discussion, we conclude that the empirical evidence on SALT is so methodologically weak that it remains an open question as to whether SALT is effective, a conclusion that makes asking about interpersonal expectancy effects as a possible rival hypothesis less urgent. Suppose, however, that we pretend that the results of these studies can be trusted. To what extent, then, and in what ways could the beneficial effects of SALT be due to interpersonal expectancy effects? To answer this question 3

16 .~ we need to make the distinction between expectancy effects that are exogenous to SALT (i.e., they are expectancies communicated unintentionally as a consequence of poor experimental design and controls) and expectancy effects that are endogenous to the SALT technique itself (i.e., they are an intrinsic and intended part of SALT). This distinction is important because different courses of action would be recommended for the two types of effects: For exogenous effects, we would suggest improvements in experimental methods in order to eliminate expectancy effects. For endogenous effects, on the other hand, we would want to acknowledge the role of expectancies and see if we could apply the literature on expectancy effects to the SALT technique to make it even more effective. There is a very real possibility of exogenous expectancy effects in the SALT research. As noted earlier, the teachers were always aware of the hypotheses and experimental condition of the students; because they believed in the SALT technique, they undoubtedly expected better performance from the subjects in the SALT condition. These expectations could have been communicated clearly to the students, either overtly or subtly. Given the nature of the SALT technique, it is difficult to conceive of an experimental design in which teachers could be blind to the condition of the students. (That is, we could not conceal from the teachers which style of teaching they were using') It would also be difficult to keep teachers from guessing the hypotheses that were being tested. Is there any way, then, that the threat of exogenous expectancy effects court be eliminated? Perhaps one approach would be to use teachers naive to SALT and manipulate expectations for its efficacy. For example, one group of teachers could be given typical instructions indicating that SALT is a promising new teaching method, and other teachers

17 could be told that many studies have shown that SALT was worse than traditional techniques, but you want to give it one last try. Another approach would be to divide up the teaching responsibilities, and have a different teacher (one who did not know whether the students were in the experimental or control group) be in charge of administering the pretests and posttests. A third approach would be to automate as much of the SALT process as possible, for example, creating audiotapes of the warmup exercises or the presentation of the material. None of these approaches solves the problem completely, but they wou id he lp . Clearly, endogenous expectancy effects play a prominent role in SALT in the guise of the positive self-expectancies elicited in the students. Inducing positive expectations for learning is an explicit part of the SALT procedure. In terms of the four factor theory, the mediation of expectancies in SALT involves primarily the climate and input factors, with climate being by far the most important factor. Teachers using SALT deliberately adopt a warm, friendly interpersonal style; they praise and encourage frequently. Also present are nonverbal behaviors that go into the climate factor, for example, smiles, dynamic voice tone, speech rate, body gestures, and eye contact. With respect to input, the SALT system may increase input because each lesson is presented twice, once in an active manner and once in a passive manner. Looking back to Table l, we see that most of these behaviors were strongly implicated in the behavior-outcome link of the mediation of expectancy effects. Specifically, positive climate, praise, eye contact, input, gestures, smiles, speech rate, and encourages had combined correlations with improved student outcomes of .399, .124, .325, .332, .310, .291, .480, and .410 respectively. These values are on the whole larger than the magnitude of the

18 effects reported in research on SALT. Given the incorporation of so many of the mediating behaviors in the SALT technique, and given the literature showing the positive impact of these behaviors on student performance, it is possible that the reported effects of SALT could be due entirely to the presence of these mediating behaviors. We could test conclusively this possibility by designing SALT studies where the presence or absence of the endogenous expectancies is experimentally manipulated. That is, we could have a condition in which the explicit induction of positive expectations during the preliminary phase is deliberately omitted. This condition could also use tape-recorded relaxation exercises and class material to minimize expectancies communicated during the presentation phase. We could then compare the results found in this condition against those found for the regular SALT technique. If the effects for the experimental condition (the one where endogenous expectancies are eliminated) were significantly lower, it would indicate that a substantial portion of the effects due to SALT might be caused by the expectations communicated implicitly or explicitly by the teacher. Such a conclusion would be of great value in planning and implementing programs for accelerating learning as research could be directed to delineating more precisely the behaviors that communicate positive expectancies and to training teachers in using these behaviors. Neurol inguis t ic Programming Neurolinguistic programming (NLP) was formulated by Bandler ~ Grinder (1975, 1979) with the aim of improving interpersonal communication, particularly within the counseling context. The basic premise of NLP is that individuals process ongoing events in the world through specific

19 representational systems corresponding to the five senses of sight, sound, touch , taste , and smell . The latter two systems are rarely used, so most research focuses on differences among the visual, auditory, and kinesthetic systems. Individual differences exist in the extent to which people use each of these three systems, and the system that an individual uses most of the time is called the Preferred Representational System (PRS). Communication is hypothesized to be enhanced when both interactants use the same PRS and impeded when interactants' PRS are not matches. The bulk of research being conducted in the area of NLP tests some aspect of this hypothesis. Before discussing this research, we need to be more explicit about what a PRS is and how a person ' s PRS is assessed . Representational systems are internal, cognitive systems for modeling the world. They are expressed behaviorally through the use of perceptual predicates. For example, the predicates "I'm in touch with..." and "I feel like..." represent the kinesthetic modality; "by view is..." and "I see.." represent the visual modality; and "It sounds like . . . " and "I hear . . . " represent the auditory modal i ty . There are three primary methods of assessing the PRS, varying in degree of reactivity or obtrusiveness. The first method is to measure the direction of a sub ject's eye movements in response to questions . If the eyes move upwards in either direction, or remain closed or looking straight ahead, the PRS is visual . If the eyes move to either side or to the lower left, the PRS is auditory. Lastly, if the eyes move to the lower right, the PRS is kinesthetic. The theoretical basis for this assessment method is neurological in nature ~ involving brain lateralization ~ and an often overlooked aspect of the entire NLP model is that it holds only for left-hemisphere dominant (e.g.,

20 right-handed) individuals. The second method of assessment is more direct and consists of counting the frequencies of the various types of perceptual predicates present in the ~ubject's verbalizations. The modality us et most often by the subject is then considered that person's PRS. The third method is the most direct and involves simply explaining the three different PRS modalities and asking subjects which one they considered themselves to be. The available research on NLP tends to fall into two ma jar categories: (a) studies validating the PRS concept by examining the agreement among the three assessment methods of eye movements, verbalizations, and self-report; and (b) studies examining the effect of PRS matching or mismatching on communication effectiveness. We are more interested in the latter category of studies for the purposes of this paper, as we want to determine whether expectancy effects might be operating when NLP principles are used in applies settings. However, it is also useful to review briefly the first category of studies and to evaluate the PRS concept on more general methodological grounds. . Four studies have investigated the extent of agreement among the three PRS assessment techniques (Birholtz, 1981; Falzett, 1981; Gram, Walker, & Day, 1982; Owens, 1978~. All of these studies showed that there was no agreement among methods at acceptable levels. As an illustration, we will describe the Gumm, Walker, ~ Day (1982) study in more detail. Fifty right-handed women were given a short interview to assess PRS through verbalization, completed a 24-item questionnaire to assess PRS through self-report, and had their eye movements videotaped while answering 20 questions. There was no agreement between verbalizations and self-report, Cohen Is kappa = -.051, p=.30, or between eye movements and self-report, Cohen Is kappa = .007, p=.46; the

21 strongest agreement was between eye movements ant verbalizations, Cohen's kappa = .103, p-.09, though this result still does not reach standard levels of significance. Also interesting is the fact that the PRS system that is discovered to be the most common changes depending on which assessment method is used; most subjects are identified as having a kinesthetic PRS by the verbalization method, whereas most subjects are identified as having an auditory PRS by the eye movement method. The results of the Gumm, Walker, ~ Day (1981) study are representative of the other three studies on the issue, ant we are left with the conclusion that the PRS assessment methods have not yet demonstrated strong convergent validity. This casts some doubt on the utility of the NLP motel and makes the interpretation of results from the experiments on PRS matching, which tend to rely on only one method of assessment, more difficult. As noted earlier, the second me jar category of research studies on NLP focuses on the effects of PRS matching on communication. Within this category there are two subcategories: (a) studies in which a subject's PRS is assesses prior to the interaction, ant counselors are instructed to match or mismatch the PRS; and (b) studies in which there is no prior assessment of PRS ant counselors are instructed to either match or mismatch the subject's use of perceptual predicates as it occurs during the ongoing interaction. This distinction has important implications for the potential expression of interpersonal expectancy effects, so we will discuss prototypical research examples from each subcategory. Let us look first at research where the PRS is assessed prior to the interaction and describe a study that yielded the strongest results for PRS matching (Falzett, 1981 ) . In this study, 26 right-handed undergraduates came

22 in for 30-minute interviews conducted by two interviewers. In the first part of the interview, both the experimenter and the interviewer recorded the subject's eye movements in response to a standardized list of questions. After six questions, the interviewer and experimenter conferred about what the PRS of the subject was, and then the interview continued. In the PRS matching condition, the interviewer was to use predicates that matches the subject's PRS exlusively; in the PER mismatching condition, the interviewer was to use predicates from the two modalities other than the subject's PRS. Following the interview, subjects were administered the Coun~elor.Rating Form Trustworthiness scale, which constituted the dependent variable of subject sa t is fac t ion wi th the interview. Analyses showed a large, positive effect of PRS matching; subjects in the matching condition felt the counselor was significantly more trustworthy than did subjects in the mismatching condition. However, there are features of this experimental design that leave open the possibility that expectancy effects may be responsible for some of the positive results. The most striking characteristic is that both the interviewers and the experimenter were not blind either to the hypotheses under study or to the experimental condition of the subjects. Because they had assisted in the assessment of subjects' PRS, the interviewers knew whether or not they were matching their subjects' PRS. Also, interviewers had been "instructed in the principles of the PRS model" (Falzett, 1981; p. 306) and consequently expected more positive subject outcomes in the matching condition. Therefore, it is plausible that the interviewers behaved differently to subjects in the two conditions ant that it was those behavioral differences rather than the PRS matching per se that led to the positive results. In this particular study, the verbal content of the

23 interview was standardized, so any differences would have to have been in the nonverbal domain. The interviewers, for example, could have spoken in a warmer tone of voice, Smiler more, or engaged in greater eye contact with subjects in the PRS matching condition, nonverbal behaviors that are important in mediating expectancy effects. We do not know, of course, whether such differential behavior actually occurred in this study, but it is likely as nonverbal behavior is very difficult to monitor and control in a standardized manner. Indeed, there is indirect evidence in this study to suggest that something else other than PRS matching, such as expectancy effects, was operating. In addition to the responses to the eye movement questionnaire, the author tabulated sub jects' responses during the interview and made an assessment of their PRS using the verbalization method. However, all but three of the subjects were shown to have a kinesthetic PRS under this classification method, in sharp disagreement with the results from the eye movement method, and the author thus decided to ignore the verbalization results. The discrepancy between the two assessment methods for this study, though, does call into question the strong results found for PRS matching. Now we describe an example from the second subcategory of studies, where the interviewer "tracks" the perceptual predicates used by the subject during the ongoing interaction. It is important to note that these studies, strictly speaking, do not test the NLP model as they sidestep completely the issue of whether or not the PRS is a meaningful entity. Hammer (1983) had three graduate students in a counseling program interview 63 subjects. The interviewers received a 15 hour training program in identifying and tracking subjects' perceptual predicates, and subjects were randomly assigned to

24 matching and mismatching conditions. The interviews were partially standardized for content by having a list of questions about dorm life as a guide. After the interviews, subjects filled out an Empathy Scale. Prior to analyzing the data, the author checked the audiotapes and threw out those interactions which did not meet his criterion of following predicate matching or mismatching according to the assigned condition. He enter up omitting over 28% of the interactions, however, a sizable amount that restricts the generalizability of his results. Analyses Shower that subjects in the matching condition obtained significantly higher empathy scores than did subjects in the mismatching condition, F(1,56)=4.96, p<.05, r=.29. Again, though, we are left with the question of to what extent these results may be due to expectancy effects. In this type of Study, it is impossible for the interviewer to be blind to the condition of the subject because the interviewer has to track the subject's conversation and then make the conscious effort of matching or mismatching the perceptual predicates the subject is using. Consequently, interviewers that use ongoing predicate matching may exhibit the same sorts of differential behavior as discussed above. The behaviors most likely to change are those nonverbal cues associated with the "climate" category-- tone of voice, gaze, leans, nods, smiles, and so on. In sum, what can be concluded about the design of these studies and the possibility that expectancy effects might be occurring? Two features of experimental design appear to be the most crucial: standardization of materials and keeping interviewers blind to the experimental condition of the subjects. With respect to standardization, an optimal way of designing a study would be for experimenters to prepare a set of audiotapes or videotapes of

25 interviewers that were identical in verbal~content except for the particular perceptual predicates used; these tapes would then be listened to by subjects with kinesthetic, visual, and auditory PRS's. In other words, the Same tape would match some subjects' PRS and mismatch other subjects' PRS. The possibility that interviewers may unintentionally behave differently in the matching vs mismatching conditions, and thus the possibility of interviewer expectancy effects, is therefore eliminated. However, it must be noted that the kind of standardization suggested here involves an inevitable trade-off with experimental realism; watching a prepared videotape is not as natural and realistic as taking part in a one-on-one interview. The second design feature important in preventing expectancy effects in studies on NLP is keeping interviewers blind to the condition of the subjects. When the system of taped interviews described above is used, there is no actual interaction so the issue of keeping the interviewer blind to the condition of the subject is irrelevant. However, when actual, ongoing interactions are part of the study, whether or not the interviewer knows the PRS of the sub ject becomes very important. In the category of studies where the subject's PRS is assessed prior to the interview, the interviewer should not be told what the subject' s PRS is . In other words, the interviewer should be told, for example, "use visual predicates"; he or she would not know whether or not the visual predicates were a match or a mismatch of the sub ject ' s PRS. As noted earlier, though, it is impossible to keep interviewers blind to condition in the category of studies where the interviewers track the sub jects' ongoing use of predicates. In the section on accelerated learning, we made the distinction between endogenous and exogenous expectancy effects, that is, separating expectancies

26 .! that are an explicit part of the technique under consideration from expectancies that are communicated unwittingly as a result of poor experimental design. In the NLP area, there are no endogenous expectancy effects; the communication of positive expectancies is not part of the model. Consequently, it becomes desirable to design studies that prevent the communication of differential expectancies. Based on the above considerations, we recommend that in planning research on the NLP construct, more studies should be conducted that use standardized audio- or videotapes, or that assess subjects' PRS prior to the interaction and keep interviewers blind to the subjects' PRS. Studies where the ongoing use of predicates is tracked by the interviewer are too vulnerable to other changes in interviewer behavior; too much is left uncontrolled, making it harder to conclude that PRS matching vs mismatching is the causal agent. Given these considerations, it is very interesting to note that a recent review of the NLP research shows that the strongest support for PRS matching was obtained in tracking studies (Sharpley, 1984, p. 246). More total PRS matching occurs in the tracking studies, which might account for the stronger results obtained. However, studies that used standardized videotapes or that assessed subjects' PRS prior to the interview do not support the NLP model (Sharpley, 1984). This lack of support leaves open the very real possibility that interpersonal expectancy effects are responsible in part for the positive results found in the predicate tracking studies. Imagery and Mental Practice One idea that has held considerable appeal for many years is the hypothesis that mental practice enhances actual performance of physical tasks. For example, Dick Fosbury, the celebrated high jumper, is well known for his

27 insistence on mentally practicing for several minutes before making each jump. But is mental practice truly efficacious? In contrast to the human performance technologies discussed so far, there exists a considerable literature examining the benefits of mental practice. Much of this literature stems from the 1940s through 1960s, when the idea of mental practice was in vogue, although there has been a recent revival of research interest in mental practice in the past few years. We now turn to a description of that research, providing a summary of the findings on mental practice and our assessment of how much the findings could be due to interpersonal expectancy effects. Fortunately for our purposes, Feltz and Landers (1983) conducted a recent and thorough meta-analysis of the literature on mental practice and skill learning, summarizing the results of 60 studies. They gathered all the studies they could find that compared mental practice to either a pretest baseline or to a no-practice control group. They coded various items from each study, including what type of task was used (motor, strength, or cognitive), number and length of mental practice sessions, subjects' previous experience with the task, and so on. Examples of the three types of tasks are: (a) motor tasks-- volleyball serves, dart throwing, and basketball free throws; (b) strength tasks-- sit-ups and hand grips; and (c) cognitive tasks-- card sorting, maze tracing, and symbol substitution. The results of the Feltz and Landers (1983) meta-analysis show that the mean effect Size for mental practice across the 60 studies was equivalent to a correlation coefficient of .23, a small to medium effect. Within the three types of tasks, however, the mean effect size varied considerably; the effect of mental practice was greatest for cognitive tasks (r=.58), followed by motor tasks (r=.21), and least for strength tasks (r=.10~. There are several

28 possible explanations for these differences in effect size. If, as many researchers propose, mental practice works by allowing subjects to rehearse the symbolic components of a task, then it follows logically that mental practice would work best with tasks that have more symbolic or cognitive components. A related explanation is given by the fact that mental practice of cognitive tasks is closer to the actual practice of the task than is mental practice of a motor task. For example, symbol substitution done mentally is practically identical to symbol substitution done physically; the only difference is the lack of writing down the solutions in mental practice. A third possibility, more relevant to the goals of this paper, is that if interpersonal expectancy effects are operating in this area of research, performance on cognitive tasks would most likely be more susceptible to influence from expectations than would strength tasks. Now that we have outlined some of the major issues and findings of this research area, we turn to a more detailed description of some examples of typical research. As the studies involved share very similar design features and methodology, we will briefly describe each in turn and then discuss more generally the possible influence of expectancy effects. The first study we describe is an older one (Corbin, 1967), conducted by one of the prominent researchers in this area. In this study, 120 male undergraduates were given a juggling pretest and divided into three levels of high, medium, and low skill. Subjects were then randomly assigned within each of these levels to one of four treatment groups: control, mental practice, physical practice, and combined mental and physical practice. Subjects then engaged in the appropriate practice (mental, physical, or both) 30 times each day for 21 days. The control subjects did no rehearsal during this time. After the 21

29 days, all subjects were given a posttest. Results showed there was no significant difference between the mental practice and control group; the physical practice group and the combined group tit significantly better than the control group but were not different from each other. In Mendoza & Wichman (1978), 32 undergraduates were pretested on dart-throwing ability and randomly assigned to one of four groups: control, mental practice only, mental practice plus simulated dart-throwing movements, and physical practice. The control group was told to report back to the lab in a week; the other groups were instructed to come into the lab for fifteen-minute practice sessions twice each day for the week. Subjects in the mental practice groups were told to imagine themselves throwing darts, making the images as vivid as possible, and to correct for any imagined misses. Subjects in the mental plus motor practice group were given the same instructions but also stood up and pretended to throw darts, moving their arms appropriately. Analyses of the posttest minus pretest gain scores showed that the mental practice group scored signif icantly higher than the control group, F(1,28)=15.66, r=.60. There was no difference between the mental practice group and the mental plus motor practice group, and the physical practice group scored signif icantly higher than the two mental practice groups and the control s . The last study we will describe (Gould, Weinberg, & Jackson, 1980) compared different kinds of mental preparation strategies. This study reported two experiments. In the first, 30 undergraduates were tested on an exercise instrument designed to measure leg strength. Each subject was tested under five different instructional sets: attentional focus (concentrating on feelings in the legs), imagery (mental practice), preparatory arousal

30 (emotionally "charging up") , rest control, and counting backwards . The instructions for the imagery set, the condition of interest to us, were as follows: "...close your eyes and picture yourself kicking your leg up as hard and as fast as possible, like you were kicking a football. In addition, visualize yourself setting a new personal best on each trial" (Gould, Weinberg, & Jackson, 1980; p. 331). Subjects were given test trials for each condition in counterbalanced orders. Analyses showed that the preparatory arousal and mental practice conditions were not significantly different from each other, but both were significantly better than the other three conditions. The second experiment, rather than having instructional set as a repeated measure, randomly assigned subjects to one of three conditions: preparatory arousal, mental practice, and a rest control. Otherwise the procedure was the same as in the first experiment. These results showed that subjects in the preparatory arousal condition performed significantly better than subjects in the control condition, r=.40. The difference for the mental practice condition was not statistically significant, but the effect size was moderate and similar to that for the preparatory arousal condition, r=.34. In sum, then, this study found support for both mental practice and preparatory arousal in improving leg strength. Now that we know better what typical research in this area is like, we can ask about the possible influence of interpersonal expectancy effects. As a first considerations we can point out that in all of these studies, experimenters were not blind to the experimental condition of their subjects, a factor that we have stressed before as being crucial in the operation of expectancy effects. If experimenters are expecting better performance in the mental practice condition, and they know which sub jects are using mental

31 practice, then they might treat those subjects differently. For example, they could exhibit those nonverbal behaviors we have discussed earlier; they could smile more, sound warmer and more encouraging, engage in greater eye contact, appear more interested in the subject, and so on. As before, these differential behaviors on the part of the experimenter might be responsible in part for any differences in performance on the part of subjects in the mental practice cond it ion . A factor relevant to the above discussion concerns the nature of the control groups used in this area of research. Very often, as in the Corbin (1967) and Mendoza ~ Wichman (1978) studies, the control group merely consists of subjects who are given a pretest and then told to report back to the lab in a given number of days. The experimental subjects, on the other hand, come back to the lab repeatedly for their practice sessions. This means that subjects in the mental practice group differ from the control subjects in two important respects: (a) they are receiving the benefits of mental practice, re levant to the hypothes is of interes t; and (b) they are receiving considerably greater amounts of experimenter time and attention, relevant to the possible mediation of expectancy effects. A better way of designing these studies would be to have a control group that spends the exact same amount of time in the lab and is treated as much as possible like the mental practice group so that the only difference between the two groups would be that one of them uses mental practice and the other does not. The distinction drawn earlier in our discussion of the SALT technique between interpersonal expectancies and self-expectancies becomes relevant here. Certainly, the subjects are not themselves blind to their condition; subjects who are using mental practice know they are using it and undoubtedly

32 ,~ ..! have positive expectations about its efficacy. Possibly it is simply doing something different --a kind of Hawthorne effect ~ accompanied by the subjects' own expectations for success that brings about the improved performance rather than the specific act of imagining oneself doing a task. There is evidence in the Gould, Weinberg ~ Jackson (1980) study to suggest that this might be the case. As noted above, subjects in the preparatory arousal condition performed equally as well if not better than subjects in the mental practice condition. The instructions given to subjects in the preparatory arousal condition were "...I would like you to 'emotionally charge-up.' In essence, psych yourself up for maximum performance-- by getting mad, aroused, pumped-up or charged-up" (Gould, Weinberg, & Jackson, 1980; p. 331). These instructions would tend to elicit positive self-expectancies on the part of the subjects, yet they lack the specificity of the mental practice instructions. The fact that both instructional sets resulted in similar levels of improvement, thought suggests that mental practice per se was not essential. The possibility that self-expectancies could be influential in this area of research implies that studies need to be designed so as to include control groups that involve similar levels of self-expectancies but differ in the presence or absence of mental practice. As it stands now, studies have confounded mental practice with positive self-expectancies, and we do not know whether mental practice in itself is efficacious. In sum, our feeling is that the research on mental practice is fairly well-designed. In terms of the exogenous tendogenous distinction made earlier. there is some possibility of exogenous expectancy effects in that the experimenters are almost always aware of the experimental condition of their subjects. That can easily be rectified in future studies, however. The

33 possibility of endogenous expectancy effects, in the form of subjects' self-expectancies, is much greater and impossible to eliminate. The best that can be done is to create control groups, as mentioned above, that will permit the direct testing of self-expectancies vs. mental practice. The last point that should be made is to note that although the Feltz ~ landers (1983) meta-analysis showed a small, positive effect of mental practice, actual physical practice has been shown in many studies to be much more effective in improving performance (Corbin, 1972). In situations where physical practice is logistically and economically feasible, then, physical practice should be used. But that does not mean that mental practice may not be useful in cases where it is too expensive or impractical to use physical practice. Biofeedback The use of biofeedback in medical and therapeutic contexts has enjoyed tremendous growth over the past 20 years. Nonexistent prior to the 1960s, biofeedback was developed in response to psychophysiological experiments showing that operant conditioning could shape responses of the autonomic nervous system. As more and more clinicians began using biofeedback, it was increasingly heralded as a panacea for a variety of stress-related symptoms, such as headaches, back pain, Raynaud's disease, high blood pressure, and teeth grinding. Despite the attention lavished on it there is considerable doubt and controversy about the efficacy of biofeedback. We will discuss some of the issues involved in that controversy, particularly those relevant to expectancy effects, and describe some examples of typical research. At the same time that biofeedback was growing in popularity among clinicians, researchers began encountering puzzling failures to replicate basic laboratory demonstrations of biofeedback. In a recent article published

34 in the American Psychologist, Alan Roberts (1985) describes several of these failures to replicate. For example, he had conducted two studies showing that people can learn to control the skin temperature of their hands. However, later attempts by his lab and other researchers to replicate these findings did not succeed. He then cites a number of other reviews on the efficacy of biofeedback in different domains (e.g., headaches), reviews that all conclude there is no evidence that biofeedback works . For example, Jessup, Neufield, & Mersky ( 1979 ~ reviewed 28 studies and found no support for the specific benefit of biofeedback in reducing migraine headaches. Furthermore, this review found that the most promising results were obtained in uncontrolled studies, where the potential for expectancy effects or other experimental artifacts is greater. Roberts concluded that the current status of clinical biofeedback research is "dismal" and that "there is absolutely no convincing evidence that biofeedback is an essential or specific technique for the treatment of any condition" (Roberts, 1985, p.940~. These reviews create a gloomy impression of the biofeedback literature. However, these were traditional literature reviews, not meta-analyses, and one of the points we have stressed in this paper is that it is crucial to examine the magnitude of experimental effects when evaluating the outcomes of studies rather than concentrating merely on whether or not the study was significant at the p<.05 level. We found only one meta-analysis of biofeedback research in the limited literature review we undertook, and its conclusions were not as gloomy. In this meta-analysis, Sharpley and Rogers (1984) assessed the comparative effectiveness of biofeedback, other forms of relaxation, and control conditions in reducing frontalis EMG levels. For their dependent variable, they subtracted the mean posttest EMG level from the mean pretest

35 EMG level, and divided by the mean pretest EMG level for each condition in a given study, thereby yielding a measure of proportion drop in EMG levels. They obtained 60 such measures from 20 different studies. They found that biofeedback was the most effective in reducing frontalis EMG levels, with a mean of .426, followed by other forms of relaxation, with a mean of .332, and control conditions, with a mean of .210. An analysis of variance accompanied by post-hoc comparisons was conducted on the 60 means, and it was concluded that biofeedback was significantly better than control conditions, but there was no significant difference between biofeedback and other forms of relaxation. We computed a linear contrast of the three means, using their data, and found a highly significant linear trend, F(1,57)=12.15, p<.001 (with r=.42 based on studies as the sampling unit and r=.l3 with subjects as the sampling unit; this latter estimate is conservative to the extent that there is nonzero variation of studies within type of study), confirming that biofeedback was the most effective, control was the least effective, and relaxation was in the middle. This meta-analysis, then, indicates that biofeedback does indeed work, but its superiority over other forms of relaxation was supported only at p=.07, one-tailed. However, it should be noted that whereas control was the least effective, its mean was nonetheless .21, indicating a hefty 20X decrease in EMG levels. This decrease is presumably due to nonspecific factors such as the placebo effect, habituation effect, or regression to the mean, factors that we will discuss in detail later (White, Tursky, & Schwartz, 1985). First, however, let us describe some typical studies carried out in this area. Banner ~ Meadows (1983) randomly assigned 63 subjects to one of six experimental conditions: (a) ERG feedback; (b) temperature feedback; (c) EMG +

36 temperature feedback; (d) relaxation (subjects given relaxation instructions and hooked up to machines but given no feedback); (e) placebo control (subjects listen to soothing music, hooked up to machines, but given no feedback; and (f) wait list control (subjects receive no treatment). This design is commendable for its careful inclusion of different types of control groups; many studies in this area either do not use control groups at all or simply have a wait list control. The design of the Banner & Meadows (1983) study, though, allows the assessment of the extent to which the benefits of biofeedback may be due to nonspecific factors such as the placebo effect that comes as a consequence of the subject entering into a therapeutic relationship and being connected to very elaborate and scientific-appearing electronic equipment. All subjects (except for the wait list controls) then came into the lab for a series of 9 one-hour sessions spaced over a period of three months, following which a posttest was administered to all subjects. Analyses showed, disappointingly, no significant differences among groups on any of the dependent measures: EMG levels, finger temperatures, self-reported tension, or self-reported frequency of problems. Unfortunately, the article did not report means for the six conditions on any of these measures so we are unable to estimate effect sizes or see whether the results were even in the right direction. However, the F's for treatment condition in all these analyses were always less than 1.0, so it is unlikely that the biofeedback conditions differed significantly from the control conditions in efficacy (maximum possible F<5). Yet there were significant decreases in tension levels over time across the six conditions. The authors conclude that "we are obviously dealing here with a non-specific (placebo) effect" (Banner ~ Meadows, 1983, p.

37 191) . Another study (Gauthier, Doyon, Lacroix, & Drolet, 1983 ~ inves ligated the use of blood volume pulse biofeedback in treating migraine headaches. Twenty-one female migraine patients were assigned to one of three conditions: temporal artery constriction biofeedback, temporal artery dilation biofeedback, and a wait list control group. The treatment consisted of 16 sessions over a period of 8 weeks, and subjects were asked to practice at home. Control subjects were told to report back in two months for treatment. According to theory, only the artery constriction biofeedback group should have been effective, because artery constriction mimics the actions of the pharmacological agents used to treat migraines. Results showed, however, that both treatment groups showed significantly greater improvement than the wait list control on the dependent variables of frequency of headaches (r=.48, number of headache days (r=.49), intensity of headaches (r=.53), and duration - of headaches (r=.41~. There were no significant differences between the two feedback groups. Interpre tat ion of the s e re su It s is a little difficult. If the authors' original theorizing had been correct, then the fact that there was improvement in the artery dilation condition implies the presence of a placebo or expectancy effect. Subjects in this condition had received substantial amounts of time and attention from the experimenters, they had been attached to impressive looking equipment with flashing lights and meters, and they had been led to believe they were receiving treatment that would help their headaches. All these factors very easily could have elicited positive self-expectancies, thus creating a placebo effect that resulted in marked improvement over the control subjects, who received no special treatment over

38 the two months. The authors do not prefer this interpretation; instead, they conclude that it is more likely that the psychophysiology of headaches is not yet fully understood ant that artery dilation might actually be effective in reducing headaches . lathe former explanation is highly plausible, however, and has no t been ru led out . What doe s the al 1 the informal ion covered so far ind icate about the possible influence of interpersonal expectancy effects? In terms of exogenous expectancy effects, we find that researchers in the biofeedback area, like those in the other areas we have discussed, are most often not blind to the experimental condition of their subjects. This leaves open the possibility of differential experimenter behavior toward experimental and control subjects. Moreover, in biofeedback studies where touble-blind procedures are used, results are often negative (e.g., Guglielmi, Roberts, ~ Patterson, 1982). i Further evidence that the experimenter's behavior toward the subject is absolutely crucial in determining the outcomes of biofeedback studies is given by Taub & School (1978). First, they relay an anecdote about one of their earlier experiments on biofeedback: They discovered that one experimenter who showed an impersonal attitude toward the subjects was able to train only 2 of the 22 subjects. Another experimenter, though, who used the exact same techniques but was more informal and friendly successfully trained 19 of 21 subjects. Astounded by this result, Taub ~ School (1978) undertook a formal study to investigate this phenomenon. They experimentally manipulated experimenter behavior toward two groups of sub Sects. In the first, the experimenter adopted an impersonal attitude, e.g., using last names, discouraging conversation, and avoiding eye contact. In the second group, the experimenter adopted a friendly attitude, e.g., using first names, encouraging

39 friendly conversation, engaging in frequent eye contact. Results showed that at the end of the training series, the group treated in a friendly manner achieved a mean change in skin temperature of 4.2 degrees as compared to the mean change of 1.3 degrees for the group treated in an impersonal manner. Taub ~ School note that this difference is the largest experimental ef feet they have obtained by the manipulation of any single variable throughout their entire sequence of experiments! They conclude, "It is almost impossible to overemphasize the importance of the experimenter-attitude variable for the success of thermal biofeedback training" (Taub & School, 1978, p. 617~. Clearly, then, the potential for exogenous expectancy effects is real . But, some critics argue (e.g., Steiner & Dince, 1981; Surwit & Keefe , 1983 ) , these effects may not be exogenous at all; that is, if biofeedback is used as an ad junct to therapy, then having a therapist or experimenter who behaves warmly toward the subjects and communicates positive expectations about biofeedback constitutes an integral part of biofeedback. Even if positive experimenter expectations were an essential aspect of biofeedback therapy, however, we would still be interested in learning whether or not the actual machinery of biofeedback training contributed an independent benef it . Equally clear is the importance of endogenous expectancy ef fects in biofeedback, taking the form of the placebo ef feet or sub ject self-expectations. Placebo effects are as powerful as some of our most sophisticated drugs and treatments, and they work for an unlimited range of problems (e.g., Beecher, 1955, 1961, 1962, 1966; Shapiro, 1960, 1964) e As discussed earlier' they are of particular importance in the biofeedback area. Subjects come into the lab or clinic, often suffering from chronic disorders with psychosomatic roots. They are then wired with an elaborate series of

40 electrodes and connected to the impressive looking equipment with all the dials and meters. It seems likely that subjects believe they are receiving a powerful treatment and should rapidly improve. Indeed, Stroebel ~ Glueck (1973) speculate that biofeedback may in fact make its greatest contribution as "the ultimate placebo." Again, even though the placebo effect may be integral to how biofeedback works, research still needs to be tone to assess the independent contributions of placebo effects ant biofeedback in ant of itself. This can be accomplishes by incorporating into studies placebo control groups such as those mentioned earlier, which involve having subjects come into the lab, receive the same treatment and attention from the experimenters-, and be hooked up to biofeedback equipment; the only difference is that the subjects do not receive accurate feedback on their physiological levels. In conclusion, we can say that despite the enormous amount of research attention paid to biofeedback over the past 20 years, we cannot state conclusively that the effects attributed to biofeedback are actually produced specifically by biofeedback. There are masses of clinical and lab studies showing that biofeedback is effective, but many of these studies suffer from methodological and design flaws, and there are also many failures to replicate. We have also seen that experimenter- and subject self-expectancy effects are pervasive in this area. At this point, what is needed is a series of carefully controlled studies that address these issues. The potential benefits of biofeedback are enormous; the research we have proposed could only add to the promise of biofeedback as it would increase our understanding of how and why biofeedback works. \

41 Parapsychology There are two transition points in the recent history of parapsychology. At each point parapsychology advanced to a new level of more rigorous research and scientific respectability, though neither point earned for it full acceptance as a respectable field of scientific inquiry (Boring, 1962; Murphy, 1962, 1963; Truzzi, 1981). The first point was in 1882 when the Society for Psychical Research was founded in London by a group primarily from Cambridge University. Among the distinguished presidents of this Society were William James, Henri Bergson, Lord Rayleigh, and C.D. Broad (Schmeidler, 1968). The second point was in 1927 when William McDougall, newly arrived at Duke University, was joined by J.B. Rhine (Boring, 1950; Schmeidler, 1968). It was Rhine who established the basic procedures of parapsychological research that are still employed today. His best known method required the subject to guess which one of five designs was the "target" stimulus. Since the probability of guessing the correct design was .20, any subject's "psi" ability could be evaluated for statistical significance by comparing the obtained success rate with the .20 expected under the null hypothesis. Parapsychological investigations cover a wide variety of phenomena including: telepathy (e.g., guessing a design being viewed by another); clairvoyance (e.g., guessing a design not being viewed by another); precognition (e.g., guessing a design not yet selected); psychokinesis (e.g., trying to influence the fall of a pair of dice); survival after death (e.g., reincarnation). The first three of these are often referred to generically as ESP (extrasensory perception). Because the types of research subsumed under the topic of parapsychology range so widely, and because of the sheer number

42 3 of parapsychological investigations, we have confined our discussion to a focused domain of parapsychological inquiry: the ganzfeld experiments. Ganzfeld Experiments In these experiments subjects typically are asked to guess which of four stimuli had been "transmitted" by an agent or sender with these guesses made under conditions of sensory restriction (usually white noise and an unpatterned visual field). There were several strong reasons for the selection of this domain of parapsychological research: 1. The domain is of recent origin so that even the earliest studies managed to avoid some of the older problems found in parapsychological research (Hansen & Lehmann, 1895; Kennedy, 1938, 1939; Mall, 1898; Rosenthal, 1965, 1966; Warner & Raible, 1937~. 2. Because of the recency of the research, access to original data was more likely than for some of the older areas (Rag, 1985) . 3. The domain is considered an especially promising area of parapsychological inquiry (Hymen, 1985; Rao, 1985~. 4. Investigations in this area have been carried out by respected researchers (Hymen, 1985~. 5. The area has been the subject of recent sophisticated public debate by eminent investigators and critics of the area (Honorton, 1985; Hyman, 1985; Rao, 1985~. 6. As an outgrowth of this debate, two formal meta-analyses of this area have become available (Honorton, 1985; Hyman, 1985~. Meta-anal~tic results Five indices of "psi" success have been employed in ganzfeld research (Honorton, 1985~. One criticism of research in this area is that some

43 investigators employed several such indices in their studies and failed to adjust their reported levels of significance (a) for the fact that they had made multiple tests (Hymen, 1985). Since most studies employed a particular one of these five methods, the method of direct hits, Honorton focused his meta-analysis on just those 28 studies (of a total of 42) for which direct hit data were available. The method of direct hits scores a success only when the single correct target is chosen out of a set of e total targets. Thus the probability of success on a single trial is 1/t with t usually = 4 but sometimes 5 or 6. The other methods, employing some form of partial credit, appear to be more precise in that they use more of the information available. Although they differ in their interpretation of the results, Honorton (1985) and Hyman (1985) agree quite well on the basic quantitative results of the meta-analysis of these 28 studies. This agreement holds both for the estimation of statistical significance (Honorton, 1985, p.58) and of effect size (Hymen, 1985, p. 13~. St~m-~nd-L~f Disnl TV Table 3 shows a stem-and-leaf display of the 28 effect size estimates based on the direct hits studies summarized by Honorton (1985, p. 84~. The effect size estimates shown in Table 3 are in units of Cohen's h which is the difference between (a) the arcsine transformed proportion of direct hits obtained and-(b) the arcsine transformed proportion of direct hits expected under the null hypothesis (i.e., lit). The advantage of h over i, the difference.between raw proportions, is that all h values that are identical are identically detectable while all i values that are identical (e.g., .65-.45 and .25-.05) are not equally detectable (Cohen, 1977, p.l81~.

Al 44 The stem-and-leaf display of Cohen's h values is shown on the left and the display is summarized on the right. Tukey (1977) developed the stem-and-leaf plot as a special form of frequency distribution to facilitate the inspection of a batch of data. Each number in the data batch is made up of one stem and one leaf, but each stem may serve several leaves. Thus, the stem . 1 is followed by leaves of 3, 8, 8 representing the numbers .13, .18, .18. The first digit is the stem; the next digit is the leaf. The stem-and-leaf display functions as any other frequency distribution but the original data are retained precisely. Distribution. From Table 3 we see that the distribution of effect sizes- is unimodal with the bulk of the results (80%) falling between -.10 and .58. The distribution is nicely symmetrical with the skewness index (~1=.17) only 24: of that required for significance at p<.05 (Snedecor ~ Cochran, 1980, pp. 78-79, 492). The tails of the distribution, however, are too long for normality with kurtosis index g2=2.04, p=.02. Relative to what we would expect from a normal distribution, we have studies that show larger positive and larger negative effect sizes than would be reasonable. Indeed, the two largest positive effect sizes are significant outliers at p<.05, and the largest negative effect size approaches significance with a Dixon index of .37 compared to one of .40 for the largest positive effect size (Snedecor ~ Cochran, 1980, pp. 279-280, 490). The total sample of studies is still small; however, if a much larger sample showed the same result, that would be a pattern consistent with the idea that both strong positive results ("psi") and strong negative results ("psi-missing") might be more likely to find their way into print or at least to be more available to a meta-analyst. Effect size. The bulk of the results (82%) show a positive effect size

45 where 50: would be expected under the null (Ed = .0004). The mean effect size, h, of .28 is equivalent to having a direct hit rate of .38 when .25 was expected under the null. The 95% confidence interval suggests the likely range of effect sizes to be from .11 to .45, equivalent to accuracy of guessing rates of .30 to .46 when .25 was expected under the null hypothesis. Significance testing. The overall probability that obtained accuracy was better than the accuracy expected under the null was a ~ of 3.37/1011 associated with a Stouffer Z of 6.60 (Mosteller ~ Bush, 1954; Rosenthal, 1978a, 1984~. File drawer analysis. A combined p as low as that obtained can be used as a guide to the tolerance level for null results that never found their way into the meta-analytic data base (Rosenthal, 1979, 1984). It has long been believed that studies failing to reach statistical significance may be less likely to be published (Sterling, 1959; Rosenthal, 1966). Thus it may be that there is a residual of nonsignificant studies languishing in the investigators' file drawers. Employing simple calculations, it can be shown that, for the current studies summarized, there would have to be 423 studies with mean p=.50, one-tailed, or Z=O.OO in those file drawers before the overall combined ~ would become just > .05. That many studies unretrieved seems unlikely for this specialized area of parapsychology (Hymen, 1985; Honorton, 1985). Based on experience with meta-analyses in other domains of research (e.g., interpersonal expectancy effects) the mean Z or effect size for nonsignificant studies is not 0.00 but a value pullet strongly from 0.00 toward the mean Z or mean effect size of the obtained studies (Rosen that & Rubin, 1978~. Comparison to an Earlier Meta-Analysis

46 We felt it would be instructive to compare the results of the ganzfeld research meta-analysis by Honorton (l9SS) to the results of an older and larger meta-analysis of another controversial research domain -- that of interpersonal expectancy effects (Rosenthal & Rubin, 1978). In that analysis, eight areas of expectancy effects were summarized; effect sizes (Cohenls d, roughly equivalent to Cohen's h) ranged from .14 to 1.73 with a grand mean d of .70. Honorton's mean effect size (h=.28) exceeds the mean d of two of the eight areas (reaction time experiments [d=.173; and studies employing laboratory interviews [d=.143~. The earlier meta-analysis displayed the distribution of the Z's associated with the obtained ~ levels. Table 4 shows a comparison of the two meta-analyses' distributions of Z's. It is interesting to note the high degree of similarity in the distributions of significance levels. The total proportion of significant results is somewhat higher for the ganzfeld studies but not significantly so ~ (i)=1 .07 , N=373, p=.30 , =.05) . Interpretation of Meta-Analytic Results Although the results of the meta-analysis are clear, the meaning of these results is open to various interpretations (Truzzi, 1981~. The most obvious interpretation might be that at a very low a, and with a fairly impressive effect size, the ganzfeld psi phenomenon has been demonstrated. However, there are rival hypotheses that will need to be considered, many of them put forward in the recent detailed evaluation of the ganzfeld research area by Hyman (1985~. Procedural Rival Hypotheses Sensory leakage. A standard rival hypothesis to the hypothesis of ESP is that sensory leakage occurred and that the receiver was knowingly or

47 unknowingly cued by the sender or by an intermediary between the sender and receiver. As early as 1895, Hansen and Lehmann (1895) had described "unconscious whispering" in the laboratory and Kennedy (1938, 1939) was able to show that senders in telepathy experiments could give auditory cues to their receivers quite unwittingly. Ingenious use of parabolic sound reflectors made this demonstration possible. Mall (1898), Stratton (1921), and Warner and Raible ( 1937) all gave early warnings on the dangers of unintentional cueing (for summaries see Rosenthal, 1965a, 1966). The subtle kinds of cues described by these early workers were just the kind we have come to look for in searching for cues given off by experimenters that might serve to mediate the experimenter expectancy effects found in laboratory settings (Rosenthal, 1966, 1985). By their nature, ganzfeld studies tend to minimize problems of sensory cueing. An exception occurs when the subject is asked to choose which of four (or more) stimuli had been "sent" by another person or agent. When the same stimuli held originally by the sender are shown to the receiver, finger smudges or other marks may serve as cues. Honorton has shown, however, that studies controlling for this type of cue yield at least as many significant effects as do the studies not controlling for this type of cue. Recording errors. A second rival hypothesis has nearly as long a history. Kennedy ant Uphoff (1939) and Sheffield and Kaufman (1952) both found biased errors of recording the data of parapsychological experiments. In a meta-analysis of 139,000 recorded observations in 21 studies, it was found that about 1X of all observations were in error and, that of the errors committed, twice as many favored the hypothesis as opposed it (Rosenthal, 1978b). While it is difficult to rule recording error out of ganzfeld studies

48 (or any other kind of research) their magnitude is such that they could probably have only a small biasing effect on the estimated average effect size (Rosenthal, 1978b, p. 1007). Intentional error. The very recent history of science has reminded us - that while fraud in science is not quite of epidemic proportion it must be given close attention (Broad & Wade, 1982; Zuckerman, 1977~. Fraud in parapsychological research has been a constant concern, a concern found justified by periodic flagrant examples (Rhine, 1975~. In the analyses of Hyman (1985) and Honorton (1985), in any case, there appeared to be no relationship between degree of monitoring of participants ant the results of the study. Statistical Rival Hypotheses File drawer issues. The problem of biased retrieval of studies for any meta-analysis was described earlier. Part of this problem is addressed by the 10 year old norm of the Parapsychological Association of reporting negative results at its meetings and in its journals (Honorton, 1985~. Part of this problem is addressed also by Blackmore who conducted a survey to retrieve unreported ganzfeld studies. She found that 7 of her total of 19 studies (371) were judged significant overall by the investigators. This proportion of significant results was not significantly (or appreciably) lower than the proportion of published studies found significant. A problem that seems to be a special case of the file drawer problem was pointed out by Hyman (1985~. That was a possible tendency to report the results of pilot studies along with subsequent significant results when the pilot data were significant. At the same time it is possible that pilot s tud ie s we re conduc ted wi shout promi s ing re su 1 t s, pi lo t s tud ie s the t then

49 found their way into the file drawers. In any case, it is nearly impossible to have an accurate estimate of the number of unretrieved studies or pilot studies actually conducted. Chances seem good, however, that there would be fewer than the 423 results of mean Z=O.OO required to bring the overall combined ~ to >.05. Multiple testing. Each gansfeld study may have more than one dependent variable for scoring a success. If investigators employ these dependent variables sequentially until they find one significant at ~<.05 the true ~ will be higher than .05 (Hymen, 1985~. Although a simple Bonferroni procedure can be used to adjust for this problem (e.g., by multiplying the lowest obtained ~ by the number of dependent variables tested) this adjustment is quite conservative (Rosenthal & Rubin, 1983~. The adjustment can be mate with greater power if the investigators are willing to order or to rate their dependent variables on a dimension of importance (Rosenthal & Rubin, 1984, 1985~. Most useful, however, is a procedure that uses all the data from all the dependent variables with each one weighted as desired so long as the weighting is done before the data are collected (Rosenthal ~ Rubin, 1986). Randomization. Hyman (1985) has noted that the target stimulus may not have been selected in a truly random way from the pool of potential targets. To the extent that this is the case the ~ values calculated will be in error. Hyman (1985) and Honorton (1985) disagree over the frequency in this sample of studies of improper randomization. In addition, they disagree over the magnitude of the re lationship between inadequate randomization and study outcome. Hyman felt this relationship to be significant and positive; Honorton felt this relationship to be nonsignificant and negative. Since the median level of just those 16 studies employing random number tables or generators

so (Z=.94) was essentially identical to that found for all 28 studies it seems - unlikely that poor rantomization procedures were associated with much of an increase in significance level (Honorton, 1985, p. 71~. Statistical errors. Hyman (1985) and Honorton agree that six of the 28 studies contained statistical errors. However, the median effect size of these studies (h=.33) was very similar to the overall median (h=.32) so that it seems unlikely that these errors had a major effect on the overall effect size estimate. Omitting these six studies from the analysis decreases the mean h from .28 to .26. Such a drop is equivalent to a drop of the mean accuracy rate from .38 to .37 when .25 is the expected value under the null. Independence _ studies. Because the 28 studies were conducted by only 10 investigators or laboratories, the 28 studies may not be independent in some sense. While under some data analytic assumptions such a lack of independence would have implications for significance testing, it toes not in the ganzfeld domain because of the use of trials rather than subjects as the independent sampled unit of analysis. The overall significance level, then, depends on the results of all trials, not the number of studies, or subjects, or investigators (any of which may be viewed as fixed rather than random). However, the lack of independence of the studies court have implications for the estimation of effect sizes if a small proportion of the investigators were responsible for all the nonzero effects. In that case the average of the investigators' obtained effects would be much smaller than the average of the studies' obtained effects. In an extreme example the median effect size of a sample of studies could be .50 while the median effect size of a sample of investigators could be zero because very few investigators obtained any nonzero effect. That did not turn out to be the case for the ganzfeld domain.

51 The median effect size (h) was identical (.32) for the 28 studies and the 10 investigators or laboratories. The mean effect sizes, however, did differ somewhat with a lower mean for labs (.23) than for studies (.28~. The proportions of results in the positive direction were very close; .82 for s tudies and .80 for labs . It is of interest to note that investigators did differ significantly from one another in the magnitude of the effects they obtained with F(9, 18) = 3.81, p<.O1, intra-class r = .63. There was little evidence to suggest, however, that those investigators tending to conduct more studies obtained higher mean effect sizes; the F(1, 18) testing that contrast was 0.38, ps.54, r= .14 .: Cone lus ion On the basis of our summary and the very valuable meta-analytic evaluations of Honorton (1985) and Hyman (1985), what are we to believe? The situation for the ganzfeld domain seems reasonably clear. We feel it would be implausible to entertain the null given the combined p from these 28 studies. Given the various problems or flaws pointed out by Hyman ant Honorton, the true ef fee t s ize is almos t sure ly smal ler than the mean h of .28 equivalent to a mean accuracy of 38% when 25Z is expected under the null. We are persuaded that the net result of statistical errors was a biased increase in estimated effect size of at least a full percentage point (from 37: to 382~. Furthermore, we are persuaded that file drawer problems are such that some of the smaller effect size results have probably been kept off the market. If pressed to estimate a more accurate ef feet size we might think in terms of a shrinkage of h from the obtained value of .28 to perhaps an h of .18. Thus, when the accuracy rate expected under the null is 1/4, we estimate the

52 obtained accuracy rate to be about 1/3. Situational Taxonomy of Human Performance Technologies In the previous sections we have reviewed domains of h''man performance research individually. We now turn to questions regarding these areas of research taken together: How do the areas compare with respect to their overall effect sizes and methodological adequacy in general? What are the important characteristics of these domains in terms of their susceptibility to expectancy effects? What is our best estimate of the "adjusted" or "true" effect size for each of these areas after taking into account the possibility of interpersonal expectancy effects and other methodological weaknesses? In attempting to answer these questions, we developed a situational -taxonomy of the five areas of SALT, NLP, mental practice, biofeedback, and ESP. This situational taxonomy is given in Table 5. The first line shows our estimates of the mean effect size (r) for each area based on our reviews of the literature. Given the diversity of these areas, these effect sizes are remarkably homogeneous, ranging from a low of .13 for biofeedback research to a high of .29 for SALT research. We repeat our caveat, though, that these effect sizes are not the products of exhaustive meta-analyses, and they are accurate estimates only to the extent that our samples of studies are representative of their populations. The next two lines of the table present the number of studies on which our analyses are based and the estimated total number of studies existing on the topic. These figures help in determining the stability of our estimates; we are most confident in our judgments of the ESP ganzfeld literature and least confident in our judgments of the biofeedback literature. It is important to remember that our reviews in some cases were

53 quite selective: Our discussion of NLP, for example, focused only on those studies that investigated the Preferred Representational System aspect of NLP theory, and our discussion of ESP focused only on studies of the ganzfeld technique that employed the criterion of direct trite. The second part of Table 5 lists important exogenous factors of the studies, that is, elements of experimental Be sign that are not necessarily part of the technique. The exogenous factors that we identified as being of particular importance are random assignment of subjects to experimental condition (or stimuli to condition in the case of ESP studies), keeping experimenters blind to the experimental condition of the subjects, setting up appropriate control groups (or comparison values in the case of ESP), and the length of experimenter-sub ject interaction. Of these factors, random assignment and experimenter blindness in particular are the most important in determining the possibility that exogenous expectancy effects could have occurred . Looking at Table 5, we see that the SALT studies do not compare favorably with the other areas with respect to these factors, and that only the ganzfeld ESP studies regularly meet the basic requirements of sound experimental design. The third section of Table 5 lists relevant endogenous factors, or characteristics that are actually part of the human performance technology. Two endogenous factors seemed especially important: whether or not the subjects' self-expectancies play a major role, and the climate of the experimenter-subject interaction. Self-expectancies are an important part of SALT, mental practice, and biofeedback, and they may be important in ESP studies as the literature suggests that larger effects are found with subjects who believe that ESP exists (Schmeidler, 1968~. The domains characterized by

54 the warmest experimenter-subject climate, which we have seen to be a major component in the mediation of expectancy effects, are SALT ant NLP. Mental practice ant ESP studies are characterized by more formal ant neutral experimenter-subject relations, ant although biofeedback studies often take place in a therapeutic context, the quality of the experimental interaction is neverthe less usual ly formal and neutral . The next line of the table presents our overall rating of the methodological quality of the research in these areas. These ratings were arrived at in a subjective manner, based on the factors listed in the table as well as our overall impression of the literatures. The scale employed is arbitrary, with a hypothetical maximum of 25; the absolute values of the quality ratings are less important than are the distances among the domains on this scale. As Table 5 shows, we have given SALT the lowest quality rating, followed by the areas of mental practice and NLP, which are close together in terms of quality; biofeedback and ESP are the two best areas in terms of methodological quality. Interestingly, there is a strong inverse relationship between the rated quality of an area and its mean effect size; the correlation coefficient is r(3~=-.85, ~=.03, one-tailed. The last line of Table 5 gives our estimate of the sizes for each of the five areas, that is, our judgment effect size for an area would be after adjusting it for "residual" effect of what the ''true" any possible bias due to expectancy effects or methodological weaknesses. This adjustment was made on a qualitative basis rather than on the basis of any explicit weighting scheme, although clearly some of the factors listed in Table 5 (e.g., random assignment and experimenter blindness) were more influential in determining the residual effect size than were others (e.g., mean length and climate of

as interaction). We wish to emphasize that the values of these residual effect sizes are presented for purposes of illustration and should not be interpreted too literally. As can be seen, the degree of adjustment varied across the five domains; the largest drop was for the SALT domain, where the effect size decreased from .29 to .00. The smallest drop was for the biofeedback domain where the effect size decreased from .13 to .10. Several interesting relationships among the results of Table 5 are worthy of mention. lathe zero-order correlation between the original and residual effect size was r=-.104. The correlation between the original effect size and the quality rating was negative, r=-.847; however, the correlation between residual effect size and quality was positive, r=.306. The partial correlation between the original and residual effect size controlling for the quality rating was r=.307. Lastly, the partial correlation between the residual effect size and the quality rating, controlling for the original effect size, was r=~413. The magnitudes of the effect sizes, both original and adjusted, for the five areas are not large. This is not surprising, for the five areas are all controversial, and one hallmark of a controversial area is a small effect size: Sometimes you get a positive result but sometimes you don't. If a research area always yielded large, significant effects there would be no controversy. We feel there are several important implications of the realization~that these areas are characterized by small effect sizes. The first is that "small" does not mean "unimportant." Even the smallest (unadjusted) effect size, r=.13 for biofeedback, can be interpreted using the Binomial Effect Size Display (Rosenthal ~ Rubin, 1982) as an increase in success rates from 44% to 56: for subjects receiving biofeedback therapy. In

I,' i 56 short, even though the five areas may be associated with small effects, these effects nevertheless can be of substantial practical importance. Another implication involves the underlying distributions of these effects in the population. The effect sizes we have reported are means computed across multiple studies. We do not know what the underlying distributions of these effects are in the population. For example, does the mean (unadjusted) effect size r=.14 for the ganzfeld studies mean that ESP is normally distributed in the population, with most people exhibiting it to the tune of _=.14? Or is it the case that most people would show a zero effect and a small number of people would show a large effect, resulting in a mean r=.14? The information needed to decide among these and other alternatives is not available. However, the question of what the distribution of benefit looks like for these technologies is an important one and deserves attention. To discover the nature of these underlying distributions, researchers would need to test a large number of subjects over a long period of time. But this is information worth gathering, because,the selection ant training of subjects in these human performance technologies might be very different if we thought a given technology more or less affected all people in a normally distributed manner than if it affected only a portion of the population in a skewed manner. The third important implication concerns the nature of replication. As states above2 these are controversial topics, and they are controversial in part because of the issue of replication failure. As it stands now, most researchers regard a failure to replicate as when a study's not reaching the .05 level of significance. We suggest that rather than emphasizing significance levels in the assessment of replications, the focus should be on

~7 the comparability of effect sizes. Thus the question becomes, "Do the studies obtain effect sizes of similar nonzero magnitude?" rather than "Do the studies all obtain statistically significant results?" Defining replication in terms of similarity of effect sizes would obviate arguments over whether a study that obtained a ~=.06 was or was not a successful replication (Nelson, Rosenthal, & Rosnow, in press; Rosenthal, in press). Suggestions for Future Research Expectancy Control Designs Throughout this paper, we have offered our opinion on the extent to which interpersonal expectancy effects may be responsible for the results of studies on various human performance technologies. Our approach has been necessarily speculative, as very few of these studies directly addressed the possibility that expectancy effects might be an important cause of the results. We have pointed out factors that lead us to believe that expectancy effects may have been occurring in several cases, but we were not present at the time the studies were conducted, and we do not have videotapes of the sessions. All we can conclude on the basis of the information available to us is that expectancy effects could have happened; we do not know that they did. However, we can make suggestions for designing future studies that would not only assess whether an expectancy effect was present but also would allow the direct comparison of the magnitude of expectancy ef fects versus the phenomenon of interest. This is accomplished through the use of an expectancy control design (Rosenthal, 1966; Rosenthal & Rosnow, 19841. In this design, experimenter expectancy becomes a second independent variable that is systematically varied along with the variable of theoretical interest. It is easiest to explain this design with a concrete example ~ and we will use as our

Next: 4 Suggestions for Future Research »
Enhancing Human Performance: Background Papers, Issues of Theory and Methodology Get This Book
×
Buy Paperback | $50.00
MyNAP members save 10% online.
Login or Register to save!
  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!