Click for next page ( 9


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 8
8 w and effect size for each of the four factors, again separately for the expectancy-behavior and behavior-outcome links. For the expectancy-behavior link, the four factors were highly statistically significant ant associated with small to medium effect sizes: climate, r=.20; feedback, r=.13, input, r=.26, and output, r=.19. With respect to the behavior-outcome link, again all four factors were statistically significant, but in terms of effect size, feedback did not seem to be very important: climate, r=.36; feedback, r-.07; input, r=.33; and output ~ r=.20 . Human Performance Technologies and Expectancy Effects We now turn to a more focused discussion of the possible influence of expectancy effects on research on techniques for the enhancement of human performance. In this next section, we (a) describe paradigmatic examples of each of five research areas concerned with improving human performance, and (b) offer opinions about the extent to which expectancy effects may be influencing research results in these areas. The five areas that will be covered are those targeted for evaluation by the Committee on Techniques for the Enhancement of Human Performance; these areas are research on accelerated learning, neurolinguistic programming, mental practice, biofeedback, and parapsychology. One caveat should be emphasized in advance: It is not possible for us to conduct meta-analyses of each of these areas; instead, we will have to rely on a light review of each area and focus on some examples of typical experiments.-Consequently, we need to stress that our overall assessment is accurate only to the extent that our samples are representative. Meta-analyses of these domains would be of great value and should be undertaken for any domains for which they are not yet available.

OCR for page 8
9 Research on Accelerated Learning Many techniques for accelerating learning have been recently advanced, techniques that claim to increase the rate or amount of learning by 200-300%. We discuss now one of.these methods, the Suggestive-Accelerative Learning and Teaching (SALT) method, and offer our assessment of the extent to which expectancy effects could be responsible for the observed learning gains. The SALT technique, an Americanized version of Lozanov's (1978) Suggestopedia technique, incorporates the use of suggestion techniques and unusual verbal and nonverbal styles of presenting material to accelerate learning. A SALT lesson comprises three primary phases: preliminaries, prese-,~tation, and practice. In the preliminary phase, the students are first led through a series of physical relaxation exercises (e.g., stretching, sidebands, head flops, and progressively tensing and relaxing all muscles). Next comes a series of mental relaxation exercises, typically guided imagery exercises such as "imagine that you are lying in a meadow watching white clouds going by." The goal of the relaxation procedures is to overcome any emotional or physical barriers to learning that might have arisen from past negative learning experiences. The last part of the preliminary phase is of particular relevance to expectancy effects, for it involves the explicit induction of positive expectancies-for learning. The teacher repeatedly stresses to the class that the SALT technique makes learning easy and fun, and that as long as the students go along with the interesting things the teacher has them do, they will find themselves learning better than they had ever imagined possible. Schuster & Gritton (1985) give an example of the communication of positive

OCR for page 8
10 ~ expe c ta t ions: : - "Imagine that we have come to the end of today ' s lesson and you are now taking the short quiz over the material. See yourself looking at the quiz questions; they are easy, you know all the answers! Feel yourself smiling as you write down the answers quickly to all the easy questions. Hear yourself talking to your friends later about how easy learning is in this class..." (p. 191. Another aspect of this phase is "early pleasant learning restimulation," or inducing a positive attitude toward learning by asking students to remember some prior experience where learning was exciting and fun ~ for example learning to ride a bicycle. In this phase of the SALT technique ~ then ~ expectancy ef fects are not an experimental artifact but rather are an explicit part of the experimental manipulation. Note, however, that these expectations are intrapersonal rather than interpersonal; they are the students' self-expectancies for their performance. The second phase of the SALT process is the presentation of the material. The presentation consists of three sections; first, there is a brief review/preview, providing students with a global impression of the content of the day's lesson. Next comes dramatic presentation of the material . The teacher uses dynamic vocal intonation to present the material; for example, the first sentence is spoken in normal tone of voice, the second sentence is shouted, the third sentence is whispered, and the cycle is repeated. Lively and engaging classical music (such as Beethoven 'a "Emperor" Concerto) is played at the same volume as the teacher's voice. At the same time , the teacher instructs students to create vivid images associated with the material. The teacher then repeats the material just presented, but this time in a soft, passive voice with baroque music playing in the background. (For reasons not clearly specified in any of the articles we surveyed but having something

OCR for page 8
to do with properties of the tonal frequencies, baroque music is supposedly particularly effective.) The goal of the passive review is to increase the students' alpha waves and to induce both hemispheres of the brain to work in tandem, thus allowing the utilization of previously untapped potential. The third and final phase of a SALT lesson involves the active practice of the material by the students. This can consist of more conventional classroom exercises (e.g., problem sets) or more imaginative activities (e.g., creating skits or writing stories using the new material). Lastly, lessons may conclude with an ungraded quiz. Students generally perform very well on these tests, increasing their confidence, and the fact that the test scores are not seen or recorded by the teacher reduces student apprehens ion . We now turn to an evaluation of the research on SALT. Let us begin our review by describing a study with particularly weak methodology. Garcia (1984) wanted to test SALT on a large class of adults learning English as a second language. Rather than randomly assigning students to the experimental and control conditions , though, she instead described the procedures for the two conditions to the 80 subjects and asked them to choose which class they preferred: the traditional teaching control class, or the experimental SALT class! This fatal error in itself renders any conclusions completely suspect: If any difference is obtained between the two conditions, we cannot tell whether it was due to the efficacy of the treatment or to the fact that dif ferent kinds of students chose to go into the two sections. It seems entirely plausible that the students who are more receptive to learning would choose to go into the experimental condition. The experimental manipulation in this study included relaxation exercises, positive suggestions, active and passive presentation, and

OCR for page 8
12 practice. The author was the instructor for both classes and consequently was not blind to the hypotheses or experimental condition of the students. (This is a serious problem in terms of expectancy effects that is true of al 1 the studies on SALT, and which we will discuss in more detail later.) The next serious error committed by this author was in the analysis of the results. Because of "the large number of subjects," she selected only eight subjects from each group for analysis. The statistical power afforded by 16 subjects so low that the author practically guaranteed that she would not obtain significant results. She fount that students in the experimental group improved more than the students in the control group, but the improvement was nonsignificant, t(l4)=1.40. This t, however, is associated with an effect size of r=.35, a nontrivial effect. Hat she used the data from all the subjects, the results would probably have been significant. (However, we cannot trust her t value very much as the means she reported in the text to not correspond to the values in her table.) In sum, from beginnir- ~o end we cannot be confident of Garcia's results. The question of to what extent expectancy effects may be responsible for the results is almost moot. Now we will turn to the best example (methodologically speaking) we found of a study on the SALT technique (Gritton & Benitez-Borten, 1976). In this study, SALT techniques were used by the first author in his 8th grade science classes (10 sections, 213 students total); two other junior high schools were used as control classes (106 students total). Consequently, neither students nor schools were randomly assigned to condition, again leaving open the possibility that preexisting differences among the classrooms or students could be responsible for any obtained results. The experimental manipulation consisted of using SALT techniques (exercises, relaxation, early pleasant

OCR for page 8
learning restimulation, active and passive presentation of material) on 15 occasions throughout the semester; traditional classroom procedures were followed on the other days. The control classrooms used traditional teaching methods. The same standard text, pretest, and posttest were used in all classrooms. Analysis of the pretest scores showed that the experimental classrooms scored significantly lower than the control rooms. Analysis of covariance, adjusting posttest scores for the pretest, revealed a significant treatment effect, F(1,314)=7.69, r=.155. The adjusted posttest means were 13.55 for the experimental group and 11.21 for the two control groups combined . A Contras t comparing the experiments 1 group to the control groups computed on the gain scores (an analysis similar in spirit to the ANCOVA) yielded an even more significant treatment effect, F(1,314)=22.16, r=.257. Therefore, the Gritton ~ Benitez-Borden (1976) study, which utilized better controls and analyses, suggests a small to medium positive effect of the SALT technique. However, this study was not without its own flaws. Again, there was no randomization of students to condition, the author of the study delivered the manipulation, and was not blind to the experimental condition of the students. These are characteristics that leave open the possibility of alternative hypotheses including expectancy effects. Furthermore, experimental treatment was completely confounded with teacher, so any significant results could be due simply to the characteristics of the different teachers rather than to the SALT technique itself. The remaining empirical articles we have examined tend to fall somewhere in between the two examples described above. We are not able to conduct a thorough review of all these studies, but some description is warranted to

OCR for page 8
14 convey a better impression of the literature. Table 2 shows in summary form the results of all the empirical articles we had available to us. The second column of the table shows the effect sizes, expressed as the correlation coefficient r, illustrating the degree of effectiveness of SALT obtained in the various studies. We estimated these correlations from the data provided by the authors; we corrected statistical errors when they could be identified before computing the effect size. The last two columns of Table 2 show how the effect sizes can be interpreted using the BESD. For example, the r(l2)-.38 for the Zeiss (1984) study can be interpreted using the BE SD as meaning that receiving SALT is equivalent to increasing student improvement rates from 311 to 69%. Glancing at the effect sizes for all the studies, we see that they range from a low of -.131 (meaning that the result was in the opposite direction) to a high of .672; the mean of the 14 correlations was .29. The mean correlation weighted by sample size was somewhat lower, r=.193 . What can we conclude from this cursory review of the SALT literature? There are two issues to address: the first is the general methodological adequacy of the studies, and the second (of more relevance to the goals of this paper) is the extent to which effects of SALT may actually be due to expectancy effects. In terms of general methodological adequacy, the studies reviewed all possess weaknesses that pose serious threats to the ability to draw causal inferences about the ef f icacy of SALT . Only a single study randomly assigned subjects to conditions (experimental or control classroom), the most crucial ingredient for causal inference . Consequently, any differences found could have been caused by pre-existing differences between the two conditions or selection bias influencing which students got into which condition. Furthermore, most of the studies used only one classroom per

OCR for page 8
15 condition. This also sheds doubt on SALT as the causal agent, for any differences could conceivably have been caused by any external change or event occurring in one of the classes, influencing all the students within the class. Research of this kind is more ideally conducted by having many classrooms involved in the project and using classroom (rather than student) as a unit of analysis. Students within a classroom may not be independent in the statistical sense, and it can be misleading to consider them so. An additional weakness of these studies is the small number of teachers used. In many cases, one teacher taught the experimental SALT class, and another teacher taught the control class. As noted earlier, such a design completely confounds treatment with teacher; any obtained differences could be due to SALT or they could be due to other, irrelevant differences between the two teachers. In other studies, there was one teacher who taught both the control and experimental classes. This removes the confound just discussed but introduces other serious problems, primarily of generalizability: When there is only one teacher or experimenter, any results obtained cannot be readily generalized beyond that particular teacher. An improved design of these studies would employ several teachers (at least four to ten) and have them teach several classes each. On the basis of the preceding discussion, we conclude that the empirical evidence on SALT is so methodologically weak that it remains an open question as to whether SALT is effective, a conclusion that makes asking about interpersonal expectancy effects as a possible rival hypothesis less urgent. Suppose, however, that we pretend that the results of these studies can be trusted. To what extent, then, and in what ways could the beneficial effects of SALT be due to interpersonal expectancy effects? To answer this question 3

OCR for page 8
16 .~ we need to make the distinction between expectancy effects that are exogenous to SALT (i.e., they are expectancies communicated unintentionally as a consequence of poor experimental design and controls) and expectancy effects that are endogenous to the SALT technique itself (i.e., they are an intrinsic and intended part of SALT). This distinction is important because different courses of action would be recommended for the two types of effects: For exogenous effects, we would suggest improvements in experimental methods in order to eliminate expectancy effects. For endogenous effects, on the other hand, we would want to acknowledge the role of expectancies and see if we could apply the literature on expectancy effects to the SALT technique to make it even more effective. There is a very real possibility of exogenous expectancy effects in the SALT research. As noted earlier, the teachers were always aware of the hypotheses and experimental condition of the students; because they believed in the SALT technique, they undoubtedly expected better performance from the subjects in the SALT condition. These expectations could have been communicated clearly to the students, either overtly or subtly. Given the nature of the SALT technique, it is difficult to conceive of an experimental design in which teachers could be blind to the condition of the students. (That is, we could not conceal from the teachers which style of teaching they were using') It would also be difficult to keep teachers from guessing the hypotheses that were being tested. Is there any way, then, that the threat of exogenous expectancy effects court be eliminated? Perhaps one approach would be to use teachers naive to SALT and manipulate expectations for its efficacy. For example, one group of teachers could be given typical instructions indicating that SALT is a promising new teaching method, and other teachers

OCR for page 8
17 could be told that many studies have shown that SALT was worse than traditional techniques, but you want to give it one last try. Another approach would be to divide up the teaching responsibilities, and have a different teacher (one who did not know whether the students were in the experimental or control group) be in charge of administering the pretests and posttests. A third approach would be to automate as much of the SALT process as possible, for example, creating audiotapes of the warmup exercises or the presentation of the material. None of these approaches solves the problem completely, but they wou id he lp . Clearly, endogenous expectancy effects play a prominent role in SALT in the guise of the positive self-expectancies elicited in the students. Inducing positive expectations for learning is an explicit part of the SALT procedure. In terms of the four factor theory, the mediation of expectancies in SALT involves primarily the climate and input factors, with climate being by far the most important factor. Teachers using SALT deliberately adopt a warm, friendly interpersonal style; they praise and encourage frequently. Also present are nonverbal behaviors that go into the climate factor, for example, smiles, dynamic voice tone, speech rate, body gestures, and eye contact. With respect to input, the SALT system may increase input because each lesson is presented twice, once in an active manner and once in a passive manner. Looking back to Table l, we see that most of these behaviors were strongly implicated in the behavior-outcome link of the mediation of expectancy effects. Specifically, positive climate, praise, eye contact, input, gestures, smiles, speech rate, and encourages had combined correlations with improved student outcomes of .399, .124, .325, .332, .310, .291, .480, and .410 respectively. These values are on the whole larger than the magnitude of the

OCR for page 8
18 effects reported in research on SALT. Given the incorporation of so many of the mediating behaviors in the SALT technique, and given the literature showing the positive impact of these behaviors on student performance, it is possible that the reported effects of SALT could be due entirely to the presence of these mediating behaviors. We could test conclusively this possibility by designing SALT studies where the presence or absence of the endogenous expectancies is experimentally manipulated. That is, we could have a condition in which the explicit induction of positive expectations during the preliminary phase is deliberately omitted. This condition could also use tape-recorded relaxation exercises and class material to minimize expectancies communicated during the presentation phase. We could then compare the results found in this condition against those found for the regular SALT technique. If the effects for the experimental condition (the one where endogenous expectancies are eliminated) were significantly lower, it would indicate that a substantial portion of the effects due to SALT might be caused by the expectations communicated implicitly or explicitly by the teacher. Such a conclusion would be of great value in planning and implementing programs for accelerating learning as research could be directed to delineating more precisely the behaviors that communicate positive expectancies and to training teachers in using these behaviors. Neurol inguis t ic Programming Neurolinguistic programming (NLP) was formulated by Bandler ~ Grinder (1975, 1979) with the aim of improving interpersonal communication, particularly within the counseling context. The basic premise of NLP is that individuals process ongoing events in the world through specific

OCR for page 8
47 unknowingly cued by the sender or by an intermediary between the sender and receiver. As early as 1895, Hansen and Lehmann (1895) had described "unconscious whispering" in the laboratory and Kennedy (1938, 1939) was able to show that senders in telepathy experiments could give auditory cues to their receivers quite unwittingly. Ingenious use of parabolic sound reflectors made this demonstration possible. Mall (1898), Stratton (1921), and Warner and Raible ( 1937) all gave early warnings on the dangers of unintentional cueing (for summaries see Rosenthal, 1965a, 1966). The subtle kinds of cues described by these early workers were just the kind we have come to look for in searching for cues given off by experimenters that might serve to mediate the experimenter expectancy effects found in laboratory settings (Rosenthal, 1966, 1985). By their nature, ganzfeld studies tend to minimize problems of sensory cueing. An exception occurs when the subject is asked to choose which of four (or more) stimuli had been "sent" by another person or agent. When the same stimuli held originally by the sender are shown to the receiver, finger smudges or other marks may serve as cues. Honorton has shown, however, that studies controlling for this type of cue yield at least as many significant effects as do the studies not controlling for this type of cue. Recording errors. A second rival hypothesis has nearly as long a history. Kennedy ant Uphoff (1939) and Sheffield and Kaufman (1952) both found biased errors of recording the data of parapsychological experiments. In a meta-analysis of 139,000 recorded observations in 21 studies, it was found that about 1X of all observations were in error and, that of the errors committed, twice as many favored the hypothesis as opposed it (Rosenthal, 1978b). While it is difficult to rule recording error out of ganzfeld studies

OCR for page 8
48 (or any other kind of research) their magnitude is such that they could probably have only a small biasing effect on the estimated average effect size (Rosenthal, 1978b, p. 1007). Intentional error. The very recent history of science has reminded us - that while fraud in science is not quite of epidemic proportion it must be given close attention (Broad & Wade, 1982; Zuckerman, 1977~. Fraud in parapsychological research has been a constant concern, a concern found justified by periodic flagrant examples (Rhine, 1975~. In the analyses of Hyman (1985) and Honorton (1985), in any case, there appeared to be no relationship between degree of monitoring of participants ant the results of the study. Statistical Rival Hypotheses File drawer issues. The problem of biased retrieval of studies for any meta-analysis was described earlier. Part of this problem is addressed by the 10 year old norm of the Parapsychological Association of reporting negative results at its meetings and in its journals (Honorton, 1985~. Part of this problem is addressed also by Blackmore who conducted a survey to retrieve unreported ganzfeld studies. She found that 7 of her total of 19 studies (371) were judged significant overall by the investigators. This proportion of significant results was not significantly (or appreciably) lower than the proportion of published studies found significant. A problem that seems to be a special case of the file drawer problem was pointed out by Hyman (1985~. That was a possible tendency to report the results of pilot studies along with subsequent significant results when the pilot data were significant. At the same time it is possible that pilot s tud ie s we re conduc ted wi shout promi s ing re su 1 t s, pi lo t s tud ie s the t then

OCR for page 8
49 found their way into the file drawers. In any case, it is nearly impossible to have an accurate estimate of the number of unretrieved studies or pilot studies actually conducted. Chances seem good, however, that there would be fewer than the 423 results of mean Z=O.OO required to bring the overall combined ~ to >.05. Multiple testing. Each gansfeld study may have more than one dependent variable for scoring a success. If investigators employ these dependent variables sequentially until they find one significant at ~<.05 the true ~ will be higher than .05 (Hymen, 1985~. Although a simple Bonferroni procedure can be used to adjust for this problem (e.g., by multiplying the lowest obtained ~ by the number of dependent variables tested) this adjustment is quite conservative (Rosenthal & Rubin, 1983~. The adjustment can be mate with greater power if the investigators are willing to order or to rate their dependent variables on a dimension of importance (Rosenthal & Rubin, 1984, 1985~. Most useful, however, is a procedure that uses all the data from all the dependent variables with each one weighted as desired so long as the weighting is done before the data are collected (Rosenthal ~ Rubin, 1986). Randomization. Hyman (1985) has noted that the target stimulus may not have been selected in a truly random way from the pool of potential targets. To the extent that this is the case the ~ values calculated will be in error. Hyman (1985) and Honorton (1985) disagree over the frequency in this sample of studies of improper randomization. In addition, they disagree over the magnitude of the re lationship between inadequate randomization and study outcome. Hyman felt this relationship to be significant and positive; Honorton felt this relationship to be nonsignificant and negative. Since the median level of just those 16 studies employing random number tables or generators

OCR for page 8
so (Z=.94) was essentially identical to that found for all 28 studies it seems - unlikely that poor rantomization procedures were associated with much of an increase in significance level (Honorton, 1985, p. 71~. Statistical errors. Hyman (1985) and Honorton agree that six of the 28 studies contained statistical errors. However, the median effect size of these studies (h=.33) was very similar to the overall median (h=.32) so that it seems unlikely that these errors had a major effect on the overall effect size estimate. Omitting these six studies from the analysis decreases the mean h from .28 to .26. Such a drop is equivalent to a drop of the mean accuracy rate from .38 to .37 when .25 is the expected value under the null. Independence _ studies. Because the 28 studies were conducted by only 10 investigators or laboratories, the 28 studies may not be independent in some sense. While under some data analytic assumptions such a lack of independence would have implications for significance testing, it toes not in the ganzfeld domain because of the use of trials rather than subjects as the independent sampled unit of analysis. The overall significance level, then, depends on the results of all trials, not the number of studies, or subjects, or investigators (any of which may be viewed as fixed rather than random). However, the lack of independence of the studies court have implications for the estimation of effect sizes if a small proportion of the investigators were responsible for all the nonzero effects. In that case the average of the investigators' obtained effects would be much smaller than the average of the studies' obtained effects. In an extreme example the median effect size of a sample of studies could be .50 while the median effect size of a sample of investigators could be zero because very few investigators obtained any nonzero effect. That did not turn out to be the case for the ganzfeld domain.

OCR for page 8
51 The median effect size (h) was identical (.32) for the 28 studies and the 10 investigators or laboratories. The mean effect sizes, however, did differ somewhat with a lower mean for labs (.23) than for studies (.28~. The proportions of results in the positive direction were very close; .82 for s tudies and .80 for labs . It is of interest to note that investigators did differ significantly from one another in the magnitude of the effects they obtained with F(9, 18) = 3.81, p<.O1, intra-class r = .63. There was little evidence to suggest, however, that those investigators tending to conduct more studies obtained higher mean effect sizes; the F(1, 18) testing that contrast was 0.38, ps.54, r= .14 .: Cone lus ion On the basis of our summary and the very valuable meta-analytic evaluations of Honorton (1985) and Hyman (1985), what are we to believe? The situation for the ganzfeld domain seems reasonably clear. We feel it would be implausible to entertain the null given the combined p from these 28 studies. Given the various problems or flaws pointed out by Hyman ant Honorton, the true ef fee t s ize is almos t sure ly smal ler than the mean h of .28 equivalent to a mean accuracy of 38% when 25Z is expected under the null. We are persuaded that the net result of statistical errors was a biased increase in estimated effect size of at least a full percentage point (from 37: to 382~. Furthermore, we are persuaded that file drawer problems are such that some of the smaller effect size results have probably been kept off the market. If pressed to estimate a more accurate ef feet size we might think in terms of a shrinkage of h from the obtained value of .28 to perhaps an h of .18. Thus, when the accuracy rate expected under the null is 1/4, we estimate the

OCR for page 8
52 obtained accuracy rate to be about 1/3. Situational Taxonomy of Human Performance Technologies In the previous sections we have reviewed domains of h''man performance research individually. We now turn to questions regarding these areas of research taken together: How do the areas compare with respect to their overall effect sizes and methodological adequacy in general? What are the important characteristics of these domains in terms of their susceptibility to expectancy effects? What is our best estimate of the "adjusted" or "true" effect size for each of these areas after taking into account the possibility of interpersonal expectancy effects and other methodological weaknesses? In attempting to answer these questions, we developed a situational -taxonomy of the five areas of SALT, NLP, mental practice, biofeedback, and ESP. This situational taxonomy is given in Table 5. The first line shows our estimates of the mean effect size (r) for each area based on our reviews of the literature. Given the diversity of these areas, these effect sizes are remarkably homogeneous, ranging from a low of .13 for biofeedback research to a high of .29 for SALT research. We repeat our caveat, though, that these effect sizes are not the products of exhaustive meta-analyses, and they are accurate estimates only to the extent that our samples of studies are representative of their populations. The next two lines of the table present the number of studies on which our analyses are based and the estimated total number of studies existing on the topic. These figures help in determining the stability of our estimates; we are most confident in our judgments of the ESP ganzfeld literature and least confident in our judgments of the biofeedback literature. It is important to remember that our reviews in some cases were

OCR for page 8
53 quite selective: Our discussion of NLP, for example, focused only on those studies that investigated the Preferred Representational System aspect of NLP theory, and our discussion of ESP focused only on studies of the ganzfeld technique that employed the criterion of direct trite. The second part of Table 5 lists important exogenous factors of the studies, that is, elements of experimental Be sign that are not necessarily part of the technique. The exogenous factors that we identified as being of particular importance are random assignment of subjects to experimental condition (or stimuli to condition in the case of ESP studies), keeping experimenters blind to the experimental condition of the subjects, setting up appropriate control groups (or comparison values in the case of ESP), and the length of experimenter-sub ject interaction. Of these factors, random assignment and experimenter blindness in particular are the most important in determining the possibility that exogenous expectancy effects could have occurred . Looking at Table 5, we see that the SALT studies do not compare favorably with the other areas with respect to these factors, and that only the ganzfeld ESP studies regularly meet the basic requirements of sound experimental design. The third section of Table 5 lists relevant endogenous factors, or characteristics that are actually part of the human performance technology. Two endogenous factors seemed especially important: whether or not the subjects' self-expectancies play a major role, and the climate of the experimenter-subject interaction. Self-expectancies are an important part of SALT, mental practice, and biofeedback, and they may be important in ESP studies as the literature suggests that larger effects are found with subjects who believe that ESP exists (Schmeidler, 1968~. The domains characterized by

OCR for page 8
54 the warmest experimenter-subject climate, which we have seen to be a major component in the mediation of expectancy effects, are SALT ant NLP. Mental practice ant ESP studies are characterized by more formal ant neutral experimenter-subject relations, ant although biofeedback studies often take place in a therapeutic context, the quality of the experimental interaction is neverthe less usual ly formal and neutral . The next line of the table presents our overall rating of the methodological quality of the research in these areas. These ratings were arrived at in a subjective manner, based on the factors listed in the table as well as our overall impression of the literatures. The scale employed is arbitrary, with a hypothetical maximum of 25; the absolute values of the quality ratings are less important than are the distances among the domains on this scale. As Table 5 shows, we have given SALT the lowest quality rating, followed by the areas of mental practice and NLP, which are close together in terms of quality; biofeedback and ESP are the two best areas in terms of methodological quality. Interestingly, there is a strong inverse relationship between the rated quality of an area and its mean effect size; the correlation coefficient is r(3~=-.85, ~=.03, one-tailed. The last line of Table 5 gives our estimate of the sizes for each of the five areas, that is, our judgment effect size for an area would be after adjusting it for "residual" effect of what the ''true" any possible bias due to expectancy effects or methodological weaknesses. This adjustment was made on a qualitative basis rather than on the basis of any explicit weighting scheme, although clearly some of the factors listed in Table 5 (e.g., random assignment and experimenter blindness) were more influential in determining the residual effect size than were others (e.g., mean length and climate of

OCR for page 8
as interaction). We wish to emphasize that the values of these residual effect sizes are presented for purposes of illustration and should not be interpreted too literally. As can be seen, the degree of adjustment varied across the five domains; the largest drop was for the SALT domain, where the effect size decreased from .29 to .00. The smallest drop was for the biofeedback domain where the effect size decreased from .13 to .10. Several interesting relationships among the results of Table 5 are worthy of mention. lathe zero-order correlation between the original and residual effect size was r=-.104. The correlation between the original effect size and the quality rating was negative, r=-.847; however, the correlation between residual effect size and quality was positive, r=.306. The partial correlation between the original and residual effect size controlling for the quality rating was r=.307. Lastly, the partial correlation between the residual effect size and the quality rating, controlling for the original effect size, was r=~413. The magnitudes of the effect sizes, both original and adjusted, for the five areas are not large. This is not surprising, for the five areas are all controversial, and one hallmark of a controversial area is a small effect size: Sometimes you get a positive result but sometimes you don't. If a research area always yielded large, significant effects there would be no controversy. We feel there are several important implications of the realization~that these areas are characterized by small effect sizes. The first is that "small" does not mean "unimportant." Even the smallest (unadjusted) effect size, r=.13 for biofeedback, can be interpreted using the Binomial Effect Size Display (Rosenthal ~ Rubin, 1982) as an increase in success rates from 44% to 56: for subjects receiving biofeedback therapy. In

OCR for page 8
I,' i 56 short, even though the five areas may be associated with small effects, these effects nevertheless can be of substantial practical importance. Another implication involves the underlying distributions of these effects in the population. The effect sizes we have reported are means computed across multiple studies. We do not know what the underlying distributions of these effects are in the population. For example, does the mean (unadjusted) effect size r=.14 for the ganzfeld studies mean that ESP is normally distributed in the population, with most people exhibiting it to the tune of _=.14? Or is it the case that most people would show a zero effect and a small number of people would show a large effect, resulting in a mean r=.14? The information needed to decide among these and other alternatives is not available. However, the question of what the distribution of benefit looks like for these technologies is an important one and deserves attention. To discover the nature of these underlying distributions, researchers would need to test a large number of subjects over a long period of time. But this is information worth gathering, because,the selection ant training of subjects in these human performance technologies might be very different if we thought a given technology more or less affected all people in a normally distributed manner than if it affected only a portion of the population in a skewed manner. The third important implication concerns the nature of replication. As states above2 these are controversial topics, and they are controversial in part because of the issue of replication failure. As it stands now, most researchers regard a failure to replicate as when a study's not reaching the .05 level of significance. We suggest that rather than emphasizing significance levels in the assessment of replications, the focus should be on

OCR for page 8
~7 the comparability of effect sizes. Thus the question becomes, "Do the studies obtain effect sizes of similar nonzero magnitude?" rather than "Do the studies all obtain statistically significant results?" Defining replication in terms of similarity of effect sizes would obviate arguments over whether a study that obtained a ~=.06 was or was not a successful replication (Nelson, Rosenthal, & Rosnow, in press; Rosenthal, in press). Suggestions for Future Research Expectancy Control Designs Throughout this paper, we have offered our opinion on the extent to which interpersonal expectancy effects may be responsible for the results of studies on various human performance technologies. Our approach has been necessarily speculative, as very few of these studies directly addressed the possibility that expectancy effects might be an important cause of the results. We have pointed out factors that lead us to believe that expectancy effects may have been occurring in several cases, but we were not present at the time the studies were conducted, and we do not have videotapes of the sessions. All we can conclude on the basis of the information available to us is that expectancy effects could have happened; we do not know that they did. However, we can make suggestions for designing future studies that would not only assess whether an expectancy effect was present but also would allow the direct comparison of the magnitude of expectancy ef fects versus the phenomenon of interest. This is accomplished through the use of an expectancy control design (Rosenthal, 1966; Rosenthal & Rosnow, 19841. In this design, experimenter expectancy becomes a second independent variable that is systematically varied along with the variable of theoretical interest. It is easiest to explain this design with a concrete example ~ and we will use as our