Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 8
8
w
and effect size for each of the four factors, again separately for the
expectancy-behavior and behavior-outcome links. For the expectancy-behavior
link, the four factors were highly statistically significant ant associated
with small to medium effect sizes: climate, r=.20; feedback, r=.13, input,
r=.26, and output, r=.19. With respect to the behavior-outcome link, again all
four factors were statistically significant, but in terms of effect size,
feedback did not seem to be very important: climate, r=.36; feedback, r-.07;
input, r=.33; and output ~ r=.20 .
Human Performance Technologies and Expectancy Effects
We now turn to a more focused discussion of the possible influence of
expectancy effects on research on techniques for the enhancement of human
performance. In this next section, we (a) describe paradigmatic examples of
each of five research areas concerned with improving human performance, and
(b) offer opinions about the extent to which expectancy effects may be
influencing research results in these areas. The five areas that will be
covered are those targeted for evaluation by the Committee on Techniques for
the Enhancement of Human Performance; these areas are research on accelerated
learning, neurolinguistic programming, mental practice, biofeedback, and
parapsychology. One caveat should be emphasized in advance: It is not possible
for us to conduct meta-analyses of each of these areas; instead, we will have
to rely on a light review of each area and focus on some examples of typical
experiments.-Consequently, we need to stress that our overall assessment is
accurate only to the extent that our samples are representative. Meta-analyses
of these domains would be of great value and should be undertaken for any
domains for which they are not yet available.
OCR for page 9
9
Research on Accelerated Learning
Many techniques for accelerating learning have been recently advanced,
techniques that claim to increase the rate or amount of learning by 200-300%.
We discuss now one of.these methods, the Suggestive-Accelerative Learning and
Teaching (SALT) method, and offer our assessment of the extent to which
expectancy effects could be responsible for the observed learning gains.
The SALT technique, an Americanized version of Lozanov's (1978)
Suggestopedia technique, incorporates the use of suggestion techniques and
unusual verbal and nonverbal styles of presenting material to accelerate
learning. A SALT lesson comprises three primary phases: preliminaries,
prese-,~tation, and practice. In the preliminary phase, the students are first
led through a series of physical relaxation exercises (e.g., stretching,
sidebands, head flops, and progressively tensing and relaxing all muscles).
Next comes a series of mental relaxation exercises, typically guided imagery
exercises such as "imagine that you are lying in a meadow watching white
clouds going by." The goal of the relaxation procedures is to overcome any
emotional or physical barriers to learning that might have arisen from past
negative learning experiences.
The last part of the preliminary phase is of particular relevance to
expectancy effects, for it involves the explicit induction of positive
expectancies-for learning. The teacher repeatedly stresses to the class that
the SALT technique makes learning easy and fun, and that as long as the
students go along with the interesting things the teacher has them do, they
will find themselves learning better than they had ever imagined possible.
Schuster & Gritton (1985) give an example of the communication of positive
OCR for page 10
10
~ expe c ta t ions:
: -
"Imagine that we have come to the end of today ' s lesson and you
are now taking the short quiz over the material. See yourself looking
at the quiz questions; they are easy, you know all the answers! Feel
yourself smiling as you write down the answers quickly to all the
easy questions. Hear yourself talking to your friends later about how
easy learning is in this class..." (p. 191.
Another aspect of this phase is "early pleasant learning restimulation,"
or inducing a positive attitude toward learning by asking students to remember
some prior experience where learning was exciting and fun ~ for example
learning to ride a bicycle. In this phase of the SALT technique ~ then ~
expectancy ef fects are not an experimental artifact but rather are an explicit
part of the experimental manipulation. Note, however, that these expectations
are intrapersonal rather than interpersonal; they are the students'
self-expectancies for their performance.
The second phase of the SALT process is the presentation of the material.
The presentation consists of three sections; first, there is a brief
review/preview, providing students with a global impression of the content of
the day's lesson. Next comes dramatic presentation of the material . The
teacher uses dynamic vocal intonation to present the material; for example,
the first sentence is spoken in normal tone of voice, the second sentence is
shouted, the third sentence is whispered, and the cycle is repeated. Lively
and engaging classical music (such as Beethoven 'a "Emperor" Concerto) is
played at the same volume as the teacher's voice. At the same time , the
teacher instructs students to create vivid images associated with the
material.
The teacher then repeats the material just presented, but this time in a
soft, passive voice with baroque music playing in the background. (For reasons
not clearly specified in any of the articles we surveyed but having something
OCR for page 11
to do with properties of the tonal frequencies, baroque music is supposedly
particularly effective.) The goal of the passive review is to increase the
students' alpha waves and to induce both hemispheres of the brain to work in
tandem, thus allowing the utilization of previously untapped potential.
The third and final phase of a SALT lesson involves the active practice
of the material by the students. This can consist of more conventional
classroom exercises (e.g., problem sets) or more imaginative activities (e.g.,
creating skits or writing stories using the new material). Lastly, lessons may
conclude with an ungraded quiz. Students generally perform very well on these
tests, increasing their confidence, and the fact that the test scores are not
seen or recorded by the teacher reduces student apprehens ion .
We now turn to an evaluation of the research on SALT. Let us begin our
review by describing a study with particularly weak methodology. Garcia (1984)
wanted to test SALT on a large class of adults learning English as a second
language. Rather than randomly assigning students to the experimental and
control conditions , though, she instead described the procedures for the two
conditions to the 80 subjects and asked them to choose which class they
preferred: the traditional teaching control class, or the experimental SALT
class! This fatal error in itself renders any conclusions completely suspect:
If any difference is obtained between the two conditions, we cannot tell
whether it was due to the efficacy of the treatment or to the fact that
dif ferent kinds of students chose to go into the two sections. It seems
entirely plausible that the students who are more receptive to learning would
choose to go into the experimental condition.
The experimental manipulation in this study included relaxation
exercises, positive suggestions, active and passive presentation, and
OCR for page 12
12
practice. The author was the instructor for both classes and consequently was
not blind to the hypotheses or experimental condition of the students. (This
is a serious problem in terms of expectancy effects that is true of al 1 the
studies on SALT, and which we will discuss in more detail later.) The next
serious error committed by this author was in the analysis of the results.
Because of "the large number of subjects," she selected only eight subjects
from each group for analysis. The statistical power afforded by 16 subjects
so low that the author practically guaranteed that she would not obtain
significant results. She fount that students in the experimental group
improved more than the students in the control group, but the improvement was
nonsignificant, t(l4)=1.40. This t, however, is associated with an effect size
of r=.35, a nontrivial effect. Hat she used the data from all the subjects,
the results would probably have been significant. (However, we cannot trust
her t value very much as the means she reported in the text to not correspond
to the values in her table.) In sum, from beginnir- ~o end we cannot be
confident of Garcia's results. The question of to what extent expectancy
effects may be responsible for the results is almost moot.
Now we will turn to the best example (methodologically speaking) we found
of a study on the SALT technique (Gritton & Benitez-Borten, 1976). In this
study, SALT techniques were used by the first author in his 8th grade science
classes (10 sections, 213 students total); two other junior high schools were
used as control classes (106 students total). Consequently, neither students
nor schools were randomly assigned to condition, again leaving open the
possibility that preexisting differences among the classrooms or students
could be responsible for any obtained results. The experimental manipulation
consisted of using SALT techniques (exercises, relaxation, early pleasant
OCR for page 13
learning restimulation, active and passive presentation of material) on 15
occasions throughout the semester; traditional classroom procedures were
followed on the other days. The control classrooms used traditional teaching
methods. The same standard text, pretest, and posttest were used in all
classrooms. Analysis of the pretest scores showed that the experimental
classrooms scored significantly lower than the control rooms. Analysis of
covariance, adjusting posttest scores for the pretest, revealed a significant
treatment effect, F(1,314)=7.69, r=.155. The adjusted posttest means were
13.55 for the experimental group and 11.21 for the two control groups
combined . A Contras t comparing the experiments 1 group to the control groups
computed on the gain scores (an analysis similar in spirit to the ANCOVA)
yielded an even more significant treatment effect, F(1,314)=22.16, r=.257.
Therefore, the Gritton ~ Benitez-Borden (1976) study, which utilized better
controls and analyses, suggests a small to medium positive effect of the SALT
technique.
However, this study was not without its own flaws. Again, there was no
randomization of students to condition, the author of the study delivered the
manipulation, and was not blind to the experimental condition of the students.
These are characteristics that leave open the possibility of alternative
hypotheses including expectancy effects. Furthermore, experimental treatment
was completely confounded with teacher, so any significant results could be
due simply to the characteristics of the different teachers rather than to the
SALT technique itself.
The remaining empirical articles we have examined tend to fall somewhere
in between the two examples described above. We are not able to conduct a
thorough review of all these studies, but some description is warranted to
OCR for page 14
14
convey a better impression of the literature. Table 2 shows in summary form
the results of all the empirical articles we had available to us. The second
column of the table shows the effect sizes, expressed as the correlation
coefficient r, illustrating the degree of effectiveness of SALT obtained in
the various studies. We estimated these correlations from the data provided by
the authors; we corrected statistical errors when they could be identified
before computing the effect size. The last two columns of Table 2 show how the
effect sizes can be interpreted using the BESD. For example, the r(l2)-.38 for
the Zeiss (1984) study can be interpreted using the BE SD as meaning that
receiving SALT is equivalent to increasing student improvement rates from 311
to 69%. Glancing at the effect sizes for all the studies, we see that they
range from a low of -.131 (meaning that the result was in the opposite
direction) to a high of .672; the mean of the 14 correlations was .29. The
mean correlation weighted by sample size was somewhat lower, r=.193 .
What can we conclude from this cursory review of the SALT literature?
There are two issues to address: the first is the general methodological
adequacy of the studies, and the second (of more relevance to the goals of
this paper) is the extent to which effects of SALT may actually be due to
expectancy effects. In terms of general methodological adequacy, the studies
reviewed all possess weaknesses that pose serious threats to the ability to
draw causal inferences about the ef f icacy of SALT . Only a single study
randomly assigned subjects to conditions (experimental or control classroom),
the most crucial ingredient for causal inference . Consequently, any
differences found could have been caused by pre-existing differences between
the two conditions or selection bias influencing which students got into which
condition. Furthermore, most of the studies used only one classroom per
OCR for page 15
15
condition. This also sheds doubt on SALT as the causal agent, for any
differences could conceivably have been caused by any external change or event
occurring in one of the classes, influencing all the students within the
class. Research of this kind is more ideally conducted by having many
classrooms involved in the project and using classroom (rather than student)
as a unit of analysis. Students within a classroom may not be independent in
the statistical sense, and it can be misleading to consider them so.
An additional weakness of these studies is the small number of teachers
used. In many cases, one teacher taught the experimental SALT class, and
another teacher taught the control class. As noted earlier, such a design
completely confounds treatment with teacher; any obtained differences could be
due to SALT or they could be due to other, irrelevant differences between the
two teachers. In other studies, there was one teacher who taught both the
control and experimental classes. This removes the confound just discussed but
introduces other serious problems, primarily of generalizability: When there
is only one teacher or experimenter, any results obtained cannot be readily
generalized beyond that particular teacher. An improved design of these
studies would employ several teachers (at least four to ten) and have them
teach several classes each.
On the basis of the preceding discussion, we conclude that the empirical
evidence on SALT is so methodologically weak that it remains an open question
as to whether SALT is effective, a conclusion that makes asking about
interpersonal expectancy effects as a possible rival hypothesis less urgent.
Suppose, however, that we pretend that the results of these studies can be
trusted. To what extent, then, and in what ways could the beneficial effects
of SALT be due to interpersonal expectancy effects? To answer this question 3
OCR for page 16
16
.~
we need to make the distinction between expectancy effects that are exogenous
to SALT (i.e., they are expectancies communicated unintentionally as a
consequence of poor experimental design and controls) and expectancy effects
that are endogenous to the SALT technique itself (i.e., they are an intrinsic
and intended part of SALT). This distinction is important because different
courses of action would be recommended for the two types of effects: For
exogenous effects, we would suggest improvements in experimental methods in
order to eliminate expectancy effects. For endogenous effects, on the other
hand, we would want to acknowledge the role of expectancies and see if we
could apply the literature on expectancy effects to the SALT technique to make
it even more effective.
There is a very real possibility of exogenous expectancy effects in the
SALT research. As noted earlier, the teachers were always aware of the
hypotheses and experimental condition of the students; because they believed
in the SALT technique, they undoubtedly expected better performance from the
subjects in the SALT condition. These expectations could have been
communicated clearly to the students, either overtly or subtly. Given the
nature of the SALT technique, it is difficult to conceive of an experimental
design in which teachers could be blind to the condition of the students.
(That is, we could not conceal from the teachers which style of teaching they
were using') It would also be difficult to keep teachers from guessing the
hypotheses that were being tested. Is there any way, then, that the threat of
exogenous expectancy effects court be eliminated? Perhaps one approach would
be to use teachers naive to SALT and manipulate expectations for its efficacy.
For example, one group of teachers could be given typical instructions
indicating that SALT is a promising new teaching method, and other teachers
OCR for page 17
17
could be told that many studies have shown that SALT was worse than
traditional techniques, but you want to give it one last try. Another approach
would be to divide up the teaching responsibilities, and have a different
teacher (one who did not know whether the students were in the experimental or
control group) be in charge of administering the pretests and posttests. A
third approach would be to automate as much of the SALT process as possible,
for example, creating audiotapes of the warmup exercises or the presentation
of the material. None of these approaches solves the problem completely, but
they wou id he lp .
Clearly, endogenous expectancy effects play a prominent role in SALT in
the guise of the positive self-expectancies elicited in the students. Inducing
positive expectations for learning is an explicit part of the SALT procedure.
In terms of the four factor theory, the mediation of expectancies in SALT
involves primarily the climate and input factors, with climate being by far
the most important factor. Teachers using SALT deliberately adopt a warm,
friendly interpersonal style; they praise and encourage frequently. Also
present are nonverbal behaviors that go into the climate factor, for example,
smiles, dynamic voice tone, speech rate, body gestures, and eye contact. With
respect to input, the SALT system may increase input because each lesson is
presented twice, once in an active manner and once in a passive manner.
Looking back to Table l, we see that most of these behaviors were strongly
implicated in the behavior-outcome link of the mediation of expectancy
effects. Specifically, positive climate, praise, eye contact, input, gestures,
smiles, speech rate, and encourages had combined correlations with improved
student outcomes of .399, .124, .325, .332, .310, .291, .480, and .410
respectively. These values are on the whole larger than the magnitude of the
OCR for page 18
18
effects reported in research on SALT.
Given the incorporation of so many of the mediating behaviors in the SALT
technique, and given the literature showing the positive impact of these
behaviors on student performance, it is possible that the reported effects of
SALT could be due entirely to the presence of these mediating behaviors. We
could test conclusively this possibility by designing SALT studies where the
presence or absence of the endogenous expectancies is experimentally
manipulated. That is, we could have a condition in which the explicit
induction of positive expectations during the preliminary phase is
deliberately omitted. This condition could also use tape-recorded relaxation
exercises and class material to minimize expectancies communicated during the
presentation phase. We could then compare the results found in this condition
against those found for the regular SALT technique. If the effects for the
experimental condition (the one where endogenous expectancies are eliminated)
were significantly lower, it would indicate that a substantial portion of the
effects due to SALT might be caused by the expectations communicated
implicitly or explicitly by the teacher. Such a conclusion would be of great
value in planning and implementing programs for accelerating learning as
research could be directed to delineating more precisely the behaviors that
communicate positive expectancies and to training teachers in using these
behaviors.
Neurol inguis t ic Programming
Neurolinguistic programming (NLP) was formulated by Bandler ~ Grinder
(1975, 1979) with the aim of improving interpersonal communication,
particularly within the counseling context. The basic premise of NLP is that
individuals process ongoing events in the world through specific
OCR for page 47
47
unknowingly cued by the sender or by an intermediary between the sender and
receiver. As early as 1895, Hansen and Lehmann (1895) had described
"unconscious whispering" in the laboratory and Kennedy (1938, 1939) was able
to show that senders in telepathy experiments could give auditory cues to
their receivers quite unwittingly. Ingenious use of parabolic sound reflectors
made this demonstration possible. Mall (1898), Stratton (1921), and Warner and
Raible ( 1937) all gave early warnings on the dangers of unintentional cueing
(for summaries see Rosenthal, 1965a, 1966). The subtle kinds of cues described
by these early workers were just the kind we have come to look for in
searching for cues given off by experimenters that might serve to mediate the
experimenter expectancy effects found in laboratory settings (Rosenthal, 1966,
1985).
By their nature, ganzfeld studies tend to minimize problems of sensory
cueing. An exception occurs when the subject is asked to choose which of four
(or more) stimuli had been "sent" by another person or agent. When the same
stimuli held originally by the sender are shown to the receiver, finger
smudges or other marks may serve as cues. Honorton has shown, however, that
studies controlling for this type of cue yield at least as many significant
effects as do the studies not controlling for this type of cue.
Recording errors. A second rival hypothesis has nearly as long a
history. Kennedy ant Uphoff (1939) and Sheffield and Kaufman (1952) both found
biased errors of recording the data of parapsychological experiments. In a
meta-analysis of 139,000 recorded observations in 21 studies, it was found
that about 1X of all observations were in error and, that of the errors
committed, twice as many favored the hypothesis as opposed it (Rosenthal,
1978b). While it is difficult to rule recording error out of ganzfeld studies
OCR for page 48
48
(or any other kind of research) their magnitude is such that they could
probably have only a small biasing effect on the estimated average effect size
(Rosenthal, 1978b, p. 1007).
Intentional error. The very recent history of science has reminded us
-
that while fraud in science is not quite of epidemic proportion it must be
given close attention (Broad & Wade, 1982; Zuckerman, 1977~. Fraud in
parapsychological research has been a constant concern, a concern found
justified by periodic flagrant examples (Rhine, 1975~. In the analyses of
Hyman (1985) and Honorton (1985), in any case, there appeared to be no
relationship between degree of monitoring of participants ant the results of
the study.
Statistical Rival Hypotheses
File drawer issues. The problem of biased retrieval of studies for any
meta-analysis was described earlier. Part of this problem is addressed by the
10 year old norm of the Parapsychological Association of reporting negative
results at its meetings and in its journals (Honorton, 1985~. Part of this
problem is addressed also by Blackmore who conducted a survey to retrieve
unreported ganzfeld studies. She found that 7 of her total of 19 studies (371)
were judged significant overall by the investigators. This proportion of
significant results was not significantly (or appreciably) lower than the
proportion of published studies found significant.
A problem that seems to be a special case of the file drawer problem was
pointed out by Hyman (1985~. That was a possible tendency to report the
results of pilot studies along with subsequent significant results when the
pilot data were significant. At the same time it is possible that pilot
s tud ie s we re conduc ted wi shout promi s ing re su 1 t s, pi lo t s tud ie s the t then
OCR for page 49
49
found their way into the file drawers. In any case, it is nearly impossible to
have an accurate estimate of the number of unretrieved studies or pilot
studies actually conducted. Chances seem good, however, that there would be
fewer than the 423 results of mean Z=O.OO required to bring the overall
combined ~ to >.05.
Multiple testing. Each gansfeld study may have more than one dependent
variable for scoring a success. If investigators employ these dependent
variables sequentially until they find one significant at ~<.05 the true ~
will be higher than .05 (Hymen, 1985~. Although a simple Bonferroni procedure
can be used to adjust for this problem (e.g., by multiplying the lowest
obtained ~ by the number of dependent variables tested) this adjustment is
quite conservative (Rosenthal & Rubin, 1983~. The adjustment can be mate with
greater power if the investigators are willing to order or to rate their
dependent variables on a dimension of importance (Rosenthal & Rubin, 1984,
1985~. Most useful, however, is a procedure that uses all the data from all
the dependent variables with each one weighted as desired so long as the
weighting is done before the data are collected (Rosenthal ~ Rubin, 1986).
Randomization. Hyman (1985) has noted that the target stimulus may not
have been selected in a truly random way from the pool of potential targets.
To the extent that this is the case the ~ values calculated will be in error.
Hyman (1985) and Honorton (1985) disagree over the frequency in this sample of
studies of improper randomization. In addition, they disagree over the
magnitude of the re lationship between inadequate randomization and study
outcome. Hyman felt this relationship to be significant and positive; Honorton
felt this relationship to be nonsignificant and negative. Since the median
level of just those 16 studies employing random number tables or generators
OCR for page 50
so
(Z=.94) was essentially identical to that found for all 28 studies it seems
-
unlikely that poor rantomization procedures were associated with much of an
increase in significance level (Honorton, 1985, p. 71~.
Statistical errors. Hyman (1985) and Honorton agree that six of the 28
studies contained statistical errors. However, the median effect size of these
studies (h=.33) was very similar to the overall median (h=.32) so that it
seems unlikely that these errors had a major effect on the overall effect size
estimate. Omitting these six studies from the analysis decreases the mean h
from .28 to .26. Such a drop is equivalent to a drop of the mean accuracy rate
from .38 to .37 when .25 is the expected value under the null.
Independence _ studies. Because the 28 studies were conducted by only
10 investigators or laboratories, the 28 studies may not be independent in
some sense. While under some data analytic assumptions such a lack of
independence would have implications for significance testing, it toes not in
the ganzfeld domain because of the use of trials rather than subjects as the
independent sampled unit of analysis. The overall significance level, then,
depends on the results of all trials, not the number of studies, or subjects,
or investigators (any of which may be viewed as fixed rather than random).
However, the lack of independence of the studies court have implications
for the estimation of effect sizes if a small proportion of the investigators
were responsible for all the nonzero effects. In that case the average of the
investigators' obtained effects would be much smaller than the average of the
studies' obtained effects. In an extreme example the median effect size of a
sample of studies could be .50 while the median effect size of a sample of
investigators could be zero because very few investigators obtained any
nonzero effect. That did not turn out to be the case for the ganzfeld domain.
OCR for page 51
51
The median effect size (h) was identical (.32) for the 28 studies and the 10
investigators or laboratories. The mean effect sizes, however, did differ
somewhat with a lower mean for labs (.23) than for studies (.28~. The
proportions of results in the positive direction were very close; .82 for
s tudies and .80 for labs .
It is of interest to note that investigators did differ significantly
from one another in the magnitude of the effects they obtained with F(9, 18) =
3.81, p<.O1, intra-class r = .63. There was little evidence to suggest,
however, that those investigators tending to conduct more studies obtained
higher mean effect sizes; the F(1, 18) testing that contrast was 0.38, ps.54,
r= .14 .:
Cone lus ion
On the basis of our summary and the very valuable meta-analytic
evaluations of Honorton (1985) and Hyman (1985), what are we to believe? The
situation for the ganzfeld domain seems reasonably clear. We feel it would be
implausible to entertain the null given the combined p from these 28 studies.
Given the various problems or flaws pointed out by Hyman ant Honorton, the
true ef fee t s ize is almos t sure ly smal ler than the mean h of .28 equivalent to
a mean accuracy of 38% when 25Z is expected under the null. We are persuaded
that the net result of statistical errors was a biased increase in estimated
effect size of at least a full percentage point (from 37: to 382~.
Furthermore, we are persuaded that file drawer problems are such that some of
the smaller effect size results have probably been kept off the market. If
pressed to estimate a more accurate ef feet size we might think in terms of a
shrinkage of h from the obtained value of .28 to perhaps an h of .18. Thus,
when the accuracy rate expected under the null is 1/4, we estimate the
OCR for page 52
52
obtained accuracy rate to be about 1/3.
Situational Taxonomy of Human Performance Technologies
In the previous sections we have reviewed domains of h''man performance
research individually. We now turn to questions regarding these areas of
research taken together: How do the areas compare with respect to their
overall effect sizes and methodological adequacy in general? What are the
important characteristics of these domains in terms of their susceptibility to
expectancy effects? What is our best estimate of the "adjusted" or "true"
effect size for each of these areas after taking into account the possibility
of interpersonal expectancy effects and other methodological weaknesses?
In attempting to answer these questions, we developed a situational
-taxonomy of the five areas of SALT, NLP, mental practice, biofeedback, and
ESP. This situational taxonomy is given in Table 5. The first line shows our
estimates of the mean effect size (r) for each area based on our reviews of
the literature. Given the diversity of these areas, these effect sizes are
remarkably homogeneous, ranging from a low of .13 for biofeedback research to
a high of .29 for SALT research. We repeat our caveat, though, that these
effect sizes are not the products of exhaustive meta-analyses, and they are
accurate estimates only to the extent that our samples of studies are
representative of their populations. The next two lines of the table present
the number of studies on which our analyses are based and the estimated total
number of studies existing on the topic. These figures help in determining the
stability of our estimates; we are most confident in our judgments of the ESP
ganzfeld literature and least confident in our judgments of the biofeedback
literature. It is important to remember that our reviews in some cases were
OCR for page 53
53
quite selective: Our discussion of NLP, for example, focused only on those
studies that investigated the Preferred Representational System aspect of NLP
theory, and our discussion of ESP focused only on studies of the ganzfeld
technique that employed the criterion of direct trite.
The second part of Table 5 lists important exogenous factors of the
studies, that is, elements of experimental Be sign that are not necessarily
part of the technique. The exogenous factors that we identified as being of
particular importance are random assignment of subjects to experimental
condition (or stimuli to condition in the case of ESP studies), keeping
experimenters blind to the experimental condition of the subjects, setting up
appropriate control groups (or comparison values in the case of ESP), and the
length of experimenter-sub ject interaction. Of these factors, random
assignment and experimenter blindness in particular are the most important in
determining the possibility that exogenous expectancy effects could have
occurred . Looking at Table 5, we see that the SALT studies do not compare
favorably with the other areas with respect to these factors, and that only
the ganzfeld ESP studies regularly meet the basic requirements of sound
experimental design.
The third section of Table 5 lists relevant endogenous factors, or
characteristics that are actually part of the human performance technology.
Two endogenous factors seemed especially important: whether or not the
subjects' self-expectancies play a major role, and the climate of the
experimenter-subject interaction. Self-expectancies are an important part of
SALT, mental practice, and biofeedback, and they may be important in ESP
studies as the literature suggests that larger effects are found with subjects
who believe that ESP exists (Schmeidler, 1968~. The domains characterized by
OCR for page 54
54
the warmest experimenter-subject climate, which we have seen to be a major
component in the mediation of expectancy effects, are SALT ant NLP. Mental
practice ant ESP studies are characterized by more formal ant neutral
experimenter-subject relations, ant although biofeedback studies often take
place in a therapeutic context, the quality of the experimental interaction is
neverthe less usual ly formal and neutral .
The next line of the table presents our overall rating of the
methodological quality of the research in these areas. These ratings were
arrived at in a subjective manner, based on the factors listed in the table as
well as our overall impression of the literatures. The scale employed is
arbitrary, with a hypothetical maximum of 25; the absolute values of the
quality ratings are less important than are the distances among the domains on
this scale. As Table 5 shows, we have given SALT the lowest quality rating,
followed by the areas of mental practice and NLP, which are close together in
terms of quality; biofeedback and ESP are the two best areas in terms of
methodological quality. Interestingly, there is a strong inverse relationship
between the rated quality of an area and its mean effect size; the correlation
coefficient is r(3~=-.85, ~=.03, one-tailed.
The last line of Table 5 gives our estimate of the
sizes for each of the five areas, that is, our judgment
effect size for an area would be after adjusting it for
"residual" effect
of what the ''true"
any possible bias due
to expectancy effects or methodological weaknesses. This adjustment was made
on a qualitative basis rather than on the basis of any explicit weighting
scheme, although clearly some of the factors listed in Table 5 (e.g., random
assignment and experimenter blindness) were more influential in determining
the residual effect size than were others (e.g., mean length and climate of
OCR for page 55
as
interaction). We wish to emphasize that the values of these residual effect
sizes are presented for purposes of illustration and should not be interpreted
too literally. As can be seen, the degree of adjustment varied across the five
domains; the largest drop was for the SALT domain, where the effect size
decreased from .29 to .00. The smallest drop was for the biofeedback domain
where the effect size decreased from .13 to .10.
Several interesting relationships among the results of Table 5 are worthy
of mention. lathe zero-order correlation between the original and residual
effect size was r=-.104. The correlation between the original effect size and
the quality rating was negative, r=-.847; however, the correlation between
residual effect size and quality was positive, r=.306. The partial correlation
between the original and residual effect size controlling for the quality
rating was r=.307. Lastly, the partial correlation between the residual effect
size and the quality rating, controlling for the original effect size, was
r=~413.
The magnitudes of the effect sizes, both original and adjusted, for the
five areas are not large. This is not surprising, for the five areas are all
controversial, and one hallmark of a controversial area is a small effect
size: Sometimes you get a positive result but sometimes you don't. If a
research area always yielded large, significant effects there would be no
controversy. We feel there are several important implications of the
realization~that these areas are characterized by small effect sizes. The
first is that "small" does not mean "unimportant." Even the smallest
(unadjusted) effect size, r=.13 for biofeedback, can be interpreted using the
Binomial Effect Size Display (Rosenthal ~ Rubin, 1982) as an increase in
success rates from 44% to 56: for subjects receiving biofeedback therapy. In
OCR for page 56
I,'
i
56
short, even though the five areas may be associated with small effects, these
effects nevertheless can be of substantial practical importance.
Another implication involves the underlying distributions of these
effects in the population. The effect sizes we have reported are means
computed across multiple studies. We do not know what the underlying
distributions of these effects are in the population. For example, does the
mean (unadjusted) effect size r=.14 for the ganzfeld studies mean that ESP is
normally distributed in the population, with most people exhibiting it to the
tune of _=.14? Or is it the case that most people would show a zero effect and
a small number of people would show a large effect, resulting in a mean r=.14?
The information needed to decide among these and other alternatives is not
available. However, the question of what the distribution of benefit looks
like for these technologies is an important one and deserves attention. To
discover the nature of these underlying distributions, researchers would need
to test a large number of subjects over a long period of time. But this is
information worth gathering, because,the selection ant training of subjects in
these human performance technologies might be very different if we thought a
given technology more or less affected all people in a normally distributed
manner than if it affected only a portion of the population in a skewed
manner.
The third important implication concerns the nature of replication. As
states above2 these are controversial topics, and they are controversial in
part because of the issue of replication failure. As it stands now, most
researchers regard a failure to replicate as when a study's not reaching the
.05 level of significance. We suggest that rather than emphasizing
significance levels in the assessment of replications, the focus should be on
OCR for page 57
~7
the comparability of effect sizes. Thus the question becomes, "Do the studies
obtain effect sizes of similar nonzero magnitude?" rather than "Do the studies
all obtain statistically significant results?" Defining replication in terms
of similarity of effect sizes would obviate arguments over whether a study
that obtained a ~=.06 was or was not a successful replication (Nelson,
Rosenthal, & Rosnow, in press; Rosenthal, in press).
Suggestions for Future Research
Expectancy Control Designs
Throughout this paper, we have offered our opinion on the extent to which
interpersonal expectancy effects may be responsible for the results of studies
on various human performance technologies. Our approach has been necessarily
speculative, as very few of these studies directly addressed the possibility
that expectancy effects might be an important cause of the results. We have
pointed out factors that lead us to believe that expectancy effects may have
been occurring in several cases, but we were not present at the time the
studies were conducted, and we do not have videotapes of the sessions. All we
can conclude on the basis of the information available to us is that
expectancy effects could have happened; we do not know that they did.
However, we can make suggestions for designing future studies that would
not only assess whether an expectancy effect was present but also would allow
the direct comparison of the magnitude of expectancy ef fects versus the
phenomenon of interest. This is accomplished through the use of an expectancy
control design (Rosenthal, 1966; Rosenthal & Rosnow, 19841. In this design,
experimenter expectancy becomes a second independent variable that is
systematically varied along with the variable of theoretical interest. It is
easiest to explain this design with a concrete example ~ and we will use as our
Representative terms from entire chapter:
effect size