Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 45
~2
J
Assessment:
Issues arid Methocis
Most discussions of assessment in the context of special education place-
ment for mildly mentally retarded students focus on proper classification
and the avoidance of misclassification. These issues have been treated ex-
tensively by other panels and professional organizations (e.g., Hobbs,
1975~. This panel was convened because of public concern about the pos-
sible misclassification of minority students and about the violations of civil
rights that such misclassification might entail. As we argued in Chapter 1,
however, issues of classification or valid assessment surrounding the
educable mentally retarded (EMR) category are inextricably linked to
issues of instruction. One major reason why misclassification is a policy
concern is that it may lead to inappropriate educational treatments. Con-
sequently, we focus our discussion of assessment instruments and proce-
dures on their educational relevance and utility their usefulness in iden-
tifying students who need and can profit from special forms of instruction
or interventions and their usefulness as guides to the type of instruction or
intervention that is needed.
Assessment procedures and instruments may have many functions, of
Although our discussion concentrates primarily on the direct contribution of assessment to
classroom instruction, we recognize that other forms of intervention may be appropriate and
necessary for some children before any program of classroom instruction can be effective.
For example, the correction of defective vision or hearing, medical treatment, or even psy-
chotherapy or family intervention might be needed before a child can function successfully in
the classroom.
45
OCR for page 46
46
REPORT OF THE PANEL
which guiding intervention is only one. They might be used to diagnose
abnormal or debilitating organic conditions, to predict future academic
performance, or in theory even to infer the underlying capacity to learn.
Each of these functions would imply different assumptions about the na-
ture of the instrument being used and about the entity being measured.
Each would raise different scientific controversies. Each could contribute
to intervention; for example, diagnosis could point to treatment, although
there might be some conditions that can be diagnosed but not treated.
The discussion below subordinates these other functions to that of facili-
tating effective educational intervention. For example, much of the debate
surrounding IQ tests has to do with their use in inferring learning po-
tential. Although we sketch the broad outlines of this debate, we base our
conclusions about IQ tests primarily on their utility, or lack of utility, in
helping educators select and design instructional programs.
Our decision to focus on the educational utility of various assessment
devices and procedures, rather than on their role in classification and mis-
classification, is based primarily on the fact that we are analyzing assess-
ment in an educational context, in which it is a means to the end of
improving instruction. Two additional considerations reinforce our decision.
First, as shown in Chapter 2, definitions of EMR originated with a
particular instrument the IQ test and have shifted over time. Data on
the prevalence of EMR are confounded with the assessment practices and
instruments used in different states and localities (see Chapter 2 and the
paper by Shonkoff in this volume). It is difficult to discuss cogently the
contribution of different assessment practices to classification and mis-
classification in the face of this confusion and circularity. Furthermore, it
would be fruitless to cover the same ground as the far more extensive
discussions of classification mentioned above.
Many scientific controversies about the validity of assessment tech-
niques, notably the IQ test, are unresolved. To attempt to take sides on
these issues would require a detailed, technical discussion that probably
would neither settle the issues nor lead to useful recommendations for
educational policy and practice.2 Decisions about policy and practice can-
not await the final resolution of scientific debates. By focusing on educa-
tional utility, we hope to provide a framework for approaching these deci-
sions despite the ambiguities in current understanding.
This chapter has two major sections. The first section, the bulk of the
chapter, reviews salient issues surrounding the instruments that comprise
2 For a comprehensive discussion of the issues involved in ability testing generally, see the
report of the National Research Council's Committee on Ability Testing (Wigdor and
Garner, 1982).
OCR for page 47
Assessment: Issues and Methods
47
a comprehensive battery for assessing a child who has proved unable to
learn normally in the classroom. The section covers IQ tests and other
measures of intellectual functioning, biomedical measures, and measures
of adaptive behavior the child's ability to meet normal expectations ap-
propriate to age and setting, with regard to self-help skills, independence,
impulse control, cooperation, and the like. The second section describes
an ideal assessment process in which the comprehensive assessment would
be embedded. The process takes place in two phases. The first phase,
prior to any attempt to find problems or deficiencies in the child, is a
systematic investigation of the learning environment and the instruction
the child receives. The purpose of this phase, which is almost nonexistent
in current practice, is to be certain that the child cannot perform ade-
quately in a well-designed instructional setting. Only after deficiencies in
the environment have been ruled out, by showing that the child fails to
learn under several reasonable programs of instruction, is it legitimate to
expose the child to the risks of stigma and misclassification that are in-
herent in any individual assessment process. The second phase is the com-
prehensive individual assessment itself, which it is hoped would be applied
to significantly fewer children than are affected under the current referral
and placement system.
COMPREHENSIVE INDIVIDUAL ASSESSMENT
The purpose of comprehensive assessment is to locate the source of the
child's difficulties in learning in the classroom. In many ways a compre-
hensive assessment represents an attempt to test, at the individual level,
some of the hypotheses about the causes of deficient classroom function-
ing that were discussed in Chapter 1. The causes may lie in physical mal-
functions, emotional disturbances, deficient social skills (either specific to
the school or encompassing the home as well), lack of relevant academic
preparation, lack of more general cognitive skills, or a basic limitation in
intellectual capacity. The causes may also lie in broader sociocultural fac-
tors of the kind discussed in Chapter 1, such as value systems antithetical
to that of the school. Such factors may be manifested in the child's
behavior in the classroom or during test situations and, to some degree, in
measures of adaptive behavior.
As noted in Chapter 2, broad-based assessment is required under P.L.
94-142, its implementing regulations, and the regulations implementing
Section 504. The regulations require, among other provisions, that assess-
ments go beyond "a single general intelligence quotient" to include mea-
sures of "specific areas of educational need." They prohibit the use of any
single procedure as the sole criterion for placing a child. They require that
OCR for page 48
48
REPORT OF THE PANEL
tests be selected in a manner designed to reflect a child's aptitude and
achievement, rather than "the child's impaired sensory, manual or speak-
ing skills." Furthers the regulations for P.L. 94-142 require that a child be
assessed in "all areas related to the suspected disability." In practice, as
seen in Chapter 2, compliance with the law is far from complete. Whether
or not other measures are administered, IQ and achievement tests tend to
dominate EMR placement decisions (see Chapter 2 and the paper by Bickel
in this volume).
We therefore begin this section with an examination of the major con-
troversies surrounding IQ tests-arguing, however, that their relevance
for educational practice is limited. The section also discusses attempts to
develop better measures of intellectual functioning, whether by improving
the IQ test or by developing supplementary or substitute measures. The
section then surveys biomedical measures and measures of adaptive be-
havior. Both types of measure lie outside the intellectual domain, as it is
usually defined; they are essential, however, to understanding the child's
classroom performance and more general capabilities and limitations as
well as to designing appropriate interventions.
IQ TESTING: CONTROVERSIES, IMPLICATIONS, AND ALTERNATIVE s 3
Of all the elements in the assessment process, standardized tests of "in-
telligence" have been the most controversial. They have been the subject
of protracted litigation, as discussed in Chapter 2. They have been the
focus of acrimonious debate in the academic community.
Three related questions are at the heart of the debate as it is usually
conducted: Are IQ scores4 determined primarily by genes or by the
environment? Are IQ scores valid measures of academic ability? Are IQ
tests culturally biased? These questions, though central to virtually all
discussions of IQ testing, do not neatly divide proponents and opponents
of testing in the schools. There is considerable diversity of opinion within
both camps, and there has been little attempt to spell out the practical
implications of these scientific controversies.
3Much of the information in this section is based on the paper by Travers in this volume.
4We recognize that leaders in the field of educational assessment have long recommended
against the use of single IQ scores and have urged the use of multiple instruments and care-
ful consideration of performance profiles across subscales within tests for assessing an indi-
vidual's mental abilities. Our focus on summary scores and use of the term "IQ test" rather
than "test of mental abilities" or the like arises because of data cited in Chapter 2 and else-
where in this report that show that summary scores are often accorded predominant weight
in placement decisions. While the extent of this practice is uncertain, it is an important
source of the controversy surrounding the use of such tests in educational placements.
OCR for page 49
Assessment: Issues and Methods
49
Our discussion of the three issues bears primarily on widely used, indi-
vidually administered IQ tests, notably the Stanford-Binet and the revised
Wechsler Intelligence Scale for Children (WISC-R). Special issues raised
by group ability testing and by the use of various substitutes for the major
IQ tests are not discussed.
The Nature-Nurture Issue
Of all the questions surrounding IQ testing, the nature-nurture issue is
the one most bitterly debated, although, as we argue below, it has little
relevance for education policy or practice. In recent years the controversy
has centered on the relative contributions of heredity and environment to
the 15-point average difference usually found between the IQ scores of
blacks and whites. Most of the existing scientific evidence bears on the
contribution of genotypic variation to individual differences in measured
(phenotypic) IQ within ethnic groups. For example, Arthur Jensen's con-
troversial article (1969) examined correlations among IQs of persons in
various biological kinship relationships and concluded that about 80 per-
cent of the variation in IQ is genetically determined. Others (e.g., Jencks
et al., 1972) have arrived at substantially lower estimates of heritability;
however, a fairly recent review (Loehlin et al., 1975) offers a figure close to
Jensen's for the heritability of individual differences in IQ within Euro-
pean and American Caucasian populations. The reviewers found less con-
sistent evidence for American black populations; heritability is substantial
for these populations but perhaps somewhat lower than for whites.
Numerous critics have attacked the assumptions, methods, and data
that led Jensen to his high estimate of the heritability of IQ. Among the
many factors cited by the critics are the confounding of genes and en-
vironments, restriction in the range of environments studied, and the in-
appropriateness of the statistical techniques borrowed from population
genetics that were used to estimate heritability.
The most controversial aspect of Jensen's work was his speculation that
the average IQ difference between races in the United States is due partly
to genetic factors. His critics have stressed that group differences in
distributions of a trait can be due mostly or entirely to the environment,
even if the heritability of the trait within groups is high. Loehlin et al. ad-
dressed the issue of between-group differences, primarily by examining
studies relating IQ distributions to indices of racial mixture, such as blood
types, skin color, and direct genealogical information. They concluded
that the data "are consistent with either moderate hereditarian or en-
vironmentalist interpretations" but perhaps "more easily accommodated
in an environmentalist framework (p. 238~." A similar statement could be
OCR for page 50
50
REPORT OF THE PANEL
made regarding other data, which show that the IQ gap between black
and white children is inversely related to the black child's exposure to
white, middle-class culture and schooling. These include studies of black
families who migrated from the rural South to the urban North, studies of
black children adopted by white parents, studies of the effects of early in-
tervention programs, and studies of sociocultural variations within black
and white populations.
In short, scientific controversy continues to exist with respect to the
issue of heredity versus environment. Virtually everyone involved in the
controversy agrees that both genetic and experiential factors influence IQ;
what is at issue is the degree of influence and the mechanisms involved.
The controversy has been carried into the courts, and several major
judicial decisions on testing have reflected the judges' convictions that IQ
tests fail to measure native intelligence (Bersoff, 19791. Yet on closer ex-
amination, we feel that the ultimate, substantive, scientific outcome of the
controversy is less important for education policy and practice than it may
appear, in particular for policies affecting placement of students in EMR
classes.
There is a widespread assumption outside the field of special education
that mental retardation is by definition an innate incapability to learn.
(This belief is clearly reflected in the Larry P. decision; see also E. Smith,
1980.) It follows from this assumption that IQ must measure innate ca-
pacity if it is to be a legitimate index of mental retardation. These views
are not shared, however, by medical and educational professionals con-
cerned with mental retardation (see Goodman, 1977, for a forceful exposi-
tion of this point). Mental retardation is currently defined as a deficit in
functioning and adaptive behavior, which may be due to a wide variety of
factors, experiential as well as organic. This purely functional definition is
motivated by the fact that, within the limits of current knowledge, there
are no differences in prognosis or indicated educational "treatment" that
distinguish organically caused deficits from experientially caused deficits.
That is, children at the same level of functional ability have about the
same expected level of future performance and can be taught most effec-
tively in about the same ways, regardless of whether their deficits have a
known organic cause, such as Down's syndrome (see Chapter 4 for further
discussion of educational treatment). If education practice is independent
of etiology in these clear-cut cases, it is hard to see why practice should be
affected by the heritability of IQ.
It is important to recognize that a wide range of academic performance
can be achieved by children with any given IQ. Even if differences in
academic ability or achievement are in large part genetically caused, proper
instruction can do a great deal to ensure that children develop to their
OCR for page 51
Assessment: Issues and Methods
51
fullest potential. For example, children with Down's syndrome reportedly
make significant gains under certain programs of instruction (Hayden and
Haring, 1977~. Although a teacher, administrator, or policy maker of the
hereditarian persuasion might be pessimistic about the likelihood of
change in underlying intellectual ability, this pessimism would be no justi-
fication for failing to provide conditions that allow each child to learn as
much as possible. Decisions about curricula and teaching methods to be
used with children at different levels of IQ or initial academic perform-
ance as well as decisions about whether to teach these children separately
or together can and should be based on the demonstrated pedagogical ef-
fectiveness of the various approaches, not on preconceptions about the
causes of initial differences in performance.
Finally, one's position on the nature-nurture question gives little or no
guidance as to the degree of ethnic imbalance in special education place-
ment that one should be willing to tolerate. As long as there are special
programs for children who lack traditional academic skills, environmen-
talists and hereditarians alike would expect minority children to be over-
represented in such programs, at least for the immediate future.
If children are indeed being stigmatized or denied educational oppor-
tunity because of presumed native incapacity, such practices represent an
inappropriate and unjustified use of IQ scores. The practices should be
discontinued, but their discontinuation does not depend on proof that IQ
has low heritability.
The Issue of Test Validity
Are IQ tests valid measures of "intelligence" or academic ability? Though
often equated or confused with the nature-nurture issue, the issue of
validity is in fact a separate one. Many psychologists think of intelligence
as an ability (or set of abilities) to absorb complex information and grasp
and manipulate abstract concepts an ability that is developed through the
interaction of genetic endowment and experience. In this view, intelligence
is not native capacity, but it is much more than knowledge of answers to
the specific questions on the Stanford-Binet or the WISC-R. Almost all
children could be taught to answer the specific questions correctly. The
question is how to interpret their performance in the absence of instruc-
tion related directly to the test items.
The validity question thus posed has two parts: the first asks whether
the skills measured by IQ tests are specific or general; the second asks
whether the entity or entities measured by the tests can legitimately be in-
terpreted as "developed ability."
There was a long debate in psychometrics over whether IQ tests mea
OCR for page 52
52
REPORT OF THE PANEL
sure "general intelligence" or differentiated abilities verbal ability, per-
ceptual ability, quantitative ability, etc. Contemporary opinion holds that
they measure both; there is variation shared by all items, and there are
also clusters of items that are particularly closely related. The overriding
conclusion, however, is that some variation is shared within clusters and
across the whole test. The rather disparate items on different IQ tests
seem to be measuring the same thing or a small number of things not a
miscellaneous collection of isolated facts and skills. This conclusion is
consistent with the interpretation that tests measure underlying abilities,
which are manifested in the mastery of specific skills and knowledge. It is
equally consistent with the interpretation that the common factor arising
from shared variation across different tests and items is really the degree
of exposure to middle-class culture and schooling.
There is no general resolution to this interpretive issue. All performance
depends on both specific learning and broader abilities. For example, a
child's performance on verbal analogies ("Tables are made of wood; win-
dows are made of ") depends on acquired vocabulary and familiar-
ity with the named objects as well as a more general ability to perceive
relationships. The relative contributions of ability and specific experience
are not fixed properties of the item or test but depend on the ranges of
ability and experience in the population tested. For example, English-
speaking American children of elementary school age would presumably
be familiar with the words in the above example, and their performance
would probably be determined largely by their ability to perceive relation-
ships. However, if children from non-English-speaking families or from
cultures without windows and tables were tested, variations in familiarity
with the vocabulary items would contribute significantly to performance.
Claims about the validity and meaning of test scores, then, are always
population-specific.
Rather than addressing the interpretive issue directly, most proponents
of testing in the schools place their faith in the empirical phenomenon of
predictive validity. Many studies have shown that IQ scores correlate with
later school grades and scores on standardized achievement tests (see the
paper by Travers in this volume). These validity coefficients (correlations)
clearly do not settle the interpretive question. They are consistent with the
hypothesis that IQ tests measure general academic ability, which is later
manifested in scholastic performance. But, again, they also can be inter-
preted as showing merely that IQ tests, achievement tests, and teacher-
made tests all sample the same domain of acquired skills. The question of
importance, once again, is how these conflicting interpretations bear on
education policy or practice.
Critics of testing have argued vehemently that tests are invalid as
OCR for page 53
Assessment: Issues and Methods
53
measures of children's general ability and are therefore unfair devices to
use for placement. However, few critics have attempted to spell out why
tests would be fair if they did measure ability or why they are unfair if
they measure only acquired skills. Defenders of testing have justified the
use of tests on grounds of predictive validity, apparently believing that
they are fair even if they measure primarily acquired skills. Yet few de-
fenders have spelled out their criteria of fairness either. The argument is
not really about the degree to which IQ tests measure ability versus ac-
quired skills but about the legitimacy of using a test that mixes the two as
a basis for educational programming and placement.
As Messick (1980) points out, when we begin to ask about the legiti-
macy of a particular use of a test, we must consider more than just what
the test measures (validity, in traditional psychometric terms). We must
also ask about the consequences of the intended use. In the context of
educational decision making it is not enough to know that IQ tests predict
future classroom performance, nor would it be enough even to know that
they measure general ability. It is necessary to ask whether IQ tests pro-
vide information that leads to more effective instruction than would other-
wise be possible. Specifically, is it the case that children whose IQs fall in
the EMR range require or profit from special forms of instruction or
special classroom settings? In the language of contemporary education
research, is there an "aptitude-treatment interaction" (Cronbach and
Snow, 1977) such that different instructional methods are effective for
children with low IQs? An affirmative answer to these questions would
constitute a good reason to use IQ scores in programming and placement
decisions. (Of course, there might also be other offsetting considerations.)
If the answers are negative and we argue in Chapter 4 that they probably
are then the IQ has limited usefulness5 in educational decision making,
and debates about the meaning of IQ scores are of secondary interest from
practical and policy standpoints.
The Issue of Racial and Cultural Bias
Do IQ tests misrepresent the skills or abilities of minority children and
those from low-income families? Are tests merely the bearers of bad news
sThis is not necessarily an argument that IQ testing should be abandoned entirely. There is
at least one use on which professionals with very different interpretations of IQ scores agree:
If a child who is failing in school proves to have an IQ in the normal range, this finding would
point to the need for further diagnostic work, e.g., a search for physical disabilities, emo-
tional difficulties, or the like. The argument in the text applies to the use of IQ cutoffs at the
low end of the scale in deciding on educational programs and placements.
OCR for page 54
54
REPORT OF THE PANEL
about genuine differences in educational potential or academic function-
ing, or are they the creators of false differences? To address these ques
tions it is necessary to clarify some points of definition that have caused
confusion and miscommunication between specialists in psychological
measurement, on one hand, and educators, policy makers, and the public,
on the other.
For many persons outside the field of psychometrics, tests are "biased"
if group differences in test scores can plausibly be attributed to average
differences in environmental advantage enjoyed by children from different
ethnic or socioeconomic groups. Prom this perspective a test can be biased
even if it captures genuine differences in knowledge, skill, or developed
ability between groups. In effect, bias, cultural causation, and unfairness
become equivalent concepts from this point of view: It seems unfair to
categorize children or allocate educational opportunities on the basis of
performance differences that are culturally caused, and it seems proper to
characterize the instruments that effectuate this unfair categorization as
biased.
For specialists in psychological measurement, questions of bias, fair-
ness, and cultural causation are separate. From the specialist's perspec-
tive, bias is purely a measurement issue: If a test shows the same internal
structure and the same pattern of correlations with other variables across
cultural groups, the test is held to be unbiased, even if different groups
have different performance profiles due to differential opportunity and ex-
perience. Given this conception of bias, it is not inconsistent to argue that
the use of a particular test for a particular purpose may be unfair even if
the test is, in the technical sense, unbiased.
Three potential sources of bias have received the lion's share of atten-
tion in the psychometric literature to date: (1) differences in performance
induced by culturally sensitive features of the test situation, such as the
race or dialect of the tester; (2) differences across cultural groups in the
difficulty of particular items or in other internal features of the pattern of
responses generated by test items; and (3) differences in the predictive
validity of tests for different groups.
Bias in the Test Situation Aspects of the test situation, aside from the
child's actual skill or ability, that might influence test scores include
familiarity with the particular test or type of test (coaching and practice);
the race and sex of the tester; the language style or dialect of the tester; the
tester's expectations about the child's performance; distortions in scoring;
time pressure or lack thereof; and attitudinal factors such as test anxiety,
achievement motivation, self-esteem, and countercultural motives to avoid
conspicuously good performance.
OCR for page 55
Assessment: Issues and Methods
55
Cases have been cited in the courts of minority children whose IQs were
low when tested by a school psychologist but increased dramatically when
the children were retested by persons of the same ethnic group under non-
threatening conditions. Most published research, however, finds little
evidence that situational factors affect minority children differentially
(Jensen, 1980: Chapter 12~. Some situational factors have significant
overall effects on test scores but show no interactions with ethnicity. For
example, coaching and practice together can boost an individual's IQ
score by about nine points, if the individual is retested after a fairly short
time interval with a test that is similar to the one used for practice. Blacks
and whites profit almost equally from coaching and practice. Thus, the
reported data suggest that familiarization with tests cannot eliminate
much of the IQ difference between the races. Not all of the other situa-
tional factors have significant overall effects on test scores, and none is as
large as the effects of coaching and practice. More important, in no case is
there a large interaction between a situational factor and ethnicity.
Item Bias One approach to the analysis of item bias, which might be
called "editorial," is to analyze the face content of items on logical or
semantic grounds or on the basis of apparent or presumed connections to
particular subcultural milieux. Judge John F. Grady's recent decision in
Parents in Action on Special Education v. Hannon (1980) provides a
dramatic and socially significant illustration of this approach. Setting
aside a variety of statistical and empirical arguments for and against the
use of tests in placing black children in EMR classes, the judge chose in-
stead to examine test items individually and to decide in each case whether
the item appeared, a priori, to present special difficulties for black
children. This "item analysis" led the judge to accept all but a few items
on the Stanford-Binet and WISC-R and to uphold the use of these tests for
educational placement by the Chicago schools. Others have drawn dia-
metrically opposed conclusions from similar editorial item analyses.
One obvious flaw in this approach is that it places bias in the eye of the
"editor," and different editors disagree. More important is the fact that
judgments about item content (even if there is agreement) are neither
necessary nor sufficient to prove that particular items discriminate against
minority children, in the sense of lowering their test scores. An apparently
innocent item can be disproportionately difficult for minority children
compared with whites, while an item that is problematic on its face can be
equally difficult for all groups.
A more systematic and empirical approach to item bias is to examine
the proportions of minorities and whites who get each item correct; when
an item deviates markedly from the overall profile for any group, that item
OCR for page 63
Assessment: Issues and Methods
63
of life, may create global deficits of functioning. Some of these deficits
may have neurological or other physical correlates in the school-age child;
others may not. Shonkoff (in this volume) reviews a variety of genetic, pre-
natal, perinatal, and postnatal conditions that have among their sequelae
global impairment of intellectual functioning. Many of these conditions,
such as maternal malnutrition or lead intoxication, can be prevented;
others, such as phenolkytonuria (PKU), can be significantly ameliorated
if detected early. In most cases, however, the damage cannot be corrected
by known physical treatments when the child has reached school age.
Remediation in these cases must address the symptom; that is, it must
take the form of an educational program designed to meet the needs of an
impaired learner. Within the limits of current knowledge there appear to
be no differences between the educational treatments that work best for
children who have global learning difficulties due to physical causes and
those that work for other children with global deficits. Future research
may lead to medical or educational interventions addressing physically
based, global learning problems; if so, identification of long-term physical
causes will become a major function of biomedical assessment in educa-
tional contexts. For now, however, its primary functions are the detection
of physical impairments in mentally normal children and the detection of
neuropsychological conditions that impair intellectual functioning but are
distinct from mental retardation as it is usually conceived.
Another distinction is also important to understanding our view of bio-
medical assessment. Certain assessment procedures can be performed at
relatively low costs; they give a preliminary indication of where a child's
problem may lie. Other procedures are more extensive and require the
services of highly trained professionals and are, therefore, costly. Screen-
ing procedures of the first kind are appropriate to use with all children
who have been referred for learning problems. Detailed diagnostic pro-
cedures of the second kind are appropriate for use in a small number of
carefully targeted cases.
Screening procedures are exemplified by the biomedical portion of Mer-
cer's System of Multicultural Pluralistic Assessment (SOMPA) (Mercer and
Lewis, 1978), a battery of instruments designed for use in comprehensive
educational assessment. SOMPA includes six biomedical measures: the Snel-
len test of visual acuity, a measure of auditory acuity, weight standardized
by height, a set of physical dexterity tasks, a health history inventory, and
the Bender Visual Motor Gestalt Test (a test that requires the child to
copy a set of figures, which is regarded as indicative of perceptual matu-
rity and neurological impairment). None of these measures is sufficient in
itself to precisely pinpoint a disability or to specify the necessary remedia-
tion. Each is capable, however, of identifying a general area of disability,
OCR for page 64
64
REPORT OF THE PANEL
within which more precise measures can be taken. In some cases the
screening measures may point to widely prevalent problems, for which
more refined diagnosis and remediation are routine; detection of common
visual problems is an obvious example. In other cases the measures may
point to areas of disability for which further diagnostic work may be ex-
tensive and for which remediation may or may not be available.
When a preliminary screening indicates the possible existence of neuro-
logical problems, a variety of specialized cognitive, sensory, and motor
tests come into play. Interpretation of the results, which requires the ser-
vices of a specialist in neuropsychology, rests on a large body of data ac-
cumulated mainly during the last 15 years (Hecaen and Albert, 1978;
Lezak, 1976; Reitan and Davison, 1974~. Unlike traditional ability and in-
telligence testing, neuropsychological analysis depends on at least four
different uses of testing results: the level of function, pathognomonic signs,
patterns, and disparities between the left and right sides of the body.
Investigations of individuals whose IQs fall in the mildly mentally
retarded range (Matthews, 1974) have shown that their performance is
sometimes strongly suggestive of localized lesions in the brain. Initially, in
the classroom, poor performance may appear to be global in nature,
whereas on closer investigation it may be seen as part of a picture resulting
from selective damage to the nervous system. For example, a child may
demonstrate low verbal ability, which is itself due to a lateralized damage
to the speech centers of the brain. Other tests, such as comparison of per-
formances from the two sides of the body, may reveal that the lateralized
damage appears in other areas besides speech and language.
Some performances on tests are pathognomonic; that is, in this context,
diagnostic of cerebral damage. For example, a partial hemiplegia may be
revealed by unusual discrepancies between finger tapping of the left and
right hands. Or abnormalities of the sensory pathways may be revealed by
failures of recognition in factual performance tests.
The application of neuropsychological analysis is by no means straight-
forward for young children and those whose verbal skills are impaired
(Boll, 1974~. Nevertheless, a thorough examination of neuropsychological
integrity, based on knowledge of the structural features of the brain, can
lead to the detection of specific genetic, traumatic, or pathophysiological
conditions (Benson, 19741.
Adaptive Behavior Scales
As noted earlier, the AAMD as well as the federal government and many
states define mental retardation as "significantly subaverage general in-
tellectual functioning, existing concurrently with deficits in adaptive be
OCR for page 65
Assessment: Issues and Methods
65
havior, and manifested during the developmental period" (Grossman,
1977:5, emphasis added.) The AAMD goes on to define adaptive behavior
as "the effectiveness or degree to which the individual meets the standards
of personal independence and social responsibility expected of his age or
cultural group" (Grossman, 1977:11~. This broad definition is consistent
with numerous more specific definitions that have been proposed by theo-
reticians and researchers (Courter and Morrow, 1978, Chapter 11.
Because the definition is so broad, it has given rise to a large number of
instruments (at least 132, according to a review cited in Meyers et al.,
1979) that stress different aspects of adaptation and have different metric
properties. However, as Meyers et al. point out, most of these instruments
share certain general characteristics that sharply distinguish them from
intelligence tests: (1) they focus on behavior rather than thought pro-
cesses; (2) they focus on common or typical behavior rather than on "po-
tential"; that is, they are descriptive rather than necessarily implying the
existence of underlying traits or capacities; and (3) they are based on
reports of informants, usually parents or teachers, rather than on direct
observation of a child's performance.
Most of these instruments have been designed specifically for use with
mentally retarded people and are particularly appropriate for differen-
tiating levels of functioning in individuals who are clearly below the nor-
mal range. However, a few are designed for use in the public school
population and are intended to help discriminate "EMR" from "normal"
children. Our discussion is particularly concerned with the latter in-
struments, of which the most widely used are the AAMD Adaptive Be-
havior Scale-Public School Version (ABS) (Lambert et al., 1975) and the
Adaptive Behavior Inventory for Children (ABIC) (Mercer and Lewis,
1978; Mercer, 1979~. The two instruments have much in common, both in
content and purpose, yet they also exhibit some important differences.
Together they illustrate most of the major issues involved in the use of
adaptive behavior scales in the schools.
The AAMD public school scale, which was derived from an earlier
AAMD scale designed for mentally retarded people (Nihira et al., 1969),
has two parts. The first contains 10 competence domains, each with one or
more subscales: independent functioning (eating, toileting, etc.), physical
development, economic activity (budgeting and shopping), language de-
velopment, numbers and time, vocational activity, self-direction (initia-
tive, perseverance, use of leisure time), and responsibility and socializa-
tion (cooperation, considerateness, interaction with others). The second
part contains 12 domains of maladaptive behavior: violence and destruc-
tion, antisocial behavior, rebellion, untrustworthiness, withdrawal, stereo-
typed behavior and odd mannerisms, inappropriate manners, unacceptable
OCR for page 66
66
REPORT OF THE PANEL
vocalizations, unacceptable or eccentric habits, hyperactivity, psychologi-
cal disturbance, and use of medication. The school version of the ABS is
normally completed by a teacher, although at least one study has shown a
high degree of agreement between parents and teachers in describing
children's behavior with the ABS (Cole, 19761. The ABS school version has
been standardized on a sample of 2,600 children, including normal children
and children identified as EMR, trainable mentally retarded, and educa-
tionally handicapped. The standardization sample included a wide range of
socioeconomic levels and ethnic backgrounds.
The ABIC is part of SOMPA, a comprehensive system for assessment of
children from diverse cultural groups. This instrument includes 242
items, each referring to a specific practical or social skill or behavior. For
example, can the child take a message on the telephone? Does the child
cross the street with the traffic light? Does the child visit friends outside
the neighborhood? Questions are answered by the child's mother or
mother substitute. Most of the items are age graded, over the elementary
school range from five to eleven; gradings are based on data from an ex-
tensive pretest and from the norm sample (described below). Items are
organized into six competence areas or subscales family, community,
peer relations, nonacademic school roles, earner-consumer, and self-
maintenance. Scores are normalized within each subscale and calibrated
to yield a mean of 50 points and a standard deviation of 15. Subscale
scores are averaged to yield an overall score. The instrument has been
standardized on a sample of almost 2,100, including equal numbers of
black, Hispanic, and white children, spanning a range of socioeconomic
levels.
It is apparent that there is considerable overlap between the ABS and
ABIC (and other adaptive behavior scales) in the types of behavior
covered. There are differences as well. The ABS is completed by the
teacher and focuses on adaptive behavior within the school. It contains
items with intellectual content of the sort found in IQ tests. In contrast,
the ABIC is completed by the mother and concentrates more exclusively
on practical skills and social behavior exhibited outside the school. It is
not surprising, therefore, that some of the ABS subscales (numbers and
time, economic activity, and language development) correlate about .6
with IQ, whereas other scales show modest correlations, generally below
.2 (Lambert, 1978~. The ABIC subscales show uniformly low correlations
with the WISC-R (Mercer, 1979~. As Meyers et al. (1979) note, there is a
wide range of variation in correlations with IQ among adaptive behavior
scales generally, depending on, among other factors, item content and the
populations sampled.
Another important characteristic of the ABIC is that subscale scores
OCR for page 67
Assessment: Issues arid Methods
67
and overall scores have almost identical distributions among black, white,
and Hispanic children (Mercer, 1979~. There is some evidence that
ethnicity does not affect scores on the ABS within EMR and regular
classes (Lambert, 19781. However, since ethnic proportions probably dif-
fered between EMR and regular classes in the ABS norm sample, distri-
butions of ABS scores may have differed for the ethnic groups overall.
What are the implications of these characteristics of adaptive behavior
scales for use in educational decision making? First, it is evident that
adaptive behavior scores are not redundant with IQ. The ABIC and most
subscales of the ABS yield information about domains of competence that
are distinct from the cluster of abilities tapped by IQ tests. One implica-
tion of this fact is that adaptive behavior measures cannot simply be
substituted for IQ as measures of general competence. A more important
implication is that the use of adaptive behavior measures in assigning
children to EMR classes a practice that is mandatory given existing
theoretical and legal definitions of mental retardation will reduce the
numbers of children assigned to such classes relative to the numbers that
would be assigned on the basis of IQ alone. (This is so because many chil-
dren with low IQs have adequate adaptive behavior scores.) As we saw in
Chapter 2, this outcome has been observed in practice.
The latter implication raises the important question of how children
with low IQs but high adaptive behavior scores will fare in regular classes.
The answer depends in part on how well those classes are designed to
match the pace of instruction to each child's individual needs an issue to
which we return in Chapter 4. It also depends on how much the social and
practical skills measured by adaptive behavior scales contribute to school
success.
A second potential set of implications concerns the effects of adaptive
behavior scales on ethnic disproportions in special education. Some have
expressed the hope that the use of adaptive behavior measures will reduce
the disproportionate representation of minorities in EMR classes. Logically,
there is no necessity for such an outcome. As Coulter and Morrow (1978)
point out, the use of one measure (adaptive behavior) that shows no ethnic
differences does not affect the ethnic differences in another measure (IQ).
If IQ and an ethnically neutral adaptive behavior measure, such as the
ABIC, were jointly used to place children, the IQ could in effect control
the ethnic composition of the group ultimately assigned to EMR classes,
depending on the decision rules used to combine the measures. However
there is some evidence, cited in Chapter 2, that the use of adaptive behav-
ior measures does in fact decrease ethnic disproportion in EMR place-
ment.
A final set of implications concerns the utility of adaptive behavior data
OCR for page 68
68
REPORT OF THE PANEL
in designing programs of instruction. As Coulter and Morrow (1978) point
out, the distinction between using adaptive behavior measures as classi-
ficatory devices and using them as guides for programming is a critical
one. Different measures may be appropriate for the two purposes. To
date, the use of adaptive behavior measures in programming has been
confined mainly to individuals whose deficiencies in functioning place
them well below the EMR range. Measures geared to mildly mentally re-
tarded populations have been used primarily for classification. It is easy to
envision possible instructional applications of adaptive behavior scales in
pinpointing areas of relative strength to be built on and areas of particular
weakness to be remedied. Some areas needing remediation might be skills
that are appropriate parts of the regular curriculum, e.g., telling time,
mastering numbers, learning to handle money. Others might be the modi-
fication of practical skills, such as dressing and hygiene, which would not
be part of the curriculum for most children but might well be included in a
program for mentally retarded children. Still others might be the modifi-
cation of maladaptive social behaviors that interfere with learning of any
kind, e.g., destructiveness or withdrawal. However, these potentially prom-
ising applications remain largely unexplored.
COMPREHENSIVE ASSESSMENT IN CONTEXT:
A TWO-PHASE PROCESS
Throughout our discussion of the elements of comprehensive individual
assessment, we argue repeatedly that assessment should be linked to in-
struction that it should discriminate among children who can profit
from different modes of instruction or who require different forms of
intervention before conventional instruction can work. This section sug-
gests an even more fundamental link between assessment and instruction.
The section is premised on the belief that what seem to be individual
failures are often failures of the educational system. Children may do
poorly in class because they have not been taught or managed appropri-
ately and this may be disproportionately true of minority children. If
this belief is correct, no assessment of the causes of learning failure would
be complete without a systematic examination of the teaching and learn-
ing environment.
Moreover, there are good reasons to examine the learning environment
before subjecting a child to a comprehensive individual assessment of the
kind described above. Merely to be singled out as a learning failure and
evaluated for placement in a category such as EMR may be distressing to
a child and the child's parents and may affect the subsequent behavior of
teachers and peers toward the child. And even with the most comprehen
OCR for page 69
Assessment: Issues and Methods
69
sive and conscientious of assessments, there is some risk that the child will
be misclassified. Given these risks of emotional damage, stigma, and mis-
classification, protection of the child's rights and interests would seem to
require that possible deficiencies of the learning situation be examined
and ruled out before comprehensive assessment bobbins.
We conclude that an ideal assessment process would take place in two
phases, beginning with an assessment of the learning environment and
proceeding to a comprehensive assessment of the individual child only
after it has been established that he or she fails to learn in a variety of
classroom settings under a variety of well-conceived instructional strat-
egies.5
Our conclusion is very much in the spirit of P.L. 94-142 and the regula-
tions implementing Section 504 and P.L. 94-142, which stipulate that stu-
dents be placed in special education programs only when "the education
of the person in the regular environment with the use of supplementary
aids and services cannot be achieved satisfactorily" (34 CFR 104.34(a);
see also 20 USC 1412 (5~(B), 34 CFR 300.550~.6 The main thrust of tints
provision has obviously been toward mainstreaming children already
diagnosed as handicapped. However, a neglected implication of the provi-
sion is that there must be a systematic attempt to determine whether
satisfactory progress can be achieved in a regular class. In the case of
children who, under present circumstances, would be referred for possible
placement in EMR classes, we suggest that there is much to be gained by
making this determination without waiting until the label is assigned.
There are no universally established procedures for conducting the kind
of two-phase assessment that we envision, nor is there a fully developed,
widely used technology for conducting an assessment of the instructional
environment. It is, therefore, incumbent on us to suggest the broad out-
lines of a procedure and to point to some directions that development of
technology might take.
What kinds of information might be included in an ideal phase-one as-
sessment? First, there should be some evidence that schools are using cur-
ricula known to be effective for the student populations they serve. Such
sOne exception to the principle that environmental assessment should precede individual
assessment is the case of biomedical screening for high-prevalence problems, such as vision
defects. As suggested earlier, such screening is not stigmatizing and is appropriate for chil-
dren who have not experienced classroom failure as well as for those who have.
6After the split of the U.S. Department of Education from the U.S. Department of Health,
Education, and Welfare, the Code of Federal Regulations was revised to transfer the educa-
tion regulations from the Public Welfare Title (Title 45) to an independent title for education
(Title 34). The citations of regulations for Section 504 and P.L. 94-142 in this report are to
their new location in the Code of Federal Regulations.
OCR for page 70
70
REPORT OF THE PANEL
evidence might be provided by publishers or independent researchers
or better yet by the district's own data. It is important that the data
show not only that the curriculum is effective for students in general but
also that it is effective for the various ethnic, linguistic, and socioeconomic
groups actually served by the school or district in question. Standardized
achievement tests or criterion-referenced performance tests (see below)
might serve as assessment devices.
Second, there should be evidence that the teacher has implemented the
curriculum effectively for the student in question. Such evidence might in-
clude documentation that other children in the class are performing ade-
quately and that the child in question has been adequately exposed to the
curriculum, i.e., has not missed many lessons due to absence, disciplinary
exclusions from class, etc. Such evidence might also include observational
data collected by a school psychologist, educational consultant, or resource
teacher, showing that the child's teacher is providing adequate classroom
management and appropriate instruction in accord with the curriculum;
that he or she is attending to the child in question and providing appropriate
direction, feedback, and reinforcement; and that the child is participating
adequately in the instructional process. Observational data could also be
used to detect and document problems of management and/or misbehavior
that interfere with the effectiveness of the curriculum, e.g., lack of atten-
tion, disruption of class, and the like.
Third, there should be objective evidence that the child has not learned
what was taught. Again, standardized norm-referenced tests or criterion-
referenced tests keyed to the curriculum itself might be used for this pur-
pose. Assessment of the child's progress should, however, be frequent
enough so that problems are detected early and so that the child is not
allowed to spend weeks in the classroom, falling further and further be-
hind, without the teacher noticing.
Finally and most important, there should be evidence that, when early
problems were detected, systematic efforts were made to locate the source
of the difficulty and to take corrective measures. Again, school psycholo-
gists or specially trained educators could play a role, acting as consultants
to the teacher in suggesting remedial approaches. Under some circum-
stances it might be appropriate to change teachers or curricula, in an at-
tempt to find a better match to the child's needs. Results of such attempts
at improvements should be documented, and only after reasonable efforts
have been exhausted should the child be referred formally for assessment.
What kinds of instruments are needed to support this two-phase assess-
ment process? Some possible answers have already been suggested. Stan-
dardized achievement tests can play a role in evaluating strong and weak
points in the curriculum as a whole; assuming that sufficiently reliable
OCR for page 71
Assessment: Issues and Methods
71
tests are selected, they can also be used to assess the performance of in-
dividual children. The growing literature on "effective schools" suggests
that these uses of standardized tests are among the distinguishing charac-
teristics of schools that are particularly effective in teaching minority
children from low-income families (see Chapter 41.
A developing technology that may have promise is criterion-referenced
testing. Criterion-referenced tests are used to measure mastery of specific
domains of subject matter. A child's performance is judged against some
absolute standard; a typical measure might be the number of arithmetic
problems of a specific sort that the child can solve. The child's perfor-
mance is not scaled against that of other children, nor is the test used to
draw inferences about broad intellectual abilities. Many informal, teacher-
made tests are in effect criterion referenced, as are many of the tests in-
cluded in packaged curricula and teachers' manuals accompanying stan-
dard textbooks. Recently, there have been advances in thinking about the
design of such tests (e.g., Martuza, 1977; Harris et al., 1974), and im-
provements in their psychometric properties may be in the offing. Such
tests are of interest in the context of this report because of their close link
to instruction. They can be used at the beginning of an instructional se-
quence to determine whether the child has the prerequisite skills needed
to profit from the instruction, and they can be used at the end of a se-
quence to determine whether the child has absorbed the material or needs
further work to achieve mastery. Thus, they can potentially be used to
evaluate the outcomes of the systematic variations in instruction that are
part of a phase-one assessment.
Another technology that has some promise is systematic observation in
the classroom. Systems for analyzing and recording behavior in the class-
room have a long history in educational research (Medley and Mitzel,
1963~. Most of the instruments used are too costly, time-consuming, and
demanding in terms of observer training to be practical for use in self-
evaluation by schools. However, there have been recent suggestions that
suitably simplified and focused instruments may be useful as diagnostic
devices and guides for the remediation of specific behavior problems (e.g.,
Alessi, 1980; Baker and Tyne, 19801. Observations have also been used by
researchers to measure the implementation of curricula (Starlings, 1977)
and time devoted to academic activities (Rosenshine and Berliner, 1978~.
Again, simplified observation systems may be useful for similar purposes
in assessing the quality of learning environments.
None of the above suggestions about procedures and instrumentation is
novel. All have been tried, in varying combinations, in different school
districts. A few large districts have gone far in implementing systematic
procedures of instruction and closely linked assessment; some of these
OCR for page 72
72
REPORT OF THE PANEL
districts have reported dramatic improvements in students' basic academic
skills (Carnine et al., 1981; Monteiro, 1981) and, by implication, a decline
in the rate of learning failures. These reports encourage us to believe that the
suggestions above are both feasible to implement and potentially effective.
The two-phase assessment process clearly entails new costs the costs of
training and maintaining staff to conduct evaluations of the learning en-
vironment. The process also entails financial savings, by reducing the
number of children referred for costly, comprehensive assessments and
possibly also the number who must be maintained in costly special
classroom settings.
SUMMARY AND CONCLUSIONS
The discussion in this chapter follows from the premise that the main pur-
pose of assessment in education is to improve instruction and learning.
Children are or should be assessed in order to identify strengths and
weaknesses that necessitate specific forms of remediation or educational
practice. Remediation may take the form of intervention outside the school,
such as medical treatment or family intervention. We believe, however,
that a significant portion of children who experience difficulties in the
classroom can be treated effectively through improved instruction.
These basic assumptions lead to a perspective on assessment and its
contribution to ethnic and sex disproportions in EMR classes that is dif-
ferent from the one with which the study began. A concern with dispro-
portion per se dictates a focus on bias in assessment instruments and a
search for instruments that will reduce disproportion. A concern with in-
structional utility leads to a search for assessment procedures and instru-
ments that will aid in selecting or designing effective programs for all
children. We believe that better assessment and a closer link between as-
sessment and instruction will in fact reduce disproportion, because minor-
ity children have disproportionately been the victims of poor instruction.
We also believe that the problem should be attacked at its roots, which lie
in the presumption that learning problems must imply deficiencies in the
child and in consequent inattention to the role of education itself in
creating and ameliorating these problems.
This viewpoint has led us to urge a greatly increased emphasis on sys-
tematic educational intervention before a child is referred for individual
assessment. When poor instruction has been ruled out as a cause of learning
failure, it becomes appropriate to look for problems within the child or in
the child's environment outside the school, again with an eye toward prob-
lems that can be corrected; this is the purpose of individual assessment.
OCR for page 73
Assessment: Issues and Methods
73
We believe, and have cited evidence to support our belief, that an
assessment procedure like the one we outlined will significantly reduce the
proportion of children whose failure to learn must be attributed to global
intellectual deficits. The question remains whether it is necessary or useful
to apply the label EMR to this residual group or to separate them from
other children for instructional purposes. The answer, in our view, must
hinge on another question: Do these children require and can they profit
from modes of instruction that are different from those that work best
with other children who have experienced learning difficulties? We turn to
this question in the next chapter.
Representative terms from entire chapter:
minority children