National Academies Press: OpenBook

Early Childhood Assessment: Why, What, and How (2008)

Chapter: 8 Assessing All Children

« Previous: 7 Judging the Quality and Utility of Assessments
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 233
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 234
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 235
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 236
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 237
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 238
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 239
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 240
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 241
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 242
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 243
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 244
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 245
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 246
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 247
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 248
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 249
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 250
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 251
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 252
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 253
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 254
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 255
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 256
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 257
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 258
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 259
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 260
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 261
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 262
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 263
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 264
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 265
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 266
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 267
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 268
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 269
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 270
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 271
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 272
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 273
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 274
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 275
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 276
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 277
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 278
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 279
Suggested Citation:"8 Assessing All Children." National Research Council. 2008. Early Childhood Assessment: Why, What, and How. Washington, DC: The National Academies Press. doi: 10.17226/12446.
×
Page 280

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

8 Assessing All Children A ll children deserve to be served equitably by early care and educational services and, if needed, by intervention services. This requires that there be fair and effective tools to assess their learning and development and identify their needs. In this chapter we address the challenges to assessment posed by groups of children who differ from the majority population in various ways. For all of the groups discussed here, assessment has been problematic. This chapter has three major sections. In the first section, we review issues around the assessment of young children who are members of ethnic and racial minority groups in the United States and the research that has been done on them, chiefly on black chil- dren. The next section deals with assessment of young children whose home language is not English, to whom we refer as English language learners. The final section treats the assessment of young children with disabilities. Minority Children Conducting assessments for all children has both benefits and challenges, but when it comes to assessing young children from a cultural, ethnic, or racial minority group, unique concerns apply related to issues of bias. There is a long history of concern 233

234 EARLY CHILDHOOD ASSESSMENT related to the potential for, and continued perpetuation of, unfair discriminatory practices and outcomes for minority children. The topic has struck political, legal, and emotional chords, with many in the minority population holding deep-seated skepticism about the positive benefits of assessing their children (Green, 1980; Reynolds, 1983). Some of the features that distinguish ­minority children in United States include racial/ethnic background, socio- economic status (SES), cultural values, dialect/linguistic differ- ences, historical and current discrimination, current geographic isolation, and other characteristics that marginalize a population to the majority society. In this section we provide a brief overview of the concerns about assessment of young minority children and examine the available empirical evidence on potential bias in assessing young children from birth to age 5. Fairness The primary concerns about the assessment of this popula- tion are fairness and equality across groups. That is, there is concern that assessment tools, by their inherent properties, could contribute to the over- or underidentification of children differ- ently across different minority population groups. Since the first assessment tools were developed, there has been long-standing concern that test scores may not necessarily reflect differences in ability or developmental milestones among children and the populations they represent, but rather demonstrate problems in the construction, design, administration, and interpretation of the assessment tests that lead them to be unfair and untrustworthy (Brown, ­Reynolds, and Whitaker, 1999; Garcia and Pearson, 1994; Gipps, 1999; National Association of Test Directors, 2004; Skiba, Knesting, and Bush, 2002). Most of what is known about potential bias in assessing minority children is based on school-age children and youth. Less is known about children younger than age 5 and assessment score differences between whites and blacks (Brooks- Gunn et al., 2003). Children ages 5-14 are the most extensively examined for cultural bias, mostly in intelligence testing, with most of the empirical focus on ages 7-11 (Valencia and Suzuki, 2001). It is important for us to clarify the many definitions of “unfair”

ASSESSING ALL CHILDREN 235 and “untrustworthy” assessment problems that are typically termed “bias,” because they are often confused by researchers and the public alike (Reynolds, Lowe, and Saenz, 1999). There is bias as in being unfair or as “partiality toward a point of view or preju- dice,” and there is bias defined as a statistical term: “systematic error in measurement of a psychological attribute as a function of membership in one or another cultural or racial subgroup” (Reynolds, Lowe, and Saenz, 1999, p. 550). Many of the definitions of bias as defined by statistical terms are tied to psychometric validity and reliability theory (discussed in Chapter 7); however, they are often confounded with philosophical definitions of bias related to fairness and views of prejudice (Brown, Reynolds, and Whitaker, 1999). Types of Biases Several categories of biases are particularly relevant for minority populations (Reynolds, 1982; Reynolds, Lowe, and Saenz, 1999). Inappropriate Content and Measuring Different Constructs Bias may arise when the content of the test is unfamiliar to or inappropriate for minority children; test content is inappropriate for a population as a result of contextual differences (Neisworth and Bagnato, 2004). The assumption is that since tests are designed for cultural values and practices of middle-class white children, minority children will be at a disadvantage and more likely to perform poorly because of a lack of exposure to, and a mismatch with, content included in the testing situation. A lack of success in an assessment may be due to the fact that the assessment instrument does not reflect the local and cultural experiences of the children taking the test, resulting in flawed examinations and misrepresentation of minority children’s true ability and perfor- mance (Hagie, Gallipo, and Svien, 2003). For example, differences in culture between racial minority and white majority groups in communication patterns, child­ rearing practices, daily activities, identities, frames of reference, histories, and environmental niches may influence child develop-

236 EARLY CHILDHOOD ASSESSMENT ment and how development is assessed (Gallimore, Goldenberg, and Weisner, 1993; Hiner, 1989; Ogbu, 1981, 2004; Slaughter- Defoe, 1995; Weisner, 1984, 1998). Hilliard (1976, 2004) has pro- vided several conceptual arguments about the role of contextual factors that differ among racial/ethnic groups, such as reasoning styles, conceptions of time and space, and dependence on and use of nonverbal communication (Castenell and Castenell, 1988). The dominant, majority group members may stigmatize the food, clothing, music, values, behaviors, and language or dialect of minorities as inferior to theirs or inappropriate, creating a col- lective group of “minorities” as a separate segment of society that is “not like” the majority (Ogbu, 2004). Variations in ecological circumstances suggest that assessments may be culturally loaded because they reflect the (typically white, majority) developers’ experiences, knowledge, values, and conceptualizations of the developmental domains being examined (intelligence, aggressive behavior, etc.). This can lead to a mismatch between the cultural content of the test and the cultural background of the person being assessed, so test items are not accurately reflective of the developmental experiences of the minority population. The idea that all children have been exposed to the same constructs that the assessment tries to measure, regardless of different socialization practices, early literacy experiences, and other influences, is a fallacy (Garcia and Pearson, 1994; Green, 1980; Laing and Kamhi, 2003; Valencia and Suzuki, 2001). So, for example, bias may arise on the Peabody Picture Vocabulary Test-III (PPVT-III) because of a lack of familiarity with pointing at pictures to communicate, unfamiliarity with English vocabu- lary, or a combination of these (Laing and Kamhi, 2003). Not all children are exposed to the unspoken expectations for communi- cation and behavior in school settings, such as the early exposure to oral and written linguistic experiences of the mainstream. As such, children who may have cultures with strong oral tradi- tions for learning (American Indians, Haitian Creoles) may be at risk for biased assessments (Notari-Syverson, Losardo, and Lim, 2003). Evidence has long suggested that children from many minor- ity racial groups do not, as a group, perform as well as children from the majority white group on school achievement and formal,

ASSESSING ALL CHILDREN 237 standardized tests, even controlling for socioeconomic back- ground and proficiency in standard American English (Garcia and Pearson, 1994; Rock and Stenner, 2005). The list of theories related to such disparities is long; however, one reason relevant to this report is that differences in test scores (e.g., between black and white children) may be due to striking disparities in eco- logical conditions and to instruments that are not designed to be sensitive to those cultural variations. Such contextual variations, if not considered in the assessment instrument design, can lead to systematic biases (Brooks-Gunn et al., 2003). Such a bias may actually perpetuate or increase social inequalities because it legiti- mates them by designing a test that has content and measures reflecting the values, culture, and experiences of the majority (Gipps, 1999). Inappropriate Standardization Sample and Methods Hall (1997) argues that Western psychology tends to oper- ate from an ethnocentric perspective that research and theories based on the majority, white, population are applicable to all groups. These paradigms are seen as templates to be used on all groups to derive parallel conclusions. As such, often the stan- dardization samples of tests are primarily drawn from white populations, and often minorities are included in ­ insufficient numbers for them to have a significant impact on item selec- tion or to prevent bias. For example, there is a great deal of concern about accurate identification of language disorders among black children using standardized, norm-referenced instruments, because many literacy tests are developed based on mainstream American English and do not recognize dialect differences. The tests have been normed on children from white, middle-class backgrounds (Fagundes et al., 1998; Qi et al., 2003; W ­ ashington and Craig, 1992). Often validity and sampling tests do not include representative samples of nonmainstream E ­ nglish speakers, so the statistical ability to find items that are biased is limited (Green, 1980; Seymour et al., 2003). It may be that the large proportion of minority children who score poorly on some standardized language assessment tools may have to do more with the fact the tests have been normed

238 EARLY CHILDHOOD ASSESSMENT on children from primarily white, middle-class language back- grounds than with true differences in children’s language abili- ties (Qi et al., 2003). Minority groups may be underrepresented in standardization samples relative to their proportions in the overall population, or their absolute number may be too small to prevent bias. Standardized tests based on white middle- class normative data have inevitable bias against children from m ­ inority and lower SES groups, providing information on their status in comparison to mainstream children. They do not take into account cultural differences in values, beliefs, attitudes, and cultural influences on assessment content; contextual influences of measuring behavior; or alternative pathways in development (Notari-Syverson et al., 2003, p. 40). In addition, the fact that a minority group is included in a n ­ ormative sample does not mean the assessment tool is ­unbiased and appropriate to use with that group (Stockman, 2000). It is a common misconception that, because a test is “normed,” it is u ­ nbiased toward minorities. The norming process, by its nature, leans toward the mainstream culture (Garcia and Pearson, 1994). When test companies draw strict probability samples of the nation, very small numbers of particular minorities are likely to be included, increasing the likelihood that minority group samples will be unrepresentative. Even if a test is criterion-referenced instead of norm-referenced, the performance standards (cutoff scores) by which the children’s performance is evaluated are likely to be based on professional judgments about what typical (that is, mainstream) children know and can do at a particular developmental level (Garcia and Pearson, 1994). Inappropriate Testing Situation and Examiner Bias Rarely examined is the assessor’s influence on child assess- ments and whether assessor familiarity or unfamiliarity exerts a bias against different population groups. For example, situational factors may systematically enhance or depress the performance of certain groups differently, such as familiarity with the testing situation, the speed of the test, question-answer communication style, assessor personal characteristics, and the like (Green, 1980, p. 244). Assessor and language bias is present particularly if the

ASSESSING ALL CHILDREN 239 assessor speaks only standard English, which may be unfamiliar, intimidating, or confusing to minority children (Graziano, Varca, and Levy, 1982; Sharma, 1986; Skiba, Knesting, and Bush, 2002). For example, a meta-analysis by Fuchs and Fuchs (1986) of 22 empirical ­studies on assessor effects on intelligence tests for chil- dren ages 4-16 suggested that children scored higher when tested by familiar ­assessors. SES was a vital variable: children from low SES backgrounds performed much better with a familiar assessor, whereas high SES children performed similarly across assessor conditions (Fuchs and Fuchs, 1986). Some researchers have suggested that assessment format and test-taking style can be threatening to some minority popula- tions by its unusual or foreign format and procedure, leading to direction bias (directions for the test misinterpreted by the child) ­ (Castenell and Castenell, 1988; Fagundes et al., 1998). These characteristics may not be equally present in all test-taking populations. Also, the test-taking style dictated by standardized procedures may influence the performance of children from diverse cultural backgrounds, such that their performance may not represent their true ability because they lack familiarity with the test-taking situation (Qi et al., 2003). Inequitable Social Consequences Use of assessments that are not free from bias may result in minority groups being over- or underrepresented in services or educational tracks. Most often the conversation is focused on inappropriate overrepresentation in services (e.g., special edu- cation) or on minorities being relegated to inferior programs or services because of test performance (Hilliard, 1991). Historically, test scores have been used to keep black and Hispanic children in segregated schools (Chachkin, 1989). More recently, excessive reliance on test scores for placement purposes has sent dispro- portionate numbers of minority children into special education programs and low tracks in middle and high school (Chachkin, 1989; Garcia et al., 1989; Rebell, 1989), cited in Garcia and Pearson (1994). Also, the opposite is possible: some children (e.g., Asians) may be overrepresented in advanced programs and high tracks. As Gopaul-McNicol and Armour-Thomas (2002) write: “The chal-

240 EARLY CHILDHOOD ASSESSMENT lenge for equity in assessment is to ensure that the judgments made about behavior of individuals and groups are accurate and that the decisions made do not intentionally or unintentionally favor some cultural group over another” (p. 10). Differential Predictive Validity To ensure the absence of bias requires that errors in predic- tion are independent of group membership, and that tests predict important outcomes or future behaviors for minority children. Claims have been made that tests do not accurately predict rele- vant criteria for minorities and that the criteria against which tests are typically correlated, being from the majority culture, are them- selves biased against minority group members (Brown, Reynolds, and Whitaker, 1999; Reynolds, Lowe, and Saenz, 1999). The psy- chometric methods described in Chapter 7 are among those that may be used to detect such bias in existing instruments and to avoid them when developing and norming new instruments. Empirical Evidence About Potential Bias In 1983 Reynolds laid out the types of assessment test bias that may occur with minority populations and the need for empirical testing of assessment instruments. Twenty-five years later, this call for empirical research about bias has largely gone unanswered. Empirical evidence does not provide a consistent answer about the potential bias of assessments of minority populations. In addition, most of the work examining test bias has been focused on school-age and adult populations (e.g., intelligence testing, entrance exams, employment tests; Reynolds, 1983). As Reynolds quipped (1983, p. 257), “For only in God may we trust; all others must have data.” What empirical evidence is available about the potential bias of assessments for minority children from birth to age 5? The quick answer: very little. A Search for Evidence Despite a wealth of conceptual and theoretical arguments and the need to be cautious using assessments with minority popula-

ASSESSING ALL CHILDREN 241 tions (e.g., Hilliard, 1979, 1994, 2004), the availability of published empirical evidence testing potential bias for minority populations, particularly in assessment tools used for children between birth and age 5, is sparse. In our search, we developed a list of com- monly used early childhood measures from several comprehen- sive sources (Child Trends, 2004; National Child Care Information Center, 2005). We used the EBSCO search engine (also called Academic Search Premier) to find empirical studies that exam- ined bias and fairness assessment for minority children. Search results were filtered on the basis of four criteria: (1) an empirical design, (2) examination of an individually administered assess- ment tool, (3) testing of minority participants, and (4) a focus on children from birth to age 5. Only studies published in refereed scholarly journals were examined. All studies were assessed by reading the title and abstracts. If the abstract didn’t provide enough information to judge the article’s match to the established criteria, the full article was reviewed. Table 8-1 lists the number of empirical articles found on test bias with minority populations by core developmental domains. A total of 64 assessment tools were searched across a number of developmental domains for empirical evidence about potential bias or fairness of the tool with English-speaking, minority populations. In all, 30 empirical articles were found that meet the committee’s criteria. In addition to searching for empirical evidence, the committee reviewed several test manuals of child assessment tools, looking at the empirical approaches test developers reported to consider the potential for bias for different ethnic and minority popula- tions. Some findings: (1) There was little reported evidence that the performance of minority children was examined separately from the larger standardization group. (2) Sometimes detailed data from the normative sample of the current assessment tool version are not available. (3) Standardization samples of minority children are small. (4) Race and class may be confounded in the normative sample. Methodological Issues In our review of the 30 empirical studies, several key meth- odological issues emerged that may contribute to why there is no

242 EARLY CHILDHOOD ASSESSMENT TABLE 8-1  Peer-Reviewed Articles Found on Test Bias with Minority Populations Across Major Developmental Domains Number of Number of Assessment Bias Testing Assessment Tools with Developmental Tools Articles Articles Meeting Committee Domain Searched Found Criteria Cognitive 11 16 • Kaufman Assessment Battery for Children (K-ABC) (n = 5) • Peabody Individual Achievement Test- Revised (PIAT-R) (n = 2) • Stanford-Binet Intelligence Scales, Fourth ed. (SB-IV) (n = 3) • Wechsler Preschool and Primary Scale of Intelligence, Third ed. (WPPSI-III) (n = 3) • Woodcock-Johnson III (WJ-III) (n = 3) Language 15 9 • Expressive Vocabulary Test (n = 3) • Peabody Picture Vocabulary Test III (n = 5) • Preschool language scale (n = 1) Socioemotional 21 5 • Behavioral Assessment System for Children (n = 1) • Bayley Scales of Infant Development (n = 1) • Child behavior checklist 1½-5 (n = 1) • Attachment Q-set (n = 1) • Peen Interactive Peer Play Scale (n = 1) Approaches to 4 0 0 learning

ASSESSING ALL CHILDREN 243 unified conclusion about the role of bias in assessment tests for children.   1. The lack of agreement on the definition of bias. Often it is not clearly specified what type of bias and validity is being tested for, and, if it is, only one type of bias may be addressed. Most of the attention is focused on construct validity and testing for biases related to inappropriate content, followed by biases related to an improper nor- mative sample. Cultural groups may have conceptions or meanings of constructs that are not aligned with what is represented in the assessment (Gopaul-McNicol and Armour-Thomas, 2002). Or there is no commonly agreed- on use of the term “bias” from a multicultural testing per- spective or agreement on how to measure it (Stockman, 2000, p. 351). Psychometric tests alone cannot address all potential issues of construct threats—problems about the validity of the constructs themselves, not just whether they are being assessed equivalently. These include con- textual nonequivalence, conceptual nonequivalence, and linguistic nonequivalence.   2. A related issue is mono-operation of bias and measures of bias. That is, many studies use only a single variable or a single technique to examine bias effects (Cook and Campbell, 1979).   3. Methods used to empirically test for bias vary widely, from simple comparisons of means and standard devia- tions with the normative sample, partial correlation between subgroups and item scores to conduct t-tests, to multiple regression and methodological approaches con- trolling for potential confounding variables. Depending on what type of bias is being examined, the simple pres- ence or absence of differences in mean scores between two different minority groups does not directly say anything about the fairness of the test (Qi et al., 2003; Reynolds, Lowe, and Saenz, 1999).   4. Lack of consistent use of psychometric research and theory in testing for bias. Empirical evidence for potential bias with minority groups may be a result of the type of ­psychometric

244 EARLY CHILDHOOD ASSESSMENT property studied and type of ­statistical method employed (Valencia and Suzuki, 2001). For example, the significant difference in performance between minority samples and the normative sample of an assessment test prompts one to consider whether this is evidence for test bias (Qi et al., 2003). There is no agreement about which psychometric procedures that deal with or test for bias are most effective (Crocker and Algina, 1986, cited in Fagundes et al., 1998). In item analysis, a normal distribution alone does not indicate whether items differed in difficulty in a sequential manner equally for minority and nonminority populations (Qi et al., 2003). For example, if items are placed in order of increasing difficulty based on a white-normed population, it is possible that this sequence is not appropriate for black children (Qi et al., 2003).   5. Examining or testing for content validity or bias tends to focus on individual item bias. Subjective techniques to overcome such bias usually involve panels of experts from diverse backgrounds who say the question is “valid” and statistical techniques that are based on item test d ­ ifferences—and these experts often disagree.   6. Small sample sizes, limited representation of minority groups, and monolithic conceptua­lization of minority groups. For example, there is often an assumption that all American Indians, Asian Americans, or African ­Americans represent a similar culture and language (Helms, 1992). Most studies examine only black-white differences. Most existing studies are based on small samples and provide limited power to examine the relationship between vari- ous environmental factors and the reliability or validity of test outcomes.   7. The empirical evidence available about bias for minority populations is almost entirely based on African American and Mexican American children (Madhere, 1998; Valencia and Suzuki, 2001). Given the growing presence of other minority groups, particularly Hispanic and Asian groups, the lack of attention to these groups in bias testing is problematic, and combining various ethnic groups into a single rubric is a serious flaw in the empirical testing of

ASSESSING ALL CHILDREN 245 assessment validity and potential bias (Cho, Hudley, and Back, 2002).   8. Few studies examine potential bias with proper control for potential confounding variables. The most obvious omissions are the age and gender of the child. Few ­studies report gender or consider gender differences in testing for cultural bias. Many fail to report or control for socio­ economic status as well.   9. Most of the research on test bias, particularly cultural bias with minority populations, was conducted in the late 1970s and 1980s, with very few studies in the 1990s or later. Also, the subjects were mostly older children. For example, Valencia and Suzuki’s (2001) review found that 92 percent of empirical, peer-reviewed articles on cultural bias in intelligence tests for children of preschool age or older were conducted in the 1970s and 1980s. 10. Limitation of the type of assessment instruments exam- ined for bias. What is known about cultural bias in assess- ment instruments is confined mostly to intelligence and cognitive tests, mostly the WISC, WISC-R, and K-ABC. The WISC and WISC-R have now been replaced by the WISC-III, yet this new version has not been examined, so most of what is known about cultural bias in intelligence tests is thus based on two obsolete instruments (Valencia and Suzuki, 2001). Tests that measure other aspects of child development have not received much attention, yet they are also likely to be culturally influenced, as intel- lectual and cognitive tests are. An example is culturally defining and measuring dimensions of socioemotional development. Such dimensions as creativity, attention, approaches to learning, and aggression may well be con- textually, ecologically, and culturally dependent. 11. Little empirical work has been done on the effects of the assessor, the rater, or the testing situation. The questions of whether some children systematically perform worse under testing situations, and whether assessor effects operate by increasing the distress or anxiety associated with a testing situation, merit further research attention (Brooks-Gunn et al., 2003). Few empirical tests have exam-

246 EARLY CHILDHOOD ASSESSMENT ined variations across subjects relative to the race of the assessor or interactions between the race of the assessor and the race of the child (Sharma, 1986). The lack of current available empirical evidence exploring test bias in early childhood assessment suggests that the subject has become peripheral among both policy makers and ­ researchers. But, as was stated so clearly at a National Association of Test Directors Symposium in 2003, “those of us who work in testing should not be lulled into a false sense of calm. The issues raised in the earlier go-around have not been fully addressed” (National Association of Test Directors, 2004, p. 7). The issues raised in the policy arena about the fairness of testing, particularly for young children, have not been informed by sufficient systematic information. English Language LearnerS The increasing demand for evaluation, assessment, and accountability in early education comes at a time when the fast- ing growing population of children in the country consists of those whose home language is not English. This presents sev- eral challenges to school systems and practitioners who may be u ­ nfamiliar with important concepts, such as second language acquisition, acculturation, and the role of socioeconomic status as they relate to the development, administration, and interpretation of assessments. Because assessment is key to effective curricular and instruc- tional strategies that promote children’s learning, young English language learners (ELL) have the right to be assessed. Through individual assessments, teachers can personalize instruction, make adjustments to classroom activities, assign children to appropriate program placements, and have more informed com- munication with parents. System administrators need to know how young English language learners are performing in order to make proper adjustments and policy changes. However, there is This section is informed by a paper prepared for the committee by Espinosa (2007).

ASSESSING ALL CHILDREN 247 a lack of adequate instruments to use with them, especially con- sidering the hundreds of languages spoken in the United States. Some tests exist in Spanish, but most lack the technical qualities of a high-quality assessment tool. In addition, there is a shortage of bilingual professionals with the skills necessary to evaluate these children, and a shortage as well of conceptual and empiri- cal work systematically linking context with child learning. In this section we discuss these challenges, review important prin- ciples associated with high-quality assessments of young English language learners, and discuss further needs in the field so that research and practice work together to see that such principles are implemented. Several terms are used in the literature to describe children from diverse language backgrounds in the United States. A gen- eral term describing children whose native language is other than English, the mainstream societal language in the United States, is “language minority.” This term is applied to nonnative English speakers regardless of their current level of English proficiency. Other common terms are “English language learner” and “limited English proficient.” These two terms are used interchangeably to refer to children whose native language is other than English and whose English proficiency is not yet developed to a point at which they can profit fully from English instruction or communication. In this report, the term “English language learner” is used, rather than “limited English proficient,” as a way of emphasizing chil- dren’s learning and progress rather than their limitations. Given the charge of the committee, the focus is particularly on children from birth to age 8—young English language learners. Young English Language Learners: Who Are They? Young English language learners have been the fastest grow- ing child population in the country over the past few decades, due primarily to increased rates in both legal and illegal immigration. Currently, one in five children ages 5-17 in the United States has a foreign-born parent (Capps et al., 2005), and many, though not all, of these children learn English as a second language. Whereas the overall child population speaking a non-English native language in the United States rose from 6 percent in 1979 to 14 percent in

248 EARLY CHILDHOOD ASSESSMENT 1999 (National Clearinghouse for English Language Acquisition, 2006) and the number of language-minority children in K-12 schools has been recently estimated to be over 14 million (August and Shanahan, 2006), the representation of English language learners in U.S. schools has its highest concentration in early edu- cation. This is because most ELL children attending U.S. public schools since entry develop oral and academic English proficiency by grade 3. The ELL share of children from prekindergarten to grade 5, for example, rose from 4.7 to 7.4 percent from 1980 to 2000, while the ELL share of children in grades 6 to 12 rose from 3.1 to 5.5 percent over this same time span (Capps et al., 2005). Assessing the development of young English language learners demands an understanding of who these children are in terms of their linguistic and cognitive development, as well as the social and cultural contexts in which they are raised. The key distinguishing feature of these children is their non-English language background. In addition to linguistic background, other important attributes include their ethnic, immigrant, and socio- economic histories (Abedi et al., 2000; Capps et al., 2005; Figueroa and Hernandez, 2000; Hernandez, 2006). Although diverse in their origins, ELL children, on average, are more likely than their native English-speaking peers to have an immigrant parent, to live in low-income families, and to be raised in cultural contexts that do not reflect mainstream norms in the United States (Capps et al., 2005; Hernandez, 2006). Decades of research support the notion that children can com- petently acquire two or more languages (García, 2005). Currently, among the available theoretical approaches, transfer theory best explains the language development of young children managing two or more languages (Genesee et al., 2006), asserting that certain linguistic skills from the native language transfer to the second. In like manner, errors or interference in second language production occur when grammatical differences between the two languages are present. In the process of cross-linguistic transfer, it is normal for children to mix (or code-switch) between languages. Mix- ing vocabulary, syntax, phonology, morphology, and pragmatic rules serves as a way for young bilingual children to enhance meaning. Because language use is context-driven, the bilingual child’s choice of language depends on characteristics of and the

ASSESSING ALL CHILDREN 249 particular relationship with the addressee as well as the child’s own attitudinal features. Young English language learners represent diverse ethnic backgrounds. According to the U.S. Department of Education (2008), in recent years approximately four in five English lan- guage learners were from Spanish-speaking homes, followed by Vietnamese (2 percent), Chinese languages (2 percent), Hmong (1.6 percent), Korean (1 percent), and many more native and for- eign languages. While a majority of Hispanic English language learners are of Mexican origin (approximately 7 in 10), substantial proportions have origins in Puerto Rico, Central America, South America, Cuba, and the Dominican Republic (Hernandez, 2006). Within and among these groups, ELL children represent diverse social and cultural customs and histories, which are essential to consider thoroughly when assessing their linguistic, cognitive, social, and emotional development in home and school contexts. Finally, it is important to consider the socioeconomic status of English language learners, including family income as well as the amount of educational capital (i.e., parental education) in the home. In 2000, 68 percent of English language learners in prekindergarten to grade 5 lived in low-income families (defined as family income below 185 percent of the federal poverty level), compared with 36 percent of English-proficient children in the same grades (Capps et al., 2005). Moreover, nearly half of ELL children in elementary school had parents with less than a high school education in 2000, compared with 9 percent of parents of English-proficient children. A quarter of ELL elementary schoolchildren had parents with less than a ninth grade education, compared with 2 percent of parents of English-proficient children (Capps et al., 2005). Parent education levels are important indices, as they influence language and educa- tional practices in the home and therefore the development of skills valued in U.S. schools. Assessment Issues Young English language learners have the right to benefit from the potential advantages of assessment. The current empiri- cal knowledge base and the legal and ethical standards are lim- ited yet sufficient to improve ways in which they are assessed.

250 EARLY CHILDHOOD ASSESSMENT Improvements will require commitments from policy makers and practitioners to implement appropriate assessment tools and pro- cedures, to link assessment results to improved practices, and to use trained staff capable of carrying out these tasks. ­Researchers and scholars can facilitate the improvement of assessment prac- tices by continuing to evaluate implementation strategies in schools and by developing systematic assessments of contextual factors relevant to linguistic and cognitive development. Assess- ments of contextual processes will be necessary for current assess- ment strategies, which largely focus on the individual, to improve classroom instruction, curricular content, and therefore children’s learning (Rueda, 2007; Rueda and Yaden, 2006). Legal and Ethical Precedents The impetus for appropriate and responsive assessment practices of young English language learners comes from a number of legal requirements and ethical guidelines, which have developed over time. Case law, public law, and ethical codes from professional organizations support the use of sound assess- ment tools, practices, and test interpretations. A widely cited set of testing standards is Standards for Educational and ­Psychological Testing (American Educational Research Association, American Psychological Association, and National Council on Measure- ment in Education, 1999). This volume offers a number of ethi- cal standards for assessing the psychological and educational development of children in schools, including guidelines on test development and application. It includes a chapter on testing children from diverse linguistic backgrounds, which discusses the irrelevance of many psychoeducational tests developed for and normed with monolingual, English-speaking children. Cau- tion is given to parties involved in translating such tests without evaluating construct and content validity and developing norms with new and relevant samples. It also discusses accommodation recommendations, linguistic and cultural factors important in testing, and important attributes of the tester. Similar, though less detailed provisions are found in the Professional Conduct Manual of the National Association of School Psychologists (2000). It has been argued that Standards for Educational and Psychologi-

ASSESSING ALL CHILDREN 251 cal Testing (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999) has outpaced present policy, practice, and test development (Figueroa and Hernandez, 2000). However, the Indi- viduals with Disabilities Education Act (IDEA) of 2004 has specific requirements related to the assessment of English language learn- ers. It requires, for example, the involvement of parents or guard- ians in the assessment process as well as a consideration of the child’s native language in assessment. Unlike ethical guidelines, which often represent professional aspirations and are not neces- sarily enforceable, public law requires compliance. The Office of Civil Rights (OCR) is given the charge to evaluate compliance to federal law and, when necessary, audit public programs engaged in assessment practices and interpretations of English language learners and other minority children. Assessment Practice: Use and Misuse In addition to the concerns surrounding the assessment of all young children, there are central issues inherent in the assess- ment of young children from non-English language backgrounds. Implementation research suggests that assessment practices with young English language learners continue to lag behind estab- lished legal requirements and ethical standards set forth by pro- fessional associations (American Educational Research Associa- tion, American Psychological Association, and National Council on Measurement in Education, 1999). In part, this is because of a lack of available instruments normed on representative samples of English language learners, inadequate professional develop- ment and training, and insufficient research to inform best prac- tice. Such is the case for the assessment of language, cognitive skills, academic achievement, among other areas. Each of these areas is visited briefly. Assessment Instruments Language is the key distinguishing feature of English language learners. Assessments of language in early childhood and elemen- tary school settings are used to identify and place children into

252 EARLY CHILDHOOD ASSESSMENT programs (including special education), to determine oral English proficiency, to determine first- and second-language vocabulary skills, and to predict literacy performance (Garcia, McKoon, and August, 2006). The Language Assessment Scales (LAS; De Avila and Duncan, 1990; Duncan and De Avila, 1988) and Pre-Language Assessment Scales (Pre-LAS; Duncan and De Avila, 1998) are cur- rently among the most commonly used instruments to measure oral language proficiency. These scales, however, have not been found to predict academic language proficiency in English on their own (Garcia, McKoon, and August, 2006). Available research findings indicate that native language academic and reading per- formance, combined with oral English proficiency, and teachers’ judgments, are better predictors of academic English proficiency (Gutiérrez-Clellen, 1999; Gutiérrez-Clellen and Kreiter, 2003). Issues of test bias are important to consider when assessing the vocabulary development of young English language learners. The Peabody Picture Vocabulary Test (PPVT; Dunn and Dunn, 2007) and the Test de Vocabulario en Imágenes Peabody (TVIP; Dunn et al., 1986) have been reported to be the most commonly used vocabulary tests in English and Spanish (Garcia, McKoon, and August, 2006). The Woodcock-Johnson (Woodcock, McGrew, and Mather, 2001) and the Preschool Language Scale, Fourth ed. (PLS 4) (Zimmerman, Steiner, and Pond, 2002) are also used. These tests are structured so that increasingly difficult and less frequent words are used to test the child’s vocabulary awareness, relative to the normative sample. However, unlike the English version, the TVIP was not developed using word frequency measures from Spanish, but was simply translated from English. This creates problems when interpreting the Spanish scores, even when the English scores are useful to compare the oral skills of English language learners with those of their English-speaking, monolingual peers. Many of the other native language assessments used with young English language learners focus on receptive vocabulary or provide a limited view of their development. Because of the lim- ited availability of instruments to test the native language devel- opment of young English language learners, including receptive and expressive skills, school personnel are often forced to rely on informal assessments by teachers, aides, or other informants.

ASSESSING ALL CHILDREN 253 This can undermine efforts to build a suitable curriculum and recognize a child’s linguistic strengths and weaknesses. Further research is needed to develop psychometrically sound native language assessments for English language learners. This will require the expertise of several disciplines, including linguistics, cognitive psychology, education, and psychometrics. Cognitive (or intellectual) assessments are also very common in early childhood education settings. Because of the inherent problems in assessing the cognitive skills of English language learners with language-loaded tests like the Wechsler Intelligence Scales for Children, Fourth ed. (WISC-IV; Wechsler, 2003), two options for intellectual assessment have been made available in recent years. One is the emergence of “nonverbal” intelligence tests. The Universal Nonverbal Intelligence Test (UNIT; Bracken and McCallum, 1998), a commonly used nonverbal cognitive test, has received positive reviews (Borghese and Gronau, 2005; Fives and Flanagan, 2002), although it is designed for children from age 5 to about age 18, not for preschoolers. The standardization of the UNIT was conducted with 2,100 children from diverse backgrounds, and the test manual provides normative scores of several subpopulations. The main complaint about the UNIT is that it is difficult to use and requires a great deal of training and practice to administer. The second option to traditional cognitive measures is intel- ligence tests developed for specific ELL populations. To date, these tests are available only for Spanish-speaking children, and most are for school-age children. One is the Spanish version of the WISC-IV (Wechsler, 2004). This test was calibrated to the WISC-IV English with a U.S. sample drawn from several areas of ­origin—Mexico, Cuba, the Dominican Republic, Puerto Rico, Central America, and South America. Some test items were modi- fied to minimize cultural bias across groups. The test is given in Spanish, and children earn credit for answers in either Spanish or English. It is designed for children from ages 6 to 16. Spanish ver- sions of the Woodcock-Johnson-Revised Tests of Cognitive Ability (WJ-R COG, 3; Woodcock and Johnson, 1989), which can be used with children as young as age 2 years, are also available. Further empirical research evaluating the reliability and validity of these instruments is needed.

254 EARLY CHILDHOOD ASSESSMENT The instruments and practices used to assess achievement often depend on the purpose of assessment. Assessments for accountability purposes tend to rely on criterion-referenced tests developed by state departments of education (Abedi, Hofstetter, and Lord, 2004; Abedi et al., 2000; National Research Council, 2000). Debates have continued over the past decades regarding the inclusion of English language learners in large-scale child assess- ment programs. Due to antidiscrimination laws, court cases, and standards-based legislation, there has been a push to include all children in state assessments, including young English language learners. This has led to the use of accommodations—changes in the test process, in the test itself, or in the test response format—to more accurately portray the performance of English language learners and not discriminate against language background (Abedi, Hofstetter, and Lord, 2004). Currently, however, decisions about which accommodations to use, for whom, and under what conditions are based on little empirical evidence. Assessments of academic achievement are also used to improve children’s learning and identification for special ser- vices. For children in early education, these tend to assess early literacy (e.g., sound and letter recognition, sight words) and numeracy (e.g., numbers, shapes, relative size, ordinality) skills. A large variety of tools and practices is used for these purposes, which can be categorized by two general types of performance assessment. First, commercial (mostly norm-referenced) tests are used. Some of the same concerns with regard to normative cognitive assessment are relevant to normative academic assess- ment. That is, many of the tests have been developed essentially as back-­translations or adaptations of existing English language measures, without evaluating their construct and content validity. Moreover, the normative samples often do not reflect the ethnic, socio­economic, or linguistic backgrounds of ELL children. Even when these obstacles are overcome and when ­bilingual achievement tests have been produced with representative sam- ples, the argument is made that the content of standardized tests does not necessarily predict success in the curriculum. The base case for this argument is that test content often does not reflect classroom content, and that academic outcomes do not inform

ASSESSING ALL CHILDREN 255 instructional or curricular interventions per se. For these reasons, a second option for the achievement assessment to improve children’s learning and to determine identification for special ser- vices, known as curriculum-based measurement, has accumulated evidence and attention over the past few decades (Fuchs, 2004; Rhodes, Ochoa, and Ortiz, 2005). Conceptualized initially as an approach to child progress monitoring (Deno, 1985), curriculum- based measurement tasks are used to assess child performance in the curriculum on a weekly basis. Results are used simultaneously to monitor child progress and to inform instructional or curricular interventions. The slope of scores over time is used to monitor progress and the rate of growth toward a determined goal or standard. The IDEA of 2004 allows ­ curriculum-based measure- ment approaches to replace traditional testing approaches (i.e., normative testing) of academic achievement to determine special education eligibility for learning disabilities, something the IDEA of 1997 did not allow. Other areas of child development important to and assessed in early educational settings include socioemotional (or behavioral), motor, and adaptive (or daily living) skills, as well as hearing, vision, and health factors. As mentioned previously, these devel- opmental areas are of interest in early education and pre-K–12 schooling insofar as they impact children’s learning and educa- tional well-being. Some issues have been raised in the research lit- erature regarding assessment instruments and practices used with culturally and linguistically diverse children in these areas as well (Carter, Briggs-Gowan, and Ornstein Davis, 2004; Figueroa and Hernandez, 2000). For example, when ­Spanish translations of the Behavior Assessment System for Children, Second ed. (BASC-2; Reynolds and Kamphaus, 2003), a set of rating scales measuring the socioemotional development of children, were produced, the test was not standardized with Spanish-speaking populations. Moreover, the construct and content validity of this tool and those like it need to be evaluated in light of cultural differences regard- ing definitions of behavior appropriateness and abnormality. Optimally, these assessment instruments would be developed in a culturally and linguistically responsive manner, specific to each of the different groups.

256 EARLY CHILDHOOD ASSESSMENT Professional Development and Training A number of problems arise when school personnel are engaged in the assessment of young English language learners without the necessary competence, tools, and therefore practices. The literature on disproportional representation of language- minority children in special education programs, for example, has pointed to culturally and linguistically unresponsive referral, assessment, and eligibility determination practices in schools as causes of disproportionality (Coutinho and Oswald, 2000; Rhodes et al., 2005). Moreover, although the research and legal and ethical declarations mandate responsive practice, several studies have documented referral, assessment, and interpretation practices that are below standard. These studies have highlighted lan- guage barriers and the low expectations of teachers ­(McCardle, Mele-­McCarthy, and Leos, 2005), questionable intellectual assess- ment practices (Bainter and Tollefson, 2003), questionable lan- guage assessment practices (Ochoa, Galarza, and Amado, 1996; Y ­ zquierdo, Blalock, and Torres-Velasquez, 2004), invalid or i ­ rrelevant interpretations (Harry and Klingler, 2006), and inappro- priate translation and interpretation practices (National Research Council, 2000; Ochoa et al., 1996; Paredes Scribner, 2002; Santos et al., 2001). This has several implications for ongoing implementation research in the area of professional development and training for assessing young English language learners. This research will need to focus on strategies to improve staff competencies neces- sary to work as a part of a professional team, to work with inter- preters, and to choose and administer appropriate assessment batteries. Moreover, implementation research should highlight strategies to train practitioners to develop their competence in second language acquisition, acculturation, and the evaluation of educational interventions. Practice and Research There is a gap between current assessment practice of young English language learners and what the research and the legal and ethical standards suggest is best practice. It is therefore impor-

ASSESSING ALL CHILDREN 257 tant that research and practice continue an ongoing dialogue to improve this scenario. Support and necessary funding should be provided by policy makers, institutions of higher education, and other research programs to pursue this course. Researchers can engage assessment scholarship to this end in three ways. First, the field needs more tests developed and normed espe- cially for young English language learners. This will require a bottom-up approach, meaning that assessment tools, procedures, and factor analytic structures are aligned with the cultural and linguistic characteristics of ELL children, as opposed to top-down approaches in which, for example, test items are simply translated from their original language to the native languages of young English language learners. Norm-based tests should also take into account important characteristics of the children, including their linguistic, ethnic, and socioeconomic histories. Second, it is time for conceptual and empirical work on child assessment to move beyond the individual level. Most of the discussion in this section reflects the extant literature, which has focused heavily on the assessment of processes and outcomes for the individual—assessing language, cognitive development, aca- demic learning, and so forth. With this knowledge base, teachers and schools are expected to adjust aspects of the environment to improve learning. It has become clear that processes outside the individual—including in the classroom (e.g., teacher-child inter- actions, peer-to-peer interactions), the home (e.g., frequency of words spoken, number of books), and the school (e.g., language instruction policies)—affect learning. The field lacks conceptual frameworks and the measures necessary to move this research forward to systematically improve children’s learning. Preliminary research on the role of context in learning sug- gests that variations in environmental factors can increase children’s engagement and participation (Christenson, 2004; Goldenberg, Rueda, and August, 2006), which in turn can lead to increased learning—and that the influence of contextual contin- gencies on learning outcomes is mediated by children’s motiva- tion to learn (Rueda, 2007; Rueda and Yaden, 2006; Rueda et al., 2001). Conceptual frameworks should account for the multilevel nature of contexts, including the nesting of individuals within classrooms and families, classrooms within schools, and schools

258 EARLY CHILDHOOD ASSESSMENT within school districts, communities, and institutions. Moreover, the role of culture and the feasibility of cultural congruence across both in-school and out-of-school contexts will be important to this work. Meaningful empirical work in this area will require the convergence of research methods (e.g., multilevel statistics and the mixing of qualitative approaches with quasi-experimental designs) and social science disciplines (e.g., cognitive psychology, educational anthropology, sociology of education). Finally, more research documenting the current scenario of the assessment of young English language learners across the country is needed. As the population of these young children continues to grow and to disperse to states with historically low represen- tations of ELL children, more work will be needed to evaluate assessment practices in their localities. Both survey research and observational approaches will be needed in this documentation. This work will aid the development of strategies to train profes- sionals with the skills necessary to serve young ELL children. Principles of Assessment Given the large and increasing size of the young ELL popula- tion in the United States, the current focus on testing and account- ability, and the documented deficits in current assessment practices, improvements are critical. Improvements are necessary at all phases of the assessment process, including preassessment and assessment planning, conducting the assessment, analyzing and interpreting the results, reporting the results (in written and oral formats), and determining eligibility and monitoring (implementation issues are discussed in Chapter 9). Researchers and organizational bodies have offered prin- ciples for practitioners engaged in the assessment of young E ­ nglish language learners. Among the most comprehensive is a list from the National Association for the Education of Young Children (2005). In a supplement to their 2003 position statement on early childhood curriculum, assessment and program evalu- ation, the NAEYC presents seven detailed recommendations “to increase the probability that all young English language learn- ers will have the benefit of appropriate, effective assessment of their learning and development” (p. 1). The last of these recom-

ASSESSING ALL CHILDREN 259 mendations concerns further needs (i.e., research and practice) in the field, the subject of the following section. Because these recommendations—presented here as principles—were a collab- orative effort of a committee comprised of over a dozen research- ers in the field, they are quite representative of recommendations found in the literature. First, screening and assessment instruments and procedures should be used for appropriate purposes. Screening tools should result in needed supports and services and, if necessary, further assessment. Assessments should be used fundamentally to sup- port learning, including language and academic learning. For evaluation and accountability purposes, young English language learners should be included in assessments and provided with appropriate tests and accommodations. Second, screenings and assessments should be linguistically and culturally appropriate. This means that assessment tools and procedures should be aligned with the cultural and linguistic characteristics of the child. When tests are translated from their original language to the native language of the ELL child, they should be culturally and linguistically validated to verify the rel- evance of the content (i.e., content validity) and the construct pur- ported to be measured (i.e., construct validity). Moreover, in the case of norm-based tests, the characteristics of children included in the normative sample should reflect the linguistic, ethnic, and socioeconomic characteristics of the child. Third, the primary purpose of assessment should be to improve instruction. The assessment of child outcomes using appropriate tools and procedures should be linked closely to classroom pro- cesses. This means relying on multiple methods and measures, evaluating outcomes over time, and using collaborative assessment teams, including the teacher, who is a critical agent for improved learning and development. Assessment that systematically informs improved curriculum and instruction is the most useful. Fourth, caution ought to be used when developing and inter- preting standardized formal assessments. As discussed, standard- ized assessments are used for at least three purposes: to identify disabilities and determine program eligibility, to monitor and improve learning, and to further accountability. It is important that young English language learners are included in large-scale

260 EARLY CHILDHOOD ASSESSMENT assessments and that these instruments continue to be used to improve educational practices and placements. However, those administering and interpreting these tests ought to use caution. Test development issues—including equivalence, translation, and norming—must be scrutinized, and evidence-based accommoda- tions should be provided during ­accountability assessments. Fifth, those administering assessments should have cultural and linguistic competence. This may be the most challenging of the principles. Professional development and training of teachers, school psychologists, speech pathologists, and school admin- istrators constitute a long-term goal that will demand ongoing funding and implementation research. Those assessing young English language learners should be bicultural, bilingual, and knowledgeable about second language acquisition. In many cases, consultants and interpreters are used when the supply of school personnel possessing these qualifications is limited. Implementa- tion research is needed to understand best practices in working with consultants and interpreters through the pre-assessment and assessment planning, conducting the assessment, analyzing and interpreting the results, reporting the results (in written and oral formats), and determining eligibility and monitoring. Finally, families should play critical roles in the assessment process. Under federal law, parents have the right to be included in the decision-making process regarding the educational place- ment of their child. Moreover, the educational benefit of the assessment process for a given child is optimal when parents’ wishes are voiced and considered throughout. Although family members should not administer formal assessments, they are encouraged to be involved in selecting, conducting, and provid- ing information to contextualize results. The process and results of assessment should be explained to parents in a way that is meaningful and easily understandable. CHILDREN WITH SPECIAL NEEDS Assessment historically has played a central role in the provi- sion of services to young children with special needs, unlike the general early childhood community, for which assessment has been viewed with suspicion until relatively recently (McConnell,

ASSESSING ALL CHILDREN 261 2000). This diverse population of young children presents numer- ous challenges related to the validity of assessments, not only because they are young, but also because of their developmental or disability-related needs. The following pages address why young children with special needs are being assessed, the princi- ples that should guide assessment, and some of the unique issues raised by conducting assessments for this population. The term “young children with special needs” is used to describe children from birth through age 5 years who have diagnosed disabilities, developmental delays, or a condition that puts them at risk for a delay or a disability. Key to understanding the assessment issues in this area is understanding who makes up this population. Many children with special needs receiving services do so through programs supported under the Individuals with Disabilities Education Act, the primary law that provides funding and policy guidance for the education of children with disabilities. The IDEA is basically a grants program of federal funds going to states to serve students with special needs on the condition that the education provided for them is appropriate (National Research Council, 1997). In 2006, nearly 1 million children with special needs under age 5 received services through programs governed by the IDEA. Specifically, almost 300,000 children under age 3 received early intervention services and more than 700,000 children ages 3 to 5 received special education and related services (https://www. ideadata.org/arc_toc8.asp#partbCC). Children under age 5 with special needs are served under two different sections of IDEA. Children from birth to age 3 receive services under Part C, Infants and Toddlers with Disabilities, whereas children ages 3 through 5 are served under Part B, which addresses special education and related services for children and youth ages 3 through 21. Infants and toddlers receive services for a variety of develop- mental problems, with communication problems being the most frequent. A total of 64 percent of children served under age 3 have some kind of developmental delay. Nearly one in five (19 percent) have some kind of a prenatal or perinatal abnormality, and 18 percent have motor problems. Three-fourths of the children iden- tified between ages 2 and 3 receive services for a communication problem. Smaller percentages have problems with movement (18

262 EARLY CHILDHOOD ASSESSMENT percent) (Scarborough, Hebbeler, and Spiker, 2006). Nearly half (47 percent) of children ages 3 to 5 are reported to have a primary disability of speech and language impairment, with 35 percent having a primary disability of developmental delay (https:// www.ideadata.org/arc_toc8.asp#partbCC). Assessment Purposes Young children with special needs are extremely diverse in the nature and extent of their competencies and needs, and this diver- sity has significant implications for assessment. The purposes of assessment include screening, diagnosis, and determination of eligibility for services, program planning, progress monitor- ing, and research, evaluation, and accountability (McLean, 2004; N ­ eisworth and Bagnato, 2004). Screening Screening, the process of identifying children who may need additional assessment, is the type of assessment that first suggests the presence of a possible developmental or physical problem, such as a mild communication delay or a hearing problem. A screening assessment may be focused on multiple areas of development, such as language, cognition, and socioemotional development, or specific body functions, such as vision or hearing. Some children, such as those with severe motor problems, would be unlikely to participate in a general developmental screening assessment intended to identify children at risk for poor development because the presence of a delay or disability is already apparent or docu- mented from birth. A number of assessment measures are available with acceptable levels of sensitivity and specificity, indicating that, if conducted well with well-chosen measures, screening can be an accurate process (Meisels and Atkins-Burnett, 2000). Diagnosis and Eligibility Determination Most young children with special needs participate in an assessment for diagnostic purposes and to establish their eligi- bility for early intervention or early childhood special education

ASSESSING ALL CHILDREN 263 services. A diagnostic evaluation is conducted to determine whether the child’s functioning is sufficiently outside the realm of typical development to warrant diagnosis of a disability or a developmental delay. The IDEA requires that children referred for early intervention services be assessed in five areas: physical development, cognitive development, communication development, social or emotional development, and adaptive development. The IDEA requires that children ages 3 and older be assessed in the area of suspected disability, although recommended prac- tice is for a comprehensive assessment in all areas (Neisworth and Bagnato, 2005). Children under 36 months of age are eligible for early intervention services under the IDEA if they have either a developmental delay or a condition likely to result in a delay if services are not provided (e.g., blindness). The IDEA requires that each state set its own criteria for determination of developmental delay. The criteria used by the states vary greatly (Shackelford, 2006), and some may require assessment precision or other p ­ sychometric qualities not available in current instruments. States also have the option to serve children under the IDEA who do not have an established condition but are at risk of developing a developmental delay. The IDEA eligibility criteria for 3- through 5-year-olds are quite different from those for infants and toddlers, meaning that a child can be eligible for services in one age group and not the other. States are required to serve all children ages 3 through 5 who have one of the 13 IDEA-specified disabilities and who have a demonstrated need for special education or related services. These are the same eligibility criteria that apply to children ages 5 through 21 with the exception of developmental delay, which can be used only with children through age 9. An alternative approach for eligibility determination, response to intervention (RTI), is being used with school-age children and has potential for younger children. Discussed briefly in Chapter 2, Specific learning disabilities, speech or language impairment, mental retarda­ tion, emotional disturbance, multiple disabilities, hearing impairments, ­orthopedic impairments, other health impairments, visual impairments, autism, deaf- b ­ lindness, traumatic brain injury, developmental delay.

264 EARLY CHILDHOOD ASSESSMENT RTI involves a multitiered procedure for identifying children who are experiencing difficulties; however, the application of this approach with younger children has not yet been fully developed (Coleman, Buysse, and Neitzel, 2006; VanDerHayden and Snyder, 2006). With current eligibility assessment procedures, children are identified for special assistance on the basis of poor performance on a norm-­referenced assessment. A multitiered model differs from traditional identification practices in that assessment is used first to identify children who are not benefiting from a high- q ­ uality program and then to monitor their progress when addi- tional assistance is provided. If the amount of additional service deemed necessary for the child to show progress is beyond the scope of the regular program, then the child could be considered in need of special education (VanDerHayden and Snyder, 2006). Assessment is central to implementation of a multitiered model, but, unlike current approaches to eligibility, the access to special services does not hinge on the outcome of assessment at a single point in time. Because assessment is ongoing in a multi­ tiered model, children have regular opportunities to receive special services if they need them, or to no longer receive them when they are performing at expected levels. Although a well-researched and well-implemented RTI model in early childhood might be an additional way to identify some children who need additional assistance around learning or behavior challenges ­(Barnett et al., 2006; Hemmeter, Ostrosky, and Fox, 2006), identification for IDEA services in the near future is likely to continue to rely on more traditional assessment procedures for many children. Planning for Intervention or Instruction The provisions of the IDEA require that each eligible child’s education must be determined on an individualized basis and designed to meet his or her unique needs. The law uses the word “evaluation” to describe the process of determining eligibility for services and the term “assessment” to describe the process of gathering information for planning the child’s program of services (McLean, 2004). The difference is not just a matter of semantics, because the norm-referenced assessments used to determine ­ eligibility do not provide useful information for

ASSESSING ALL CHILDREN 265 intervention planning, meaning that another type of assessment must be administered for this purpose (Bailey, 2004; Fewell, 2000; McCormick and Noonan, 2002; McLean, 2005). For children and families, this means that additional assess- ments need to be conducted after the diagnostic evaluation sub- stantiates that the child meets the eligibility criteria for services. Criterion-referenced or curriculum-based measures are generally used as part of the assessment process to identify objectives for the child and identify appropriate instructional or intervention strategies to achieve these objectives (Bagnato, 2007; Losardo and Notari-Syverson, 2001). In addition, information about the family’s daily routines and activities, the family’s concerns and priorities, and the child’s s ­ pecial interests is useful in planning (Wolery, 2003), as is infor- mation about classroom activities and goals for children in group care and educational settings (Pretti-Frontczak et al., 2007). Progress Monitoring The phrase “progress monitoring” is currently used to describe two different kinds of assessment processes for young children with disabilities. The first refers to tracking their progress through a set of objectives using any criterion or curriculum-based tool administered at regular intervals (Pretti-Frontczak et al., 2007; Wolery, 2003). The second involves the use of tools derived from a general outcomes model (Deno, 1997), in which key skills linked to general outcomes are assessed repeatedly over time, allowing for depiction of growth toward identified outcomes (Carta et al., 2002; McConnell, 2000). Monitoring progress is related to planning the child’s pro- gram, and the same assessments can be used for this purpose. The assessment process helps the teacher, interventionist, or therapist know whether they should continue to address this outcome or set of outcomes with the set of strategies being used or should identify higher level outcomes or new strategies (Pretti-Frontczak et al., 2007; Wolery, 2003). Note that for children making good progress, progress monitoring identifies the need for the teacher to address high-level outcomes. For children not making prog- ress, progress monitoring may indicate the need for alternative

266 EARLY CHILDHOOD ASSESSMENT intervention approaches to achieve outcomes not being met with current strategies. Whereas the IDEA has requirements address- ing evaluation for eligibility determinations and assessment for program planning, it is silent on the use of ongoing assessment to monitor a child’s progress toward a given set of outcomes. The law requires periodic review and updating of the child’s plan, but it does not address how assessment tools are to be used in this process. The use of ongoing assessment for planning and progress monitoring, however, is considered one of the indicators of a quality program for all young children, including children with disabilities (Division for Early Childhood, 2007; National Association for the Education of Young Children and National Association of Early Childhood Specialists in State Departments of Education, 2003). Large-Scale Assessment: Research, Evaluation, and Accountability Studies have examined multiple aspects of the development of young children with disabilities and the factors influencing their development, such as parent interaction or the effective- ness of a particular intervention strategy or curriculum model. A substantial body of research addresses the development of young children with particular kinds of disabilities or delays, for example, visual impairments or autism, and much of that evidence is based on the administration of assessment tools that track children’s development (see, e.g., Hatton et al., 1997; Rodrigue, Morgan, and Geffken, 1991). Similarly, many studies have examined issues of intervention or program effectiveness for young children with special needs by looking at developmental gains on assessment measures (McLean and Cripe, 1997; Spiker and Hopmann, 1997). The National Early Intervention Longitudinal Study and the Pre-Elementary Education Longitudinal Study are two national policy ­studies of IDEA services to young children with special needs that examined child outcomes and drew some of their findings from assessments (Hebbeler et al., 2007; Markowitz et al., 2006). Other national studies and evaluations, such as the Early Childhood Longitudinal Study-Kindergarten Cohort

ASSESSING ALL CHILDREN 267 and the national evaluation of Early Head Start, have included children with special needs because they were included in the population of children from which the study sample was drawn (Hebbeler and Spiker, 2003). The diversity of children with special needs, especially with regard to some who have limited response capabilities and lower overall functioning, is highly problematic when it comes to large- scale evaluations designed to look at the entire population of young children for research, evaluation, or accountability pur- poses. And the assessment of young children with special needs to address state or federal accountability requirements is a relatively recent phenomenon, either for programs specifically for children with special needs or for general early childhood programs in which they are served, such as Head Start or state-operated pre- schools (Division for Early Childhood, 2007; Harbin, Rous, and McLean, 2005; Hebbeler, Barton, and Mallik, 2008). Beginning in 2008, the U.S. Department of Education is requiring that all states provide data on progress made by young children during their time in IDEA-governed programs. States are employing a vari- ety of approaches to obtain these data, including using a single assessment statewide, several online assessments, a summary process based on team decision making, and multiple sources of information that include formal assessment tools. Much attention in the last 20 years has focused on making sure that children in special education are included in state K-12 accountability efforts, because previously they were not. The 1997 amendments to the IDEA require that children with disabilities be included in state and district assessment programs and pro- vided with appropriate accommodations. The law also requires that states report their scores on these assessments in the same detail and with the same frequency as the scores of other children (Ysseldyke et al., 1998). Principles of Assessment Several aspects of the assessment of young children with dis- abilities for eligibility and program planning are codified in the IDEA as described above and may be addressed in state laws and regulations as well. In addition, several organizations, including

268 EARLY CHILDHOOD ASSESSMENT the National Association of the Education of Young Children, the National Association of Early Childhood Specialists in State Departments of Education, and the National Association of School Psychologists, have developed position statements on the assess- ment of young children (National Association for the Education of Young Children and National Association of Early Childhood Specialists in State Departments of Education, 2003; National Association of School Psychologists, 2005). The principles in these documents apply to all children, including those with special needs. Indeed, some of the principles apply to using assessment to identify children in need of special services. The Division for Early Childhood (DEC) of the Council for Exceptional Children has developed a set of recommended practices specifically addressing the assessment of young children with special needs (Neisworth and Bagnato, 2005). The DEC also has developed a companion document to NAEYC’s position statement on curriculum, assess- ment, and evaluation that elaborates on these topics for children with disabilities (Division for Early Childhood, 2007). A common theme across the professional organizations and echoed by many in the field is the importance of using multiple sources of information and never making a decision about a child based on a single assessment (Greenspan and Meisels, 1996; McCune et al., 1990; McLean, 2004; Meisels and Atkins-Burnett, 2000; Wolraich et al., 2005). This recommendation is especially important for children with special needs, whose performance and behavior across settings and situations can be even more vari- able than those of typically developing children. A key principle of good assessment is that families of children with special needs should be included in the assessment process (Boone and Crais, 2002; Division for Early Childhood, 2007; Meisels and Atkins-Burnett, 2000; Meisels and Provence, 1989; Neisworth and Bagnato, 2005). Thinking of families as equal and contributing partners in the assessment has numerous implica- tions for how an assessment process is to be carried out. Family members contribute to the assessment process by supporting the child during assessment, validating the findings suggested by other team members, identifying discrepancies between the child’s performance on a formal assessment and what a child usually does, reporting on typical patterns of behavior, and con-

ASSESSING ALL CHILDREN 269 ducting the assessment with team members to ensure the best performance of the child (Division for Early Childhood, 2007; Woods and McCormick, 2002). The variability in the performance of children with special needs across situations requires incorpo- rating information from family members to obtain an accurate picture of the child’s capabilities. Families are reliable reporters of information about their child’s performance, and the validity of the assessment is enhanced by including it (Suen et al., 1993, 1995). Another principle applicable to all children but of special relevance to children with special needs is the importance of providing them with multiple opportunities to demonstrate their competencies. The setting for the assessment, the child’s relation- ship with the person conducting the assessment, and the ability of the assessor to establish rapport, fatigue, hunger, interest level in the materials and numerous other factors could result in a severe underestimate of the child’s capabilities (Division for Early Child- hood, 2007; Meisels and Atkins-Burnett, 2000). Besides offering multiple assessment opportunities, tapping multiple sources of information (family members’ reports, observation of children in familiar settings) about the child’s functioning helps reduce the chance of underestimating their functioning (McLean, 2004). Qualities of good early childhood assessment, identified by Neisworth and Bagnato (2005), are that it is useful for its chosen purpose; acceptable to both families and professionals; authentic in that the circumstances and people involved in the assessment are familiar to the child; based on collaboration between families and professionals; reflects convergence of multiple sources of information; accommodates individual differences; sensitive to even small increments of change; and based on tools that have been validated for use with the population of children for whom the assessment is being used. Five practices addressing the assess- ment of children with special needs and recommended by the Division for Early Childhood of the Council for Exceptional Chil- dren reflect these qualities (Neisworth and Bagnato, 2005): 1. Professionals and families collaborate in planning and implementing assessment.

270 EARLY CHILDHOOD ASSESSMENT 2. Assessment is individualized and appropriate for the child and family. 3. Assessment provides useful information for intervention. 4. Professionals share information in respectful and useful ways. 5. Professionals meet legal and procedural requirements and meet recommended practices guidelines. Assessment Challenges Children with special needs are assessed in large numbers and by a varied array of practitioners, yet little information about actual assessment practices is available. It would be useful to know what tools are being used, how child behaviors are being judged, how eligibility decisions are being reached, to what extent children with special needs are included in accountability assessments, and so on. The use of norm-referenced standard- ized assessments for children with special needs creates particu- lar challenges. Standardized assessments require that items be administered the same way to all children, requiring them to show competence on demand, possibly in an unfamiliar setting and at the request of a stranger. The structure and requirements of tra- ditional norm-­referenced measures present numerous problems for the assessment of young children in general, but especially for young children with special needs (Bagnato, 2007; Macy, Bricker, and Squires, 2005; McLean, 2004; Meisels and Atkins-Burnett, 2000; Neisworth and Bagnato, 2004). In fact, Bagnato concluded that “conventional testing has no valid or justifiable role in early care and education” (Bagnato and Yeh-Ho, 2006, p. 618). A discus- sion of some of these problems follows. One of the problems is based on the extent and number of response demands that the testing situation makes on the child. Standardized testing often requires verbal fluency, expressive communication, fully functioning sensory systems, as well as comprehension of the assessment cues including the verbal and visual cues being given by the examiners (Bagnato, 2007; Division for Early Childhood, 2007; Meisels and Atkins-Burnett, 2000). Many young children with special needs are not capable of com- plying with all of the demands of the testing situation.

ASSESSING ALL CHILDREN 271 A national study of eligibility practices of over 250 preschool psychologists with over 7,000 children found that nearly 60 per- cent of the children would have been untestable if the psycholo- gists had followed standardized procedures. Children could not respond as they were expected to because of lack of language, poor motor skills, poor social skills, and lack of attention and other self-control behaviors (Bagnato and Neisworth, 1995). One of the basic principles of good assessments is that an assessment must have demonstrated validity for the purposes for which it is used (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999). Norm-referenced measures are often used with young children to determine eligibility for IDEA services. As explained previously, state definitions for eligibility for early intervention services employ criteria (e.g., percent delay) that necessitate the use of norm-referenced measures. In 1987, a landmark paper examined the test manuals of 27 aptitude and achievement tests and found that publishers provided very little information on the use of the test with children with disabilities (Fuchs et al., 1987). More recently, Bagnato and colleagues (Bagnato, McKeating- Esterle, and Bartolomasi, 2007; Bagnato et al., 2007a, 2007b) published syntheses of both published and unpublished research on testing and assessment methods for early intervention, with funding from the U.S. Department of Education, Office of Spe- cial Education Programs. They concluded that no research has been conducted to support the use of conventional tests for early intervention eligibility. Only three studies have been conducted to support the use of authentic assessment methods and clinical judgment methods for this purpose. Bailey (2004) suggests that the factor structure used to develop age levels for developmental assessments may not be appropriate for children with develop- mental delays. He cites a study that found only three factors for children with severe developmental disabilities rather than the five factors reported in the manual (Snyder et al., 1993). Weak or imprecise measurement during eligibility determinations may lead to denial of access to services. One possible way to mitigate some of the limitations of using norm-referenced assessments for eligibility determinations is

272 EARLY CHILDHOOD ASSESSMENT through the use of clinical judgment, which, in those states that allow it, can be used instead of or in conjunction with formal assessments. Dunst and Hanby (2004) compared the percentage of children served in the 28 states and the District of Columbia that allow the use of informed clinical opinion with those that do not and found no differences in the percentage of children served, suggesting that professionals in the states that allow for informed clinical opinion may not take advantage of this eligibility deter- mination practice. Another practice problem associated with standardized norm-referenced assessments is that they do not provide infor- mation that is relevant for program planning because the items are chosen for their ability to discriminate among children. In other words, ideal items on norm-referenced tests are passed by half the children and failed by half the children in the norming group. Because norm-referenced tests lack treatment or instruc- tional validity (Bailey, 2004; Botts, Losard, and Notari-Syverson, 2007; Neisworth and Bagnato, 2004), service providers need to give additional assessments to develop intervention plans. One study, which represents a possible new direction for eligibility assessment, examined the use of a curriculum-based measure for eligibility as an alternative to norm-referenced assessment (Macy, Bricker, and Squires, 2005). It found support for the potential of alternative forms of assessment for making eligibility decisions. All of the problems with using norm-referenced assessment notwithstanding, at least professionals administering traditional tools to young children for diagnostic purposes have the option to select a particular instrument on the basis of the characteristics of the individual child to be tested and should be augmenting that information with information from other sources. The exam- iner also can modify the assessment procedures to accommodate fatigue or lack of interest. Although such changes in administra- tion violate the standard administration procedures, they may be the only way to get usable information from the assessment (Bagnato and Neisworth, 1995). Often no such option for indi- vidualization exists when children with disabilities are assessed for research, evaluation, or accountability purposes—the other reasons why children with special needs would be administered standardized assessments.

ASSESSING ALL CHILDREN 273 For the aggregated data to be meaningful, all children must be administered the same assessment according to the same guidelines. The issue of aggregating data is somewhat less problematic for researchers or program evaluators study- ing a ­ homogeneous subpopulation of children with special needs, such as young children with blindness, because the study designers may have the option to select a measure that has been developed and validated with the subpopulation of interest (assuming such measures exist). For large data collec- tions encompassing the entire range of young children with disabilities, the challenges related to instrument selection and administration are substantial, as are the challenges of recruiting assessment administrators and interpreters with the full range of relevant knowledge and experience. Designers of large-scale data collections may respond to the assessment challenges posed by the diversity of children with spe- cial needs by excluding them from either the entire study sample or from one or more of the assessments. Another approach is to include only those children with special needs deemed capable of participating in the general assessments and either exclude or administer an alternate assessment to those who cannot take part in the regular assessment. The Early Childhood Longitudinal Study-Kindergarten Cohort, for example, included all children with special needs, provided a set of accommodations for those who needed them, and included an alternate assessment for children who could not participate in the regular assessment ( ­ Hebbeler and Spiker, 2003). Given that the data in large-scale studies will be aggregated across children and possibly disaggregated by subgroups, it is imperative that accurate conclusions be drawn about the per- formance of children with special needs. Even though there are no data on the validity of using standardized norm-referenced assessments with children with special needs for this purpose, national and statewide evaluation efforts, including the Head Start’s National Reporting System, have used such measures with this population for these purposes. Currently, an assessment system developed by the state of California contains the only assessment tools that have been developed explicitly for large-scale data collection with young

274 EARLY CHILDHOOD ASSESSMENT children, including those with special needs. These observation- based tools are unique because they were designed from the beginning to ensure that young children with disabilities could be included in the data collection (see http://www.draccess.org for more information). In addition to these general problems, we describe below sev- eral challenges of special relevance to the assessment of children with disabilities. Construct-Irrelevant Skills and the Interrelatedness of Developmental Domains For a young child to demonstrate competency on even a single item on an assessment requires a combination of skills, yet some of them may not be relevant to the construct being assessed. To the extent that items on an assessment require skills other than the construct being assessed (e.g., problem solving), construct- irrelevant variance exists in the scores. Some examples of this in assessments of young children with special needs are obvious. A child who cannot hear or who has no use of her arms will not be able to point to a picture of a cat when asked. The item requires hearing and pointing as well as knowledge of a cat, even though these are not the skills being tested. The child who cannot point will fail the item, regardless of what he or she knows about cats. Other occurrences of construct-irrelevant variance may not be so obvious. All assessments that require children to follow and respond to the examiner’s directions require some degree of language processing. Even though test developers attempt to address this by keeping instructions simple, all young children are imperfect language processors because they are still learn- ing language. Many young children with special needs have impairments related to communication, meaning their capacity to process language is even less than the restricted capacity of a typical peer. Unlike deafness, blindness, or a motor impairment, language processing problems may present no visible signs of impact on the assessment process. Construct-irrelevant variance is a major problem for the assessment of young children because many assessments are organized and scored around domains of development. Domains

ASSESSING ALL CHILDREN 275 are a construct created to describe areas of development. They do not exist independently in the child, and therefore measure- ment tools that assume independence of domains will have some degree of construct-irrelevant variance due to overlap across domains. Ironically, the impact of construct-irrelevant skills is greater for children with disabilities, because their development across domains may be less connected than it is for typically developing children. For example, completing a two-piece puzzle requires both cognitive and motor skills, skills that develop in tandem in typically developing children. The puzzle is challeng- ing for the same-age child with limited motor skills, even though that child may have a very solid understanding of how the pieces fit together. Functional Outcomes and Domain-Based Assessments For many years the emphasis in working with young chil- dren with special needs has been on identifying and improving functional, rather than domain-based, outcomes. The concept of an appropriate outcome of intervention for a young child with disabilities has evolved over time. One approach used previously by service providers was to write outcomes drawn from domain- based developmental milestones (Bailey and Wolery, 1984). Two examples of milestones as outcomes are “Places round piece in a form board” or “Nests two then three cans.” Although some lists of milestones can provide useful skills, milestones do not make good instructional targets for numerous reasons. They are not derived from a theory of development. Many were originally developed because of their ability to differentiate the perfor- mance of children of different ages on standardized tests. And the sequence of development for typically developing children may not represent the best sequence for children with disabilities. A contrasting approach to outcome identification, which is now considered recommended practice, is to develop outcomes that are functional (McWilliam, 2004). Functional outcomes (a) are immediately useful, (b) enable a child to be more independent, (c) allow a child to learn new, more complex skills, (d) allow a child to function in a less restrictive environment, and (e) enable a child to be cared for more easily by the family and others ­(Wolery,

276 EARLY CHILDHOOD ASSESSMENT 1989). An example of a functional outcome is “Natalie will be able to sit in her high chair, finger feed herself, and enjoy dinner with her family.” Outcomes like this are important because they allow a child to participate more fully in a variety of community settings (Carta and Kong, 2007). Unlike a set of developmental milestones that may have limited utility to a child on a day-to- day basis, functional skills are usable across a variety of settings and situations with a variety of people and materials that are part of the child’s daily environment (Bricker, Pretti-Froniczak, and McComas, 1998). Functional outcomes are at odds with domain-based assess- ments because they recognize the natural interrelatedness across domains as essential to children’s being able to accomplish mean- ingful tasks in their daily lives. A functional outcomes approach does not try to deconstruct children’s knowledge and skills into types of items reflected in many domains-based assessment frameworks; the units of interest are the more complex behaviors that children must master to be able to function successfully in a variety of settings and situations. The International Classifica- tion of Functioning, Disability and Health—Children and Youth Version (ICF-CY) (World Health Organization, 2007) is based on an emerging international consensus that characterization of individuals’ health and ability or disability should be grounded in functions, activities, and participation and provide methods for characterizing these in children. The emphasis in many assessment tools on discrete skills and their organization into domains can operate as a barrier to recom- mended practice for practitioners, who are to use the results in partnership with families to identify the child’s areas of need and plan interventions addressing meaningful functioning. Universal Design and Accommodations Universal design is a relatively new phenomenon that has direct application to assessment design for all children, especially young children with special needs. Ideally, all assessments should be designed in accord with principles of universal design, thereby minimizing the need for accommodations. Universal design has its origins in architectural efforts to design physical environments

ASSESSING ALL CHILDREN 277 to be accessible to all. According to the Center for Universal Design (1997), universal design is “the design of products and environ- ments to be usable by all people, to the greatest extent possible, without the need for adaptation or specialized design.” Universal design is reflected in the community in sidewalks that have curb cuts, allowing people with wheelchairs to cross streets. The goal in applying principles of universal design to assess- ments is to develop assessments that allow for the widest range of participation and allow for valid inferences about performance (Thompson and Thurlow, 2002). Applying the principles of uni- versal design to the development of assessments for accountability for elementary and secondary school-age children, Thompson and Thurlow identified seven elements of universally designed assess- ments (Table 8-2). Some of the principles, such as maximum readability and maximum legibility, are primarily applicable to assessments in which the child will be reading passages of text, but most of these principles can be applied to early childhood assessment design. A principle of special relevance for young children is the need for precisely defined constructs. Just as physical environments are to be designed to remove all types of barriers to access and use, assessments are to be designed so that cognitive, sensory, emo- tional, and physical barriers that are not related to the construct being tested are removed (Thompson, Johnstone, and Thurlow, 2002), which relates to the previous discussion on construct- i ­ rrelevant skills. Application of universal design principles is intended to minimize construct-irrelevance variance. Universal design principles are especially relevant for standardized assess- ments but also apply to criterion-based assessments. For example, objectives for children can be described with regard to “commu- nication” rather than spoken language and “mobility” rather than walking. Many of the assessment tools in use today with young children predate the concept of universal design and thus were not developed to reflect these principles (California’s Desired Results System being a notable exception). Even with the application of universal design principles, the need may remain to develop accommodations to allow some children with special needs to be assessed with a particular instru- ment and for their scores to accurately reflect their capabilities.

278 EARLY CHILDHOOD ASSESSMENT TABLE 8-2  Elements of Universally Designed Assessments Element Explanation Inclusive assessment Tests designed for state, district, or school population accountability must include every student except those in the alternate assessment, and this is reflected in assessment design and field testing procedures. Precisely defined The specific constructs tested must be clearly defined constructs so that all construct-irrelevant cognitive, sensory, emotional, and physical barriers can be removed. Accessible, Accessibility is built into items from the beginning, and nonbiased items bias review procedures ensure that quality is retained in all items. Amenable to The test design facilitates the use of needed accommodations accommodations (e.g., all items can be Brailled). Simple, clear, All instructions and procedures are simple, clear, and and intuitive presented in understandable language. instructions and procedures Maximum A variety of readability and plain language guidelines readability and are followed (e.g., sentence length and number of comprehensibility difficult words are kept to a minimum) to produce readable and comprehensible text. Maximum legibility Characteristics that ensure easy decipherability are applied to text, to tables, figures, and illustrations, and to response formats. SOURCE: Thompson and Thurlow (2002). An accommodation is never intended to modify the construct being tested. Accommodations can include modifications in pre- sentation, in response format, in timing, and in setting. They are generally associated with standardized testing, with its stringent administration requirements. Criterion-based measures, which tend to be more observation-based, provide children with many and varied ways to demonstrate competence as part of the assess- ment procedures, an approach that reduces but may not eliminate the need for accommodations. An extensive body of literature has developed in the last 20 years on the use of accommodations of various kinds with various subgroups of school-age children with disabilities, as

ASSESSING ALL CHILDREN 279 states moved to include children with disabilities in statewide accountability testing programs (see http://www2.cehd.umn. edu/NCEO/accommodations). There is no corresponding litera- ture for young children, probably because the process of building a system of ongoing large-scale assessment of young children for accountability is only beginning in many states (National Early Childhood Accountability Task Force, 2007), and it is the imple- mentation of large-scale data collection that precipitates the need for accommodations. Other Assessment Characteristics Individual assessment tools differ with regard to other fea- tures that have implications for their appropriateness for some children with special needs. The tool must have a low enough floor to capture the functioning of children who are at a level that is far below their age peers. Not having enough items low enough for children with severe disabilities can be a problem on a norm-referenced or curriculum-referenced measure. Similarly, the assessment must have sufficient sensitivity to capture small increments of growth for children who will make progress at far slower rates than their peers (Meisels and Atkins-Burnett, 2000). Identifying a tool that has a sufficiently low floor, provides adequate sensitivity, and covers the target age range will be challenging for any large-scale assessment that includes young children with special needs. An assessment developed to be used with 3- through 5-year-olds that includes items only appropri- ate to that age span will not adequately capture the growth of a 3-year-old who begins the year with the skills of a 2-year-old and finishes with those of a 3-year-old. One last consideration related to assessing young children with special needs is the extent to which the test’s assumptions about how learning and development occur in young children are congruent with how development occurs in the child being assessed. Caution is needed in using assessments with children with special needs that were developed for a typically develop- ing population, and in which children with special needs were not included in the design work or the norming sample (Bailey, 2004).

280 EARLY CHILDHOOD ASSESSMENT Conclusion The nearly 1 million young children with special needs are regularly being assessed around the country for different pur- poses. Although a variety of assessment tools are being used for these purposes, many have not been validated for use with these children. Much more information is needed about assess- ments and children with special needs, such as what tools are being used by what kind of professionals to make what kind of decisions. Assessment for eligibility determines whether a young child will have access to services provided under the IDEA. It is unknown to what extent these critical decisions are being made consistent with recommended assessment practices and whether poor assessment practices are leading to inappropriate denial of service. The increasing call for accountability for programs serv- ing young children, including those with special needs, means that even more assessment will be occurring in the future. Yet the assessment tools available are often insufficiently vetted for use as accountability instruments, and they are difficult to use in standardized ways if children have special needs, and they focus inappropriately on discrete skills rather than functional capacity in daily life. Until more information about assessment use is avail- able and better measures are developed, extreme caution is critical in reaching conclusions about the status and progress of young children with special needs. The potential negative consequences of poor measurement in the newest area of assessment, account- ability, are especially serious. Concluding that programs serving young children with special needs are not effective based on flawed assessment data could lead to denying the next generation of children and families the interventions they need. Conversely, good assessment practices can be the key to improving the full range of services for young children with special needs: screening, identification, intervention services, and instruction. Good assess- ment practices will require investing in new assessment tools and creating systems that ensure practitioners are using the tools in accordance with the well-articulated set of professional standards and recommendations that already exist.

Next: 9 Implementation of Early Childhood Assessments »
Early Childhood Assessment: Why, What, and How Get This Book
×
Buy Paperback | $69.95 Buy Ebook | $54.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The assessment of young children's development and learning has recently taken on new importance. Private and government organizations are developing programs to enhance the school readiness of all young children, especially children from economically disadvantaged homes and communities and children with special needs.

Well-planned and effective assessment can inform teaching and program improvement, and contribute to better outcomes for children. This book affirms that assessments can make crucial contributions to the improvement of children's well-being, but only if they are well designed, implemented effectively, developed in the context of systematic planning, and are interpreted and used appropriately. Otherwise, assessment of children and programs can have negative consequences for both. The value of assessments therefore requires fundamental attention to their purpose and the design of the larger systems in which they are used.

Early Childhood Assessment addresses these issues by identifying the important outcomes for children from birth to age 5 and the quality and purposes of different techniques and instruments for developmental assessments.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!