Page 113

5—
Student Assessment

This chapter addresses the issue of assessing the language proficiency and subject matter knowledge and skills of English-language learners.1

State Of Knowledge

Assessment plays a central role in the education of English-language learners and bilingual children. Teachers generally use assessments to monitor language development in students' first or second language and track the quality of their day-to-day subject matter learning. In addition, assessments are used to place students in special programs and to provide information used for accountability and policy analysis purposes. The research issues related to these roles have much in common.

Several uses of assessment at the classroom and school levels are unique to English-language learners and bilingual children, while others also apply to students generally. Uses unique to English-language learners and bilingual children include the following:

Identification of children whose English proficiency is limited

1The standards for assessing reading and writing developed by the International Reading Association and the National Committee of Teachers of English, as well as those currently in development by Teachers of English to Speakers of Other Languages for assessing English proficiency, are consistent with and supportive of the model of assessment emerging from the review in this chapter.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 113
Page 113 5— Student Assessment This chapter addresses the issue of assessing the language proficiency and subject matter knowledge and skills of English-language learners.1 State Of Knowledge Assessment plays a central role in the education of English-language learners and bilingual children. Teachers generally use assessments to monitor language development in students' first or second language and track the quality of their day-to-day subject matter learning. In addition, assessments are used to place students in special programs and to provide information used for accountability and policy analysis purposes. The research issues related to these roles have much in common. Several uses of assessment at the classroom and school levels are unique to English-language learners and bilingual children, while others also apply to students generally. Uses unique to English-language learners and bilingual children include the following: • Identification of children whose English proficiency is limited 1The standards for assessing reading and writing developed by the International Reading Association and the National Committee of Teachers of English, as well as those currently in development by Teachers of English to Speakers of Other Languages for assessing English proficiency, are consistent with and supportive of the model of assessment emerging from the review in this chapter.

OCR for page 113
Page 114 • Determination of eligibility for placement in specific language programs (e.g., bilingual education or English as a second language [ESL]) • Monitoring of progress in and readiness to exit from special language service programs Uses of assessment that extend beyond English-language learners include the following: • Placement in categorically funded education programs, such as special education, gifted and talented, and Title I programs • Placement in remedial or advanced academic course work • Monitoring of achievement in compliance with school district and/or state-level assessment programs • Certification for high school graduation and determination of academy mastery at graduation In addition, the federal government sponsors a variety of assessments, such as the National Assessment of Educational Progress, to measure the performance and progress of U.S. students. Additional discussion of the National Assessment of Educational Progress and other large-scale assessments in relation to English-language learners is included in Chapter 9. The remainder of this section begins by looking at issues of validity and reliability associated with student assessment. The next two subsections review uses of assessment that are unique to English-language learners and issues involved in assessing language proficiency. This is followed by two subsections that examine uses of assessment that extend beyond English-language learners and those associated with the assessment of subject matter knowledge. One additional set of assessment issues is then explored—those associated with assessing special populations, including very young second-language learners and English-language-learners with disabilities. The chapter ends with a discussion of standards-based reform and its implications for the design and conduct of student assessments. Validity and Reliability Issues It is essential that those using any assessment impacting children's education strive to meet standards of validity and reliability (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1985). Validity concerns whether the inferences drawn from assessment outcomes are appropriate to the purposes of the assessment. It encompasses use of an assessment to measure current achievement and ability relative to specific performance criteria, as well as the potential for future achievement, and to investigate the underlying competencies that theory indicates

OCR for page 113
Page 115 should be tapped by an assessment. Reliability concerns the accuracy of assessment outcomes in light of the variations in those outcomes that are due to factors irrelevant to what the assessment was intended to measure. Such factors might include characteristics of the individual, the fact that the assessment represents only a sample of a larger universe of assessment items, and inconsistency of the scoring of performance on an assessment (such as a constructed response test) from scorer to scorer and across an individual's scoring of the same assessment. The issue of reliability is made more complex because these factors may interact in ways that are not readily measured for their impact on performance (Cronbach et al., 1995). The validity and reliability of assessments can be investigated using a wide range of psychometric and statistical procedures, as well as experimental and qualitative studies of assessment performance. Garcia and Pearson (1994:343-349) examine assessment and diversity across a wide range of subject matters and test types. They highlight potential problems for English-language learners that result from the ''mainstream bias" of formal testing, including a norming bias (small numbers of particular minorities included in probability samples, increasing the likelihood that minority group samples are unrepresentative), content bias (test content and procedures reflecting the dominant culture's standards of language function and shared knowledge and behavior), and linguistic and cultural biases (factors that adversely affect the formal test performance of students from diverse linguistic and cultural backgrounds, including timed testing, difficulty with English vocabulary, and the near impossibility of determining what bilingual students know in their two languages). The ensuing discussion of assessment as applied to English-language learners and bilingual children inherently involves questions about the validity and reliability of assessments and their appropriateness for these children. It is also important to note that assessment practices have social and educational consequences that should be considered in an ongoing program of validity research (Messick, 1988). Assessment Purposes Unique to English-Language Learners There are many purposes for assessments of language proficiency, including placing students in special services, monitoring their progress, predicting educational outcomes, and exiting students from special language services. According to four recent surveys, states and local districts use a variety of methods to determine which language-minority students have limited English proficiency, to place these students in special language-related programs, and to monitor the progress of the students in such programs (August and Lara, 1996; Cheung et al., 1994; Fleishman and Hopstock, 1993; Rivera, 1995). These methods include home language surveys, registration and enrollment information, observations, interviews, referrals, grades, and classroom performance and testing (Cheung et al., 1994). However, administration of language proficiency tests in English is

OCR for page 113
Page 116 the most common method (Fleishman and Hopstock, 1993). Fleishman and Hopstock found that 83 percent of school districts with English-language learners used English-language proficiency testing, either alone or in combination with other techniques, to determine which language-minority students were of limited English proficiency. Similarly, such tests were used by 64 percent of school districts for assigning English-language learners to specific instructional services in schools and by 74 percent of school districts for reclassifying students once they have developed English proficiency. Achievement tests in English are also frequently used by school districts and schools to help identify English-language learners, assign them to school programs, and reclassify them when English proficient (Fleischman and Hopstock, 1993). Specifically, 52 percent of school districts and schools across the country use such tests to help identify English-language learners, 40 percent use them to help assign students to specific instructional programs within a school, and over 70 percent use them for reclassification purposes (as reported in Zehler et al., 1994). There is a great deal of variability across school districts in the way assessments are used for the above purposes. This is because many states, while providing guidance to the districts on assessment procedures for students with limited English proficiency, allow them considerable flexibility in choosing assessment methods, assessment instruments (usually from a menu of state-approved instruments), and cutoff scores for these instruments (August and Lara, 1996).2 Issues in Assessing Language Proficiency Regardless of the modality of testing, many existing English-language proficiency instruments emphasize measurement of a limited range of grammatical and structural skills. Test items are frequently designed to assess a specific discrete language skill, though some tests and test items involve assessment of a number of discrete skills simultaneously. In part, emphasis on assessment of 2Of the 25 states that have assessment requirements for determining which language-minority students are of limited English proficiency, 22 specify English proficiency tests. Of these 22 states, 8 also specify achievement tests, and 3 specify English proficiency tests and below-average performance based on grades or classwork. When assessment is used for program placement, similar procedures are used. In the other states, it is up to individual districts to set these policies. In some states, native-language proficiency assessments are required (Arizona, Hawaii, Utah, California, Texas, New Jersey) or recommended. The only information regarding methods for reclassifying students from language assistance programs (Cheung and Soloman, 1991) indicates that language tests are the most frequently used method (required in 36 percent of states, recommended in 30 percent), followed by content area tests (required in 34 percent of states, recommended in 11 percent). Other methods recommended for determining program exit include observations and interviews. About one-third of states reported having no state requirement regarding exit criteria.

OCR for page 113
Page 117 grammatical and structural control of a language is a legacy from first-language acquisition studies. First-language acquisition research was dominated, especially in the 1970s, by arguments between empiricists and nativists who used morphology and syntax as the primary battleground for framing our scientific understanding of language acquisition (Bialystok and Hakuta, 1994). During the 1970s and 1980s, new models of bilingual language competence emerged from the fields of linguistic pragmatics, interactional sociolinguistics, and cognitive studies of discourse processing. These perspectives, which were better attuned to the language demands faced by language-minority students in everyday settings (Rivera, 1984), examined how children acquire competence in using language to accomplish purposeful functions arising in social interaction (e.g., Wong Fillmore, 1982) and how language practices are tied to ongoing participation in classroom activities, referred to as authentic assessment (e.g., Gutierrez, 1995). As a consequence of these new models of language competence, Valdez Pierce and O'Malley (1992) recommend assessment procedures for monitoring the language development of language-minority students in the upper elementary and middle grades that reflect tasks typical of the classroom or real-life settings. As examples, they cite oral interviews, story retellings, simulations/situations, directed dialogues, incomplete story/topic prompts, picture cues, teacher observation checklists, and student self-evaluations. They also describe a portfolio assessment framework for monitoring the development of English-language learners. Authentic assessments are both more difficult to administer and less objectively scored than traditional assessments, but they do reflect the important view that language proficiency is multifaceted and varies according to the task demands and content area domain (see Chapter 2). Widespread implementation of practical assessments based on this viewpoint has been slow to emerge and is an important area for further research. One promising approach has been developed by Royer and Carlo (1991). They report on the utility of a sentence verification technique test (which basically involves reading or listening to a passage and then marking sentences as to whether they correctly reflect the information in the passage). The authors suggest the passages can be developed locally, based on curricular material familiar to the student. This form of assessment is relatively easy to develop in any language, and the reliability and validity data appear strong. However, in pursuing new assessments of language proficiency for English-language learners and bilingual children, we should not ignore existing language assessment methods that focus on discrete language skills, even though there are differing beliefs about which components are most critical. For example, evidence exists throughout the cognitive and psycholinguistic research literature that routinization of basic language recognition and production skills is associated with greater fluency in language use at the level of spoken and written discourse (McLaughlin, 1984). Thus the assessment of these skills is a legitimate endeavor, though it is important to recognize that such assessments may have good predictive

OCR for page 113
Page 118 ability because they are tapping an ability correlated with a variety of language proficiencies, not because they constitute language proficiency. In summary, the major purpose of English-language proficiency testing has been to determine placement in special language programs, monitor students' progress while in these programs, and decide when students should be exited from these programs. Most measures used not only have been characterized by the measurement of decontextualized skills, but also have set fairly low standards for language proficiency. Ultimately, English-language learners should be held to high standards for both English language and literacy, and should transition from special language measures to full participation in regularly administered assessments of English-language arts. Assessment Purposes That Extend Beyond English-Language Learners The assessment policies discussed in this section are related to determining eligibility for federal assistance and monitoring student progress at the state and district levels. Title I is by far the largest federal program serving English-language learners. Yet past practice in using tests to assess eligibility for such programs raises a number of issues. For example, in documenting district policies, Strang and Carlson (1991) found that many English-language learners were not being served through Title I because districts required students to be English proficient before they could be served. However, those English-language learners who met the English proficiency requirements also scored above the cut-off on English achievement tests used for Title I selection. New Title I assessment policy is currently being discussed because of changes in the law (see Kober and Feuer, 1996). Those changes provide for the participation of all students, including English-language learners, in assessments to determine whether they are meeting performance standards and for reasonable adaptations of these assessments to accomplish this end. According to the law, English-language learners are to be included in assessments to the extent practicable, in the language and form most likely to yield accurate and reliable information on what they know and can do, including their mastery of skills in target subject matter areas, not just English. The law now further requires that each state plan identify the languages other than English that are present in the participating student population and indicate the languages for which yearly student assessments are not available and are needed. States are required to make every effort to develop such assessments and may request assistance from the Secretary of the Department of Education if linguistically accessible assessment measures are needed (see August et al., 1995). Assessment is particularly important for purposes of selecting eligible students for services in Title I targeted assistance programs, whereby Title I services are made available to a subset of the students "on the basis of multiple, educationally

OCR for page 113
Page 119 related, objective criteria established by the local educational agency and supplemented by the school" (Section 1115). The current policy guidance provided by the U.S. Department of Education does not elaborate on how this might be accomplished for English-language learners, and leaves it up to local districts to select those eligible students "most in need of special services." In the absence of adaptations to assessments, including assessments conducted in the native language, as well as methods for determining how English-language learners compare with other students on educational needs, a large proportion of English-language learners may not be served through Title I. Surveys of state-wide assessment systems (August and Lara, 1996; Rivera, 1995) show that states use a variety of measures to assess student performance, including performance-based assessments and standardized achievement tests, and that states are in various stages of incorporating English-language learners into these assessments. August and Lara (1996) found that only 5 states require English-language learners to take state-wide assessments required of other students;3 36 states exempt English-language learners from such assessments, although 22 of those states require these students to take the assessments after a given period of time (usually 1-3 years). Some states base their assessment decision on the proficiency level of their English-language learners; of these, a few leave it up to local districts to determine which students have enough English proficiency to participate in the state-wide assessments. Finally, some states use multiple criteria to excuse students from state-wide assessments, including number of years in English-speaking classrooms, language proficiency scores, school achievement, and teacher judgment. States use a variety of approaches to assess students that have been exempted from the state-wide assessments. Hafner (1995) reports that 55 percent of states allow modifications in the administration of at least one of their assessments to incorporate English-language learners. The most common modifications are extra time (20 states), small-group administration (18 states), flexible scheduling (16 states), simplification of directions (14 states), use of dictionaries (13 states), and reading of questions aloud in English (12 states). Other accommodations include assessments in languages other than English, availability of both English and non-English versions of the same assessment items, division of assessments into shorter parts, and administration of the assessment by a person familiar with the children's primary language and culture (Rivera, 1995). Some states also provide guidance to scorers on evaluating the work of English-language learners. Hafner (1995) reports that 10 percent of states give special training on evaluating the work of English-language learners, and 10 percent give directions in their manuals. Some training entails the development of scoring rubrics and procedures for constructed response items that are sensitive 3In 3 of these states, however, English-language learners may be exempted under certain conditions.

OCR for page 113
Page 120 to the language and cultural characteristics of English-language learners. The Council of Chief State School Officers recently developed a Scorer's Training Manual (Wong Fillmore and Lara, 1996) to be used by states and local education agencies to aid in the scoring of English-language learners' answers to open-ended mathematics questions. In collaboration with the National Center for Educational Statistics and the Educational Testing Service, this manual will be piloted using the work of English-language learners who participated in the 1996 National Assessment of Educational Progress math assessment to see how well it prepares scorers to assess the work of those students accurately. Clearly, classroom teachers also assess students to determine how well they are grasping coursework and to inform instructional practice (see Chapter 7). Innovations at the classroom level include an assessment process that is multiple referenced and incorporates information about the students in a variety of contexts obtained from a variety of sources through a variety of procedures (Genesee and Hamayan, 1994). Navarette et al. (1990) describe innovative assessment procedures that include unstructured techniques (e.g., writing samples, homework, logs, games, debates, story telling) and structured techniques (e.g., criterion-referenced tests, cloze tests, structured interviews), as well as a combination of the two (portfolios). In addition, students are assessed in their native language to better determine their academic achievement and ensure appropriate coursework (Genesee and Hamayan, 1994). Information on student background characteristics such as literacy in the home, parents' educational backgrounds, and previous educational experiences is collected and provides essential information that helps put the assessment results in context. Issues in Assessing Subject Matter Knowledge A central issue in assessing subject matter knowledge is determining what knowledge is intended for assessment. This issue is discussed in detail in the later section on standards-based reform. In the discussion in this section, we assume that the developers of an assessment have decided what to assess and examine the difficulties involved in incorporating English-language learners and bilingual children into assessments intended for their English-proficient peers. As noted in the Standards for Educational and Psychological Tests, every assessment is an assessment of language (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1985). This is even more so given the advent of performance assessments requiring extensive comprehension and production of language.4 4For example, the performance description for mathematical communication, one of seven mathematical performance areas for elementary school children, requires the student to "use appropriate mathematical terms, vocabulary, and language based on prior conceptual work; show ideas in a variety of ways including words, numbers, symbols, pictures, charts, graphs, tables, diagrams, and models; explain clearly and logically solutions to problems, and support solutions with evidence, in both oral and written form; consider purpose and audience when communicating; and comprehend mathematics from reading assignments and from other sources" (New Standards, 1995). Quite clearly, this assessment of mathematical skills is also an assessment of language proficiency.

OCR for page 113
Page 121 The English-language proficiency levels of students affect their performance on subject area assessments given in English. For example, Garcia (1991) found that the English reading test performance of Spanish-speaking Hispanic students was adversely affected by their unfamiliarity with vocabulary terms used in the test questions and answer choices. In fact, interview data demonstrate that the presence of unknown vocabulary in the questions and answer choices was the major linguistic factor that adversely affected the Hispanic children's reading performance.5 Alderman (1981) found that the relationship between test scores on Prueba de Aptitud Academica (a Spanish version of the SAT developed for use with native Spanish speakers) and English SAT scores increased with higher English proficiency test scores for native Spanish-speaking high school students. This study indicated that aptitude can be seriously underestimated if the test taker is not proficient in the language in which the test is being given. Given that the English proficiency level of students affects their performance on assessments administered in English and that recent assessments require high levels of English proficiency, research is needed to develop assessments and assessment procedures appropriate for English-language learners. One strategy under active investigation is the use of native-language assessments. Approximately 75 percent of English-language learners come from Spanish-language backgrounds. For some of these students, it is realistic to develop native-language assessments. However, in doing so, it is desirable to keep in mind the difficulties involved in developing native-language assessments that are equivalent to the English versions. Such difficulties include problems of regional and dialect differences, nonequivalence of vocabulary difficulty between two languages, problems of incomplete language development and lack of literacy development in students' primary languages, and the extreme difficulty of defining a "bilingual" equating sample (each new definition of a bilingual sample will demand a new statistical equating). Minimally, back-translation should be done to determine equivalent meaning, and ideally, psychometric validation should be undertaken as well.6 The challenge of using native-language assessments or bilingual versions is illustrated by the results of research on developing and administering mathematics 5Garcia (1991) also found that the Hispanic students' English reading test performance was adversely affected by their limited prior knowledge of certain test topics, their poor performance on the implicit questions (which required use of background knowledge), and their tendency to interpret the test literally when determining their answers. These findings have implications for the schooling of English-language learners (see Chapters 3 and 7). 6Hambleton and Kanjee (1994) recommend validating the translated version with empirical evidence using item response theory.

OCR for page 113
Page 122 test items only in Spanish or in side-by-side Spanish-English format as part of the National Assessment of Educational Progress field test of mathematics items (Anderson et al., 1996). Spanish-language items were translations of English-version items. This research found substantial psychometric discrepancies in students' performance on the same test items across both languages, leading to the conclusion that the Spanish and English versions of many test items were not measuring the same underlying mathematical knowledge. This result may be attributable to a lack of equivalence between original and translated versions of test items and needs further investigation. Another strategy to make assessments both comprehensible and conceptually appropriate for English-language learners might entail decreasing the English-language load through actual modification of the items or instructions. This would not be a straightforward task, however. While some experts recommend reducing nonessential details and simplifying grammatical structures (Short, 1991), others claim that simplifying the surface linguistic features will not necessarily make the text easier to understand (Saville-Troike, 1991). When Abedi et al. (1995) reduced the linguistic complexity of National Assessment of Educational Progress mathematics test items in English, they reported only a modest and statistically unreliable effect in favor of the modified items for students at lower levels of English proficiency. Other strategies for incorporating English-language learners into assessments include those mentioned earlier, such as extra time, small-group administration, flexible scheduling, reading of directions aloud, use of dictionaries, and administration of the assessment by a person familiar with the children's primary language and culture (Rivera, 1995). Additional possibilities include making test instructions more explicit and allowing English-language learners to display their knowledge using alternative forms of representation (e.g., showing math operations on numbers and knowledge of graphing in problem solving). Almost no research has been conducted to determine the effectiveness of these techniques, however. Another issue in assessment of subject matter knowledge for English-language learners is the errors that result from inaccurate and inconsistent scoring of open-ended or performance-based measures. There is evidence that scorers may pay attention to linguistic features of performance unrelated to the content of the assessment. Thus, scorers may inaccurately assign low scores for performance in which English expression (either oral or written) is weak. This obviously confounds the accuracy of the score enormously.7 Absent training, different scorers probably will rate the same student work very differently. 7Interestingly, Lindholm (1994) found highly significant and positive correlations between standardized scores of Spanish reading achievement and teacher-rated reading rubric scores, as well as between the standardized reading scores and students' ratings of their reading competence, for native English-speaking and native Spanish-speaking students enrolled in a bilingual immersion program.

OCR for page 113
Page 123 Issues in Assessing Special Populations Very Young Second-Language Learners Assessing young children's development in meaningful ways is already surrounded by a great deal of controversy and concern among the preschool education community. As Meisels (1994:210-211) states: …measurement in preschool is marked by recurrent practical problems of formulation and administration. … Many measurement techniques used with older children are inappropriate for use with children below school age, or even below grade 3. For example, the following methods are extremely unlikely to yield valid information about normative trends in development: paper and pencil questionnaires, lengthy interviews, abstract questions, fatiguing assessment protocols, extremely novel situations or demands, objectively-scored, multiple choice tests, isolated sources of data. None of these methods are consistent with principles of developmentally appropriate assessment. If none of these practices are appropriate for young children in general, their inappropriateness for children from different linguistic and/or cultural backgrounds can certainly be taken as a given. For these reasons, McLaughlin et al. (1995:7-8) have called for a special set of guidelines to be used in assessing bilingual preschool children. These guidelines include the following: • Developmental and cultural appropriateness • Awareness of the child's linguistic background • An approach that allows children to demonstrate what they can do • Involvement of parents and family members, teachers, and staff, as well as the child Using these guidelines, McLaughlin et al. recommend what they call "instructionally embedded assessment," in which teachers make a plan about what, when, and how to assess a child; collect information from a variety of sources, including observations, prompted responses, classroom products, and conversations with family members; develop a portfolio; write narrative summaries; meet with family and staff; and finally, use the information to inform curriculum development. And this is a recursive process that begins again once it has been completed for any individual child. An assessment system of this sort is, of course, extremely time-consuming and necessitates reform in several areas, including use of time, professional staff development, accountability, and relationships with parents. It may, however, be the only meaningful way teachers can assess young second-language learners.

OCR for page 113
Page 128 school, sufficient to achieve the goal of having all children—particularly the economically disadvantaged and English-language learners—meet the state's proficient and advanced levels of performance. To determine whether English-language learners are meeting these standards, assessment results will have to be disaggregated by English proficiency status. Some states, such as Florida, Hawaii, Louisiana, Maine, Ohio, and Washington, report disaggregating data by English proficiency status (August and Lara, 1996). However, research is needed to determine how best to accomplish this in statistically sound ways, especially in light of alternative assessment procedures used with English-language learners. Because of the difficulties in assessing English-language learners, it may be important to assess their access to necessary resources and conditions, such as adequate and appropriate instruction. However, defining and assessing these conditions is a very difficult task. Although there has been substantial work in defining some conditions, such as content coverage and time for mainstream students (Carroll, 1958; Leinhardt, 1978), the research base for defining the most important and effective resources and conditions for English-language learners is very weak (see Chapter 7). However, many English-language learners find themselves in poor schools and do not have access to the basics of education necessary for success in school. A good start would be to define and assess these essential resources (e.g., textbooks, course offerings, accessibility of information) while continuing research into other aspects of school life, such as effective school-wide and classroom attributes. In terms of improving opportunities to learn for English-language learners, another strategy would be to encourage the development and evaluation of methods to help school staff monitor progress in improving schooling through systematic attempts to compare their school's performance against certain quality indicators.9 This notion is further elaborated in Chapter 7. Research Needs A relatively small number of student assessment research needs stand out as candidates for highest priority given our existing knowledge base. To address these needs, there must be coordination with the research findings on the linguistic and cognitive development of children (see Chapters 2 and 3). Issues in Assessing Language Proficiency 5-1. Research is needed on how assessments of children's language proficiency in their primary language and English can be improved so they are 9California, for example, has a Program Quality Review System that relies on peer review. Additional benchmarks could include school-wide and classroom factors that are known to improve the performance of English-language learners.

OCR for page 113
Page 129 consistent with research findings on first- and second-language acquisition and literacy development. Existing English-language proficiency instruments emphasize measurement of a limited range of frequently discrete language skills, such as grammar and syntax. Assessments of language proficiency need to be broadened to reflect findings from research in such fields as linguistic pragmatics, interactional sociolinguistics, and cognitive studies of discourse processing that better reflect the language demands placed on language-minority children in everyday contexts (although existing methods for assessing discrete skills should not be ignored). New research on language proficiency, building on research on social factors in school learning highlighted in the previous chapter, should attend to issues such as sensitivity to bilingualism as a social phenomenon and should take into account the potential impact of bilingualism on language proficiency assessment (Valdes and Figueroa, 1994). We need to know more about how community language use affects the development of proficiency in two languages (see Chapter 2). Verhoeven (1996), for example, has found that immigrant children may acquire a less-developed knowledge of grammar in their first language as a result of their limited exposure to use of that language in their new communities. Acquiring language in a bilingual community may lead to variations in both the first and second languages that incorporate grammatical, lexical, and idiomatic features from the other language. 5-2. Research is needed on how to use assessments to determine levels of proficiency in different aspects of English required for English-language learners to participate in English-only instruction. What are the measurement issues associated with the determination of these aspects? How do these proficiency requisites vary by subject and grade? Although many states and local districts have established performance standards for exit from special language assistance programs, these standards have not been validated by tracking student performance in mainstream classrooms. Proficiency requisites may vary by subject since some content areas are more dependent on language than others (for example, reading versus math). They will vary by age since language becomes less contextualized in the upper grades. Issues in Assessing Subject Matter Knowledge 5-3. Research on the assessment of subject matter knowledge needs to address the following questions. First, how do students' levels of English proficiency affect their performance on subject area assessments given in English? Second, how does verbal facility with a first language affect performance on assessments in the native language? Third, how does the language used for instruction affect performance on assessments in the native language?

OCR for page 113
Page 130 Research to date (Alderman, 1981) has found that a student's aptitude in a subject can be significantly underestimated if the test is administered in a language in which the student has limited proficiency. Further research is needed to explore this relationship. With regard to the question of how facility with a first language affects performance on assessments in that language, it is not appropriate to approach this issue as a question of "proficiency" in the native language; such an approach makes neither theoretical nor empirical sense since native speakers acquire proficiency in a first language through their early socialization and additional capacity for proficiency through biological maturation. Instead, the key issues in assessment surround children's familiarity with the kind of language used on an assessment in the first language. For example, we need to investigate how well children understand assessment instructions in their first language—a peculiar usage of language that depends on previous experience with tests of the sort being administered. Research is also needed on how the language used for instruction affects assessment performance in a primary language. For example, do students with communicative competence in their native language but schooling in English perform better when assessed in English than when assessed in their native language? What effect do native language proficiency, years of schooling in English, and difficulty of subject matter assessed have on their performance? 5-4. Research is needed to develop assessments and assessment procedures that incorporate more English-language learners. Further, research is needed toward developing guidelines for determining when English-language learners are ready to take the same assessments as their English-proficient peers and when versions of the assessment other than the "standard" English version should be administered. This research should include attention to the language and performance demands of assessments and assessment instructions that are separate from the content and domain under assessment. It should also include investigation of the effect of any modifications on the validity and reliability of the assessment. We need to understand better the interaction between the performance of English-language learners and the nature of the assessment. That is, do certain assessment formats (e.g., multiple choice versus constructed response) make it more difficult or easier for such students to express subject matter knowledge? Criteria are needed as well for determining which English-language learners should take which form of an assessment—an unmodified English version, a native-language version, a modified English version, an English assessment with support, or some other alternative assessment mode (see August and McArthur, 1996). 5-5. Research is needed to address inaccurate and inconsistent scoring of open-ended or performance-based measures of the work of English-language

OCR for page 113
Page 131 learners. How can errors resulting from such inaccurate and inconsistent scoring be reduced? We need to understand the mechanisms by which the filter of English can influence scorers' accuracy and consistency and ways in which the scoring of English-language learner assessments can be improved. Standards-Based Reform 5-6. Research needs to address whether it is possible to establish common, standard benchmarks for subject matter knowledge and English proficiency for English-language learners within a valid theoretical framework; what these benchmarks might be; and how the benchmarks for English proficiency might be related to performance standards for English-language arts. 5-7. Research is needed to determine whether in the context of school and district outcomes, English-language learners are making progress toward meeting proficient and advanced levels of performance. How can the outcomes of nonstandard administrations/alternative assessments be incorporated into district- and state-wide accountability systems and reporting requirements? 5-8. Research is needed into how opportunities to learn can be evaluated. Standards-based reform has contributed to a redefinition of the role of assessment that has implications for English-language learners. These policies call for the inclusion of these students in assessments, for assessments that are systematically linked to standards systems at the district and state levels, and for evaluations of programs at the school site and district levels to ensure that students are meeting the standards. Although current policy does not require assessments of school and classroom conditions and resources that make it possible for students to meet new standards, educators concerned with helping language-minority children are interested in assessing these opportunities to learn.

OCR for page 113
Page 132 Annex: Legislative Context For Standards And Assessment Legislation passed by Congress in recent years contains several consistent themes regarding student assessment, program evaluation (see Chapter 6), and standards with respect to English-language learners. These themes provide important opportunities for directed research to guide the policy process. Legislative language expressing these themes can be found in Goals 2000 (P.L. 103-227), Title I (Helping Disadvantaged Children Meet High Standards) and Title VII (Bilingual Education Programs) of the Improving America's Schools Act of 1994 (P.L. 103-382), and the Reauthorization of the Office of Educational Research and Improvement (Title IX of P.L. 103-227). The themes might be encapsulated as follows: • Standards and assessments are to fully include English-language learners. • Innovative ways of assessing student performance are encouraged, including modifications to existing instruments for English-language learners. • Programs are to be evaluated with respect to whether they meet "challenging" performance standards, rather than on a normative or comparative basis. • Evaluations are to be useful for program improvement as well as program accountability. The following subsections summarize key provisions of the major legislation. Department of Education Organization Act of 1994 According to Section 216(b)(3) of this act: The Secretary shall ensure that limited-English-proficient and language-minority students are included in ways that are valid, reliable, and fair under all standards and assessment development conducted or funded by the Department. Goals 2000 Goals 2000 provides resources to states and communities to develop and implement systemic education reforms aimed at helping all students meet challenging academic and occupational standards. The law defines "all students" as meaning "students or children from a broad range of backgrounds and circumstances, including among others, students or children with limited English proficiency." The law authorizes grants to states and local education agencies (LEAs) to help defray the cost of developing, field testing, and evaluating assessment systems

OCR for page 113
Page 133 that are aligned with state content standards. It sets aside a portion of funds for developing assessments in languages other than English. Goals 2000 further authorizes federal grants to state education agencies (SEAs) for the purpose of developing a state plan to improve the quality of education for all students. Development of the state plan is to include establishment of teaching and learning standards and assessments aligned with these standards, as well as strategies for program improvement and accountability. Title I The law requires states to develop or adopt a set of high-quality yearly assessments, including assessments in at least reading or language arts and math, to be used as the primary means of determining the yearly performance of each LEA and school served under Title I in enabling all children to meet the state's student performance standards. (If states are using transitional assessments, they must devise a procedure for identifying LEAs and schools for improvement, and this procedure must rely on accurate information about the academic progress of each LEA and school.) The law states that the same assessments must be used to measure the performance of all children. It specifies that assessments must be aligned with challenging content and student performance standards; provide coherent information about student attainment of such standards; be used for purposes for which such assessments are valid and reliable; measure the proficiency of students in the academic subjects in which a state has adopted challenging content and student performance standards; administered at some time during grades 3 through 6, 6 through 9, and 10 through 12; and involve multiple up-to-date measures of student performance. The assessments are to provide for the participation of all students; reasonable adaptations and accommodations for students with diverse learning needs; and the inclusion of English-language learners, who are to be assessed, to the extent practicable, in the language and form most likely to yield accurate and reliable information on what they know and can do, so that their mastery of skills in subjects other than English can be determined. Furthermore, the law states that adequate and yearly progress must be defined in a manner that is consistent with guidelines established by the Secretary of Education, resulting in continuous and substantial yearly improvement of each LEA and school; such improvement must be sufficient to achieve the goal of enabling all children served under this part of the legislation to meet the state's proficient and advanced levels of performance, particularly economically disadvantaged students and English-language learners. Moreover, progress must be linked primarily to performance on the assessments carried out under this section of the legislation, while also being established in part through the use of other measures.

OCR for page 113
Page 134 Title VII The law clearly indicates the purposes of evaluations for programs funded under Subpart 1 (Bilingual Education Capacity and Demonstration Grants): "(1) for program improvement, (2) to further define the program's goals and objectives, and (3) to determine program effectiveness." Evaluations are to address student achievement using state student performance standards (if any), including data comparing English-language learners and other students on school retention, academic achievement, and gains in English (and where applicable, the non-English language) proficiency. The evaluations are also required to incorporate "program implementation indicators that provide information for informing and improving program management and effectiveness," including information on the curriculum and professional development. In addition, evaluations must describe the relationship of activities funded under Title VII to the overall school program and activities conducted through other sources. Evaluations have consequences for comprehensive school grants and system-wide improvement grants. These programs are to be terminated if students "are not making adequate progress toward achieving challenging State content standards and challenging State student performance standards," and "in the case of a program to promote dual language facility, such program is not promoting such facility" (Sections 7114(b)(2) and 7115(b)(2)). Subpart 2 of Title VII authorizes funds for data collection, dissemination, research, and program evaluation through grants, contracts, and cooperative agreements. Current or recent recipients of program grants may conduct longitudinal research to monitor the students. Funds are also made available for activities to promote the adoption and implementation of "programs that demonstrate promise of assisting children and youth of limited English proficiency to meet challenging State standards." References Abedi, J., C. Lord, and J. Plummer 1995 Language Background Report. Graduate School of Education, National Center for Research on Evaluation, Standards, and Student Testing. Los Angeles: University of California at Los Angeles. Alderman, D. 1981 Language proficiency as a moderator variable in testing academic aptitude. Journal of Educational Psychology 74:580-857. American Educational Research Association, American Psychological Association, and National Council on Measurement in Education 1985 Standards of Educational and Psychological Testing. Washington, DC: American Psychological Association. Anderson, N.E., F.F. Jenkins, and K.E. Miller 1996 NAEP Inclusion Criteria and Testing Accommodations. Findings from the NAEP 1995 Field Test in Mathematics. Washington, DC: Educational Testing Service.

OCR for page 113
Page 135 August, Diane, and Julia Lara 1996 Systemic Reform and Limited English Proficient Students. Washington, DC: Council of Chief State School Officers. August, Diane, Kenji Hakuta, Fernando Olguin, and Delia Pompa 1995 LEP Students and Title I: A Guidebook for Educators. Washington, DC: National Clearinghouse for Bilingual Education. August, Diane, and Edith McArthur 1996 Proceedings of the Conference on Inclusion Guidelines and Accommodations for Limited English Proficient Students in the National Assessment of Educational Progress (December 5-6, 1994). National Center for Education Statistics, Office of Educational Research and Improvement, U.S. Department of Education, Washington, DC. Bialystok, E., and K. Hakuta 1994 In Other Words. New York: Basic Books. Carroll, John B. 1958 Communication theory, linguistics, and psycholinguistics. Review of Educational Research 28(2):79-88 Cheung, Oona M., and Lisa W. Soloman 1991 Summary of State Practices Concerning the Assessment of and the Data Collection about Limited English Proficient (LEP) Students. Washington, DC: Council of Chief State School Officers Cheung, Oona M., Barbara S. Clements, and Y. Carol Mieu 1994 The Feasibility of Collecting Comparable National Statistics about Students with Limited English Proficiency. Washington, DC: Council of Chief State School Officers. Cloud, N. 1991 Educational assessment. Pp. 219-245 in E.V. Hamayan and J.S. Damico, eds., Limiting Bias in the Assessment of Bilingual Students. Austin, TX: Pro-Ed. Council of Chief State School Officers 1992 Recommendations for Improving the Assessment and Monitoring of Students with Limited English Proficiency. Washington, DC: Council of Chief State School Officers. Cronbach, L., R. Linn, R. Brennen, and E. Haertel 1995 Generalizability Analysis for Educational Assessments. Los Angeles: Center for Research on Evaluation, Standards and Student Testing and Center for the Study of Evaluation, University of California. Durán, Richard P. 1989 Assessment and instruction of at-risk Hispanic students. Exceptional Children 56(2):154-158. Feuerstein, R. 1979 The Dynamic Assessment of Retarded Persons. Baltimore, MD: University Park Press. Fleischman, H.L., and P.J. Hopstock 1993 Descriptive Study of Services to Limited English Proficient Students, Volume 1. Summary of Findings and Conclusions. Prepared for Office of the Under Secretary, U.S. Department of Education by Development Associates, Inc., Arlington, VA. Garcia, G.E. 1991 Factors influencing the English reading test performance of Spanish-speaking Hispanic children. Research Reading Quarterly 26(4):371-392. Garcia, G.E., and P.D. Pearson 1994 Assessment and diversity. Review of Research in Education (20):337-391. Genesee, F., and E.V. Hamayan 1994 Classroom-based assessment. In F. Genesee, ed., Educating Second Language Children: The Whole Child, the Whole Curriculum, the Whole Community. New York: Cambridge University Press. Gutierrez, K. 1995 Unpackaging academic discourse. Discourse Processes 19(1):21-37.

OCR for page 113
Page 136 Hafner, A. 1995 Assessment Practices: Developing and Modifying Statewide Assessments for LEP Students. Paper presented at the annual conference on Large Scale Assessment sponsored by the Council of Chief State School Officers, June 1995. School of Education, California State University, Los Angeles. Hambleton, R.K., and A. Kanjee 1994 Enhancing the validity of cross-cultural studies: Improvements in instrument translation methods. In T. Husen and T.N. Postlewaite, eds., International Encyclopedia of Education (2nd edition). Oxford, UK: Pergamon Press. Kober, Nancy L., and Michael J. Feuer 1996 Title I Testing and Assessment. Challenging Standards for Disadvantaged Children. Summary of a Workshop. Board on Testing and Assessment, National Research Council. Washington, DC: National Academy Press. Leinhardt, G. 1978 Educational opportunity: Opportunity to learn. Pp. 15-24, Chapter III in Perspectives in the Instructional Dimensions Study: A Supplemental Report from the National Institute of Education. Washington, DC: National Institute of Education. Lewis, J. 1991 Innovative approaches in assessment. In R.J. Samuda and S.L. Kong, J. Cummins, J. Pascual-Leone, and J. Lewis, eds., Assessment and Placement of Minority Students. Toronto, Canada: C.J. Hogrefe. Lindholm, K. 1994 Standardized Achievement Tests vs. Alternative Assessment: Are Results Complementary or Contradictory? Paper presented at the American Educational Research Association, New Orleans, April. School of Education, San Jose State University. McLaughlin, B. 1984 Second-Language Acquisition in Childhood, 2d ed. Hillsdale, NJ: Erlbaum. McLaughlin, B., A. Blanchard, and Y. Osanai 1995 Assessing Language Development in Bilingual Preschool Children. NCBE Program Information Guide Series, No. 22. Washington, DC: National Clearinghouse for Bilingual Education. McLaughlin, M.W., and L.A. Shepard 1995 Improving Education Through Standards-Based Reform. Stanford, CA: The National Academy of Education. Meisels, S. 1994 Designing meaningful measurements for early childhood. Pp. 202-222 in B. Mallory and R. New, eds., Diversity and Developmentally Appropriate Practices: Challenges for Early Childhood Education. New York: Teachers College Press. Messick, Cheryl K. 1988 Ins and outs of the acquisition of spatial terms. Topics in Language Disorders 8(2):14-25. Moss, M., and M. Puma 1995 Prospects: The Congressionally Mandated Study of Educational Growth and Opportunity. First Year Report on Language Minority and Limited English Proficient Students. Prepared for Office of the Under Secretary, U.S. Department of Education by Abt Associates, Inc., Cambridge, MA. National Council of Teachers of Mathematics 1989 Curriculum and Evaluation Standards for School Mathematics. Reston, VA: National Council of Teachers of Mathematics. Navarette, C., J. Wilde, C. Nelson, R. Martinez, and G. Hargett 1990 Informal Assessment in Educational Evaluation: Implications for Bilingual Education Programs. Program Information Guide No. 13. Washington, DC: National Clearinghouse for Bilingual Education.

OCR for page 113
Page 137 New Standards 1995 Performance Standards. English Language Arts, Mathematics, Science, and Applied Learning. Volumes 1, 2, and 3. Consultation Drafts. Washington, DC: National Center for Education and the Economy. Rivera, Charlene 1984 Communicative Competence Approaches to Language Proficiency Assessment: Research and Application. Multilingual Matters 9. Rosslyn, VA: InterAmerican Research Associates. 1995 How We Can Ensure Equity in Statewide Assessment Programs? Findings from a national survey of assessment directors on statewide assessment policies for LEP students, presented at annual meeting of the National Conference on Large Scale Assessment, June 18, 1995, Phoenix, AZ. The Evaluation Assistance Center East. Washington, DC: George Washington University Institute for Equity and Excellence in Education. Royer, J., and M. Carlo 1991 Assessing the language acquisition progress of limited English proficient students: Problems and a new alternative. Applied Measurement in Education 4:85-113. Saville-Troike, Muriel 1991 Teaching and Testing for Academic Achievement: The Role of Language Development. Focus, Occasional Papers in Bilingual Education, No. 4. Washington, DC: National Clearinghouse for Bilingual Education. Shinn, M.R., and G.A. Tindal 1988 Using student performance data in academics: A pragmatic and defensible approach to non-discriminatory assessment. Pp. 383-407 in R.G. Jones, ed., Psychoeducational Assessment of Minority Group Children: A Casebook. Berkeley, CA: Cobb and Henry. Short, D. 1991 How to Integrate Language and Content Instruction: A Training Manual. Washington, DC: Center for Applied Linguistics. Strang, E. William, and Elaine Carlson 1991 Providing Chapter 1 Services to Limited English-Proficient Students. Final Report. Rockville, MD: Westat. Teachers of English to Speakers of Other Languages (TESOL) 1996 ESL Standards for Pre-K-12 Students. Washington, DC: Center for Applied Linguistics. Valdes, Guadalupe, and Richard A. Figueroa 1994 Bilingualism and Testing: A Special Case of Bias. Norwood, NJ: Ablex. Valdez Pierce, L., and J.M. O'Malley 1992 Performance and Portfolio Assessment for Language Minority Students. NCBE Program Information Guide Series. Washington, DC: National Clearinghouse for Bilingual Education. Verhoeven, L. 1996 Early bilingualism, cognition, and assessment. Pp. 276-291 in M. Milanovic and N. Saville, eds., Performance Testing, Cognition and Assessment. Cambridge, England: Cambridge University Press. Wong Fillmore, L. 1982 Language minority students and school participation: What kind of English is needed? Journal of Education 164:143-156. Wong Fillmore, L., and Julia Lara 1996 Summary of the Proposal Setting the Pace for English Learning: Focus on Assessment Tools and Staff Development. Washington, DC: Council of Chief State School Officers. Zehler, Annette M., Paul J. Hopstock, Howard L. Fleischman, and Cheryl Greniuk 1994 An Examination of Assessment of Limited English Proficient Students. Special Issues Analysis Center, Task Order Report, March 28, 1994. Arlington, VA: Development Associates, Inc.

OCR for page 113
Page 138 PROGRAM EVALUATION: SUMMARY OF THE STATE OF KNOWLEDGE The following key points can be drawn from the literature on program evaluation: • The major national-level program evaluations suffer from design limitations; lack of documentation of study objectives, conceptual details, and procedures followed; poorly articulated goals; lack of fit between goals and research design; and excessive use of elaborate statistical designs to overcome shortcomings in research designs. • In general, more has been learned from reviews of smaller-scale evaluations, although these, too, have suffered from methodological limitations. • It is difficult to synthesize the program evaluations of bilingual education because of the extreme politicization of the process. Most consumers of research are not researchers who want to know the truth, but advocates who are convinced of the absolute correctness of their positions. • The beneficial effects of native-language instruction are clearly evident in programs that are labeled ''bilingual education," but they also appear in some programs that are labeled "immersion." There appear to be benefits of programs that are labeled "structured immersion," although a quantitative analysis of such programs is not yet available. • There is little value in conducting evaluations to determine which type of program is best. The key issue is not finding a program that works for all children and all localities, but rather finding a set of program components that works for the children in the community of interest, given that community's goals, demographics, and resources. • Five general lessons have been learned from the past 25 years of program evaluation:   — Higher-quality program evaluations are needed.   — Local evaluations need to be made more informative.   — Theory-based interventions need to be created and evaluated.   — We need to think in terms of program components, not politically motivated labels.   — A developmental model needs to be created for use in predicting the effects of program components on children in different environments.