6
Implementation and Support

There is no single right way to build a science assessment system. There are no step-by-step instructions for developing systems that are well aligned with standards, that clearly communicate valued standards for teaching and learning, and that provide accurate information for decision making. However, certain steps will be invaluable to states that are planning to develop such a system.

This chapter discusses some of these steps and offers some practical ideas for states to consider. Many of these discussions reflect the input of the working groups with whom the committee consulted extensively. This chapter also addresses two other elements of the system—the reporting of results and professional development—that are important for supporting the assessment system and helping it to function as intended. The chapter concludes with a brief discussion of the uses of technology for designing and implementing an assessment system.

NEEDS ANALYSIS

In designing or modifying a coherent science assessment system, states will want to begin with a needs analysis that includes gathering information about what assessment-based information stakeholders need. The needs assessment should include the opinions of a wide range of stakeholders, including students, teachers, school administrators, school district personnel, state policy makers, parents, and the public, as each requires a different array of assessment-based information. A needs assessment can also make clear when and how assessment



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 114
Systems for State Science Assessment 6 Implementation and Support There is no single right way to build a science assessment system. There are no step-by-step instructions for developing systems that are well aligned with standards, that clearly communicate valued standards for teaching and learning, and that provide accurate information for decision making. However, certain steps will be invaluable to states that are planning to develop such a system. This chapter discusses some of these steps and offers some practical ideas for states to consider. Many of these discussions reflect the input of the working groups with whom the committee consulted extensively. This chapter also addresses two other elements of the system—the reporting of results and professional development—that are important for supporting the assessment system and helping it to function as intended. The chapter concludes with a brief discussion of the uses of technology for designing and implementing an assessment system. NEEDS ANALYSIS In designing or modifying a coherent science assessment system, states will want to begin with a needs analysis that includes gathering information about what assessment-based information stakeholders need. The needs assessment should include the opinions of a wide range of stakeholders, including students, teachers, school administrators, school district personnel, state policy makers, parents, and the public, as each requires a different array of assessment-based information. A needs assessment can also make clear when and how assessment

OCR for page 114
Systems for State Science Assessment results should be reported and can identify aspects of the system that will need special monitoring to ensure they are working as intended. It is through such an analysis that the state can consider the role of science assessment in the overall education system and how it will interact with the education and assessment systems in other disciplines. A needs analysis is just as important for a system that is already operating as it is for one that is being developed. Such an analysis can reveal gaps in an existing system, for example, by identifying the need for information that is not being collected. Understanding how the assessment program is perceived and used can guide improvements in the system, highlight future needs, and help states set targets for the allocation of resources. The results of the analyses can be used to develop a continuous improvement plan for science education and assessment, a plan that should guide future modifications to the system. States may find it useful to ask school districts and schools to conduct a parallel needs assessment. Results of these local needs assessments can yield information that state-level analyses might not uncover. Local needs assessments also can be used by school districts and schools to identify important gaps in the information that states provide to them as well as strategies for filling those gaps. The school system of the city of Milwaukee, for example, which had a strong emphasis on developing students’ reasoning and problem-solving skills, recognized that the state testing program did not provide them with the information they needed about student achievement in this area. The school district designed and implemented its own local assessment system to supplement the state testing program—incorporating multiple measures of student achievement that included performance assessments administered in classrooms. The state assessment provided the district with both norm-referenced and criterion-referenced data that could be used for some purposes, while the local assessments provided information on higher order thinking and reasoning skills that were not being assessed by the paper-and-pencil tests used by the state (Webb, 2002).1 The committee recognizes that, in many instances, the list of needs revealed by a needs analysis will be long and states may have to set short-, intermediate-, and long-term goals for implementing the fully developed assessment systems they want. However, states that are not in a position to develop completely new assessment programs can begin with small steps toward their goals. For example, a state might start addressing needs it has identified by including a small number of open-ended assessment tasks in its large-scale assessment program, or by helping schools and districts to develop standardized, classroom-administered perfor- 1   The Milwaukee example comes from mathematics. The committee notes that a similar system could be developed in science. We include the example to illustrate how local needs can lead to the creation of an assessment system that supplements rather than supplants the state assessment.

OCR for page 114
Systems for State Science Assessment mance measures that can shed light on aspects of student achievement that are not assessed by existing tests. EXPERT ADVICE Developing a coherent assessment system is a complex and multifaceted task that requires a variety of expertise, both technical and content specific. A network of independent, yet interacting, advisory groups is an invaluable resource, and they should be put in place before system design begins or as early in the process as possible. The committee suggests that the advisory groups should include both permanent and ad hoc members. Permanent committees could be used to generate specific products, such as standards, assessment designs, and state-issued requests for proposals; the ad hoc committees could review these products as necessary. To ensure that the permanent committees maintain continuity, states could rotate new members into the process on a staggered basis. Terms for committee members should be set for no fewer than five years. One of the advisory groups, sometimes called a Technical Advisory Council (TAC), should advise the state about the technical measurement issues associated with the testing program; other groups should focus on each of the content areas that are part of the state assessment program. Membership for a science content committee should include scientists, science educators, researchers who study science assessment, and individuals with expertise on how people learn science. One or two members of each of the content-specific groups should sit on the TAC to represent the concerns of the discipline. This group will be able to help states evaluate the scientific importance and accuracy of proposed test items before they are included on an assessment as well as respond to comments about the items that are raised after administration. The membership of the TAC should have expertise in all aspects of test design, development, and implementation, including the assessment of students with special needs. The role of the TAC will vary from state to state and from stage to stage, but it should be able to help the state specify the purpose and use of assessment results, identify potential sources of assessment data (e.g., teacher, portfolio, state test, district test), and evaluate whether or not the proposed methods will achieve the purposes of assessment in a technically sound manner. Throughout the design process, TAC members should help the state write and review specifications to guide the bidding process for the development of specific components of the system. They should also help states to identify strategies for interpreting assessment data to meet identified purposes. In addition, the TAC could help the state design alignment studies, evaluate the results of the studies, and make recommendations for changes in the system to improve or maintain alignment between standards and assessment as well as

OCR for page 114
Systems for State Science Assessment across grades and disciplines. It could also help the state monitor and evaluate the overall assessment efforts and recommend changes based on evaluation studies. Appendix A presents some practical tips for working with a TAC. There are two key recommendations: (1) The TAC should work for the state, not the test publisher, although the test publisher should be held responsible for providing the TAC with all the information it needs to carry out its job—including information on possible problems with the tests or the interpretation of results; and (2) the state should have a plan in place to ensure that the advice of the TAC is considered. Some states, for example, require that the assessment director respond in writing to the advice of the TAC and provide justification for not following particular recommendations. DEVELOPING THE STRUCTURE An important step in developing and maintaining a science assessment system is the creation of documents that explain the master plan for the system. These documents should specify the purposes for each assessment in the system, the constructs that each will measure, and the ways the results are to be used. They should provide specific guidance as to who will be tested; where each component will be administered and by whom; who is responsible for developing the component; when the assessments will be administered; and how the results will be scored, combined, and reported for specific purposes. These issues are not mere details, and they can involve a variety of trade-offs and compromises that balance efficiency, cost-effectiveness, technical quality, and the credibility of results. In developing these documents, states will need to consider: The purposes of the assessment system—how the assessment results will be used at different levels of the education system. The resources that are available to support the assessment system. Which indicators should be included in the assessment system and how they will be combined and reported for each of the identified purposes. Which students should be included in assessments and when, where, and how they should be assessed, given the identified purposes. How the effects of the assessment system should be evaluated. What mechanisms should be put in place to address problems uncovered by the assessment results and the evaluation of the assessment system. These documents should be reviewed and updated on a regular basis. The state of Maine has developed a variety of documents that specify the components of their assessment system and summarized them in a chart to help policy makers and educators understand the interaction among assessment purposes, development, and scoring (see Table 6-1).

OCR for page 114
Systems for State Science Assessment TABLE 6-1 Characteristics of Maine’s Assessment System   Primary Purpose Selected or Developed by Scored by Classroom assessment Informing teaching and learning Individual teacher Individual teacher School or district assessment Informing and monitoring Groups of teachers and administrators Groups of teachers (and others) State assessment Monitoring and evaluating programs to ensure accountability Groups of administrators, administrators, and/or policy makers Scorers outside the district Assessment system Informing teaching, monitoring and evaluating, certification District assessment leadership Both internal and external   SOURCE: Maine Department of Education (2003). Identification of Purposes As states identify the purposes that the assessment system will serve, they will need to consider what assessment-based data will be needed for each identified purpose as well as how those data will be reported. The relationship between the results and the decisions must be clear. For example, if a state wishes to know about the progress that all students are making toward achieving state standards, then a large-scale test that is administered to all students and samples broadly from among all of the standards should be included in the system. If a state hopes to provide information that can be used to address individual students’ needs, then assessment strategies that permit in-depth assessment of student understanding of a smaller set of knowledge and skills will be needed. If both kinds of information are needed, as is the case with the No Child Left Behind Act (NCLB), then both types of assessment would be needed. Each assessment that is part of the system should be accompanied by a clear list of purposes for which it could be used—that is, inferences for which it could provide valid evidence. The specific purpose will guide the selection of measures that can elicit evidence of understanding and dictate the circumstances in which they should be used. It is important to note that the purpose of testing is not the same as the type of test, items, or tasks; there is no specific item type or assessment type that is unique to a particular purpose. For example, well-designed multiple-choice items can be used for formative purposes, just as open-ended performance tasks can be included on tests that are used for accountability or program evaluation purposes. In fact, the same question could be used in assessments designed for different purposes quite successfully. Although educational measurement experts frequently stress that one test

OCR for page 114
Systems for State Science Assessment cannot serve all purposes equally well, there need not necessarily be a one-to-one correspondence between the number of tests and the number of purposes, provided the state is cognizant of the trade-offs inherent in using an assessment to serve multiple purposes. Evidence that an assessment is valid for one purpose is insufficient to establish the validity of its use for another purpose (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999, p. 17). Some evidence exists (Niemi, 1996; Baker, 1997; Baker, Abedi, Linn, and Niemi, 1996) that tests can be designed to yield useful information for various purposes at different levels of the system when the results are reported in different ways. Baker (2003) suggests that system-oriented measures can be turned to instructional improvement purposes in this way. This would be possible if evidence is collected to support the validity of each purpose, or if the different purposes are addressed by aggregating and reporting results in different ways. The Nebraska STARS (Buchendahl, Impara, and Plake, 2002) and the Maine MeCAS2 programs have used this approach. In these programs, results of local assessments whose primary purpose is to support teaching and learning in the classroom are being combined with each other and with state-level assessments to support judgments about achievement of the state standards. These judgments are useful to both teachers and policy makers. The programs are built around a strong foundation of professional development that supports teachers in developing technically sound assessments. In each of these states, considerable attention has been paid to establishing the validity of both classroom and district portions of the assessment system for each intended purpose. However, concerns about the comparability of information across districts remain, and further research and experience will be needed to determine how well such strategies will work for different purposes. At What Level the Assessment Is Administered Another aspect of the selection of suitable assessment approaches is where the assessment should be administered to maximize its usefulness and provide results that support desired inferences. In a system of assessments, with many ways to implement an assessment strategy, decisions should be based both on the construct to be measured and where the most accurate picture of student learning can be obtained. For example, as discussed in Chapter 3, if a detailed picture of students’ abilities to conduct a scientific investigation is needed, this information may be captured best by the teacher while students are actively engaged in inquiry 2   Information about the Maine MeCAS program is available from: http://www.state.me.us/education/mea/index.htm.

OCR for page 114
Systems for State Science Assessment activities. Such an assessment should therefore be administered in the classroom (or wherever the activity takes place). It is also important to consider how the results will be used. There are trade-offs inherent in any decision about where assessment should take place. For example, while ongoing classroom assessment helps teachers make instructional decisions that can enhance student learning, the results of such assessments may not be incorporated easily into an assessment system that is used for accountability purposes because they are not standardized and therefore not easily comparable. By the same token, the results of standardized tests, which are easily absorbed into accountability systems, may not meet the immediate needs of teachers or students. One strategy for meeting both needs is to ask teachers to incorporate some standardized tasks—which can more easily be used for accountability purposes because their comparability from classroom to classroom can be readily established—into their repertoire of classroom assessment strategies. Such assessment tools would not replace ongoing formative assessment but supplement it (see, for example, the description of New York’s and Connecticut’s assessment of inquiry in Chapter 3). There is a need for more research on the design and implementation of standardized classroom assessment opportunities. Textbook publishers could assist in this effort by including in their supplementary materials a variety of assessment activities and related scoring rubrics that could be implemented by teachers in the classroom and possibly incorporated into the state’s assessment system. Frequency Decisions about how frequently to assess depend on how the results are to be used and how stable they need to be over time. For example, tests given at the end of a school year, while useful for providing a snapshot of what students have learned and for evaluating patterns of errors that could be the target of future instructional interventions, do not typically affect the educational experiences of the students who take them. Assessment strategies designed to support students’ ongoing learning must provide feedback in time for students and their teachers to benefit from the information. Tests that are used to determine how students are progressing from one grade to the next may only have to be administered once per year. Large-scale assessments, such as the National Assessment of Educational Progress (NAEP) science assessment, that are designed to paint a broad picture of what students in U.S. schools know and can do in science need to be administered even less often. Assessing more frequently than is necessary for a particular purpose is costly and inefficient; assessing too infrequently can provide inaccurate information or may provide information that arrives too late to be useful to support important decisions.

OCR for page 114
Systems for State Science Assessment Responsibility for Test Development In an assessment system the responsibility for developing assessments can be distributed across the system, which makes it more difficult for states to maintain coherence unless there is a plan in place. Roeber (1996) describes a process that states could use in developing a coherent set of assessment practices to meet the information needs of participants at different levels of the education system (Box 6-1). However, he, like the committee, understands that states vary in many ways and that the model is just one possibility among many. We include it in this report to illustrate how such a system could be developed—not as a model for system design. WORKING WITH A COMMERCIAL TEST PUBLISHER In creating a high-quality state testing program, many states will work with a commercial test publisher in some way. As Patz et al. (2005) point out, the way a state views the role of professional test development organizations may depend on the way it views the task of assessing student learning. For example, if a state thinks of assessment as primarily an opportunity to capture the success of efforts to pursue key intellectual goals in the schools, then it may see only a limited role for commercial testing contractors. A state that opts for a technically complex, large-scale assessment is likely to depend more heavily on a testing contractor. A system can easily incorporate both kinds of assessment, so that a state may use a contractor only for the development of some components of its assessment system (Education Leaders Council, 2002). State education and testing industry personnel, working under the auspices of the Education Leaders Council developed a set of standards to guide states as they develop relationships with test publishers. These standards provide guidance on preplanning, design and response strategies for requests for proposals, administration, scoring, reporting, and appropriate uses of data. Although the committee did not evaluate these standards, we think that they raise important considerations for states and their test development contractors.3 Appendix A contains a checklist for the preparation of a request for proposals for testing contractors as well as some practical tips for working with them. These are not intended to serve as standards, but rather to highlight aspects of the working relationship that may need attention and to provide some recommendations for improving the collaboration. Two of the design team papers described in Chapter 2 provide additional guidance for states in working with test contractors. Patz et al. (2005) discuss some basic elements of project management and suggest a variety of ways that 3   The standards are available at: http://www.accountabilityworks.org/publications/ELC_AW_Model_Contractor_Standards_and_State_Responsibilities_for_State_Testing_Programs.pdf.

OCR for page 114
Systems for State Science Assessment BOX 6-1 Developing a Coherent Assessment System: An Illustrative Example The state develops a set of content standards in selected areas with local district input. Most school districts adopt the state standards as their own. In each area, the state coordination team develops an assessment blueprint describing the manner in which the content standards are to be assessed at the state, district, and classroom levels. The state selects subjects for statewide assessments to be administered in certain grades. The purpose of the assessments is primarily to hold schools accountable for student performance. Results are reported to parents, teachers, schools, and districts. Performance standards are created for each area in which the state has created content standards. These standards ensure assessments can be used to judge the performance of students and schools. For each area in which the state has developed content standards, the state coordination team also develops a professional development program to ensure that all local educators are able to address the content standards and help students achieve at high levels. The state creates the assessments that will be used, with the state coordination team overseeing the work to ensure the assessments match the content standards and fulfill the purposes of the overall assessment system. The state creates other assessments (portfolio assessments, performance events, performance tasks, plus more conventional selected-response and open-ended assessments) for use as “off-grades” throughout the school year. These assessments provide information teachers can use to improve the learning of individual students, as well as group information to improve the instructional program at the school and classroom levels. The state sees that the assessments are created, validated, and distributed across the state. As part of this process, the state administers the assessments to a sample of students statewide at each grade level, develops scoring rubrics and training materials for each open-ended or performance measure, and prepares the materials for distribution to school districts. Assessments are tried out in a representative set of classrooms around the state with the results used in several ways: to refine the assessments themselves, to refine the assessment administration directions, and to revise and expand the scoring rubrics. The state provides ongoing information and professional development opportunities to all local school districts. Assessment information collected by classroom teachers is summarized at the building level. District and school summaries are added to provide a more complete picture of student achievement. SOURCE: Adapted from Roeber (1996).

OCR for page 114
Systems for State Science Assessment states could work with test development professionals and test publishers in implementing a science assessment program according to NCLB requirements. Popham et al. (2004) include draft language that could be incorporated (with any desired modifications) into a request for proposals. REPORTING ASSESSMENT RESULTS The reporting of assessment results is frequently taken for granted, but consideration of this step is critical in the design of assessment systems and in the use of assessment-based information. The committee recommends that decisions about reporting be made before any assessment design begins. As we have discussed, information about students’ progress is needed at all levels of the system, albeit with varying degrees of frequency and in varying degrees of detail. Parents, teachers, school and district administrators, policy makers, the public, and of course students themselves need clear, accessible, and timely information and feedback about what is taking place in the classroom (Wainer, 1997). Moreover, in a systems approach, many different kinds of information need to be available, but not all stakeholders need the same information. Thus, questions about how various kinds of results will be combined and reported to different audiences and how reporting can support sound, valid interpretations of results need to be considered very early in the process of system design. NCLB’s requirements for the reporting of assessment results are fairly specific. Results that are aligned with the state’s academic achievement standards are to be reported for all tested students and disaggregated by major subgroups. The results also are to include “interpretative, descriptive, and diagnostic reports” for individuals that can be used to “help parents, teachers, and principals to understand and address the specific academic needs of students” (P.L. 107-110). Depending on the needs of different groups for assessment-based information, results can be presented in terms of individual standards or clusters of standards, or in terms of learning progressions that have been defined and made clear and available to all. They can be presented in terms of comparisons of one student’s or a group of students’ performance to other groups or to established norms. Results can also describe the extent to which students have met established criteria for performance. If descriptions of the skills, knowledge, and abilities that were targeted by the tasks in the assessment are included, users will understand the links between the results and goals for student learning. When these links are clear, users of the results—whether parents, teachers, or policy makers—can see how they could act on what they have learned about student progress. We note that the reporting of assessment results can take many forms—from graphical displays to descriptive text, and from a series of numbers to detailed analysis of what the numbers mean. Some states report assessment results on a

OCR for page 114
Systems for State Science Assessment standard-by-standard basis; others provide information keyed to learning objectives for a specific class. In many states in Australia, where learning continua serve as the basis for assessment at all levels of the system, progress maps are used to describe student achievement (see Chapter 5). NCLB requires that “interpretative” material be included in reports. Inter-pretive material is supporting text that explains, in a way that is suited to the technical knowledge of the intended audience, the nature and significance of the results. Interpretative material should: specifiy the purposes of the assessment. describe the skills, knowledge, and abilities being assessed. provide sample test questions and sample student responses keyed to performance levels. provide a description of the performance levels. describe the skills, knowledge, and abilities that a student or group of students have achieved or have not yet achieved. describe how the results should be interpreted and used, with a focus on ways to improve student performance. describe common misinterpretations of results. indicate the precision of scores or classification levels. Samples of student work are a useful way of illustrating student achievement. When reports include such samples, users can gain further insight as to what it means for a student to be classified at a particular achievement level. Samples can also be used to illustrate the ways in which students need to improve. Many assessments are designed to generate subscores, that is, detailed results for particular aspects of the domain that has been assessed. Subscores provide an important means of making assessment results more useful. Providing subscores for traditional paper-and-pencil tests, with or without open-ended items, is relatively straightforward; it depends largely on ensuring that a sufficient number of tasks that measure the subdomain have been included and that measurement error for that portion of the assessment has been established. Developing subscores, or perhaps nonnumerical results that address particular aspects of a domain to be measured, is also useful in the context of other kinds of measures, and it fits well with the learning progression model we have presented. However, the development of subscores that can support decisions about curricula or be used in the diagnosis of students’ needs relative to state standards is an area that needs further research and development. Information about the performance of relevant comparison groups can also enhance users’ understanding of individual and group results. Other information—for example, about the quality of education and opportunities afforded to students, as well as students’ motivation to perform well—can further enhance the validity of score interpretations. The Internet offers the possibility of making a volume of information available to users that might be impractical for paper-

OCR for page 114
Systems for State Science Assessment based reports. Information can be presented in the context of guidance as to how it can be used and interpreted, and it can be interactive so that users can focus on the areas of greatest relevance to them. Users need to understand the degree of uncertainty or measurement error associated with assessment results. This is particularly important when a variety of measures are used in a system, although quantitative measures of error can be less straightforward for newer modes of assessment than for traditional tests. Such information can be conveyed using standard error bands, a graphic display, or statements regarding the probability of misclassification (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999). Regardless of how this is done, each score reported should be accompanied by an indication of its margin of error or other indicators of the measure’s degree of precision. This information should be supported by text that makes clear how the precision of the scores should be factored into inferences based on the results. Information on how close individual students or groups of students are to attaining a different performance level can also be reported (Goodman and Hambleton, 2003), along with a description of the skills, knowledge, and abilities represented by each performance level. Finally, while much research has been done on the design of technically sound assessments, there is little research on ways of reporting results that allow for accurate and meaningful interpretations (Hambleton and Slater, 1997; Jaeger, 1998; Goodman and Hambleton, 2003). Research has indicated that users’ preference for a data display and the understandability of a display do not always coincide (Wainer, Hambleton, and Meara, 1999). To ensure that reports communicate clearly and effectively to their intended audiences, different formats should be evaluated to determine which are best understood and most likely to be used accurately by those audiences. This can be accomplished using think-aloud studies and focus groups consisting of members of the relevant audiences. We encourage the U.S. Department of Education to assist states by supporting research on the design of effective assessment reporting tools, including the use of technology for this purpose. We also encourage education policy organizations and professional societies to create opportunities for this issue to be addressed. PROFESSIONAL DEVELOPMENT Teachers and students are in the best position to use assessment results directly to improve learning, and teachers need specific knowledge and skills to make sure that this happens. The committee has called on states not to rely exclusively on large-scale assessments, but to use multiple modes of assessment to obtain the kinds of information that are needed to understand and effectively monitor students’ science learning. We also call on states to make use of relatively new research findings about the ways in which student learning progresses in the

OCR for page 114
Systems for State Science Assessment sciences when designing science education systems and science assessments. The demands of such a system on teachers are clearly very great; as a consequence, the responsibility of states to make sure that teachers are supported in this effort, while they are training to enter the profession and throughout their careers, is correspondingly great. The committee concludes that a strong system of professional development is critical for the proper functioning of a science assessment system, as it is to the success of standards-based reform in general. Many reports and articles have described the nature of high-quality professional development (see, for example, National Research Council, 2001a; Putnam and Borko, 2002; Shepard, 2000; Darling-Hammond, 1998; Hawley and Valli, 1999), and this report does not discuss general principles of effective professional development. Rather we highlight key challenges for professional development that relate to assessment. Throughout the education system, many individuals and groups—not just teachers—make and influence decisions regarding the use and interpretation of assessment results, and they base their decisions, for good or ill, on the understandings they have of the assessment process. When their understanding is poor, the consequences can be great. It is thus very important that all of these individuals and groups—from elected officials at the highest levels, to school board members, to parents—understand that assessment is integral to the system, not a separate task. These individuals need to have the opportunity, and take the responsibility, to become educated about how assessments work, their goals, and the interpretation of their results. States rely on both preservice programs (for teachers in training) and inservice programs (for practicing teachers) to provide professional development for their teachers. In general, neither preservice teacher preparation programs offered by colleges and universities, nor in-service programs, which typically are controlled by schools and districts, are currently accomplishing all that they could, particularly with regard to assessment. We focused our attention on the kinds of professional development that are needed to enable teachers and others to use science assessment results to improve student learning outcomes. Just as we see science assessment as an element in a coherent system, we see professional development as an important element for supporting that system. Whatever form it takes, assessment is a tool that all teachers use every day to obtain information on their students’ learning. For classroom assessment to function as it should in a system, it is the teachers who must develop and use means of assessing their students’ learning, who must incorporate measurement tools developed by themselves and others into their instruction, and who must prepare students for assessments that will be given outside the classroom setting. Teachers also must absorb and understand the information that all these kinds of assessments can supply, and they have the principal responsibility for using that in-

OCR for page 114
Systems for State Science Assessment formation to help their students learn and to improve their own instructional strategies. To accomplish these things, teachers need to understand the principles on which different kinds of assessments are based. Large-scale assessments designed to provide information about many students, for example, often are viewed by teachers as intrusions that bear little relationship to their goals in the classroom, and few teachers are well prepared to make sense of the kinds of results that these assessments typically provide. Yet these assessments are the only means of obtaining important information, including data for evaluating the success of educational approaches, for monitoring trends over time, and for certain accountability purposes. Moreover, if the outside assessments that teachers encounter are designed as parts of the coherent system the committee is calling for, they will be consonant with the assessments used at the classroom level and can provide information about students’ progress toward the science standards that teachers can use. While teachers may not be involved in the design or selection of the large-scale assessment instruments their students take, it is important that they understand the purposes the assessments are designed to serve, the kinds of inferences they were designed to support, and the ways in which the results are to be used. They also need to understand the kinds of data that are produced, and they should have sufficient understanding of the assessment’s technical properties to be able to put the data in context and link it to other information they have about their students. Large-scale assessments are just one tool for obtaining information about what students are learning. Teachers already assess their students constantly. Informally, they gain information through interactions with students—for example, by taking note of the understanding or misconceptions that underlie students’ comments and questions and by observing the ways they use resources and approach challenges. More formally, they devise activities, quizzes, tests, and the like to find out how and what students have learned. All of these assessment activities require that teachers have, in addition to a deep understanding of the content domain, a foundation of basic knowledge about how to develop tasks that are valid and useful in the classroom, the ways in which student learning develops, principles of educational measurement, and the subject matter they are teaching. Teachers need deep understanding of the subject matter they are teaching if they are to develop and use assessment effectively. There is considerable evidence that existing knowledge and beliefs play an important role in how teachers learn to teach, how they teach, and how they think about teaching (Cohen and Ball, 1990; Prawat, 1992; Putnam and Borko, 1997). For example, teachers must understand their discipline deeply to develop assessment opportunities that promote learning and to avoid assessment that encourages rote learning. They need

OCR for page 114
Systems for State Science Assessment to understand how learning in the subject area develops over time, so that they can assess initial understandings before moving to more complex ideas. Perhaps most important, teachers whose knowledge is incomplete or inaccurate may reinforce, through assessment, incorrect conceptions held by their students. It is probably unrealistic to assume that in-service professional development opportunities, which schools and districts use for many different purposes, will provide teachers with the skills they need to use and understand assessment effectively. Assessment competence, like competence in any discipline, requires sustained effort and focused instruction accompanied by practice and feedback. The committee therefore calls on colleges and universities that prepare teachers to include in their curricula for teacher education courses on educational measurement that are both general and specific to science. Because the course requirements for teacher preparation programs are largely set by state licensure requirements, the most effective way to encourage these programs to include educational measurement courses is for states to include in their standards for certification and recertification a provision that teachers demonstrate assessment competence as a condition for teacher licensure. Much work is needed to make this a reality. Stiggins (1999) found that only 25 states require assessment competence as a criterion for licensure, and Trevisan (2002) found that only 18 states had any requirements related to assessment literacy for school administrators. Trevisan points out that, in 1990, the American Federation of Teachers, the National Council on Measurement in Education, and the National Education Association issued Standards for Teacher Competence in Educational Assessment of Students. Box 6-2 contains the seven standards these organizations developed for teacher assessment literacy. He calls on states to consider some of these national standards in revising their own licensure requirements. He highlights the work of the state of Washington, which requires all teachers in the state to meet national standards in each field; specifically, teachers are required to meet requirements of the Interstate New Teacher Assessment and Support Consortium (INTASC), which include indicators for assessment literacy. Education administrators at all levels of the system require assessment competence for (1) assisting teachers in creating and using assessment effectively; (2) providing leadership in the creation and implementation of building- or district-level assessment policies; and (3) using assessment results in their capacity as administrators in making decisions about students, teachers, and instruction; and (4) reporting on assessment results to a variety of stakeholders and constituencies. Box 6-3 includes standards for assessment competency for education administrators that were developed through the collaborative efforts of a number of organizations representing school administrators and the educational measurement community.4 4   In 1990, the American Federation of Teachers, the National Council on Measurement in Education, and the National Education Association published the Standards for Teacher Competence on

OCR for page 114
Systems for State Science Assessment BOX 6-2 Standards for Teacher Assessment Competence Teachers should be skilled in: choosing assessment methods appropriate for instructional decisions. developing assessment methods appropriate for instructional decisions. administering, scoring, and interpreting the results of both externally produced and teacher-produced assessment methods. using assessment results when making decisions about individual students, planning teaching, developing curriculum, and school improvement. developing valid pupil grading procedures, which use pupil assessments. communicating assessment results to students, parents, other lay audiences, and other educators. recognizing unethical, illegal, and otherwise inappropriate assessment methods and uses of assessment information. SOURCE: American Federation of Teachers, National Council on Measurement in Education, National Education Association. Available at http://www.lib.muohio.edu/edpsych/stevens_stand.pdf. While teachers and students can use assessment results directly to improve learning, policy makers and the public use assessment results to allocate resources, set education policy, and advocate for change. All these groups need a better understanding of what assessment results can and cannot tell them about education and student achievement. Several large policy organizations—for example, the National Conference of State Legislatures, the National Association of Secondary School Principals, and the Southern Regional Education Board—have published reports to help their members better understand the uses of assessment results. Similarly, Boston, Rudner, Walker, and Crouch (2003) developed a guide for education journalists to assist them in using and reporting assessment data accurately. The committee urges all who are responsible for using or reporting assessment results to become as informed as possible.     Educational Assessment of Students. The joint committee recommended those standards as a framework for preservice and in-service training for teachers. The committee also recommended that standards be developed for other categories of educational professionals. This document is intended to complement the Standards for Teacher Competence.

OCR for page 114
Systems for State Science Assessment BOX 6-3 Synthesis of Competency Standards in Student Assessment for Education Administrators Competencies associated with assisting teachers: Have a working level of competence in the Standards for Teacher Competence in Educational Assessment of Students. Know the appropriate and useful mechanics of constructing various assessments. Competencies associated with providing leadership in developing and implementing assessment policies: Understand and be able to apply basic measurement principles to assessments conducted in school settings. Understand the purposes (e.g., description, diagnosis, placement) of different kinds of assessment (e.g., achievement, aptitude, attitude) and the appropriate assessment strategies to obtain the assessment data needed for the intended purpose. Understand the need for clear and consistent building- and district-level policies on student assessment. Competencies needed in using assessments in making decisions and in communicating assessment results: Understand and express technical assessment concepts and terminology to others in nontechnical but correct ways. Understand and follow ethical and technical guidelines for assessment. Reconcile conflicting assessment results appropriately. Recognize the importance, appropriateness, and complexity of interpreting assessment results in light of students’ linguistic and cultural backgrounds and other out-of-school factors in light of making accommodations for individual differences, including disabilities, to help ensure the validity of assessment results for all students. Ensure the assessment and information technology are employed appropriately to conduct student assessment. Use available technology appropriately to integrate assessment results and other student data to facilitate students’ learning, instruction, and performance. Judge the quality of an assessment strategy or program used for decision making within their jurisdiction. SOURCE: American Association of School Administrators, National Association of Elementary School Principals, National Association of Secondary School Principals, National Council on Measurement in Education. http://www.unl.edu/buros/bimm/html/article4.html.

OCR for page 114
Systems for State Science Assessment INCORPORATING TECHNOLOGY INTO THE SYSTEM5 Technology holds great potential to help in efforts to push large-scale testing beyond the paper-and-pencil format, to find ways to measure more kinds of performances, and to transform the way assessments are designed, developed, administered, and scored (Bejar, 1996; Bennett, 1998, 2002; Mislevy, Steinberg, and Almond, 2002). However, that promise has not yet been fully realized in most state testing programs. Despite the fact that multimedia environments offer opportunities to present students with complex, lifelike situations with which they can pursue a sustained investigation, or have opportunities to visualize abstract concepts, or work with large complex data sets, most technology-based assessment is generally used only in technology-based learning environments that have a significant technological infrastructure in place. Thus the application of such assessment approaches has been limited (Quellmalz and Haertel, 2004). More research in this area is critical if technology is to be incorporated into state assessment programs more broadly. Several groups of researchers are beginning to make progress in this area. For example, Mislevy, Steinberg, Breyer, and Almond (2002) are developing a technology-supported assessment design system through the Principled Assessment Designs in Inquiry (PADI) project (Mislevy et al., 2003). PADI is a system for developing reusable assessment task templates, organized around schemas of inquiry that are based on research from cognitive psychology and science education. The completed system is to have multiple components, including generally stated rubrics for evaluating evidence of inquiry skills, an organized set of assessment development resources, and a collection of schema, exemplar templates, and assessment tasks. Currently, however, in our review of the use of technology in assessment, we found that most states are using technology primarily for the following purposes: administering assessments; organizing, managing, and analyzing student assessment data; making items, performance tasks, rubrics, and complete tests available to teachers; and scoring and reporting assessment data to various stakeholder groups. Although we found examples of schools and districts that incorporate technology into their instructional and formative assessment activities at a local level, such use has not, for the most part, spread to state assessment programs. 5   For a more in-depth discussion of some of these issues, the committee refers the reader to a paper prepared for the committee by Edys Quellmalz and Geneva Haertel of SRI, International. The paper, “Use of Technology-Supported Tools for Large-Scale Science Assessment: Implications for Assessment Practice and Policy at the State Level” (2004), covers a range of topics related to technology and science assessment.

OCR for page 114
Systems for State Science Assessment Online Administration The Education Week report Technology Counts (2003) describes how 12 states and the District of Columbia are administering computer-based assessment to students. As testing requirements increase and budgets are tightened, the authors of this document believe that more states will follow suit. It is noteworthy that only four of the states were conducting science assessment on line, and only one was including open-ended questions. But it is also interesting to note that in 2004 Maine made its innovative multiformat science assessment available on laptop computers. Economics seems to be a primary motivator for the increase in computer-administered assessment. Neuberger (2004) reported that Oregon recovered the cost of developing an online version of its state test within one year. Quellmalz and Haertel (2004) report that vendors estimate that computer-administered tests save half to three-quarters of the administrative costs of paper-and-pencil versions. An added advantage of computer-administered assessments is the potential for immediate feedback that can be used by students and teachers more effectively than results from external assessments that must be sent away for scoring. However, we note that until computer-delivered, large-scale assessment includes opportunities to measure complex thinking and conceptual understanding, its usefulness as a feedback mechanism will be limited. Scoring Technology supporting the scoring of responses has been evolving rapidly and has been greatly improved by advances in semantic analysis and computer-based scoring of written text. While a number of commercial products are available to support automatic essay scoring, methods for scoring shorter constructed responses are still being refined. One effective strategy that has been shown to have a positive effect both on teachers’ assessment competence and the quality of their teaching is to involve them in the scoring or evaluation of student test responses. Costs associated with this activity—for example, meeting costs and the costs for transporting the tests and teachers—have limited their use. However, technology can reduce the costs associated with scoring open-ended items (Odendahl, 1999; Whalen and Bejar, 1998). Computer support for live scoring has been developed by commercial testing companies. The supports range from online training and calibration checks to fully online systems in which live conversations between raters are possible and in which all participants can see the same examples, interact with other meeting participants, share comments on the student work samples and rubrics, and amend the scoring rubrics as a group if necessary.

OCR for page 114
Systems for State Science Assessment Managing the Data Most states are already harnessing the power of technology to manage assessment data and to link it with other student information. For example, by providing every student in the state with a unique identification number (as many states are now doing), states can use data analysis programs to view assessment data in multiple ways. Such programs allow educators not only to look at overall achievement and the accomplishments of individual students, but also to disaggregate the information by teacher, by race, by poverty status, and by students with disabilities and those who are learning English. The performance of students who have participated in particular instructional programs can be captured and results can be linked to such factors as the length of time in the school or course-taking patterns. Technology makes these types of analysis easier and more readily available than in the past. Technology provides an efficient means of storing, managing, and reporting results from multiple assessment opportunities so they can be retrieved, combined, and reported in a cost-effective manner. It also makes possible the creation of databases of student work that can be used by teachers, students, and parents as a guide to expectations for student achievement. These examples of student work, if linked to specific performance levels as described by the state academic achievement standards, could facilitate students’ involvement in their own assessment by allowing them to compare their performance with acceptable performance—an important aspect of learning with understanding. Support for Assessment Development Many states and school districts have created item banks linked to state standards and made them accessible to teachers and others for use in classroom or district assessment activities. The American Association for the Advancement of Science is actively engaged in developing an item bank of science items that are linked to the Benchmarks for Science Literacy and the maps contained in the Atlas for Science Literacy. It is also their intention to make these items available to states and researchers through an online delivery system linked to the maps. Item banks are useful tools for teachers and others, but care must be taken to ensure that items drawn from the banks are aligned with state standards and goals. The wide variety among states complicates the sharing of item banks (see Quellmalz and Moody, 2004, for a discussion of some issues involved in operating an item bank). In sum, the committee found that technology holds great promise for improving science assessment, but further developments in its applications to assessment will be required before that potential can be realized. We urge all concerned to continue to pursue promising strategies.

OCR for page 114
Systems for State Science Assessment QUESTIONS FOR STATES Implementation Designing an assessment system is an iterative process that cannot be accomplished in one fell swoop. States must build their science assessment systems carefully and deliberately over time, keeping in mind issues of validity and coherence and recognizing that adding new components or eliminating others can create changes in the system that need to be addressed. The committee proposes that in implementing a system, states ask themselves the following questions: Question 6-1: Has the state brought together important stakeholders and required experts to develop or revise its science assessment system so that it reflects a shared vision of science education? Question 6-2: Does the state have a written master plan for its science assessment system that specifies which types of assessments are to be used for which purposes; how frequently the different assessments will be administered; who will develop them; who will administer them; at what level of the education system they will be administered; and how the results will be scored, reconciled, and reported? Question 6-3: Has the state developed both long- and short-term strategies for ensuring that resources are available for assessment development and revision? As part of this process, has consideration been given to such strategies as doing a little bit each year, purchasing curriculum materials that include quality assessments, collaborating with other states that have similar standards to develop assessments or item banks, or developing an assessment system that uses existing personnel and assessment opportunities to assess aspects of science learning that might otherwise be too expensive to assess? Question 6-4: Is the state’s assessment system plan closely aligned with the complete array of its science standards, reflecting the breadth and depth of the science content knowledge, scientific skills and understandings, and cognitive demands that are articulated in the standards? Question 6-5: Does the state have, and use the support of, both technical and content-specific advisory committees to provide advice and guidance on the design, implementation, and ongoing monitoring and evaluation of the assessment system? Do these advisory committees make recommendations to improve particular aspects of the assessment system, and does the state have in place a plan for considering and responding to their suggestions?

OCR for page 114
Systems for State Science Assessment Reporting Assessment Results Question 6-6: Has consideration been given in designing the assessment system to the nature of the score reports and to the intended inferences that the assessment information will be used to support? Question 6-7: Have the state and its contractors developed strategies to ensure that reports of assessment results are accessible, relevant, and meaningful to the targeted audiences and that they are provided in a timely manner? Question 6-8: Do assessment reports include information on the precision of scores and on the accuracy with which the scores can be used to classify students by performance levels? Do they include information about and examples of the appropriate and inappropriate use of the scores and about the kinds of inferences that can and cannot be supported by the results? Professional Development Question 6-9: Do the state’s teachers, school administrators, and policy makers have ongoing opportunities to build their understanding of current assessment practices and expand their skills in using and interpreting assessment results? Question 6-10: Do school, district, and state education administrative personnel possess sufficient assessment competence to use assessment information accurately and to communicate it effectively to interested stakeholders? Question 6-11: Do school, district, and state education administrative personnel have sufficient resources to collect, store, manage, and analyze the data collected through the assessment system? Question 6-12: Do the state, school districts, and schools include science educators in every step of the assessment process (from the design of the assessments to data collection to the use and interpretation of the results), thereby providing ongoing opportunities for individuals at each of these levels to build their understanding of current assessment practices and expand their skills in using and interpreting assessment results? Question 6-13: Do the state’s teacher licensing regulations for certification and recertification require that all candidates demonstrate assessment competence at a level commensurate with their area of certification? Question 6-14: Does the state require as part of its certification and recertification standards that all teachers of science possess knowledge of the subjects they teach as well as the knowledge necessary to teach science well?