Skip to main content

Currently Skimming:

Environment for Embedding: Technical Issues
Pages 14-39

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 14...
... For present purposes, a key element of this process is sampling. A national test represents a sample of possible tasks or questions drawn from a subject area, and the material to be embeciclec3 represents a sample of the national test.
From page 15...
... Commercial achievement test batteries cover equally broac3 content, as do state assessments. Because the time available to assess students is limited, wicle~ranging tests can include only small samples of the full range of possibilities.
From page 16...
... 16 EMBEDDING QUESTIONS Domain Definition 8th-gracle mathematics 1 Framework Definition NAEP 8th-grade mathematics framework: specified content areas, skills, etc. Test Specification Specific mix of content areas and item formats; rules for scoring, etc.
From page 17...
... ' culty, anc3 the relationship of the new items to the accompanying items. COMMON MEASURES FROM A COMMON TEST To clarify the distinction between common tests and common mea' sures, anc3 to establish a stanciarc3 of comparison for embezzling, we begin our discussion of methods for obtaining incliviclual scores on a common measure with an approach that entails neither linking nor embezzling,
From page 18...
... If test administration is not consistent from one loca' Lion to another, for example, across states, even the use of a full common test may not guarantee a common measure. Moreover, when the national measure provides norms based on a stanciarclization sample, the aciminis' tration of the test must conform to the administration procedures used in the stanciarclization.
From page 19...
... The committee noted that the fairness or reasonableness of a comparison hinges on the particular inferences the test scores are used to support. For example, suppose that two states agree to administer an identical mathematics test in the spring of the 8th gracle.
From page 20...
... For example, instructions to the examiners, the amount of time allowed, the use of manipulatives or test' ing aicls, and the mechanics of marking answers should be the same for all students. However, because of the expense involved in hiring external test administrators, most state tests are aciministerec3 by the regular school staff, teachers, counselors, etc.
From page 21...
... If different states provide different directions for the national test, different opportunities to use calculators or manipulatives (see Figure 2-2) , impose different time limits for students, or break the test into a different number of testing sessions, seemingly comparable scores from different states may imply different levels of actual proficiency.
From page 22...
... , shorter testing periods with aciclitional breaks, anc3 use of a scribe for recording answers; see Table 2~1 for a list of accommodations that are used in state testing programs. Two recent papers prepared by the American Institutes for Research ~ 1998a, 1998b)
From page 23...
... ENVIRONMENT FOR EMBEDDING: TECHNICAL ISSUES TABLE Al Accommodations Used by States 23 Type of Accommodation Allowed Number of States Presentation format accommodations Oral reading of questions Braille editions Use of magnifying equipment Large-print editions Oral reading of directions Signing of directions Audiotaped directions Repeating of directions Interpretation of directions Visual field template Short segment testing booklet Other presentation format accommodations Response format accommodations Mark response in booklet Use of template for recording answers Point to response Sign language Use of typewriter or computer Use of Braille writer Use of scribe Answers recorded on audiotape Other response format accommodations Test setting accommodations Alone, in study carrel Individual administration With small groups At home, with appropriate supervision In special education class Separate room Other test setting accommodations Timing or scheduling accommodations Extra testing time (same day) More breaks Extending sessions over multiple days Altered time of day Other timing-scheduling accommodations Other accommodations Out-of-level testing Use of word lists or dictionaries Use of spell checkers Other 35 40 37 41 39 36 12 35 24 12 14 31 18 32 32 37 18 36 11 40 23 39 17 35 23 10 40 40 29 18 13 SOURCE: Adapted from Roeber et al.
From page 24...
... This table shows 2 The most recent national standardizations of the Stanford Achievement Test (SAT) , the Comprehensive Tests of Basic Skills (CTBS)
From page 25...
... NAEP, TIMSS, and the proposed VNT are examples of such tests. If one of these tests is selected to serve as a freestanding test or as the source for the embeciclec3 items, the state tests would have to be aciministerec3 cluring the same testing period as the national assessment.
From page 26...
... 26 EMBEDDING QUESTIONS TABLE 2~2 State Rankings from the 1998 NAEP 4th-Gracle Reacling Assessment Rank State Score 2 3 6 7 9 0 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Connecticut Montana New Hampshire Maine Massachusetts Wisconsin Iowa Colorado Kansas Minnesota Oklahoma Wyoming Kentucky Rhode Island Virginia Michigan North Carolina l exas Washington Missouri New York West Virginia Maryland Utah Oregon Delaware l ennessee Alabama Georgia South Carolina Arkansas Nevada Arizona Florida New Mexico ~ .
From page 27...
... Aciclitionally, state practices anc3 laws related to test security anc3 the release of test items contained in state tests vary a great clear (see Figure 2-4~. Some states release 100 percent of their tests' content every year, others release smaller percentages, anc3 others none at all.
From page 28...
... . state releases the items contained in a test, then the items must be changed every year so that breaches in test security and differential examinee exposure to the national test items c30 not differentially affect student performance.
From page 29...
... In other words, teachers might teach to the test in inappropriate ways that inflate test scores, thus undermining the common measure (see, e.g., Koretz et al., 1996a; Koretz et al., 1996b)
From page 30...
... ABRIDGMENT OF TEST CONTENT FOR EMBEDDING In the previous section we outlined a variety of conditions that must be met to obtain a common measure anc3 outlined how policies anc3 practices of state testing programs make such conditions difficult to achieve, even when embecicling is not involved. Embecicling, however, often makes it more difficult to meet these conditions anc3 raises a number of aciclitional issues as well.
From page 31...
... Between 1955 anc31977 the mathematics section of the ITBS consisted of math concepts anc3 math problem-solving tests but clic3 not include a separate math computation test. In 1978 a math computation test was aciclec3 to the test battery, but the results from this test were not incluclec3 in the total math score reported in the annual trenc3 data.
From page 32...
... SOURCE: Adapted from Iowa Testing Programs (1999~.
From page 33...
... In the 1997~1998 school year, 41 states tested students in 4th~gracle reacling, 8th~gracle mathematics, or both: 27 states assessed students in reacling in 4th gracle, anc3 39 states assessed students in mathematics in 8th gracle.4 Only 25 states tested both 4th~gracle reacling anc3 8th~gracle mathematics, leaving a significant number of states without tests into which items for those subjects could be embeciclec3 (see Table 2~3~. It could be possible for states that c30 not administer reacling or mathematics tests in gracles 4 anc3 8, respectively, to embed reacling or mathematics items in tests of other subjects, but context effects (see below)
From page 34...
... With embecicling, it is possible that the changes in the context in which the national items are aciministerec3 will affect student performance. Such context effects can leac3 to score inaccuracies and misinterpretations.
From page 35...
... ; · display of answer choices in a vertical string versus a horizontal string; convention for ordering numerical answer choices for mathematics items (from smallest to largest or randomly) or ordering of punctuation marks as answer choices for language mechanics items; · characteristics of manipulatives used with mathematics items (e.g., rulers, protractors)
From page 36...
... Nonetheless, even with careful attempts to follow these suggested test construction procedures, there can be no assurance that context effects have been completely avoided. SPECIAL ISSUES PERTAINING TO NAEP AND TIMSS Some embedding plans have the goal of reporting state or district achievement results in terms of the proficiency scales used by the National Assessment of Educational Progress (NAEP)
From page 37...
... The scale supports such statements as, "The average math proficiency of 8th graclers has increased since the previous assessment," anc! "35 percent of state A's students are achieving above the national average." To facilitate interpreting the results in terms of stanciarcis of proficiency, panels of experts assembled by the National Assessment Governing Boarc]
From page 38...
... The content, difficulty, anc3 number of items vary across the booklets, anc3 no single booklet is representative of the content domain. This approach to the distribution of items to test takers, called matrix sampling, allows cover' age of a broac3 range of content without imposing a heavy testing burden on incliviclual students.
From page 39...
... The VNT is being planned as a conventional test that will yield incliviclual student scores on a scale as similar as possible to the NAEP scale. The VNT is intenclec3 to provide a common metric for reporting achievement results for all test takers.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.