Read "Grading the Nation's Report Card: Research from the Evaluation of NAEP" at NAP.edu

« Previous: 9 Difficulties Associated with Secondary Analysis of NAEP Data

Page 195 Cite

Suggested Citation:"10 Putting Surveys, Studies, and Datasets Together: Linking NCES Surveys to One Another and to Datasets from Other Sources." National Research Council. 2000. Grading the Nation's Report Card: Research from the Evaluation of NAEP. Washington, DC: The National Academies Press. doi: 10.17226/9751.

Page 196 Cite

Page 197 Cite

Page 198 Cite

Page 199 Cite

Page 200 Cite

Page 201 Cite

Page 202 Cite

Page 203 Cite

Page 204 Cite

Page 205 Cite

Page 206 Cite

Page 207 Cite

Page 208 Cite

Page 209 Cite

Page 210 Cite

Page 211 Cite

Page 212 Cite

Page 213 Cite

Page 214 Cite

Page 215 Cite

Page 216 Cite

Page 217 Cite

Page 218 Cite

Page 219 Cite

Page 220 Cite

Page 221 Cite

Page 222 Cite

Page 223 Cite

Page 224 Cite

Page 225 Cite

Page 226 Cite

Page 227 Cite

Page 228 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

10 Putting Surveys, Studies, and Datasets Together: Linking NCES Surveys to One Another and to Datasets from Other Sources George Terhanian and Robert Boruch "Relations stop nowhere....The exquisite problem is eternally but to draw the circle within which they happily appear to do so." . . . Henry James, Roderick Hudson, 1876 This paper examines ideas about combining different datasets so as to inform science and society. It was prepared at the invitation of the National Research Council's (NRC) Board on Testing and Assessment so as to inform the board's deliberations about policy on education surveys in the United States. The surveys of paramount interest are those sponsored by the National Center for Education Statistics (NCES). The research reviewed here and the implications that are educed from it are directed first to the NRC. They are dedicated in the second place to the interests of the NCES. The third target is the social sciences community more generally. Examples given here are drawn from a variety of sciences inasmuch as data linkage issues transcend academic disciplines. They are taken from different institutional jurisdictions because the issues cross geopolitical boundaries. Two studies are used to provoke discussion and to frame some issues: Hilton's (1992) Using Data-bases in Educational Research and Hedges and Nowell's (1995) paper on national surveys of the mathematics and science abili- ties of boys and girls. We also depend heavily on other materials generated by NCES, the NRC, and others. This includes work, for example, on teacher supply, demand, and quality (National Research Council, 1992) and on integrating fed- eral statistics on children (National Research Council, 1995~. The minutes of the 195

196 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER NCES Advisory Council on Education Statistics reflect periodic interest in the way NCES surveys can be linked to one another or to data generated by other federal agencies (Griffith, 1992) and we exploit these also. In what follows we begin with the two illustrations that help frame discus- sion. The pedigree of linkage is considered briefly, and the ubiquity of linkages in contemporary surveys is then discussed. Inasmuch as the meaning of words such as linkage, merging, and so on are used differently in the research literature, the next section covers ways to clarify the language. Distinctions are further drawn between statistical policy for making surveys connectable in contrast to de facto policy in which post facto connections are difficult. Evaluating the prod- ucts of any variety of linkages is important, and this topic is covered also, based on suggestions about mapping and registering linkage studies. In the next to last section of the paper we suggest exploring some new kinds of linkage. The paper concludes with a summary of the implications of this work. TWO INTRODUCTORY ILLUSTRATIONS The origin of Hilton's (1992) book was in a project undertaken by the Edu- cational Testing Service (ETS) to understand whether different sources of statis- tical information, each based perhaps on a national sample, could be combined to produce a "comprehensive unified database" of science indicators for the United States. Sponsored by the National Science Foundation, the project' s general goal was to improve the way we capitalize on data that bear on educating scientists, mathematicians, and engineers. The book's implications, inadvertent and other- wise, are important for designing NCES surveys, among others. Twenty-four education databases were reviewed by the project, including the Survey of Doctoral Recipients, national teacher examinations, and at least four massive longitudinal databases. Only 8 of the 24 were deemed worthy of deeper examination. That is, the eight could be "linked" in some sense with others, given the resources available. They included the National Longitudinal Study of the Class of 1972 (NLS:72) and the National Education Longitudinal Study of 1988 (NELS:88), the equality of Opportunity Surveys (1960s), cross- sectional systems such as the Scholastic Assessment Test (SAT), and the NCES National Assessment of Educational Progress (NAEP). As Hilton made plain in the preface to his book, the project was "not fea- sible." Put more bluntly, the ETS effort to combine datasets was a flop despite competent and thoughtful efforts. The databases chosen for examination could not be used for the purpose considered (i.e., to produce a comprehensive science database). It was, nonetheless, a project noble in aspiration and diligent in its execution. The questions posed in the Hilton project about the available databases and which are relevant to linking any datasets seem important for designing new NCES surveys. Put in modified terms, the questions are as follows:

GEORGE TERHANIAN AND ROBERT BORUCH 197 · What variables are common to various databases? · What ways of measuring each variable, ways of sampling, and adminis . tration are common, making comparison (or linkage) among datasets easy? What differences in ways of measuring, administrating, and sampling make comparison (or linkages) dubious or difficult? · What can be done to fix different datasets so they are "comparable" (or sinkable) in some way and therefore make it sensible to put them together? The Hilton book contained no detailed catalog of why the databases failed to meet one or more of the criteria implied by these questions. Hedges and Nowell (1995) attacked a different but related topic under- standing gender differences in mental abilities of various kinds based on dispar- ate surveys. These authors chose to depend only on studies based on samples of roughly the same target populations and that purportedly measured the same abilities (e.g., reading). That is, they selected only studies that approached the first three questions above in similar ways. Their final selections included NCES- sponsored work, notably NELS:88, NLS-72, High School and Beyond (HS&B), NAEP (trend data only), Project Talent, and the National Longitudinal Youth Survey sponsored by the U.S. Department of Labor, among others. These are summarized in Table 10-1. We rely periodically on its contents in what follows. There was sufficient commonality in what was measured on whom in the target populations in the Hedges-Nowell (1995) ambit to produce an informative analysis. It is a fine illustration of combining different datasets so as to learn whether males and females really differ on mental abilities and how they might differ. For instance, the authors' dependence on well-defined national probability samples avoided the inferential problems encountered in earlier studies, notably depending on self-selected samples (as in SAT/ACT testing), idiosyncratic samples (e.g., in test storming), and distributional assumptions (to get at charac- teristics of extreme scores). A main product of the Hedges and Nowell's work is learning that males are more variable than females in their tested intellectual achievement. This finding helps to elevate substantially the scientific conversa- tion about the purported differences in the mean levels of math and science abilities of boys and girls. It helps to show how more variability among boys may produce specious claims about their ability relative to girls. THE PEDIGREE OF EFFORTS TO PUT DIFFERENT DATASETS TOGETHER The idea underlying any linkage effort undertaken by NCES or by others is that combining data from different sources can help us learn something new. More to the point, the combination permits us to learn something that cannot be learned from individual sources. The idea has fine origins. Alexander Graham

198 _' 3 o Cq a' be a' a' a' · Cq o Cq a' Cq x VO a' o Cq C) .= Cq .= a' o VO To ¢ EM ¢ A Do Do . . V, z o o ;o~ ~ m · ~ 4= ~ ~ 0 0 ~ 0 0 ~ z V) z 4= EM C) .O o ca C) 4= · ~ SO C) v ca ~ 0 ~ ~ ~ 8 _1-~ ;^ cola ~ ~ ~ .~ Ct V, [~ ca sit o = ~ .= Ed ca .N ~ ~ o Do Do o .o cd ~ ;^ 4= ca o ~-~ (~N -~ o ~) s~ O ~ R ;^ ¢ O ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ o O t,.4 0 ~ ca ~ ~ ~ ~ ~ e ,, ~ = , ~ O .~ e , ~ ~= ~ ~ ~ .=

GEORGE TERHANIAN AND ROBERT BORUCH 199 Bell, for instance, exploited the notion in his study of genetic transmission of deafness. In the late 1880s he depended on completed Census Bureau interview forms found strewn in a government building basement and linked these to ge- nealogical records from other sources (Bruce, 1973~. One can also trace the theme to John Graunt's effort in the seventeenth century to learn how to use records in the Crown's interest. Graunt exhorted the King to understand his empire through a lens consisting of compilations of records in statistical form: the counts of soldiers at arms, for instance, from one source and the numbers of births, deaths, and so on from other sources. Scheuren (1995), similarly thoughtful and exhortative, has reviewed and refreshed our thinking about how to augment administrative records and understand them better through surveys. The pedigree of linkage studies is also reflected in contemporary efforts to evaluate social programs. In studies of manpower training and employment, for example, it has become common to link the employment records on specified individuals to their program records and to link these data in turn to research records on individuals (Rosen, 1974~. In agriculture, health, and taxation, there have been fine studies of why and how one ought to couple data from different sources in a variety of ways (Kilss and Alvey, 1985~. From papers by Scheuren (1995) and others we learn about contemporary history of record linkage algorithms (developed by Tepping and Felligi-Sunter, among others), the construction of matching rules and the information exploited in matches, the idea of linkage documentation, and various approaches to adjust- ing for mismatches. We can learn about the role of privacy issues and statistical analysis implications from a related body of work (e.g., Cox and Boruch, 1988~. We learn about appraising the benefits and costs of linkage of administrative records or the difficulty of doing so on account of sloppy practice, from aggres- sive investigatory agencies such as the U.S. General Accounting Office (1986a, 1986b). The title of Hilton's book, Using National Data-bases in Educational Research, may suggest to some readers that they can learn something about whether, why, and how massive studies are combined and used. In fact, recent work on how to enhance the usefulness of statistical data is pertinent. Some of it has been economically oriented for instance, Spencer's work (1980) on cost- benefit analysis to allocate resources to various data collection efforts and the follow-up papers by Moses, Spencer, and others. Scholarly papers on why and how social research data, including educational and health research data, are used are also relevant. Kruskal's volume (1982) is a gem on this account. The analyses in Hilton's book were not burdened by the history of linkage. That is, the authors failed to put the ETS linkage studies into the larger context of such studies or the still larger context of design and exploitation of databases and survey. We learn about attempts to link the Armed Services Vocational Aptitude Battery to tests given in the longitudinal HS&B survey and to SATs, but we are

200 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER not told about how this would enhance science indicators or inform decisions or, more importantly, improve the design of surveys. Similarly, the Hedges and Nowell (1995) paper does not consider implications of the work for the design of better surveys that can be linked in any respect, despite the fact that the authors are sensitive to the implications of their work on other accounts. THE UBIQUITY OF PUTTING DIFFERENT DATASETS TOGETHER AND FUNCTIONAL CATEGORIES Some varieties of linkage are common, even pedestrian. So frequently do they occur that they are taken for granted. Other varieties of linkage are not encountered often. They may be undertaken for reasons that seem obviously important or, to the lay public, obscure or trivial. This section provides illustra- tion of linkages, pedestrian and otherwise. The examples are put into categories that have meaning for scientists and an informed public: national probability sample surveys, longitudinal studies, studies of the quality of data, intersurvey consistency, and hierarchical studies. National Probability Sample Surveys Virtually all national probability sample surveys in this country and else- where are an exercise in combining information from different systems. Tele- phone surveys often draw on a population listing of telephone numbers. A population census may draw on an address list for dwellings. The NCES Schools and Staffing Survey, for instance, depends on lists of schools identified as admin- istrative units or locations. List information is used to construct the sample. Listed information is often combined in the same microrecord with the informa- tion provided by the respondent. Longitudinal Studies: Tracking Change Any longitudinal survey involves linkage at a basic level. Microrecords obtained on individuals or institutions at one point in time are linked to those obtained subsequently, as in NELS:88, NLS:72, and HS&B. The organization responsible for each wave of the survey may vary, of course, as when NCES used different contractors. Target populations, variables, and their measurement may also differ somewhat between waves. Studies of the Quality of the Data Any postenumeration survey of a national census and most post facto studies of the quality of a large survey employ linkage. Microrecords in the main initial survey, for instance, are compared to those generated in a more intensive, smaller,

GEORGE TERHANIAN AND ROBERT BORUCH 201 and presumably more accurate study of a subsample of the original target popu- lation. Efforts to estimate reliability of achievement tests focus on stability of individual scores over time; individual scores must be linked across time. Finally, many if not most studies of the validity of respondent reports in surveys rely on two or more sources of information on the trait or characteristics of interest. Enrollment records in colleges may be compared to self-reported enrollment information in a sample of students receiving subsidized loans. In the federal statistics arena, most studies of response quality or measure- ment error require linkage and are described regularly in the professional litera- ture. Scholarly reports usually appear, for instance, in the annual Proceedings of the Section on Survey Methods of the American Statistical Association and in reports issued by the federal agency that sponsored the work. It is disconcerting to see little representation of municipal statisticians in these Proceedings and reports. It is not clear why their contribution is sparse, and the matter deserves a bit of researchers' attention. Intersurvey Consistency The NCES has conducted a Private School Survey (PSS) independent of a special supplement to the Schools and Staffing Survey (SASS). SASS has depended on the PSS for a sampling frame of schools, using a basic form of linkage. More generally, both the supplemented SASS and PSS have provided estimates of the numbers of schools, teachers, and students in the private sector. Each survey is normally run at different times and measures some of the same variables. On at least one occasion each was run in the same year (Holt et al., 1994~. The results of each survey may or may not agree, differences in time frame being one possible reason for discordance. The occasion of a PSS and a SASS supplement in the same year permitted NCES to investigate the consistency between them. At times then NCES depends on applying algorithms to SASS that reweigh subgroups' totals of schools, teachers, and students in various cat- egories so as to produce overall group totals that are consistent with PSS group totals. A "group" here might be a type of private school (e.g., Catholic). "Linkages" here are of two kinds. First, the PSS is used as the sampling frame for SASS. Second, the memberships of schools in subgroups are supposed to be identical in PSS and SASS, and a linkage between the two is required for estimating new sampling weight. Consider next the problem of assuring that a school's locale is properly identified as a large city or as midsized, as urban or suburban, and so on. Each year NCES attempts to record every school and its locale through the annual Common Core of Data (CCD) survey. Census Bureau data are used in the CCD to identify locales, using seven well-defined locational categories used by the bureau. Every two to five years SASS is run, targeting a sample of schools. In

202 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER this effort SASS also elicits information on locales using a simplified question involving eight categories or responses. A challenge lies in reconciling the two sources of information about school locale (Johnson, 1993; Bushery et al., 1992~. Reconciliation of the SASS and CCD files then involves linkage. Such studies reveal, for instance, that roughly 70 percent of SASS reports on locales are correct, that 87 percent of Census classifications are accurate, and that the most common discordance lies in the suburban categories. More important, note that both data sources are imperfect in different ways. This makes linkage-based reconciliation studies essential to assuring the quality and interpretability of the survey results. Reconciliation studies that illuminate the discrepancies that might be found between two or more independent surveys are important. It would be dis- comfitting to find a 10 percent difference in the number of teachers in the United States based on one NCES survey, for example, in contrast to another NCES survey undertaken independently and within a year or two of the first survey. The differences between results of two independent surveys may be a matter of sampling error. Or they may be substantial and attributable to differences in questionnaire wording, definitions, and sampling frame. Being able to link records so as to understand the discrepancies is essential. Linkages may be at the entity level, such as a school, school district, or state. Or they may be at the individual level, as when teachers respond to a questionnaire about their career in teaching. Consider the following examples based on Kasprzyk et al. (1994) and Jenkins and Wetzel (1994~. Discrepancies between independent surveys of institutions, such as "schools," occur for a variety of reasons. For instance, some commercial firms define schools in terms of their physical locations. The CCD defines schools in terms of administrative units, two or more of which may be lodged in the same location. These differences are relevant to sampling frames and to results of surveys, of course. Careful analyses are done to assure that discrepancies and their implications are understood. Furthermore, estimates of the number of teachers in each state may be based on SASS or on state-generated counts for CCD. The estimates may and do differ at times for some states. For instance, overestimates of 15 percent in nine states appeared in the 1990 to 1991 SASS for a variety of reasons. One such reason was the questionnaire wording used in each survey. A respondent in the CCD would report on a unit involving grades kindergarten through 6; the SASS respondent might report on kindergarten through 6 and on grades 7 and 8. Postprocessing edits helped reduce discrepancies. Hierarchical Studies Once said, it is obvious that any survey of schools, teachers in schools, and students assigned to particular teachers must involve a basic linkage of micro .

GEORGE TERHANIAN AND ROBERT BORUCH 203 records to be useful as a hierarchical study. That is, one must be able to link each child to his or her teacher and each teacher to the school that the teacher serves. Research on the problem of doing such work in the context of SASS has been conducted since at least the early 1990s (King and Kaufman, 1994~. Partly because such work often involves ex-ante design, rather than ex-post facto record linkage, difficulties in linkage appear to be ordinary. Rather, estimation issues appear to be difficult. Of course, many more levels of linkage are possible. The Third International Mathematics and Science Study (TIMSS) is an obvious example. It involves no temporal linkage of the kind that longitudinal studies require. It does involve sampling test items in each child, sampling classrooms in schools, sampling schools in each nation, and a nonprobability sample of nations. Thousands of instances of linkage of diverse kinds are entailed in such a study. WHAT DOES "LINKAGE" MEAN? Vernacular in the sciences is not as uniform as one might expect. Recall, for instance, debates over what constitutes a gene or genome in the Human Genome Project. Discussions about integrating or linking data in the social sciences also are affected by dialect differences. We discuss illustrations below and then dimensionalize the idea of linkage. The focus is on units whose records are to be linked, the populations from which units are sampled, and the variables that are measured on these units and other matters. All in what follows depends on learning from others about what linkage has meant in the context of work spon- sored by NCES and others. Vernacular and Definitions in Education Statistics The Hilton (1992) book's vernacular is sufficiently different from technical parlance in related areas to confuse some readers. For instance, there are repeated references to "linking" and "merging" of different databases, but these terms are undefined. Further, the book's use of these words is at times not the same as is customary in contemporary statistical work. For instance, linkage is defined, in effect and occasionally, as combining microrecords based on a common identi- fier for the same person or entity. At times the book's use of the word link is to imply an intention to "put together." At other times the word link means to stratify the units in each database in the same way (e.g., high ability, Hispanic, and so on) in order to look at how frequencies in these strata change over time on a dimension such as persistence in studying science. The word merge is also used to describe putting different records together, records that may or may not have a common source. The phrase "pooling data" was used by Hilton (1992) and has been used by others in the sense of doing a side-by-side comparison of statistical results from each of several different datasets. This phrase is not used in a way that some

204 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER readers would expect. For some analysts "pooling data" means combining the data from two or more samples of the same population into one that can be analyzed as a complete sample. For others it means combining the results from samples of different populations. Finally, consider another more recent example. Bohrnstedt (1997) uses the words link, integrate, and connect in a thoughtful essay entitled "Connecting NAEP Outcomes to a Broader Context of Educational Information." His use of these terms, at first glance, is instructive. The consci- entious reader might observe, for example, that Bohrnstedt makes a careful dis- tinction between link and integrate. He refers, for example, to the "integration" of CCD information with NAEP data, and he discusses the possible "linkage" of NELS:88 and NAEP data. The reader who also possesses some knowledge of what these datasets contain might then conclude that two datasets can be linked, at least in the context of education, if both involve the assessment of achievement or performance. This reader would be mistaken, though. As Bohrnstedt con- cludes, he uses the term link when referring to CCD/NAEP integration: that is, he substitutes link for integrate. The word connect does not reappear in the paper's prose. What are the implications of this example? Especially in creative efforts such as Bohrnstedt's, the precise meanings of such words as link, integrate, and connect ought to be made plain. Vernacular in Other Sciences Work on genes and genomes engenders problems of differences in labeling the object of their attention in context. For instance, a gene for one species may be called something different from the same gene in another species. Given the remarkable growth in genetic research, including the number and size of genome sequence databases, this is not a trivial matter (Williams, 1997~. Similarly, scientists have begun to build a World Wide Web-oriented database on gene mutations as a part of the Human Genome Project effort. A feature of the design problem is to agree on what to call mutation. "The nomenclature is nearly agreed on . . . (with) the systematic name . . . based on the nucleic acid change and . . . the common name based on the amino acid change" (Cotton et al., 1998:9~. The Internet will be used to further explicate and debate. The vernacular problem is not confined to the life sciences. It extends to mathematics. "Computation," for instance, was heralded in a recent Science piece on bridging databases. In fact, basic statistical analyses, rather than compu- tations, were the main topic: understanding how to estimate relationships when there are many errors attributable to sampling and measurement (Nadis, 1996~. The lead on an interesting letter to Science was entitled "Bioinformatics: Math- ematical Challenges" (Grace, 1997~. Yet the letter concerns what is now regarded as a conventional statistical analysis approach to understanding the structure underlying data (i.e., analysis of variance), developed by two scholars who admired and exploited mathematics, R.A. Fisher and O. Kempthorne.

GEORGE TERHANIAN AND ROBERT BORUCH 205 Science has also carried excellent articles with headings such as "Digital Libraries" (e.g., Nadis, 1996), "Letters" (Cotton et al., 1998), and "Bioinformatics" (Williams, 1997~. They all deal with the names of things. But such papers are not easily found in any Web or library-based search based on a single keyword. One of us had to review the articles published over a five-year period to get the connection. Implication: Understanding and Standardizing Nomenclature One of the implications of this vernacular problem for NCES is that discus- sion, analysis, and agreement on terminology are in order. Because there has been little standardization in educational statistics produced at the state level, in recent years NCES has played a leadership role in getting state education agen- cies to agree to common definitions in statistical reporting. Witness the rough consensus on using two or three definitions of "dropout," for example. Witness also the NCES surveys of how public schools ask about student's race and ethnicity and the stupefying variety in measurement that then impedes better thinking. NCES can play a related role here and to refresh the roles taken at times by the Internal Revenue Service's Statistics of Income division, the Census Bureau's methods division, and others. That is, NCES can help make plain what we mean by "combining" datasets or surveys; "connecting" them; "linking" microrecords, datasets, or surveys; "pooling" datasets or surveys; "integrating" surveys or statistical systems; "unified databases; and "merging" files. In other words, putting things together. Absent explicit definitions of what these words mean, reaching mutual understandings in the statistical and political communities will be difficult or impossible. Most importantly, designing surveys so that they can be linked, compared, merged, and so on will be impossible. NCES can be a leading agency in this effort. Dimensions of Linkage One way of arranging the way we think about linkage is to depend on the elements used in designing conventional statistical surveys. Consider then the ideas of units of sampling, populations, and variables in this context and exten- sions of the ideas. Units: Individuals, Entities, or Both Records on an individual may be linked, as when a child's school transcript is linked to the child's responses to a survey questionnaire, as in High School and Beyond. Or responses on one wave of the HS&B may be linked to responses on subsequent waves, as in any longitudinal study. Similarly, a child or parent's

206 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER response to an education survey may be linked to responses to a survey, as in the education component of augmented National Health Interview Survey. Records on institutions or other entities may be linked, as in NCES's planned longitudinal surveys of schools. Or records on a school may be linked in the sense that school responses to the SASS questionnaire may be linked to responses to the annual CCD survey. The linkage may be hierarchical in that a child's record may be linked to his or her teacher's response to a survey. These in turn are linked to archival records on the school, school district, or state in which the child and teacher work. TIMSS is an example. For deterministic linkage the individual or entity in one survey or survey wave must be identical to the entity in the second survey or wave. In other words, the records on the same identifiable entity appear in two places. Populations and Sampling Frames The totality of units of interest constitutes the target population. Overlaid on this is the sampling frame, design, and method to determine who or what is in the sample at hand in any given survey. For deterministic linkage to be possible, there must of course be some overlap in the target populations defined for each survey or archive. For instance, records on schools in SASS may be linked to records in CCD because the target populations overlap. Nonoverlap may occur because each survey is run at a different time in a different school. Some schools, for example, disappear: they may be closed or merged with other schools, for example. Sampling frames must be defined similarly, if not identically, for linkage to be easy. For example, a change in sampling frame from one that is entity based to location based led to the need for reconciliation studies in SASS. These would not have been possible without linkage. Variables and Their Measurement Different surveys or archives may measure the same variable, as when NLS elicits information on gender in repeated waves of a longitudinal survey. Or the variables may differ, as when early surveys measure an individual's academic ability and later ones elicit information about the person's job acquisition. Linkage is facilitated by some redundancy in measurement of a variable. For instance, gender should remain the same across repeated waves in a longitudinal survey even if the full name changes somewhat with deletion of a middle name or a change in surname with marriage. Linkage is arguably productive when differ- ent variables are measured in different ways. NAEP, for example, gets at the broad socioeconomic characteristic of each student. If it were possible to link NAEP to independent tax return information, studies of the relationship between achievement and parental resources would be far more informative.

GEORGE TERHANIAN AND ROBERT BORUCH Suggestions: A More Orderly Vernacular 207 The language of linkage is, as we have suggested, as promiscuous as is the use of certain words in other sciences. The language will change as the science changes, of course. Nonetheless, a perspective on standardization is desirable and possible. The National Research Council, NCES, or Office of Management and Budget (OMB) might be the vehicles for obtaining agreement on nomencla- ture. Our suggestion is as follows. First, focus one's attention on one study or sample survey dataset as primary. When two studies are equally important and must be put together, arbitrarily designate one as primary. View any linkage between this primary study and other studies or datasets as a linkage that involves augmentation of the primary study. Last, build on contemporary practice and some familiar vernacular to define the following eight kinds of linkage: 1. Sample augmentation. A different sample of the same target population is put together with the primary sample. 2. Variable augmentation. New variables, generated by different sources and observed on the primary sample, are added to the primary sample dataset. For instance, transcripts generated by schools on the courses that students take were added to a primary study that elicits information from students such as HS&B. 3. Time augmentation. New measures are put together with earlier measures of the same variables on the same sample. Longitudinal studies such as HS&B and NELS:88 involve this kind of linkage. 4. Family (kin) augmentation. Measures taken from relatives of units in the primary sample are added to the primary sample datasets. For instance, teachers' data are added to student data in TIMSS; the teachers' information bears on the students whose achievement levels are also measured and constitute the primary dataset. 5. Levels augmentation. Measures taken on units at a higher level than the units in the primary sample are added to the primary sample dataset. For example, nation-level policy variables may be observed and added to TIMSS datasets on schools and students in schools in each nation covered by TIMSS. The primary TIMSS dataset did not include observations at the national level, but new studies will. 6. Mode augmentation. New ways of measuring roughly the same variables on roughly the same units are added to a primary sample dataset. For instance, digitized videotape data may be added to teacher and student records in the same schools in TIMSS in two countries as a different way of measuring what is taught and how. 7. Population augmentation. New populations having been surveyed using the same measures are put together in a file with the primary sample dataset, the primary sample having been drawn from a different population. For instance, a

208 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER new Chinese version of TIMSS might be added to the TIMSS data that heretofore had no Chinese data. 8. Replicative augmentation. A different sample of a different or the same population using identical measures is put together with a primary sample dataset. The studies combined by Hedges and Nowell (1995) constitute a replicate aug- mentation. Eight kinds of linkage (augmentation) were just identified. To make their memorization easier, let us invent a mnemonic. Recall the musical scale: doh re me fah sob lab te doh? Change a couple of letters and we get: Voh: Variable augmentation Re: Replication augmentation Me: Mode augmentation Fah: Family augmentation Sob: Sample augmentation Lab: Level augmentation Te: Time augmentation Po: Population augmentation This is miserable music but a possibly helpful way of arranging songs about linkage. LINKAGE POLICY: EX-ANTE, EX-POST FACTO, OR BOTH Planning disparate studies so as to permit their combination at a later time is, we believe, important. Understanding how to combine studies after the fact, when we have been unable or unwilling to plan, is no less important. At the national, regional, or local levels, no planning is possible absent an institutional vehicle for enhancing cooperation among the organizations that sponsor statisti- cal surveys. In what follows we make more clear what is meant by ex-ante and ex-post facto policy, present an illustration, and briefly discuss the institutional vehicles that might actualize such policy. Apropos of ex-ante linkage policy, there appears to be a fine opportunity to plan the combination of data generated by federal agencies with different missions. In this context the example we discuss concerns a federal agency that is authorized to generate national education data and a federal organization dedicated partly to generating data on the effects of education programs in the United States. Definition and Analogy In principle it is possible to construct a national policy that facilitates putting datasets together in the interest of science and society. Such a policy might

GEORGE TERHANIAN AND ROBERT BORUCH 209 emphasize long-term planning for periodic linkage, that is, an ex-ante approach to the topic, or it might stress an ex-post facto perspective. The latter recognizes that the scientific or policy questions that invite putting different datasets together are often posed after particular studies are designed and data generated, rather than in advance of the studies' design and execution. This distinction is analogous to that made in the specific context of a longi- tudinal study that itself entails linkage. Such a survey is planned so as to follow individuals or organizations over time. Information that permits follow-up is routinely obtained at the start of the survey and often in each follow-up wave of the survey. This "forward tracing" information includes, for instance, the names of relatives or organizations that might be helpful in locating the individuals who were sampled in the first wave at later points in time. Where a longitudinal study is not planned, rather it is constructed after the fact, resources for "backward tracing" are brought to bear. In the survey arena these have typically included post offices, telephone books, and credit bureaus. A similar compartmentalization of tactics is implicit in ex-ante versus ex- post facto linkage initiatives. Ex-ante requires that one anticipate and obtain the kinds of information that will foster future linkages. This information may be basic. For instance, obtaining data on the same background variable, such as age, gender, education, race, or ethnicity, and collecting data on these in the same way, for instance, is one such forward linkage tactic. Ex-post facto linkages may require other resources. Among the latter we might include probabilistic match- ing algorithms that, in different ways, help determine that the same persons or entities appear in two independent sample surveys and administrative records systems. See, for instance, the Proceedings of the Survey Research Methods Section of the American Statistical Association for articles on this topic. Questions and Modes of Response: An Illustration Hilton's (1992) book provides ample evidence for the United States that questions about a survey respondent's economic status, race/ethnicity, or other important topics, have been asked differently across surveys and datasets. Such differences in questions prevent straightforward comparison of the results of independent surveys directed toward the same population. That is, they prevent linkage of a particular kind. The book, however, offered no recommendations about whether and how to standardize such questions. Learning how to address an ostensibly simple question about race well, indeed figuring out what "well" means, is not easy. NCES has done work in this area, notably in discovering the variety of ways that schools ask related questions (see the citation in Evinger, 1997~. The problem is, of course, general. Consider, for instance, recent federal efforts to determine how questions about racial and ethnic origins ought to be asked in surveys (Evinger, 1997~. The Federal Inter- agency Committee for the Review of Racial and Ethnic Standards spent four

210 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER years on the problem. Nearly 60,000 respondents were involved in randomized field experiments using seven variations on such questions embedded in the Current Population Survey. The study's objectives included learning more about whether and how to ask about multiracial self-identification and categories of race and ethnicity. The committee's recommendations based on this evidence and revised by OMB resulted in, among other things, a directive requiring stan- dard use of five racial categories: white; black or African American, American Indian or Alaskan; Asian; and native Hawaiian or other Pacific Islander (see OMB Directive 15, 10/30/97, http://www.access.gpo.gov). This kind of work helps avoid a major limit on the value of any linkage in at least one respect. Asking about race and ethnicity in roughly the same way, dictated by standardization of measurement, enhances linkage opportunities. Further, one major lesson of the work is that at times putting datasets together often engenders fundamental issues. For instance, a "simple" question that is put to a respondent is not always simply put or interpreted. A second lesson is that some problems in the arena transcend federal (and state) agency boundaries and require methodological research. To put the matter bluntly, it took the coopera- tion of 60,000 citizens to figure this out. They were engaged in large-scale field trials on how to ask questions, a nontrivial exercise in a country as diverse as the United States. Neither the NRC committee nor NCES needs a reminder of this, but others might. We also know that embedding different forms of the same question in the questionnaire, for a subsample at least, is a decent vehicle for learning about relationships among questions. More general tactics might be invented, based perhaps on the test-equating strategies that have been explored by Holland and Rubin (1982), among others. Certainly the matter is pertinent to NCES's invest- ments in learning how to integrate (and in what senses to integrate) the longitudi- nal and cross-sectional surveys that it sponsors (Griffith, 1992~. An implication of all this is that survey questions need to be designed with linkage in mind. NCES often does this implicitly and in an ad hoc fashion. We are unaware of an explicitly written standard for doing so as part of NCES's survey design strategy, nor does it appear to be a systematic program of empirical side studies or pilot work by NCES that regularly takes linkage seriously. Organizational Vehicles A national ex-ante strategy requires that an institutional vehicle be exploited to plan for linkage of surveys or other research projects across branches, dimen- sions, or units in an agency and to plan similarly across independent agencies. In the United States the Interagency Council on Statistical Policy is one vehicle for planning across statistical agencies. The council was created under the Paper- work Reduction Act. A section of the enabling legislation (Sec. 3504) gives authorization "to improve the efficiency and effectiveness . . . to coordinate the

GEORGE TERHANIAN AND ROBERT BORUCH 211 activities of the Federal Statistical System ((e)(l)), and promote the sharing of information" and so forth. Each of these elements of the statute bears on linkage, including the diverse kinds of linkage discussed in this paper. The Interagency Council on Statistical Policy is one of several possible organizational vehicles. Other options may be more attractive, feasible, or appro- priate. Consider, for instance, that a broad research theme and set of questions might drive a de facto data linkage policy in the United States. One such theme, suggested by Pallas (1995), is better alignment of economic statistics with educa- tion statistics. Putting relevant datasets together ex-post facto requires cross- agency cooperation, which in the United States is a complicated matter. Pallas (1995) suggests the invention of an interagency working group. Regardless of the merits of the particular theme, economics, and education, there are good precedents for the vehicle he suggests. They include the Interagency Task Force on Child Abuse and Neglect, to judge from recent conferences of the NRC's Board on Children, Youth and Families and the Committee on National Statistics. Federal Statistical Agencies and Federal Agencies with Other Missions In the United States and some other countries, considering ex ante policy on putting datasets together invites thinking about the institutional separation of passive statistical surveys from actively controlled experiments for planning and evaluation programs. At the political level, such separation may be essential. Federal statistical agencies such as the NCES, the Bureau of Justice Statistics, the National Center for Health Statistics, and the Bureau of Labor Statistics are supposed to be free of political influence, for example. Federal agencies that sponsor controlled experiments in education, crime, and so forth focus scientific attention on innovations that at times are politically sensitive. In education, for instance, the Planning and Evaluation Service of the U.S. Department of Educa- tion is responsible for medium- to large-scale evaluations of federally sponsored education programs in the United States. The staff of this Office of the Under Secretary have initiated high-quality randomized trials on dropout prevention programs, among others. This political and statutory separation of these two kinds of institutions is not necessary on intellectual grounds. In particular, when the object of a study is to produce unbiased and generalizable estimates of the relative effectiveness of a program, combining the survey data with controlled field trials data is sometimes sensible. Estimates of effects based on the surveys are often generalizable because they are based on national or large probability samples, but they are suspect because they depend so heavily on specification of models that underlie the analysis. On the other hand, the controlled experiments usually must be local- ized, limiting generalizability. But they are more trustworthy on account of the randomization they use and the consequent lower vulnerability to violations of model-based assumptions.

212 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER Ex-post facto approaches to combining such data have been laid out in reports of the U.S. General Accounting Office. Specific applications of this "cross-design synthesis" approach include estimating the relative effect of mas- tectomy versus lumpectomy on five-year survival rates of women with breast cancer. Ex-ante approaches based on these ideas are given by Boruch and Terhanian (1996, 1998~. Their hypothetical examples include coupling NAEP to controlled field experiments so as to learn about the relative effects of grouping students in schools by ability. MAPS, DISPLAYS, REGISTRIES, AND EVALUATION Consider the problem of how to make orderly our thinking about productive linkages of diverse kinds. This section considers two related topics. The first bears on the idea that it is possible to better map survey questions and response categories, this being crucial to understanding what variables (and ways of mea- suring them) are common or unique across different surveys. The second topic concerns the visual display of information on the contents of surveys. The third topic, inventing registries of linkages, has implications for enhancing understand- ing of the first two topics and for evaluating the products of any linkage effort or policy. Mapping Questions and Response Categories We are aware of no comprehensive effort to develop intellectual maps of the specific questions asked in surveys mounted by NCES or other federal statistical agencies in the United States. Nor are there maps of the variables that the questions are supposed to address. Quite apart from this it is difficult to conceive of definitions for a map, much less to specify how to construct one or to imagine the forms that such maps might take. Nonetheless, we present some ideas on the topic here. The premise is that intellectual maps, like contemporary geographic maps and genome representations, can be important to the development of the field. Raw Material What is the raw material for such a map? For any given survey it includes the basic element of a specific question and its associated response categories. A summary question and response of the sort often found in code books generated for statistical records, the labels for a table or chart in an academic research journal or government publication, and so forth are insufficient. Marginal tabulations of the distributions of response are commonly available. These, along with reports on methodological studies of the item where they exist, also might be included as ingredients for map making. It is well understood that

GEORGE TERHANIAN AND ROBERT BORUCH 213 context counts, and so the block of questions into which a particular item is embedded and the entire protocol ought to be included as raw material. And, of course, structural information on a survey's sponsorship and timing are funda- mental. An Electronic Form of Map: Web Based In principle it is possible to put a question and response onto a Web page with hypertext links to other raw material or distillations of the latter. Linkage levels might be based on one or more natural search propensities. For instance, links to a block of questions into which a specific question is embedded is natural. A second-level hypertext connection to the questionnaire is also natural. Or the search propensities of inquirers might be empirically based, so as to identify what types of links would be most helpful. That is, they might be designed so as to get at what inquirers prefer first, then proceed to what they want second, and so on. The map then is tailored to their needs, just as contemporary geographic maps are tailored. Regardless of the particular search mechanism, it seems sensible to exploit the opportunities presented by hypertext linkage in this context. That is, the technology permits easy lateral connections. We can then learn how questions about (roughly) the same variable are addressed in different surveys. It makes easy the task of connecting vertically so as to get at target samples or sponsors, marginal distributions, and so on. Precedent and Form: ZUMA As we have said, we are unaware of a U.S. precedent for mapping. However, a potentially useful model is embodied in work on cross-nation surveys in Europe. In particular, Mannheim' s Zentrum fur Umfragen, Methoden und Analysen (ZUMA) has undertaken to consolidate and analyze information on background variables, how they are asked, and how they might be "harmonized." The multicountry surveys of primary interest in this context are the Eurobarometer (European Commission), the International Social Survey Programme (29 member coun- tries), the European Community Household Panel (ECHP, Eurostat), and the surveys falling within the purview of the European Society for Opinion and Marketing Research (ESOMAR). ZUMA began the effort by focusing on background variables and learning which ones are commonly used in analyses and commonly asked about in the surveys. Such variables are often measured in NAEP, TIMSS, and other educa- tion surveys and have been the subject of considerable discussion. ZUMA clas- sifies them into broad categories: A.G.E.: age, gender, education; CIEOV: class, income, employment status, occupation, vote; and RHEMMR: religion, house- hold, ethnicity, marital status, group membership, etc. Each variable is defined

214 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER broadly. The specific wording of items and response categories is given across surveys. Each is summarized in tabular form exemplified by the next exhibit. The hard-copy ZUMA report at hand (McCabe and Harkness, 1998; Harkness and Mohler, 1998) is densely packed with information. A Web-based system for its display, one that exploits hypertext connections and vertical and horizontal links, is likely to be more user friendly. Software A partial precedent for automated mapping lies in contemporary software developed for linking different kinds of administrative records. For example, Austin's school district may have 3,000 courses on its books because this district, as others, reports all courses approved by the Texas State Department of Educa- tion in its course portfolio. The district, however, has only 1,000 course categories or elements. A method to "map" one set of information to the other has been created to facilitate the linkage. This "Success Finder Mapper" (www.evalsof.com) is automated in the sense of making the mapping easier. It does depend on human judgment, of course, to set parameters and rules (Ligon, 1998~. The product appears interesting enough to justify exploring its utility in a survey context in contrast to an administrative record context. Visual Representations Developing visual representations of such information seems important given its potential density. For instance, multiple displays or maps might be con- structed for each variable, with the questions' "distance" from one another plot- ted on a line or in a two-dimensional space. The distance might be based on some index of semantic differential or difference on cluster analyses or some other approach. Or it might be based on a simple count of common features. Consider, for instance, the crude number of response categories for a simple question about whether the respondent works in the four tabulated surveys: Survey Number Number of response categories 5 1 2 3 4 7+ 10 11 They might be displayed so as to emphasize common features for example, ISSP ESOMAR EURO ECHP Self-employed Never employed N N Y Y Y Y Y N Note: ISSP = International Social Survey Programme. ECHP = European Community House- hold Panel.

GEORGE TERHANIAN AND ROBERT BORUCH 215 We are aware of no serious research on this topic of displaying different questions about the same variable and response categories. The work by Tufte (1990) and others seems pertinent. Learning about how to exploit multiple e-dimensional displays, "escaping flatland" in Tufte's vernacular, is a tantalizing prospect. Because the features of questions and response categories may them- selves be of categorical character, recent developments in the visual display of categorical information also are relevant. (See also Blasius and Greenacre, 1998, for a provocative if numbing array of options discussed at a conference convened by the Zentralarchiv fur Empirische Sozialforschung in Cologne.) Analyses Analyzing how questions about the same variable differ from one another across surveys, how response categories differ whether and how marginal distri- butions of responses differ, and so on is a complex matter. Scholars can exploit registries or maps of questions in analyses, thereby adding value to the maps and increasing our understanding of how to measure well and how to link different surveys. (See Braun and Miller, 1997, for a nice illustration of subtle and not so subtle traps in asking about "education level" across different cultures, languages, and geopolitical jurisdiction, based on the ZUMA mapping project.) Printed Displays How do we better display information on multiple surveys so as to make plain what is common to two or more of them and what is unique? Commonness of some elements is fundamental to putting datasets together. Learning then how to fabricate a two-dimensional display to characterize commonness succinctly is likely to be helpful. Displays with this intent are helpful, in some respects, to judge from those given in Hedges and Nowell (1995) and an NRC (1995) report on integrating statistics on children. From these one can learn how difficult it is to construct informative displays and some lessons about how to improve them. First, nonuniformity in displays is discomfiting. The NRC volume displays surveys down the left margin in rows; the columns are variables or topics consid- ered by one or more of the surveys. Hedges and Nowell arranged their display in the opposite way. Further, measured variables such as "family background" are identified as a broad category in some displays but not in others. Broad categories appear in some displays or the same study but not in others. For instance, Brooks-Gunn et al. (1995) identify studies that measure "family context" and others that do not. Some papers in the NRC volume classify variables as inputs or outputs, while others use a different classification. Some of this variation is trivial and unnecessary. What ought to be rows and what ought to be columns is easily standardized, for instance. Arranging the variables vertically and the studies horizontally often works well. A rectangular standard is adequate, but the

216 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER complexity of surveys and the poser of hypertext invite one to think in more than two dimensions for displays or maps. A second lesson of the NRC volume and the Hedges and Nowell paper is that the usefulness of displays, at least for their authors, lies partly in the particular question, theory, or perspective embodied in their essay. Hofferth (1995), whose interest lies in evaluation, tailored her display to recognize inputs and outputs. Brooks-Gunn et al. (1995), whose theoretical work emphasized contextual vari- ables, made a point of recognizing these variables explicitly rather than as inputs and outputs. All of this implies there is a deep need to develop a capacity for flexibility in the composition of displays, to permit or facilitate the fabrication of multiple different displays, each of which may be standardizable. The hypertext feature of Web sites, a potential e-dimensional analog to tinker toys, provides the feasibility. Third, printed displays often do not get to the level of specific question and response. Consequently, we do not know whether questions concerning family resources, such as income, are asked identically in the Panel Study of Income Dynamics, NELS:88, and others (Brooks-Gunn et al., 1995~. Similarly, recall that Pallas (1995) recognized the family background variables appear in NLS-72, HS&B, NELS:88, BPS, and the Beginning Post Secondary Study. Without a deeper and more burdensome search, we have no idea whether the questions that address these variables are the same or whether response categories are the same from one question to the next. Nor do we know whether question and response categories are the same as those in surveys cataloged by Brooks-Gunn et al. as "family material resources." Mapping questions in the ways suggested earlier can help reduce uncertainty in this respect. Registry of Analyses and Linkages It is important to understand the consequences of putting datasets together. Little research, however, appears to have been done to advance this understanding. In developing this paper we lacked the resources (or did not have the wit to ask for them) to mount a full-blown empirical study of the value of earlier linkages or to guess at the value of future linkages. One can infer some of the value, of course, from the work we described here, but this is not entirely satisfactory. Two related topics then invite our attention: a registry of linkage and evalu- ating the uses of linkage. The handling of both topics carries the implication that the NRC committee, NCES, or both can take action. No federal agency or private foundation, including NCES, has an excellent system for tracking the uses to which the datasets that it sponsored are put. Uses here means the formal statistical analyses of either a stand-alone dataset or datasets that have been put together. The absence of a tracking system makes it difficult to periodically evaluate and improve any given survey. More to the point of this paper, assessing the value of

GEORGE TERHANIAN AND ROBERT BORUCH 217 linked datasets and building linkage policy would be difficult without such a system. Consider, for instance, that in the Hilton (1992) book there are few refer- ences to independent analysis of the datasets that are in the book's ambit. The Hedges and Nowell (1995) paper is a bit more conscientious on this account. In particular, there is a literature review, but it is perforce brief. Both resources reflect a symptomatic lack of a good registry on which scholars analyzed what dataset and, further, on which scholars educed what implications for the datasets' improvement. Linkage often constitutes one option for improvement. A broad implication is that NCES might consider creating registers of the use of datasets in the interest of improving surveys, including linked datasets. More conscientious efforts by authors, professional journal editors, and systems such as ERIC could facilitate this. Certainly, exploiting Internet-based Web sites is feasible for this and related efforts. See the Terhanian Web site for a registry of analysis of NELS:88, for example (http://dolphin.upenn.edu/~terhania/index.html), and Boruch and Terhanian (1998) more generally. Evaluating the Products of Linkage Evaluation, regardless of its style or method, would have to take into account the purposes of putting datasets together. The purposes and dimensions outlined earlier might be used to organize the effort. Each of these purposes or dimen- sions, of course, can be examined with respect to its value for various audiences- scientist, policymakers, intermediary organizations that interpret public statistics for particular constituencies, and so on. To begin such an evaluation, one might study the production process. Link- age is no easy matter in many studies, to judge from the papers reviewed here on research in this arena. For example, merely assuming that an individual or entity is the same and can be identified as such in a record-linkage effort can be compli- cated. In a society as inchoate as the United States, names of individuals and institutions change or are altered for a variety of reasons. Their locations and other characteristics are often thought to be durable, but they also change often. Errors in understanding a question about identity or characteristics are not un- common and are not trivial. Certainly, the value of the products of linkage can be studied and understood. What do the products add to understanding? How? How do we know? Estimat- ing the value added to a scientific body of knowledge in this, as in other arenas, is not always easy. So-called paradigm shifts, involving a remarkable and obvious change in the way science is done, are rare. More important, they come about only with industrious adherence to conventionally conscientious research stan- dards and incremental advances that are discernable. Much of the recent scholarly work on understanding the incremental value of scientific work depends on the system of peer-reviewed publications. Citation

218 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER counts are a stereotypical device for characterizing value, but other approaches can be exploited. For instance, the Proceedings of the American Statistical Association is not viewed by some as a scholarly journal. Nonetheless, the work products published therein are fundamental to our understanding of what goes on at NCES and other statistical agencies. NCES's planned journal, and other peer- reviewed journals, may publish works that appeared earlier in the Proceedings. But it would be as foolish to rely on the latter alone as it would be to ignore the Proceedings. SOME LINKAGE OPTIONS IN EDUCATION STATISTICS There is no formal, well-articulated "linkage policy" at NCES or any other statistical or research agency in the United States. We are aware of no such policy in Sweden, Israel, France, the United Kingdom, Japan, or Germany. Absent formal policy, identifying viable and interesting examples of what is desirable is a dubious objective. In what follows we suppress our ambivalence and discuss what might be desirable. Each suggestion for the future ought to be considered in light of our earlier suggestions in this paper about evaluation and vernacular. Linking NCES Surveys Several of the NCES datasets mentioned earlier, including NAEP, SASS, NELS:88, and CCD, contribute in distinct ways to the research and policy-making communities' understanding of a variety of important educational issues. NAEP for example, generates national and subnational estimates of achievement in core subject areas on a regular basis. SASS, on a somewhat less regular basis, pro- duces a wealth of information concerning teacher supply, demand, quality, and, more generally, conditions in schools. NELS:88 allows researchers to test myriad hypotheses bearing on how, and how well, students learn over time. And the CCD provides general information on the nation's universe of school districts and schools, respectively, on an annual basis. Are These Datasets "Puzzle Pieces" that Fit Together Neatly? Despite their unique contributions, these NCES surveys are not pieces of an education puzzle that fit together neatly. On the contrary, certain pieces seem broken, several duplicate pieces exist, some pieces are inexplicably missing, and a few new pieces are produced so slowly that they appear to be altogether lost. Examples are given in what follows.

GEORGE TERHANIAN AND ROBERT BORUCH Broken Pieces: Example 1 219 Terhanian (1997) analyzed 1994 NAEP data in the interest of developing a deeper understanding of the relationship between school expenditures and student reading proficiency. To obtain school expenditure information for his analysis, Terhanian linked CCD district information (which he then converted to per-pupil values) with NAEP district, school, teacher, and student information. The task of linking CCD and NAEP data was by no means straightforward or seamless, however, because the NAEP dataset did not include the CCD unique identifica- tion code for participating school districts or schools. Yet, as Terhanian discov- ered inadvertently, the NAEP dataset did include the two "broken" pieces (i.e., separate variables) of the unique district code. By simply concatenating the two, Terhanian was able to create the one variable that was necessary to augment the NAEP data with CCD data. A Peculiar Irony NCES does not provide researchers with instructions on how to "fix" the "broken" pieces in the NAEP user's manual. Nor do NCES representatives actively publicize the presence of these pieces. It is perhaps for these reasons that scholars who focus on NAEP' s improvement often recommend linkage with the CCD. They simply do not realize that the two datasets are already sinkable, albeit with difficulty. Duplicate Pieces: Example 2 Several NCES datasets, including NELS:88, SASS, and NAEP, include ques- tions about school quality, teacher experience, and other common areas that concern policy makers and researchers. In some cases the exact same questions, or very similar ones, appear on different surveys. In other cases, however, questions about the same topic are phrased so differently across surveys that it is impossible to compare responses. Understanding NCES's rationale here is not as complicated as it seems. No one at NCES is charged with the responsibility of coordinating the various surveys, many of which run during the same year, at the microlevel. That is, no one really knows which questions are on which surveys, much less how they got there. We believe there is a better way. Missing Pieces: Example 3 Linkage efforts are less successful than planned at times because puzzle pieces are missing. In the 1992 NAEP eighth-grade national math assessment, for instance, only about 60 percent of 8,300 math teachers could be linked cor- rectly to their students. Data were completely missing for 35 percent of the total

220 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER sample of teachers and partly missing for another 5 percent. Attempts by re- searchers to shed light on the relationship between teacher characteristics and student achievement, then, could only flop. NCES and its contractors seem to have corrected the within-school linkage problem the teacher/student match rate improved appreciably for the 1994 and 1996 NAEP assessments. The ability of NCES and its contractors to learn from such failures certainly bodes well for the future. Lost Pieces: Example 4 NCES datasets are not always produced expeditiously. Instead, some datasets, notably the CCD, are produced so slowly that they appear to be altogether lost. This not only diminishes the usefulness of the CCD to researchers and others but also adds to their frustration. Consider, as an example, the case of the JASON Foundation for Education. In 1997 the foundation developed a promising method to deliver science instruction via the Internet to middle school students. At the same time, it developed a simple registration process for potential participants that exploited the interactive nature of the Internet and relied on information from the 1992 to 1993 CCD. In order to register for the pilot program, participants had to first identify their school district from a menu of districts and then their school from a menu of all schools in their district. After they did so, additional information about the district and the school populated several data fields on the registration page. JASON then asked potential participants to complete the registration form by confirming or editing the CCD information that populated the data fields. From start to finish, the entire process should have taken less than five minutes. The registration process turned out to be flawed, however, because a non- trivial percentage of CCD information was either obsolete or missing (i.e., it seemed "lost". For this reason about 10 percent of the first several hundred JASON registrants could not find their school districts or schools listed among those on the registration Web site menu. Others who were able to find their school districts or schools often felt obligated to correct dated information (e.g., number of students in the school). The registration process turned out to be a burden for respondents despite the good intentions of the folks at JASON. What does this example of a "lost piece" imply for NCES? If researchers and others are to rely on the CCD, NCES must ensure that data are collected and compiled more expeditiously. Comparing the pace of the current collection and compilation process to that of the movement of a glacier, regardless of the cause (e.g., state officials possess no obvious incentive to provide NCES with informa- tion in a timely manner), seems fair.

GEORGE TERHANIAN AND ROBERT BORUCH What Combination of NCES Data Is Available and at What Linkage Level? 221 For any randomly chosen public school in the United States, the CCD is likely to be the only NCES information source available to researchers and policy makers. Absent a change in how NCES designs its surveys, there is little reason to expect some nontrivial combination of CCD, SASS, NAEP, and NELS:88 data to be collected during the same year for a meaningfully representative sample of schools. This is despite the fact that some combination of these data would, in our opinion, better serve the research and policy-making communities. Table 10-2 displays crudely the current linkages among and between the NCES datasets mentioned here. It also describes the level at which these datasets are currently sinkable. What are the current research implications of these poten- tial linkages on analysis? It is possible to link some combination of CCD (e.g., core per-pupil expenditures of the Amarillo Independent School District), SASS, NAEP, and NELS:88 information at the district level in a given year. See Terhanian (1997), Wenglinsky (1997), and Taylor (1997) for recent examples of analyses that have exploited some combination of these linkage opportunities. It is also possible, in some cases, to link CCD, SASS, and NELS:88 at the school level in a given year. About 23 percent of the schools in which the sample of NELS:88 students were enrolled in both 1990 and 1992, for instance, also partici- pated in the 1990 to 1991 wave of SASS. CCD information, then, is also avail- able for these schools during these years. The value of linkage may seem trivial to researchers who wish to carry out analyses of student or school samples that are representative of the nation or states. The implications for the design of future surveys, however, are perhaps less trivial. Just as we recommended that NCES or some other thoughtful federal agency develop a map or maps of variables across surveys, we also suggest that they consider doing so for the actual surveys they sponsor. The object of map- ping is to better understand how the education puzzle pieces fit together, what pieces are missing, and what pieces are needed to better complete the puzzle. Linkage and Augmentation of NCES Data and Non-NCES Data At times, states, other federal agencies, and government contractors produce information that can be linked to NCES datasets, including NAEP. For instance, the Pennsylvania Educational Policy Studies Project, which is affiliated with the TABLE 10-2 Linkages Between and Among NCES Datasets Level Data Source District SASS NELS:88 CCD NAEP School SASS NELS:88 CCD

222 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER University of Pittsburgh, maintains a database that provides general descriptive data on the universe of Pennsylvania's school districts. These data include valu- able information that is not available through other sources such as the CCD, notably each school district's Equalized Subsidy for Basic Education (ESBE) revenue (which is the largest source of state aid to school districts) and the ratio the state uses to determine ESBE revenue. States such as Pennsylvania, then, are in a position to exploit linkage oppor- tunities. For instance, the Pennsylvania state department of education might compare NAEP results with results from its own state assessment. Or Pennsylva- nia might undertake a large-scale satisfaction survey of the sample of schools participating in NAEP or SASS in the interest of understanding the effect of school quality, measured more broadly than it is currently measured, on school and perhaps even student achievement. Instances of states capitalizing on NCES's efforts are hard to find, however. An example of a government agency capitalizing on and augmenting NCES' s work is not so hard to find. The General Accounting Office (GAO) used the SASS sample in its recent work to investigate the quality of school facilities across the United States. GAO did not, however, return an augmented dataset to NCES for analysis because no arrangement had been made with NCES in ad- vance. To us this seems quite shortsighted on the part of either NCES, GAO, or perhaps both. The American Institutes for Research, a government contractor, has pro- duced a Teacher Cost Index (TCI) to which NAEP or other NCES datasets might be linked. The TCI is a district-level index that accounts for factors that underlie differences in the cost of living among school districts (Chambers, 1995~. Devel- oped in part on the basis of an analysis of the 1993 to 1994 SASS, the TCI provides researchers with an arguably important tool for adjusting expenditure data to make expenditure effectiveness comparisons more fair. It enables re- searchers to estimate, for instance, the annual salary that school districts across a state would have to pay a similarly qualified teacher. Private Organizations At a high level of analysis, private organizations often link their efforts to a dataset generated by public agencies. Louis Harris and Associates, for instance, periodically surveys nationally representative samples of teachers, students, and parents. The sampling frames on which the organization relies include the CCD. Harris's efforts do not usually engender individual privacy issues because data are reported only in the aggregate. Moreover, the issues that concern Harris are not necessarily those that NCES and other federal agencies are able to focus on. Rather, Harris consciously seeks to fill missing information gaps and therefore focuses on certain important issues in far greater depth than NCES. These issues

GEORGE TERHANIAN AND ROBERT BORUCH 223 include parental involvement; safety and violence in schools, neighborhoods, and cities; and gender equity in schools. There is no great reason why Louis Harris and Associates or other private organizations could not cooperate with NCES (or other statistical agencies) to enhance understanding of the value of sample augmentation linkage of the sort described earlier. Harris could have used the NCES Schools and Staffing Survey or any of the recent NAEP samples, for example, to inform or improve the design of the 1997 Metropolitan Life surveys that investigated gender equity and parental involvement in schools from the perspectives of students, teachers, and parents. And the organization might have provided NCES with resultant datasets as well as suggestions for improving future surveys and/or linkage. Organizations such as Louis Harris and Associates are sensitive to the idea that linkages of various kinds can advance the company's mission in the public interest. They also recognize that linkage of datasets may be useless and that linkage engenders both naive and subtle privacy issues. More important, such organizations can be encouraged to develop more creative and innocuous ap- proaches to policy on putting datasets together. This effort could be made for national samples of schools, local education agencies, sampling frames, and so forth. The information that comes about as a result ought to become a part of the knowledge base for NCES and other statistical agencies. SUMMARY Implication: Electronic Mapping NCES, and perhaps other statistical agencies, can invent a Web-based system for mapping the variables measured in each survey sponsored by the agency (and other studies), the questions that address the variables, and the question response categories, exploiting hypertext to facilitate the acquisition of deeper information and wider searches. This would make easier the task of understanding what is common and unique to diverse surveys in education and perhaps other areas. Such a system is a natural extension of NCES's work on data warehousing and electronic code books and can adopt software that meets open database connec- tivity standards. Implications: Nomenclature NCES can play a leadership role in clarifying and standardizing the semantics of linkage. This would help make plainer and more uniform words such as merging, pooling, connecting datasets and so forth and fostering sensitivity to definitions of these in statistical policy, activity, and publications. NCES has been vigorous in related respects in the past, to judge from the agency's work

224 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER with state education agencies on, for example, determining what dropout means and how a dropout is counted. Implications: Dimensionalizing Linkage NCES can explore ways to make plainer the functions of linking surveys, in effect dimensionalizing linkage activity. This might be done, as suggested earlier, by hinging dimensionalization on the ideas of augmenting a primary survey with two or more secondary ones, focusing on what is augmented: samples, popula- tions, variables, modes of measurement, replication, and so on. The rationale is that we need to learn how to better arrange our thinking about very complex linkage efforts. Implication: Linkage Policy NCES can explore at least two approaches to linkage policy. Ex-ante policy stresses the idea that all surveys can be planned so as to be more connectable in specific senses. Ex-post facto policy recognizes that not all linkage can be planned and that unplanned linkage must be planned for. Further, institutional vehicles for developing policy can be identified and explored, such as inter- agency councils and statistical agency task forces. In the continued absence of coherent policy, we are unlikely to make much progress in productively exploit- ing diverse surveys or in better understanding the benefits and costs of linked studies. Implication: Registries, Displays, and Evaluation Developing a registry of each study that depends on linkage and developing new ways of displaying sinkable or linked studies is possible. These are essential to understanding the linkage landscape and, moreover, to evaluating the value of linkages of various kinds. No such registries exist. Partly for this reason, per- haps, few formal and comprehensive evaluations of linkage efforts have been published. Implication: Broken Pieces, Missing Pieces NCES can consider approaching linkage issues productively by using a "broken pieces, missing pieces" theme. That is, one tries to understand how a study could be more informative had the possibility of linkage actualized through better planning. This perspective is kin to the idea underlying good postmortems in medicine and good crash investigations in the aviation and nuclear sciences, engineering, and other disciplines. It can be exploited by statistical agencies in the linkage context as it is, in effect, in individual survey efforts and formalized.

GEORGE TERHANIAN AND ROBERT BORUCH Implication: Cross-Agency and Cross-Institution Initiatives 225 NCES can play a leadership role in understanding whether, how, and how productive certain kinds of linkage studies that cross institutional and geopolitical jurisdiction lines have been and could be done. In pnnciple, for example, some surveys sponsored by the public might easily be linked in one or more dimensions with privately sponsored surveys. In principle a survey mounted by a federal statistical agency such as NCES can be designed so as to permit easy connection to a study designed by a federal agency with another mission, such as program evaluation. What is possible in principle is not always possible in practice, but unless we explore the former, we will not improve the latter. To return to the general topic of this essay, recall the quotation from Henry James at the start of this paper. It says, in other words, that everything is related to everything else. To make this manageable, NCES and the statistical and social sciences community have to draw circles around the more connectable things. In this respect the work reviewed in this paper and the implications educed here can help NCES and the research community do better in the future. This requires resources, of course, not the least among which is the political and scientific will to make data work harder to serve the public interest. ACKNOWLEGMENTS Research for this paper was sponsored by the National Center for Education Statistics, the National Science Foundation, and the U.S. Department of Educa- tion. We are grateful to colleagues at the Planning and Evaluation Service of the U.S. Department of Education, the U.S. General Accounting Office, and the Education Statistical Services Institute for conversations that helped clarify our thinking on the topic. REFERENCES Blasius, J., and M. Greenacre 1998 Visualization of Categorical Data. New York: Academic Press. Boruch, R.F., and G. Terhanian 1996 So what? The implications of new analytic methods for designing NCES surveys. Pp. 4.1-4.118 in From Data to Information: New Directions for the National Center for Education Statistics, G. Hoachlander, J. Griffith, and J.H. Ralph, eds. Washington, D.C.: U.S. Department of Education. 1998 Controlled experiments and survey-based studies on educational productivity: Cross- design synthesis. Pp. 59-85 in Advances in Educational Productivity, Volume 7, A. Reynolds and H. Walberg, eds. Greenwich, Conn.: JAI Press. Bohrnstedt, G.W. 1997 Connecting NAEP Outcomes to a Broader Context of Educational Information. Paper presented at the annual meeting of the American Educational Research Association, Chicago.

226 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER Braun, M., and W. Miller 1997 Measurement of education in comparative research. Comparative Social Research 16: 163-201. Brooks-Gunn, J., B. Brown, G.J. Duncan, and K.A. Moore 1995 Child development in the context of community resources: An agenda for national data collection. Pp. 27-97 In Integrating Federal Statistics on Children: Report of a Work- shop. Board on Children and Families and Committee on National Statistics, National Research Council. Washington, D.C.: National Academy Press. Bruce, R.V. 1973 Bell: Alexander Graham Bell and the Conquest of Solitude. New York: Little Brown. Bushery, J., D. Royce, and D. Kasprzyk 1992 The Schools and Staffing Survey: How re-interview measures data quality. In 1992 Proceedings of the Section on Survey Research Methods. Alexandria, Va.: American Statistical Association. Citro, C.F. 1997 Editor's postscript. Chance 10(4):31. Chambers, J. 1995 Public School Teacher Cost Differences Across the United States. Washington, D.C.: National Center for Education Statistics. Cotton, R.G.H., V. McKusick, and C.R. Scriver 1998 The HUGO Mutation Database Initiative. Science 279:10-11. Cox, L.H., and R.F. Boruch 1988 Emerging policy issues in record linkage and privacy. Journal of Official Statistics 4(1):3-16. Evinger, S. 1997 Recognizing diversity: Recommendations to OMB on standards for data on race and ethnicity. Chance 10(4):26-31. Letter. Science 275: 1862-1863. Grace, J.B. 1997 Griffith, J. 1992 Presentation to the National Advisory Council on Education Statistics (March 12-13, 1992): Draft Paper on a Proposal for an Integrated Longitudinal Studies Program. Wash- ington, D.C.: National Center for Education Statistics. Harkness, J., and P. Mohler 1998 Towards a Manual of European Background Variable: Part I, Appendix II: Report on Background Variables in a Comparative Perspective. Mannheim, Germany: Zentrum fur Umfragen, Methoden und Analysen. Hedges, L.V., and A. Nowell 1995 Sex differences in mental test scores, variability, and numbers of high scoring individuals. Science 269:41-45. Hilton, T., ed. 1992 Using National Data-bases in Educational Research. Hillsdale, N.J.: Lawrence Erlbaum Associates. Hofferth, S.L. 1995 Children's transition to school. Pp. 98-123 in Integrating Federal Statistics on Children: Report of a Workshop. Board on Children and Families and Committee on National Statistics, National Research Council. Washington, D.C.: National Academy Press. Holland, P.W., and D.B. Rubin, eds. 1982 Test Equating. New York: Academic Press.

GEORGE TERHANIAN AND ROBERT BORUCH 227 Holt, A., S. Kaufman, F. Scheuren, and W. Smith 1994 Intersurvey consistency in school surveys. Pp. 105-l lO in Volume II: 1994 Proceedings of the Section on Survey Research Methods. Alexandria, Va.: American Statistical Association. Jenkins, C.R., and A. Wetzel 1994 The 1991-92 teacher follow-up survey reinterviewed and extensive reconciliation. Pp. 821-826 in Volume II: 1994 Proceedings of the Section on Survey Research Methods. Alexandria, Va.: American Statistical Association. Johnson, F. 1993 Comparisons of school locale settings: Self-reported vs. assigned. Pp. 689-691 in 1993 Proceedings of the Section of Survey Research Methods. Alexandria, Va.: American Statistical Association. Kasprzyk, D., K. Gruber, S. Salvucci, M. Saba, F. Zhang, and S. Fink 1994 Some data issues in school-based surveys. Pp. 815-820 in Volume II: 1994 Proceedings of the Section on Survey Research Methods. Alexandria, Va.: American Statistical Association. Kilss, W., and W. Alvey, eds. 1985 Record Linkage Techniques: Proceedings of the Workshop on Exact Matching Method- ologies. Washington, D.C.: U.S. Department of the Treasury. King, K.E., and S. Kaufman 1994 Estimation issues related to the student component of SASS. Pp. 1111-1115 in 1994 Proceedings of the Section on Survey Research Methods. Alexandria, Va.: American Statistical Association. Kruskal, W.H., ed. 1982 The Social Sciences: Their Nature and Use. Chicago: University of Chicago Press. Ligon, G. 1998 Success Finder Mapper. Available at: www.evalusoft.com. McCabe, B., and J. Harkness 1998 Towards a Manual of European Background Variable: Part I, Appendix II: Report on Background Variables in a Comparative Perspective. Mannheim, Germany: Zentrum fur Umfragen, Methoden und Analysen. Nadis, S. 1996 Computation cracks semantic barriers between data-bases. Science 272:1419. National Research Council 1992 Teacher Supply, Demand, and Quality: Policy Issues, Models, and Data-bases, E.E. Boe and D.M. Gilford, eds. Committee on National Statistics. Washington, D.C.: National Academy Press. 1995 Integrating Federal Statistics on Children. Board on Children and Families and Commit tee on National Statistics. Washington, D.C.: National Academy Press. 1999 Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress, J.W. Pellegrino, L.R. Jones, and K.J. Mitchell, eds. Committee on the Evaluation of National and State Assessments of Educational Progress, Board on Testing and Assessment. Washington, D.C.: National Academy Press. Pallas, A. 1995 Federal data on educational attainment and the transition to work. Pp. 122-155 in Inte grating Federal Statistics on Children: Report of a Workshop. Board on Children and Families and Committee on National Statistics, National Research Council. Washington, D.C.: National Academy Press.

228 PUTTING SURVEYS, STUDIES, AND DATASETS TOGETHER Rosen, S., ed. 1974 Final Report of the Panel on Manpower Training Evaluation: The Use of Social Security Earnings Data for Assessing the Impact of Manpower Training Programs. Washington, D.C.: National Academy of Sciences. Scheuren, F. 1995 Administrative Record Opportunities in Educational Survey Research. Report prepared for the National Center on Educational Statistics. Washington, D.C.: George Washington University. Spencer, B.D. 1980 Conducting benefit cost analysis. Pp. 38-59 in R.W. Pearson and R.F. Boruch, eds. Lecture Notes in Statistics: Survey Research Designs. New York: Springer-Verlag. Taylor, C. 1997 The Effect of School Expenditures on the Achievement of High School Students: Evi- dence from NELS and the CCD. Paper presented at the American Educational Research Association annual meeting, Chicago. Terhanian, G. 1997 School Policies and Practices, Student Proficiency, and Racial Differences in Proficiency: Evidence from a Multilevel Analysis of the Reading Proficiency of 4th Graders from Pennsylvania and New York. Paper presented at the Summer Data Conference of the National Center for Education Statistics, Washington, D.C. Homepage. Available at: http://dolphin.upenn.edu/~terhania. Tufte, E.R. 1990 Envisioning Information. Cheshire, Conn.: Graphics Press. U.S. General Accounting Office 1986a Computer Matching: Assessing Its Costs and Benefits. Washington, D.C.: U.S. General Accounting Office. 1986b Computer Matching: Factors Influencing the Agency Decision Making Process. Wash- ington, D.C.: U.S. General Accounting Office. Vogel, G. 1997 Publishing sensitive data: Who calls the shots? Science 276:523-526. Wenglinsky, H.A. 1997 When Money Matters: How Educational Expenditures Improve Student Performance and When They Don't. Princeton, N.J.: Policy Information Center, Educational Testing Service. Williams, N. 1997 How to get databases talking to one another. Science 275:301-330.

Next: 11 Developing Classroom Process Data for the Improvement of Teaching »

Grading the Nation's Report Card: Research from the Evaluation of NAEP (2000)

Chapter: 10 Putting Surveys, Studies, and Datasets Together: Linking NCES Surveys to One Another and to Datasets from Other Sources

Welcome to OpenBook!

Get Email Updates