Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 5
Change and the 2020 Census: Not Whether But How Change and the 2020 Census: Not Whether But How FORTY YEARS AGO, THE U.S. CENSUS BUREAU mailed questionnaires for the 1970 census to households “in the larger metropolitan areas and some surrounding counties”—covering roughly 60 percent of the population—and asked that the households return the completed form by mail (U.S. Census Bureau, 1976:1-6). Structured in 393 local offices and coordinated by staff in 13 regional (area) offices, a large and temporary workforce of 193,000 employees was assembled for two basic but massive tasks (U.S. Census Bureau, 1976:1-52, 5-1, 5-4).1 First, the temporary staff conducted census operations and interviewed respondents outside the dense urban areas—covering less than half of the total population but the vast majority of the land area of the nation. Second, the staff carried out the costly operation of knocking on doors and following up with households that did not return the mail questionnaire. Completed questionnaires were processed using the Bureau’s Film Optical Sensing Device for Input to Computers (FOSDIC) system, preparing the data for analysis and tabulation. The 1970 census is instructive because—in broad outlines—it has provided the basic model of U.S. census-taking for every decennial count that has followed. The numbers of offices and staff have changed in later censuses and some of the underlying technology has changed, including FOSDIC’s microfilm sensing giving way to optical character recognition in 2000 and 2010. The fraction of the population counted principally by mailout/mailback of questionnaires has increased, although temporary field staff are still the first point of contact—either for delivery of questionnaires 1 The number of offices does not include 6 local offices and 1 temporary regional office in Puerto Rico, and the 193,000 staffing figure does not include Alaska or Hawaii (U.S. Census Bureau, 1976:1-52, 5-4).
OCR for page 6
Change and the 2020 Census: Not Whether But How to be returned by mail or for direct interviewing—for people across much of the land area of the nation. Although the methodological basics of the U.S. census have remained the same over those 40 years, the cost of the census has decidedly not. Since 1970, the per-housing-unit cost of the census increased by at least 30 percent from decade to decade (and typically more); even with the Census Bureau’s announcement that the 2010 census will return $1.6 billion to the treasury, the per-housing-unit cost of the 2010 census is likely to exceed $100, relative to the comparable 1970 figure of $17 per unit (National Research Council, 2010:Table 2-2).2 To be sure, a contributor to the cost increases has been the addition of specialized operations to increase the coverage and accuracy of the count—to the extent that the 2000 census largely curbed historical trends of undercounting some demographic groups and, indeed, may have overcounted some (National Research Council, 2010:29–30). That said, the cost of American census-taking has reached the point of being unsustainable in an era when unnecessary government spending is coming under increased scrutiny; the Census Bureau is certain to face pressure to do far better than the projections of straight-line increases in costs that have accompanied earlier censuses. At this writing, the tremendously complex and high-stakes civic exercise that is the 2010 census will still be very much in operation. Even with the release of state-level population counts by the statutory deadline of December 31, 2010 (“within 9 months after the census date;” 13 USC §141[b]),3 work will continue toward the release of the detailed, census-block–level data for purposes of legislative districting by the end of March 2011 (13 USC §141[c]). Indeed, even some field interviewing work for the Census Coverage Measurement (CCM) that will provide basic quality metrics for the census continues into the early months of 2011. However, although the 2010 census continues, it is not too early to turn attention to the census in 2020. It is actually both appropriate and essential that work and research begin very early in the 2010s, if the design of the 2020 census is to be more than an incremental tweak of the 2010 plan and if the 2020 census is to be more cost-effective than its predecessors. 2 Figures are based on conversions to 2009 dollars in National Research Council (2010:Table 2-2). The Census Bureau announced the $1.6 billion savings (relative to budgeted totals) in August 2010; see http://www.census.gov/newsroom/releases/archives/2010_census/cb10-cn70.html. Using $13.1 billion as the life-cycle cost for the 2010 census rather than $14.7 billion as in the cited table yields a per-household-cost estimate of $102.42. 3 The Census Bureau announced the apportionment totals 10 days early, in a December 21 press event. Earlier that day, the secretary of commerce officially transmitted the results to the president as required by law; the president, in turn, transmitted the numbers and the corresponding allocation of seats to both houses of Congress during the first day of the 112th Congress on January 5, 2011.
OCR for page 7
Change and the 2020 Census: Not Whether But How A THE PANEL AND THIS REPORT Sponsored by the U.S. Census Bureau, the Panel to Review the 2010 Census has a broad charge to evaluate the methods and operations of the 2010 census with an eye toward designing and planning for a more cost-effective 2020 census. (The full statement of charge is shown in Appendix A.) In our first year of operation, the panel has held five meetings with both public data-gathering sessions and deliberative sessions. In late fall 2009, the Census Bureau convened a series of informal brainstorming sessions on possible directions for 2020—on such topics as response options and coverage improvement—in which members of our panel participated along with other external experts and Census Bureau staff. Subsequently, small working groups of panel members held similar brainstorming sessions with Census Bureau staff on topics chosen by the panel, including automation of field processes. Between February and August 2010, panel members and staff conducted 58 site visits to local census offices, regional census centers, and data capture facilities in order to obtain information on current census operations with an eye toward future improvements; see Appendix B for a listing. A subgroup of the panel also visited the headquarters of Statistics Canada in Ottawa in May 2010 to discuss the use of the Internet for data collection in Canada’s 2006 and 2011 censuses, as well as the Statistics Canada approach to research and testing. This first interim report is directed at the forward-looking part of our charge—general guidance for 2020 census planning—for two reasons alluded to in the introduction. First, there are important ways in which the 2010 census is still ongoing—neither the final census data nor the operational data needed to evaluate specific census functions are yet available—and so the actual “2010 evaluation” part of our charge is necessarily premature. More fundamentally, the second reason for a forward-looking approach is that the early years of the decade are critical to the shape of a 2020 count that is meaningfully different from 2010. Change is difficult for any large organization, and confronting the swelling cost of census-taking is something that will require aggressive planning, research, development—and, to be clear, investment of resources—in a very short time frame. Importantly, the guidance in this report draws on the efforts and experience of several predecessor National Research Council panels. Our Panel to Review the 2010 Census effectively combines the functions of two expert panels sponsored by the Census Bureau to accompany the 2000 census: the Panel to Review the 2000 Census, tasked to observe the census in process, and the Panel on Research on Future Census Methods, tasked to evaluate the then-emerging plans for the 2010 census. Both of those panels’ final reports offer recommendations for later censuses that remain relevant for 2020 planning; in particular, the suggested directions in Reengineering the
OCR for page 8
Change and the 2020 Census: Not Whether But How 2010 Census: Risks and Challenges (National Research Council, 2004b) on the Census Bureau’s geographic resources and its approach to developing the technical infrastructure of the census still apply directly to 2020. More recently, the immediate predecessor Panel on the Design of the 2010 Census Program of Evaluations and Experiments (CPEX) issued its final report, Envisioning the 2020 Census (National Research Council, 2010). Having offered its guidance on the design of the formal experiments and evaluations accompanying the 2010 census in its earlier reports, that panel’s final report deliberately addressed census research and evaluation in a much broader perspective. Consequently, our panel’s report serves to amplify and extend some of the themes from the CPEX panel’s study. B RESEARCH PLANS FOR THE 2020 CENSUS At steps during the first year of our work, the panel reviewed initial suggestions by the Census Bureau for their research plan leading to the 2020 census. We have done so in our plenary meetings as well as the working group sessions mentioned above. However, we cannot comment on the Bureau’s “final” version of its initial research plan for 2020 because it is not available to us. The intricacies of the federal budgeting process are such that some form of a research plan is factored into the Census Bureau’s budget submission for fiscal year 2012. However, as those submissions are not official in any sense until the administration formally proposes its budget in early 2011, the Census Bureau is not at liberty to discuss specific proposals with us or any other advisory group. We think it counterproductive to try to assess and speak about specifics in the research plan because we have no insight as to what details may or may not have made it into a final draft, not to mention what may or may not change the plan during departmental and administration review. Accordingly, this deliberately-short report focuses on general principles, reserving discussion of specific proposals for future reports and interactions. C 2020 DIRECTIONS: POSITIVE SIGNS, BUT FOCUS AND COMMITMENT NEEDED With that caveat, our general assessment of the Bureau’s posture going into early 2020 census research and planning is that there are several good signs. The Census Bureau deserves credit for early moves that it has made in the direction of 2020 census planning. The Bureau’s expressed intent to create a parallel organizational directorate on 2020 while the existing 2010-focused directorate continues its work and its reinstatement of a core research and methodological directorate are both very positive signs. We are
OCR for page 9
Change and the 2020 Census: Not Whether But How also encouraged by the Bureau’s apparent intent to use smaller, more frequent experiments during the decade rather than rely principally on a small number of large-scale “test censuses” as in recent decennial cycles.4 At its December 2010 meeting, the panel also heard about the Bureau’s commitment to improve its cost accounting and its analytical cost modeling capabilities, both of which will be essential to informing the discussions of census cost and quality that await the decade. Finally, we think it is a positive sign that—as we have discussed broad sketches of a 2020 research strategy with the Census Bureau over the first year of the panel’s work—there appears to be agreement between the Bureau and the panel on the broad topic areas along which an effective and efficient 2020 census design should take shape. The guidance on research offered by two predecessor National Research Council panels—one that reviewed plans for the 2010 census early in the last decade (National Research Council, 2004b) and a more recent one that looked ahead to 2020, having reviewed the experiments and evaluations planned for the 2010 census (National Research Council, 2010)—remains sound. We explicitly echo some of their points here and generally endorse their suggestions. However, we also share their concerns that early census planning efforts will founder if they lack a clear focus and strong organizational commitment. Accordingly, in this report, we are deliberately very sparing in our formal recommendations—reserving them to three main messages that are meant to suggest and cultivate a specific attitude toward 2020 census research. First, we suggest that research and development energies be focused under four headings: Recommendation 1: The Census Bureau should focus its research and development efforts on four priority topic areas, in order to achieve a lower cost and high-quality 2020 census: Field Reengineering—applying modern operations engineering to census field data collection operations to make the deployment of staff and the processing of operational data more efficient; Response Options—emphasizing multiple modes of response to the census for both respondent convenience and 4 See National Research Council (2010:App. A) for a summary of major Census Bureau testing and experimentation activities between 1950 and 2010. From that summary, the National Research Council (2010:65–67) panel found it clear that “the Bureau used to be considerably more flexible” in its testing and that “small, targeted tests in selected sites used to be more frequent;” by comparison, in the 2000 and—particularly—2010 rounds of testing, “selected studies seem to have been chosen more based on the availability of testing ‘slots’ ” in large-scale tests than on research questions or operational concerns. As that panel noted, there is certainly value in large-scale tests or dress rehearsals to properly practice with new census machinery, but there are many more questions that can be answered through strategic use of smaller-scale research activities.
OCR for page 10
Change and the 2020 Census: Not Whether But How data quality, including provision for response via the Internet; Administrative Records—using records-based information to supplement and improve a wide variety of census operations; and Continuous Improvement of Geographic Resources—ensuring that the Census Bureau’s geographic databases, especially its Master Address File (MAF), are continually up-to-date and not dependent on once-a-decade overhauls. We urge the Census Bureau to adopt a small number of focused goals. Individual research projects should be considered and conducted with reference to these priority areas; consideration should be given to how individual research efforts build on each other and contribute to an overall program of research within each topic area. The key problems that we have observed in early iterations of the Bureau’s 2020 research plan are that—beyond identifying these broad, priority topic areas—the Bureau’s plans have shown a lack of focus and a lack of commitment, and they have suffered somewhat from the “stovepipe” mentality that the Bureau’s new organizational approaches may help to break. It may be useful to elaborate on each of those phrases: By “lack of focus,” we mean that initial drafts of the research plan included dozens of specific projects, roughly falling under four main topic headings in one iteration but with little notion of how they contribute to that topic and how (or if) they build from one to the other. The point in laying out a research agenda is to provide some kind of direction toward an end result, outlining how specific research tasks shed light on the decisions and trade-offs that will ultimately need to be made in shaping the 2020 census; previous versions of the Bureau’s research plan appeared to try to overwhelm with the sheer number and range of activities, and lacked that sense of direction. By “lack of commitment,” we mean that the Bureau seems to have largely shied from taking more than an exploratory position to these four priority areas—not wanting to be locked into any one design too early, which is understandable, but ultimately conveying a sort of half-heartedness about major changes in approach. The argument that “no one knows what X will look like in 2020”—in which, in varying discussions, X has been “the Internet,” “mail delivery,” “administrative records,” “commercial records,” “geography,” and others—is undeniably true. But that reasoning is dangerous if it is used to dispel or minimize research on future directions rather than a challenge to work on those aspects of future technology and capability that can be studied and tested now—anticipating the kinds of capabilities that will be
OCR for page 11
Change and the 2020 Census: Not Whether But How available in commercial, off-the-shelf hardware and software closer to 2020. Finally, by the “stovepipe” mentality, we mean that—at least until the recent administrative changes began to circulate—the 2020 planning was being done on scarce resources by very limited numbers of staff. The staff work was energetic and good and very useful for framing, but ultimately lacking because it was—and was presented as—a set of proposed activities done in isolation from other parts of the Bureau. Among other things, the draft research proposals for 2020 lacked any explicit connection or coordination with the formal research work being done in the 2010 CPEX program, the activities planned in the Geographic Support System initiative (about which more is said below), the American Community Survey (including the testing of Internet response to that survey) and other current surveys, and the Bureau’s economic directorate (which also makes use of Internet response and operational control systems). Continuing to ask whether the Bureau should retool its field technical infrastructure or whether administrative records should play a role in the 2020 census is not the right approach; it seems to be grounded in the notion that a single fix or a single tweak in census approach will be sufficient to drive down 2020 census costs. We are convinced that no such single fix exists, and that the shape of the 2020 census will have to make use of work in all the priority research areas, in some measure, to materially change 2020 conduct. Accordingly, we recommend an aggressive, assertive posture toward research: Recommendation 2: The Census Bureau should commit to implement, in the 2020 census, strategic changes in each of the four priority areas identified in Recommendation 1. The manner of implementing them should be guided by research on how each type of change may influence the trade-off between census accuracy and cost. We think that this approach is the most effective way to build the research and evidentiary base for the 2020 census plan. The third and final central message of this report is meant as a practical way of underscoring and cultivating the kind of commitment to serious reengineering called for in Recommendation 2. We concur with the predecessor National Research Council (2010:43) panel that commitment to change can and should be helped by setting a bold goal that is “stark, ambitious, and public.” As the previous panel wrote, “it has become almost rote to include ‘containing cost’ as a goal of the decennial census” when what is needed is “meaningful reductions in per-household cost—through leveraging new technology and methodology—without impairing quality.”
OCR for page 12
Change and the 2020 Census: Not Whether But How We agree that a bold goal is crucial to motivating census research over the decade, and accordingly suggest a slight variant of the goal offered by the previous panel: Recommendation 3: The Census Bureau should motivate its planning and reengineering for the 2020 census by setting a clear and publicly announced goal to reduce significantly (and not just contain) the inflation-adjusted per housing unit cost relative to 2010 census totals, while limiting the extent of gross and net census errors to levels consistent with both user needs and cost targets. This should take into account both overall national coverage errors and regional variations in them. Quite deliberately, we phrase our recommendation in still-stark but more general terms than the previous panel, which urged (National Research Council, 2010:Rec. 2.1) the Bureau to plan for the 2020 census with the stated goal of holding per-household cost and national and major demographic group coverage errors to their 2000 census levels (not 2010). The previous panel’s report (National Research Council, 2010:Ch. 2) traces long-term trends in census cost and quality measures, noting the more than 600 percent increase in real-dollar per-household cost between the 1970 and 2010 censuses in contrast with much smaller relative gains in census accuracy (as measured by net census error). An earlier National Research Council (1995:55) panel devoted considerable attention to explaining the growth in census cost between the 1970 and 1990 censuses, finding itself unable to directly account for some three-fourths of the total increase. That panel ultimately concluded that the increase was largely driven by the Census Bureau “pouring on resources in highly labor-intensive enumeration efforts to count every last person,” in response to demands for highly accurate small-area data, at the same time as public cooperation with the census dipped and measured net undercount actually increased from 1980 to 1990. Fifteen years later, the successor National Research Council (2010:39–40) concurred, noting a “steady accretion of coverage improvement programs” over the decades—all of which arguably have some value but few of which are subjected to extensive cost-benefit analysis, and none of which are cost-free. That panel observed that, looking ahead to 2020, the census is at a critical point at which additional spending on existing methods or adding still more coverage improvement layers “in an effort to reduce net under-coverage could conceivably add more error than it removes.” We think that the research directions we suggest in this report are capable of achieving significant streamlining of effort and per-capita household cost reductions without tipping the balance to higher levels of census error. But we also think that it is premature to suggest specific totals or percentages as targets for 2020; setting those targets will depend critically on the raw and opera-
OCR for page 13
Change and the 2020 Census: Not Whether But How tional data from the 2010 census, the results of the 2010 census evaluation and Census Coverage Measurement programs—and on early, pilot research this decade. D FIELD REENGINEERING: NEED FOR MODERN OPERATIONS ENGINEERING The priority research areas noted in Recommendation 1 are all important, but it is logical to start the discussion with the topic of field reengineering and the automation of field operations. We use the term “field reengineering” as a convenient shorthand, cognizant that the term is open to overly simplistic interpretations—“field” perhaps connoting a narrow focus on the moment-by-moment work of individual temporary enumerators and local staff, “reengineering” perhaps connoting a stringent restriction to development of computer software or hardware systems. By the term “field reengineering,” we mean both of those individual threads and more: a fundamental evaluation of all major operations, with an eye toward optimization of effort and resources and improvement of cost-effectiveness. If cost reduction—while maintaining quality—is to be a major focus of planning the 2020 census, then it follows that reexamining and streamlining field operations must be on the table. The largest component expenses of the modern decennial census are those that involve the mass hiring and deployment of temporary census workers. The most expensive single operation in modern censuses is Nonresponse Follow-up (NRFU), knocking on doors and otherwise attempting to collect information from households that do not return their questionnaire by mail or other means. Doing whatever is possible to contain the size of the NRFU workload is a key motivator for work on administrative records and response options, as discussed below. But, assuming that there will inevitably be a need for some substantial NRFU, finding ways to make it more cost-effective is important. Although NRFU is the single largest field operation, other major field operations also involve the major deployment of temporary staff: in support of the 2010 census, such operations included the complete Address Canvassing operation to verify address list entries in 2009, a series of operations to establish contact with and then count at group quarters (nonhousehold) locations, and the deployment of enumerators to either deposit questionnaires or conduct interviews in areas of the country where mail delivery of questionnaires was not feasible. It is also appropriate to discuss field reengineering and automation first because they may be the most difficult for the Census Bureau to address, on three key levels. One is that true, systematic review of operations—close to approaching the basic ideas of census-taking from a blank-sheet or first principles approach—is relatively new to the decennial census. The estab-
OCR for page 14
Change and the 2020 Census: Not Whether But How lished and entrenched mechanics of census-taking, stemming from having used the same basic outline of operations for 40 years, breeds a familiarity with normal routines; this familiarity, in turn, contributes to a culture in which “just-in-time” systems development and training are accepted, even though the risks are high and costs substantial. A second reason for the primacy of field reengineering is that it is cross-cutting and highly intertwined with the other three research areas. The technical systems that assign field staff must properly synchronize with systems for handling multiple response modes to the census form, the degree to which administrative records data may be used in census operations directly affects the scope of field operations and the level of follow-up necessary, and field systems are of little use if they do not reflect current and accurate geographic features and address information. Accordingly, a fresh approach to field automation can be difficult because the task is so large and extensive, and so it is not neatly compartmentalized into a single “project.” But, arguably, the key difficulty in field reengineering is illustrated by the record of experience leading to the 2010 census. A more detailed account of systems development for 2010 must await future reports of the panel, in line with evaluating the systems that ultimately were used in the census. But for this first report a brief summary suffices. The complication for approaching field reengineering in 2020 is that field automation—in the specific form of developing handheld computers for use in both NRFU and Address Canvassing—was a major plank in the Census Bureau’s plans for 2010. The Census Bureau assembled mobile computers using commercial, off-the-shelf products for preliminary testing in its 2004 and 2006 field tests. As those tests continued, the Bureau moved toward issuance of a major contract to develop not only the handheld computers but also the operational control systems that manage the flow of information among census headquarters, regional and local census offices, and individual field staff. The five-year, $600 million Field Data Collection Automation (FDCA) contract was awarded to Harris Corporation in March 2006. Problems with use of the Harris-developed devices in spring and summer 2007, in Address Canvassing for the 2008 census dress rehearsal, began to be noted during that operation (e.g., U.S. Government Accountability Office, 2007) and into the fall. In early January 2008, online media broke the story that the Census Bureau had been advised in November 2007 that the handheld development was in sufficiently “serious trouble” that paper-based contingency operations should be immediately developed (Holmes, 2008). When the Census Bureau submitted a new, “final” set of requirements to its FDCA contractor in mid-January 2008, the resulting cost estimate prompted the Bureau and the U.S. Department of Commerce to assemble a task force to suggest options for the
OCR for page 15
Change and the 2020 Census: Not Whether But How FDCA work.5 Ultimately, the strategy chosen by the Bureau in 2008 was to abandon the use and the development of the handhelds for all but the Address Canvassing operation—making the 2010 NRFU operation completely paper-based and putting the estimated total life-cycle cost of the 2010 census at roughly $14.5 billion.6 That the Census Bureau stumbled in field systems development—very visibly and expensively—in preparing for the 2010 census is a complication for 2020 because it may induce some skittishness about moving aggressively so early in the development for the next census. The high price tag of the collapse of the FDCA handhelds and the late switch to paper-based NRFU operations may also make it more difficult to sell the idea of field reengineering as a short-term investment to save money with an efficient and effective census in the long run. But we think it is a wise investment, and that it is key to avoiding a 2020 census that is merely an incremental tweak on 2010; having stumbled in 2010 systems development highlights the importance of trying again and succeeding where the previous efforts foundered. Our predecessor National Research Council (2004b:172–173) Panel on Research on Future Census Methods sketched out the major stages of successful system reengineering efforts, based on “past experience with reengineering and upgrading information technology operations within corporations and government agencies.” In our assessment, these steps remain the right prescription, and we echo and endorse them here: Define a “logical architecture” or “business process” model: A“logical architecture” is a blueprint of the workflow and information flow of a particular enterprise—the full set of activities and functions and the informational dependencies that link them. As described by the earlier panel (National Research Council, 2004b:175), the key attribute of the logical architecture model is that it is focused on function and purpose; it is not a timetable that assigns completion times to individual functions and it should not be based on (or constrained by) existing organizational boundaries. The baseline logical architecture model becomes just that: an “as-was” model that serves as the basis for redesign or replacement. 5 Our predecessor Panel on Research on Future Census Methods—whose final report, issued in 2004, we reference in this section—forecast the problems that burst forth in 2008, stating that the handheld development effort would go awry without early attention to requirements and functionality rather than specific forms of devices. “A second risk inherent with the [handheld computer] technology lies in making the decision to purchase too early and without fully specified requirements, resulting in the possible selection of obsolete or inadequate devices” (National Research Council, 2004b:7). 6 For additional information on the handheld development portion of the FDCA contracts, (see, e.g., U.S. Department of Commerce, Office of Inspector General, 2006; U.S. Government Accountability Office, 2008).
OCR for page 18
Change and the 2020 Census: Not Whether But How merators and field staff. Yet the underlying processes cannot be reengineered if they are not articulated and assessed up front. E RESPONSE OPTIONS: PROMOTING EASIER AND LESS EXPENSIVE REPLIES Field reengineering may be the most difficult of the topic areas for the Census Bureau to handle for a variety of reasons, but arguably the topic area for which a timidity in approach could be most costly is that of response options—and census response via the Internet in particular. Guiding respondents to submit their census information in an inexpensive and computer-ready format is critical to curbing the cost of moving and processing paper. To be sure, obtaining a substantial percentage of respondent take-up via the Internet is a challenging task; among other things, care must be taken to make sure that questions asked via paper or the Internet (or through other modes) share common structures and yield the same information regardless of mode, and the Census Bureau should tap the developing literature on building Internet participation in the census and survey contexts. But Internet response must not be treated as a far-off or unobtainable goal either, because it is likely a key contributor to a more cost-effective census. Again, a full examination of the decisions made for the 2010 census awaits future (and more evaluative) reports, but a brief summary is useful here. Our predecessor National Research Council (2010) panel discussed the chronology in more detail in an appendix to its interim report (which is, in turn, reprinted as part of the 2010 volume). Although the 2000 census included an Internet response option (albeit an unpublicized one) and Internet response was advocated in very early planning documents for 2010, the Census Bureau reversed course in mid-decade and announced in summer 2006 that online response would not be permitted in the 2010 census. The primary arguments cited by then-Census Bureau director Louis Kincannon (2006) and a Bureau-commissioned report by the MITRE Corporation (2007) included intense worries about security (e.g., a “phishing” site set up to resemble the census) that could negatively impact the response rate as well as concerns from pilot testing that offering Internet response as an option did not significantly increase overall response rates. Acknowledging the Bureau’s stance, the previous panel pointedly remarked that “the panel does not second-guess that decision, but we think that it is essential to have a full and rigorous test of Internet methodologies in the 2010 CPEX” (National Research Council, 2010:206). No such major test was included in the formal experiments of the 2010 census, although the Bureau did announce a small-scale “Internet measurement re-interview study, focused on
OCR for page 19
Change and the 2020 Census: Not Whether But How how differently people answer questions on a web instrument from a paper questionnaire” (Groves, 2009:7) as a late CPEX addition. Based on initial discussions of 2020 planning, we are heartened by some signs of commitment on the Bureau’s part to exploring Internet response, both in preliminary testing for 2020 as well as in regular response to the American Community Survey. However, as with field reengineering, we suggest that response modes are another area in which top-level commitment and championship and commitment are critical to success. It is particularly important that the Census Bureau not continually fall back on arguments along the lines of “no one knows what the Internet will look like in 2020”—certainly a true statement, but one that misses the broader point. The argument would be on point if the goal were the polished implementation of a census questionnaire in any particular computer markup language or on any specific computer platform—but it is not. Rather, the goal is to investigate important factors that are not bound to specific platforms: mode effects of response (e.g., whether different demographic groups respond differently or more quickly to an electronic questionnaire than a paper version); the effectiveness of various cues or prompts to encourage respondents to adopt particular response modes; and the emergence of standards for security of online Internet transactions. Decennial census planners should pursue research projects that elucidate mode effects and that guide respondents to use lower processing–cost response options, such as online response. But—importantly—they should not go into them trying to reinvent the wheel. Arguably, the most important, immediate research and development task in this area is to track and learn from the experiences of other countries that have implemented online response in the 2010 round of censuses. In particular, the soon-to-unfold case study of the 2011 Canadian census is a vital one. Conducted every five years by Statistics Canada, the Canadian census permitted online response in 2006 and achieved roughly a 20 percent Internet response rate—including considerably higher-than-anticipated Internet take-up rates in more rural provinces, where planners had not expected heavy Internet saturation (National Research Council, 2010:294–295). Statistics Canada hopes to double the Internet take-up rate in 2011; to do so, it is using a very aggressive “wave methodology” approach, under which most Canadian households will not receive a mailed questionnaire as their first contact by the census. Instead, some 60 percent of Canadian households—in areas where models suggest a high probability of Internet response—will receive only a letter (with a URL and Internet response instructions, including an ID keyed to the household address) in the first wave. Indeed, in subsequent reminder waves of contact, at least one more letter, telephone prompt, or postcard (generic, without Internet log-in information) will be tried before paper questionnaires are mailed en masse. The initial mailings (letters and
OCR for page 20
Change and the 2020 Census: Not Whether But How postcards) will include a telephone number so that households can request a paper questionnaire, if desired. The other 40 percent of Canadian households are roughly evenly divided into two groups, one that will receive the census questionnaire as the initial mailing and the other (in more rural locations) where conventional questionnaire drop-off by enumerators will be performed. Côté and Laroche (2009) provide a basic overview of the 2006 response option and the plan for 2011. The Canadian experience with strongly “pushing” some response options will be a useful one for the U.S. Census Bureau to monitor closely. Likewise, the Internet take-up rates and approaches to mobilizing online response in other national censuses, such as the 2011 United Kingdom census will merit examination. In addition to examining the results of the limited Internet reinterview study that was added into the 2010 CPEX program, the Census Bureau should also actively use the ACS as a testbed for decennial census methods. The ACS is already a multimode survey (mail, phone, and personal interview), and 2020 census planners should look at the emerging online response option for the ACS for guidance on how to best use and promote response modes in the decennial census. Clearly, the ACS is a more complex survey instrument than the short-form census, but the national scale of the ACS, its use of multiple data collection modes, and its overlap of census-short-form content make it an important tool for census testing, including the insertion of questions in a “methods panel” portion of the ACS sample. Experience from elsewhere in the Census Bureau is also useful to study (Internet response is permitted to the Bureau’s economic censuses and surveys) and port over, as appropriate to the decennial census context. F ADMINISTRATIVE RECORDS: SUPPLEMENTING MULTIPLE CENSUS OPERATIONS A significant wild card in planning for the 2020 census is the potential role of administrative data—records compiled from other agencies of federal, state, tribal, or local governments, as well as records available from commercial sources. In this research area, the main challenge for census planners is building a business case for the use of such records in a wide variety of census operations—and thus overcoming some historical expectations for the role of administrative data in the census—to permit informed decisions about the extent of records usage in 2020. To support the Administrative Records Experiment 2000 (AREX 2000) that accompanied the 2000 census—the Census Bureau’s first foray into the use of records in a major census test—the Bureau constructed the first incarnation of its Statistical Administrative Records System (StARS) database
OCR for page 21
Change and the 2020 Census: Not Whether But How using 1999-vintage data from federal agencies. A major challenge of AREX 2000 was the assembly, linkage, and unduplication of the StARS data, and a major focus of the experiment was to consider the potential utility of administrative records as a replacement for the census process—to wit, the concept of an “administrative records census” that has historically driven consideration of the topic. To that end, AREX 2000 zeroed in for detailed comparison of StARS and census data on two sites (Baltimore City and County, Maryland, and Douglas, El Paso, and Jefferson Counties, Colorado). The results of the AREX 2000 are summarized by Judson and Bye (2003) and related evaluation reports. Having successfully built StARS, the Census Bureau decided to continue the work, formally posting notice (pursuant to requirements of the Privacy Act of 1974, 5 USC § 552a) of the establishment of StARS in a January 2000 Federal Register notice (65 FR 3203). The original notice indicated that StARS “will contain personally identifiable information from six national administrative federal programs” obtained from six federal agencies—“the Internal Revenue Service, Social Security Administration, Health Care Financing Administration, Selective Service System, Department of Housing and Urban Development, and the Indian Health Services.” The notice also suggested that “compatible data may also be sought from selected state agencies, if available.” Current incarnations of StARS rely on seven data sources from federal government agencies; the most prominent of the underlying sources is person-level information extracted from Internal Revenue Service (IRS) returns. Recent revisions and amendments to the regulatory notice in March 2009 (74 FR 12384) and October 2010 (75 FR 66061) have suggested an eventual wider scope for StARS, with the October 2010 notice indicating intent to obtain administrative record files from eight cabinet-level departments of the federal government and four other agencies, while “comparable data may also be sought from State agencies and commercial sources.”7 To date, an important characteristic of the Bureau’s StARS database is that it does not exist as a “living,” ongoing entity. Rather, it is currently rebuilt anew each year, using new vintages of the underlying source data files that are intended to match the March/April reference time of the decennial census to the greatest extent possible. Consequently, year-to-year dynamics in the database are as-yet unexplored (save for comparison of the aggregate record counts to see how “close” in size the compiled StARS database is relative to the Census Bureau’s national-level intercensal population esti- 7 Specifically, the sources named in the notice are “agencies including, the Departments of Agriculture, Education, Health and Human Services, Homeland Security, Housing and Urban Development, Labor, Treasury, Veterans Affairs, and from the Office of Personnel Management, the Social Security Administration, the Selective Service System, and the U.S. Postal Service” (75 FR 66062).
OCR for page 22
Change and the 2020 Census: Not Whether But How mates). As an ongoing research enterprise, work on a continuous administrative records file would be extremely useful—permitting study of evolution of the records over time and shedding light on undercoverage and overcoverage of persons and households in both the census and the administrative data themselves. However, given the orientation of this report toward important first steps, our discussion below assumes work within the current StARS framework of regular rebuilding. As of late 2009, the Bureau plans to conduct a full matching study of the 2010 census results to the StARS database, as an addition to the CPEX program. When first announced, the study was characterized as “mount[ing] a post-hoc administrative records census, using administrative records available to the Census Bureau” (Groves, 2009:7). Later, the concept of the study was suggested in budget submissions for fiscal year 2011; one of several initial projects in an administrative data initiative throughout the federal statistical system is “using administrative records to simulate the 2010 Census in order to thoroughly examine and document the coverage and quality of major governmental and commercial administrative record sets” (U.S. Office of Management and Budget, 2010:317). In its recent reactivation of a research and methodology directorate, the Bureau has also signaled an intent to make study of administrative data a high priority, creating a new office for administrative records research within the research directorate. If executed fully, the proposed StARS–2010 census matching study—and ongoing Census Bureau research on administrative data quality and uses—is much more than an ambitious scaling-up of the AREX 2000 work. The study is the critical research activity in the area of administrative records and should be a critical proving ground, and we enthusiastically support its continuance. However, the key point that we make in this area—consistent with our “not whether but how” guidance—is that the Bureau resist the temptation to stop at the question of national-level coverage of StARS relative to the census. The question of whether a complete “administrative records census” is possible—as a replacement for the census—is an interesting one but has too often been the beginning and the end of discussions; that question is no longer the most important one (if it ever was), nor is it (arguably) the most interesting. We encourage the Bureau to be open to and use its matched records–census files, to explore the use of administrative data in a supplementary role to a wide variety of census operations. In particular, roles for administrative data as a supplementary resource to NRFU operations should be explored; as we discuss in the next section, work with administrative records should also be a key part in assessing and upgrading the Bureau’s geographic resources—whether as a source of address updates throughout the decade or a way to identify areas that may require more (or less) intensive precensus address canvassing.
OCR for page 23
Change and the 2020 Census: Not Whether But How However imprecise they may be—and caveated as the results may need to be—the Bureau should also match StARS or other administrative data to the operational data from the 2010 census. Doing so would finally move toward empirical answers to important questions about possible roles of administrative data: for instance, whether administrative data might be a recourse to reduce the number of visits made by enumerators during NRFU (e.g., resorting to the records data after three contact attempts rather than six) or how they compare to data obtained from proxy responders, such as neighbors or landlords of absent householders. Use of administrative data in a true simulation of the 2010 count, and so compared with time-stamped data on the return of census questionnaires, may also suggest diagnostic measures that could be supplied to local and regional census managers to target staff and resources during field operations. Matched administrative records and census data would also facilitate necessary study of data quality from both sources, including the accuracy of race and Hispanic-origin data from administrative data and the degree of correspondence of “household” membership in administrative data (persons affiliated with a particular address) with the census “usual residence” concept. Work on the administrative records matching study should contribute to the development of a business case for wider access to and use of administrative data, to inform final decisions on the use of the data. This business case includes both a utility side—a pure cost–benefit articulation—and an acceptability side. On the utility side, the administrative records simulation should permit cost modeling, for instance on the potential cost impact of resorting to records at different phases of NRFU. It also speaks to the quality of the data; a study that stops at coarse or large demographic-group measures and does not investigate the quality of records data on characteristics (rather than just counts) would be unfortunate. The utility side of the business case is arguably more important for early research work than the acceptability side, but the acceptability side must also be addressed. By the acceptability side, we mean studying whether the respondent public, census stakeholders, and Congress (as the ultimate source of direction for conducting the census) will accept the wider use of administrative data in the census (and for which purposes). This includes assessing general public sensitivity to providing private information (e.g., the housing tenure question in the 2010 census on whether a home is rented or owned, free-and-clear or with a mortgage) in a census or survey context compared to drawing that information from records sources. It also includes respondents’ reactions to specific questionnaire or mailing package wording and cues that suggest the risks or benefits of comparing census returns with other data sources. From a technical standpoint, it also means documenting the effectiveness of the Bureau’s data handling standards. At present, a critical part in StARS assembly is replacement of true personalized identifiers, like Social Security
OCR for page 24
Change and the 2020 Census: Not Whether But How numbers, with a generalized Protected Identification Key (PIK); contracting with external users to deliberately try to “break” the Bureau’s identifiability safeguards (and correcting any detected shortcomings) would bolster the security case. A clear concern moving forward in administrative records work is refining the mix of data sources that are compiled and combined into a StARS-like database. The current StARS relies heavily on IRS tax data. The IRS data may be very good in terms of coverage, but the use of those data necessarily raises logistical and operational concerns, including potential impacts on response and goodwill toward the census based on being associated with tax authorities8 as well as the regulatory clashes between the privacy protections nested in Titles 13 (Census) and 26 (Internal Revenue) of the U.S. Code. To that end, the Bureau should complete work that it has started on outlining a complete matrix of possible data sources for StARS, including state and local government resources as well as commercial files. The cost and quality (for generating data on characteristics) of the current federal-level StARS relative to one or more non-IRS StARS-alternatives should be examined in detail. The Census Bureau should also consider the quality and accessibility of data from sources beyond federal agency contributors as they pertain to the group quarters (GQ) population—people living in such places as college dormitories, correctional facilities, health care facilities, and military installations. The concept of a GQ-focused StARS built from facility or institutional records should be explored as a supplement to the traditional collection of data through distribution of questionnaires at large GQ facilities. G GEOGRAPHIC RESOURCES: MEASURING QUALITY AND UPDATING CONTINUOUSLY As one of our predecessor National Research Council (2004b:57) panels observed, “a decennial census is fundamentally an exercise in geography”—its core constitutional mandate is to realign the nation’s electoral geography and its final data spotlight the nation’s civic geography, describing “how and where the American public lives and how the characteristics of small geographic areas and population groups have changed with time.” Accordingly, another National Research Council (2004a:57) panel concluded, without exaggeration, that the quality of the Census Bureau’s geographic resources— 8 On the significance of these concerns, as in the development of Internet response options, comparison of experience with other national statistics offices—particularly Canada—could be instructive. The 2006 Canadian census long-form sample adopted the approach of other Statistics Canada surveys of letting respondents check a box to permit Statistics Canada to use income tax returns to fill in questions on income. In all, 82.4 percent of long-form respondents chose the tax option (Statistics Canada, 2008:9), with no deleterious effects.
OCR for page 25
Change and the 2020 Census: Not Whether But How in particular, the accuracy of its address list—“may be the most important factor in determining the overall accuracy of a decennial census.” This will continue to be true of the 2020 census, regardless of its eventual shape—any operational or methodological improvements are ultimately for naught if census data cannot be accurately linked to specific geographic locations and cross-checked and tabulated accurately. In the 2000s, the Census Bureau undertook an eight-year MAF/TIGER Enhancements Program (MTEP), intended to address both the Bureau’s Master Address File (MAF) and its Topologically Integrated Geographic Encoding and Referencing System (TIGER) geographic database. The centerpiece activity of MTEP, in turn, was the MAF/TIGER Accuracy Improvement Project (MTAIP)—a major contract issued to Harris Corporation (which later won the FDCA contract, described above) in June 2002 to realign the county-level TIGER files to improve the locational accuracy of streets and other features. Although revolutionary when developed in the mid-1980s, both the database structure and the point, line, and polygon quality in the TIGER files had become dated by the 2000 census. Our predecessor (National Research Council, 2004b:84) panel on 2010 census planning strongly echoed the need for an overhaul of TIGER but cautioned that the MTEP—nominally to improve both of the Bureau’s core geographic resources in MAF and TIGER—had an unmistakably “TIGER-centric feel,” with other components of the MTEP “seem[ing] to speak to the MAF largely as it inherits its quality from TIGER” and not materially improving the MAF in its own right. That panel took strong exception with language in extant Census Bureau planning documents for 2010 that signaled an intent to wait for a complete Address Canvassing operation in 2009 to seriously work on improving MAF quality (aside from periodic updates from Postal Service data; see National Research Council, 2004b:88). That panel also expressed concern that some supporting and later-stage objectives of the MTEP were ill specified or unspecified—among them the MTEP objective on quality metrics to document the quality of MAF/TIGER information and identify areas in need of corrective action (National Research Council, 2004b:77). Ultimately, the Bureau proceeded with a complete Address Canvassing operation—sending field enumerators to every block to verify address information and collect geographic operations, in the one 2010 census operation that was able to make use of handheld computers. In the early drive toward the 2020 census, the Census Bureau has expressed its intent to make upgrades to its geographic resources a strong early focus. Typically, the “geographic support” account in the Census Bureau’s budget includes regular maintenance of the main components of the Census Bureau’s Geographic Support System (GSS): the MAF and TIGER databases. These maintenance activities include the regular updates of the MAF through the U.S. Postal Service’s Delivery Sequence Files (DSF) and the
OCR for page 26
Change and the 2020 Census: Not Whether But How annual Boundary and Annexation Survey that gathers and updates boundary information for local governments (and changes in their legal status). In its fiscal year 2011 budget request (U.S. Census Bureau, 2010), the Bureau seeks an additional $26.3 million (over the base $42.3 million request) in its geographic support account to kick off what has been dubbed the GSS initiative. The budget request summarizes the initiative simply (U.S. Census Bureau, 2010:CEN-191): The [initiative] supports improved address coverage, continual updating of positionally accurate road and other related spatial data, and enhanced quality measures of ongoing geographic programs. By focusing on activities that improve the [MAF] while maintaining and enhancing the spatial infrastructure that makes census and survey work possible, this initiative represents the next phase of geographic support after the MAF/TIGER Enhancement Program (MTEP). Census Bureau staff also described the initiative at the panel’s March 2010 meeting. Consistent with our predecessor National Research Council (2004b) panel, we generally support the aims of the Bureau’s GSS initiative; because the previous decade’s development work was heavily TIGER-centric, we think it appropriate that the Bureau take a more balanced, close-to-MAF-centric posture to its geographic resources leading up to 2020. In particular, we welcome the expressed indication of moving toward continuous improvement of geographic resources over the whole decade, rather than gambling too heavily on one-time operations like the 2009 Address Canvassing round or the comprehensive mid-2000s TIGER realignment work. That said, we support the Bureau’s GSS Initiative work with a significant catch—the Bureau’s geographic work early in the decade should include serious attention to quality metrics for both MAF and TIGER. The quality metrics and evaluation plank of the previous decade’s MTEP slate never really materialized, and assertions that the Bureau’s MAF represents a gold standard among address lists are no longer adequate or compelling. An important part of continuous improvement is being able to provide some manner of hard, quantitative information on how good MAF and TIGER are at any particular moment. Work in this area should include regular fieldwork, perhaps making use of the Bureau’s ongoing corps of interviewers who collect information for regular demographic surveys, and may include the systematic collection of GPS-accurate map spot and line feature readings for comparison with TIGER and comparison of MAF and TIGER with comparable information from commercial and other sources (e.g., utility records or local conversions of rural route and other addresses to conventional city-style addresses for 9-1-1 location). In general, we suggest that an important research priority for the Census Bureau as it exits the 2010 census is to aggressively mine and probe its
OCR for page 27
Change and the 2020 Census: Not Whether But How current MAF. This includes ties to 2010 census operational data—the Bu-reau’s knowledge of information added in the full Address Canvassing operation and late field operations like the Vacant/Delete Check as well as its knowledge of census mailings returned as “undeliverable as addressed.” An earlier National Research Council (2004a) panel noted that evaluating the MAF and suggesting operational improvements were severely complicated because the structure of the Bureau’s geographic sources did not readily allow for the unique contributions of individual operations (e.g., the Local Update of Census Addresses returns suggested by local and tribal governments or the regular refreshes from Postal Service data) to be disentangled and compared. Ideally, the 2020 MAF/TIGER structure is more amenable to reconstructing such operational histories for individual addresses or street features; accurate cost-benefit assessment of geographic support operations for 2020 depends vitally on the collection and analysis of these kinds of metadata. The importance of vigorous, intensive analysis of the quality of MAF/TIGER cannot be overstated. It is tempting, but misguided, to minimize such work as simply clerical or as an exercise in fine-tuning cartographic accuracy. Spatial data quality is inextricably linked to census quality and, to the greatest extent possible, both the spatial data in MAF/TIGER and census operational data demand study at fine-grained geographic levels, not just national or other high-level aggregates. Phraseology that we invoked above is applicable here: analysis is a key first step, no matter how imprecise the source data might be or how caveated results must be. Small-scale field collection of GPS readings and independent listings may not generalize well, but modeling and small-area estimation approaches could usefully be introduced; perhaps a spatial data quality estimate for every small county is infeasible, but an estimate for “places like us” (collections of places that are similar by demographic characteristics or other stratification variables) could still usefully steer geographic updating resources. An earlier National Research Council (2009:119–128) report discussed a framework for modeling census quality using both MAF/TIGER and census operational data as inputs, and that work may suggest possible directions. Related to another core research area, another priority for geographic work is to prepare for the possible use of administrative records data in geographic update operations. In addition to a person-level data file, the Bureau’s current StARS system also generates a listing of addresses, dubbed the Master Housing File (MHF). Just as current work with StARS on the person-level side has largely been limited to looking at gross counts, so too has the utility of the MHF as an update source for—or quality check of—the MAF/TIGER databases been largely unexplored to date. Bureau staff attempt to use the TIGER database to geocode the MHF—associate each address with a specific geographic code—but to date has not delved deeply
OCR for page 28
Change and the 2020 Census: Not Whether But How into the attributes of MHF addresses that do or do not geocode. Likewise, the year-to-year flux in MHF content—“births” and “deaths” of addresses—remains to be explored. It is our understanding that the Census Bureau is working on converting the samples for its ongoing demographic surveys to use the MAF as their address source, much like the ACS does now. The status quo for the current surveys is to draw their sample from parts of four different address frames—an address frame (separate from the MAF), an area frame, a GQ inventory, and a listing of new construction addresses. Switching the surveys to use the MAF as a base has the advantage of making the MAF a fuller “corporate resource” within the Census Bureau; it is also useful in that it gives the current surveys a direct stake in the quality of MAF/TIGER, and so could facilitate the use of survey interviewers as part of regular geographic quality assessment (as mentioned above). Our charge is focused on the decennial census and its specific operations, but we think it entirely appropriate to support the use of the MAF in all of the regular current surveys; updates and improvements to MAF/TIGER based on regular use of those systems ultimately accrue to the quality of the census.