–2–
Planning the 2020 Census: Cost and Quality
LESSONS AND CONCLUSIONS FROM THE HISTORY of the modern U.S. decennial census—and the role of research and development (R&D) in past decades—are vital to considering directions for effective R&D for the 2020 and subsequent censuses. Our reference to the historical record takes different shapes in the later chapters of this report, with Chapter 3 critiquing the Census Bureau’s current research strategies with an eye on past efforts while Chapter 4 discusses organizational and structural features of the history of Census Bureau operational research.
This chapter looks at the historical record—particularly for the post–World War II decennial censuses—with a focus on broader forces. A lesson we learn from our historical review is that two key drivers—the costs of the decennial census and the quality of resulting census information—are the most important areas of concern for a successful census in 2020. Reconciling cost and quality involves trade-offs; the role of an R&D program such as we would like to see for the 2020 census is to provide the high-quality information and evidentiary basis for addressing those trade-offs. The history of the modern decennial census is also the story of a third set of factors—social and technological change—in which the census must operate and that offer both challenges and opportunities for a high-quality, cost-effective census in 2020.
We begin in Section 2–A with a general historical overview of the census in the post–World War II era to set the necessary context.1 We then turn in subsequent sections to examination of trends in census quality (2–B) and census costs (2–C). We close in Section 2–D with an assessment of how we think those two drivers should affect 2020 census planning in general.
2–A
DEVELOPMENT OF THE MODERN CENSUS
2–A.1
1940 and 1950: Sampling and R&D
The basic methodology of the 1940 and 1950 censuses was not dissimilar to that used in previous censuses: temporary Census Bureau employees (enumerators) went door to door, writing down answers provided by household respondents on large sheets of paper, or “schedules.”2 Each schedule had one line per person for 30–40 people on the front and one line per housing unit on the back for the people listed on the front.3 The data were keypunched and tabulated by clerks using technology invented by Herman Hollerith for the 1890 census. Yet underlying the similarities were important innovations that paved the way for today’s census.
The 1940 census was the first census to use newly developed probability sampling methods to ask a subset of the population some of the census content (6 of 60-odd questions); it was also the first census not only to include formal evaluations of the quality of the enumeration and of specific content items, but also to be followed by a program of pretests and experiments leading up to the next census (see Chapter 3 for details). This R&D program resulted in improvements to the wording of questions, enumerator instructions, and other features of the 1950 census.
The 1950 census included at least four important innovations. First, sampling was used much more extensively for collecting the census content: about two-fifths of the 60-odd questions were asked of samples of the population, which presumably contributed to the reduction in real dollar per housing unit costs in 1950 compared with 1940.4 Second, the first mainframe computer (UNIVAC I) for use outside academic and defense research
1 |
Our synopsis of census-taking in 1940 through 2010 is based principally on Jenkins (2000) [1940 census]; Goldfield and Pemberton (2000a,b) [1950, 1960 censuses]; National Research Council (1985:Chap. 3, 5) [1970, 1980 censuses]; National Research Council (1995) [1990 census]; National Research Council (2004a) [1990, 2000 censuses]; National Research Council (2004b) [planning for the 2010 census]. |
2 |
Enumerators were first hired in place of U.S. marshals for the 1880 census. |
3 |
The content of the census questionnaire expanded greatly from a few items in the first censuses to dozens of items in censuses of the late 19th century; in 1940, a census of housing was added to the census of population. |
4 |
The 1940 census was an outlier in the first half of the 20th century with regard to costs. It cost substantially more per housing unit than the 1930 census, so that the 1950 census costs reverted to the historical norm (see Section 2–C.1). |
was delivered to the Census Bureau in spring 1951 in time to help process some of the census results. Third, the 1950 census included several experiments with far-reaching effects: tests of a household schedule in place of a line schedule, a housing-based sampling scheme for content instead of a person-based sampling scheme, self-enumeration in place of enumerator reporting, and variation of enumerator assignments in such a way as to be able to measure the error in content due to differences among enumerators. Fourth, the 1950 census included the first postenumeration survey to measure completeness of the census count.
2–A.2
1960: Mailout and the “Long Form”
The results of the enumerator variance experiments in the 1950 census, which indicated that the census was no more accurate than a 25 percent sample (Bailar, 2000), galvanized the Census Bureau to proceed with R&D on self-enumeration in place of personal visits as the dominant enumeration method (see Section A–1.b). The 1960 census was the first to use household questionnaires in place of line schedules and the first to use two different questionnaires—a “short form” with basic questions asked of every person and household and a “long form” with questions asked of a sample. (There were several variations of the long form with different sample sizes.) It was also the first census to mail out the questionnaires: shortly before Census Day, U.S. postal carriers dropped off unaddressed short forms to all housing units on their routes; households were instructed to fill out the forms and wait for an enumerator to pick them up and transcribe the responses to a computer-readable form. At every fourth household, the enumerator left one of the long forms to be completed by the household and mailed back to the census district office (in rural areas, enumerators completed the long form at the time of their visit).
The long-form response rate was 77 percent; enumerators revisited households that failed to mail back their long form to obtain their answers in person. A Bureau-invented device called FOSDIC (film optical sensing device for input to computers) was used to read microfilm images of the census questionnaires and transmit the data to mainframe computers for editing and tabulation. The effective use of computer technology presumably contributed to the modest reduction in real dollar per housing unit costs in 1960 compared with 1950.
2–A.3
1970–1980: Mailout-Mailback, Computerized Address List, Coverage Improvement, Dual-System Estimation
The 1970 census saw the implementation of mailout-mailback technology as we know it today, in which the Census Bureau develops a comput-
erized address list, coded to census geographic areas (e.g., blocks, tracts, places, counties), uses postal carriers or census enumerators to deliver labeled questionnaires to every address on the list, and uses enumerators to follow up those addresses that do not mail back a questionnaire. The concept behind the development of an address list was to improve coverage and have control over each questionnaire rather than simply leaving it up to postal carriers and enumerators to do a complete job. To ensure that the new procedures would work well, the Census Bureau decided to limit their use to areas of the country containing 60 percent of the population; the remaining areas were enumerated in person.5
The 1980 census expanded mailout-mailback techniques to areas of the country containing 95 percent of the population. The 1980 census also greatly expanded coverage improvement efforts that were begun in 1970. While the completeness of the census count was a concern from the very first 1790 census, understanding of coverage errors in the census and their possible implications for the distribution of power and resources reached fever pitch in the years following the landmark “one-person, one-vote” decision of the U.S. Supreme Court in 1964 (Baker v. Carr). The Census Bureau adopted special coverage improvement programs for the 1970 census to obtain greater accuracy in the population counts because of their use for legislative redistricting and federal fund allocation and the belief that new methods were required to improve coverage given fears of being counted among some population groups, overlooked housing units in multiunit structures, and other factors. (By contrast, 1950 and 1960 census planning assumed that undercoverage was largely because enumerators failed to follow instructions—see U.S. Census Bureau, 1974:1.) The 1980 census greatly expanded and added to the coverage improvement programs used in 1970, spending about six times the amount spent in 1970 in real terms on such programs. For example, in 1980 enumerators rechecked units that appeared to be vacant or otherwise not eligible for the census on a 100 percent basis rather than for a small sample of the units as in 1970, and local officials were given the opportunity to review preliminary housing unit counts after nonresponse follow-up (Citro, 2000).
Finally, the 1980 census made the first use of dual-system methodology for estimating the net undercount and disparities in coverage for population groups by matching the results of an independent postenumeration survey to the census results in a sample of areas. (In contrast, the 1950 and 1960 postenumeration survey programs simply compared aggregate counts for the census and a postcensus recount by specially trained enumerators of a sam-
ple of areas—this “do it again, better” method was shown to underestimate the undercount compared with dual-system estimation.) As discussed in Section 2–B, net coverage error declined to an all-time low in 1980 compared with the 1940–1970 censuses, but real dollar per housing unit costs of completing the census almost doubled compared with 1970, whereas the 1970 cost was about the same per housing unit as 1960.
2–A.4
1990–2000: Controversy Over Adjustment; Incremental Change
Ambitious plans were originally developed for both the 1990 and 2000 censuses to adopt the recommendations of many statisticians to use sampling for the count itself and not just the content in order to improve the completeness of census coverage and save on costs. Ultimately, the secretary of commerce decided not to adjust the 1990 census data for coverage error in July 1991. A court-ordered procedure to consider adjustment on the basis of a relatively small postenumeration survey ended with the courts upholding a decision by the secretary not to adjust the counts, even though measured net and differential undercount increased compared with 1980 (see Section 2–B).6
Similarly, plans to use sampling for nonresponse follow-up in the 2000 census were ruled to violate Title 13 by the U.S. Supreme Court in 1999, and problems with the large 2000 postenumeration survey led to widely supported decisions by the Census Bureau and the Department of Commerce not to adjust the census results for legislative redistricting or federal fund allocation. Indeed, the estimated net undercount for the 2000 census was close to zero, although this result reflected large numbers of duplicates almost offsetting equally large numbers of missed people.
Perhaps as a consequence of the attention devoted to the adjustment controversy and to refining the postenumeration survey methodology, neither the 1990 nor the 2000 census saw major innovations in census procedures of the magnitude of the use of computerized processing, mailout-mailback enumeration, and expanded coverage improvement programs that were introduced in the 1960–1980 censuses.7 Essentially, the 1990 and 2000 censuses made incremental modifications to previous census procedures; they did not alter the paradigm of the modern census as established in 1980: development and checking of a computerized address list; mailout-mailback; in-person follow-up for nonresponse; coverage improvement operations;
computerized editing and tabulation; and postcensus evaluation of coverage. Census net coverage worsened somewhat in 1990 and improved in 2000; census real dollar per housing unit costs increased by about 30 percent from 1980 to 1990 and by about 60 percent from 1990 to 2000. From 1960 to 2000, real dollar per housing unit costs increased by over 400 percent.
The experiments included in the 1990 and 2000 censuses were limited in scope (see Sections A–5.b and A–6.b) and did not set forth a clear path for innovation in 2010. An exception was the Census 2000 Supplementary Survey, which tested the ability to conduct a separate American Community Survey (ACS) to obtain long-form questionnaire content at the same time as a full census (see next section). An experiment in using administrative records to substitute for a traditional census or for nonresponse follow-up was well conducted but, because of a late start in planning and limited resources, was limited in scope (Bye and Judson, 2004:1–2). The evaluations of census procedures in 1990 and especially 2000 were limited in usefulness for future census planning, consisting largely of descriptive reports that documented the inputs and outputs of particular procedures but did not provide rigorous cost-benefit analysis of them or an assessment of their relative effectiveness for different kinds of geographic areas or population groups.
2–A.5
2010: Recovery from Near Disaster?
By 2001, the Census Bureau articulated a strategy for a “reengineered” census process in 2010; it also announced that adjustment for measured net undercount would be off the table for 2010. Save for the pilot work that had been done on the ACS, this emergent strategy for 2010 did not extend directly from the 2000 census evaluations and experiments for the simple reason that it could not do so chronologically—most of the evaluation reports were only completed and released in 2002 and 2003. However, findings from the 2000 census experience would later influence 2010 census plans—for instance, in the decision to add two coverage probe questions to the 2010 census questionnaire.
The reengineered 2010 process hinged critically on three major initiatives. First, the Census Bureau planned to modernize its Topologically Integrated Geographic Encoding and Referencing (TIGER) system—the geographic database used to map census addresses and code them to specific census blocks (and thus to higher-level aggregates like cities or school districts). When it was developed in the 1980s, TIGER represented a significant improvement over the patchwork address coding guides that were used in the 1970 and 1980 censuses, but after almost 20 years of use, it needed realignment of its geographic data (through comparison with local geographic information system files) and overhaul of its software structure. The Census Bureau embarked on a multiyear Master Address File (MAF)/TIGER En-
hancements Program as a key plank in its 2010 census plan; the major component of this program was a contract (the MAF/TIGER Accuracy Improvement Program) to perform the complete realignment of TIGER features in electronic files, which was awarded to Harris Corporation and carried out.
Second, the Census Bureau committed to replacing the census long-form sample—a detailed battery of social and economic questions administered to samples of census respondents—with the continuous American Community Survey. The idea for conducting a “rolling census” goes back several decades; formal planning of a continuous ACS began in the early 1990s, and pilot data collection began in 1996 in 4 counties, later expanded to about 30 counties. The Census 2000 Supplementary Survey, conducted in about one-third of all counties, confirmed that the Bureau could successfully field the ACS and the decennial census at the same time. This larger-scale administration also yielded data that could be compared with the 2000 long-form sample to assess the adequacy of the new survey. Based on the results, the Bureau decided that the 2010 and subsequent censuses would include only the short-form items and that the ACS would go into full production as soon as funding became available, which occurred in 2005.
Third, the Census Bureau decided to use modern handheld computer technology for two key census processes: checking the address list in 2009 and conducting nonresponse follow-up enumeration in 2010. The use of such technology was expected to significantly reduce the amount of paper (questionnaires, enumerator timesheets, maps, etc.) and office space required for the census, permit real-time monitoring of census field operations, and reduce census costs compared with paper-and-pencil methods. However, the R&D program for the handheld technology and the contract for implementation, also with the Harris Corporation, were poorly planned and executed, necessitating an extensive “replan” effort in early 2008 (see Section 2–C.2), which resulted in the decision to revert to paper-and-pencil methods for nonresponse follow-up operations.
Entering the 2010 planning cycle, the Census Bureau hoped that its 2008 “dress rehearsal” would be exactly that—a full operational pretest. The Bureau’s 1998 dress rehearsal for the 2000 census was less a rehearsal than a major experimental comparison of three competing census designs, each making different use of sampling and coverage measurement. However, the 2008 dress rehearsal in San Joaquin County, CA, and the Fayetteville, NC, area was not able to function as a full dress rehearsal. The planned rehearsal was already scaled back because of the budgetary constraints of operating for key periods under continuing resolution, previous-fiscal-year-level levels. But, with the early 2008 replan, the Bureau could not conduct a nonresponse follow-up operation in its dress rehearsal for the basic reason that the late reversion to paper-based methods left it without a nonresponse follow-up operation to test. The lack of a full-fledged dress rehearsal left
the 2010 census with many unanswered questions as to its procedures and plans.
We discuss the implications of these plans and developments on estimated 2010 census costs in Section 2–C. Of course, the effects on completeness of coverage in the 2010 census will not be known until after the census and its associated coverage measurement programs are conducted.
2–B
CENSUS QUALITY
Perhaps the most critical driver of decennial census planning and execution is the concern that the census achieve as complete coverage of the population as possible, including not only the total number of inhabitants, but also their distribution by state and other geographic areas and their racial and ethnic composition. This concern dates back to the first census in 1790, when Secretary of State Thomas Jefferson expressed the view that the census count was short of the “true” population, which he believed to be 4.1 million people and not 3.9 million as the census reported (Wells, 2000:116). Subsequent censuses also raised concerns about coverage—notably, in 1870, complaints of undercounts in New York City and Philadelphia led President Ulysses S. Grant to order a recount, which, however, added only 2 percent and 2.5 percent, respectively, to the two cities’ population totals. The dramatic growth in the population of the South between 1870 and 1880 ultimately led the 1890 census office to estimate that the 1870 census had undercounted the South by 10 percent and the country as a whole by 3 percent (Hacker, 2000b:129).
In the 20th century, the findings from the 1920 census that the population was more urban than rural for the first time in the country’s history led to attacks on the accuracy of the census, and, for the first time, Congress was not able to agree on a reapportionment of the House of Representatives to reflect the census results in a timely manner (McMillen, 2000:37–38). Following the civil rights revolution and the “one person, one vote” Supreme Court decisions, concern about undercount fueled controversy over whether to adjust the census for measured net undercount and to correct disparities in coverage by geographic area and population group and drove the planning, execution, and evaluation of the 1980–2000 censuses.
2–B.1
Definition and Measures
We use the term “census quality” to denote the accuracy of the count and its basic distribution by geography and population group. Since formal coverage evaluation began for selected population groups in the 1940 census and for the population as a whole in 1950, two general quality metrics have dominated the discussion. The first metric is net coverage error,
simply the difference between census-based counts or estimates and their associated true values for some geographic or demographic domain; estimated counts greater than the true values are then dubbed a net overcoverage, while estimates lower than the true value represent net undercoverage. Because the true values are unknown, competing strategies have been devised to derive approximations—typically through an independent effort to estimate the same population (postenumeration survey) or derivation of estimates based on birth, death, and migration data (demographic analysis). To provide comparability across domains, net coverage error is often expressed as a percentage of the “true” counts. Due to the historical undercoverage of racial and ethnic minority groups, the second type of quality metric—differential net undercoverage—focuses on the difference between the rate of net undercoverage error for a given demographic group compared with the national rate.
Census undercoverage and overcoverage are made up of two general types of errors:
-
omission, which occurs when a current resident of the United States is not included in the census anywhere, and
-
erroneous enumeration, which occurs when a nonresident (or nonperson) is erroneously included anywhere in the census or when a person is included more than once.
Coverage error may also arise when a (nonduplicated) resident is counted at the incorrect geographic location. The severity of this latter type of error depends on the degree of geographic displacement and the level of aggregation of interest; depending on those perspectives, geographic misallocations are either moot (e.g., when a person in one block is listed in the wrong block, but both blocks in question are within the same county for which coverage is being estimated) or count as two errors (undercoverage in one block and overcoverage in another).
2–B.2
Net Coverage Error, 1940–2000
Table 2-1 shows estimated net coverage error, by the method of demographic analysis, and the difference in coverage estimates for blacks and all others, for the 1940 through 2000 censuses. With the exception of 1990, when the net undercoverage rate increased from the previous census, there has been a sustained trend toward more complete coverage of the total population. Whereas the 1940 census had an estimated net undercount rate as high as 5.4 percent, the 2000 census achieved an estimated net undercount rate of practically zero (0.1 percent). Estimated net undercount rates for blacks and nonblacks also declined over the period (with the exception of the uptick in 1990 for both groups), although the difference between black
Table 2-1 Estimates of Percentage Net Undercount, by Race, from Demographic Analysis, 1940–2000 (in Percent)
and nonblack net undercount rates increased from 3.4 percent in 1940 to 4.3 percent in 1970, and was as high as 4.4 percent in 1990, before declining to 3.1 percent in 2000.
Demographic analysis does not provide estimates of coverage error for other population groups, such as Hispanics, but coverage rates for those groups appear to have improved as well. Thus, based on postenumeration survey methodology, net undercount decreased from 5 percent in 1990 to 0.7 percent in 2000 for Hispanics, from 4.6 percent in 1990 to 1.8 percent in 2000 for non-Hispanic blacks, from 2.4 percent in 1990 to a net overcount of 0.8 percent in 2000 for non-Hispanic Asians, and from 0.7 percent in 1990 to a net overcount of 1.1 percent in 2000 for non-Hispanic white and other races (National Research Council, 2004a:Table 6.7).
2–B.3
Another Metric: Gross Coverage Errors
Undoubtedly, the achievements in reducing estimated net undercount and narrowing the differences between estimated net undercount rates for racial and ethnic groups over the 1940–2000 period are due, in no small measure, to the proactive efforts by the Census Bureau, in cooperation with many public- and private-sector organizations, to improve coverage. As noted above, the addition of coverage improvement operations to the conduct of the census began in 1970 and greatly escalated in 1980.
Yet research has shown that coverage improvement programs not only add people to the census who may have been missed otherwise, but also add people who should not be counted at all or who may have been counted elsewhere. Examples include the 1980 Vacant/Delete Check program (U.S. Census Bureau, 1989:Ch. 8) and the 1990 program to count parolees and probationers (Ericksen et al., 1991:43–46). In 2000, the problem-plagued development of the MAF from multiple sources, some not used in previ-
ous censuses, contributed to large numbers of duplicate enumerations, only some of which were weeded out in subsequent census operations (National Research Council, 2004a:142–143).
Estimates of gross errors, including both erroneous enumerations and omissions, are as large as 36.6 million in 1990 (16.3 million erroneous enumerations and 20.3 million omissions) and 33.1 million in 2000 (17.2 million erroneous enumerations and 15.9 million omissions) (National Research Council, 2004a:253). Too much should not be made of the specific numbers, given that some errors are not of consequence for larger areas of geography and given different definitions and methods for estimating gross errors in the two censuses, but, however defined, the census is and has always been far from error-free.
2–C
CENSUS COSTS
It appears throughout much of the history of the U.S. census—and particularly in the period after 1970—that concerns about coverage have trumped concerns about costs, with Congress willing to appropriate ample funds for the conduct of the census. However, cost increases from census to census are not written into stone—in fact, real dollar per person or housing unit costs have held steady and even declined in some censuses, as seen in the next section. Yet in the period 1970 to 2010, costs have escalated enormously, and the increases appear harder and harder to justify. Because of this continued growth in census costs, it seems highly likely that containing costs—while maintaining or improving census quality—will and should be a major driver of 2020 census planning.
In the discussion that follows of historical census costs, it is important to note the difficulties in obtaining comparable cost estimates across time. Comparisons can be affected by the choice of a specific price deflator; also, it is not clear that what is included in census costs is strictly comparable from census to census. Thus, the reader should consider the data provided as indicative of the order of magnitude of the costs from one census to the next. Following usual practice, we discuss “life-cycle” costs, which are inclusive of precensus planning and testing, the actual conduct of the census, and processing and dissemination of census results. The limitations of available census cost data make it difficult to decompose cost increases (or decreases) by examining specific operations.8 In fact, the absence to date of a robust, comprehensible, fully parameterized cost model for the census and its components makes it difficult not only to analyze the reasons for changes in costs
between censuses, but also to plan for the next census. Moreover, with few exceptions, there has been little analysis of the cost-benefit ratio, in terms of coverage improvement, of various census operations.
2–C.1
Census Costs Over Time
Since the first census, the American population has increased manyfold (from 3.9 million people in 1790 to over 300 million people in 2009) and become more diverse in living arrangements and many other ways; the needs of Congress and the executive branch for additional information beyond a basic head count have also grown. Hence, it is not surprising that the costs and complexity of the census have increased commensurately. There were only about 650 enumerators (U.S. marshals) for the 1790 census, whereas recent censuses have employed half a million or more enumerators plus postal service carriers; the office force for the census has also increased over time, as has the volume of data collected and produced for public use. By 1900, the real dollar per person cost of the census had increased from 11 cents to $3 in 1999 dollars. For the 1900–1960 period, however, despite continued population growth, costs remained in the range of $3–4 most of the time. Exceptions occurred for the 1920 census, which cost about 30 percent less than the average, and the 1940 census, which cost about 60 percent more than the average (Anderson, 2000:384—estimates are not available per housing unit).
Turning to census costs from 1960 to the present, we focus on the magnitude of estimates of real dollar costs per housing unit, which is the appropriate measure for the modern mail census. Table 2-2 shows the estimated per housing unit cost for each census beginning in 1960 in both nominal and real 2009 dollars. In real terms, the 1970 census cost only slightly more than the 1960 census, but costs per household increased by over 90 percent from 1970 to 1980—the single largest percentage increase in the costs of census-taking since the Census Bureau was established as a permanent agency in 1902. The increase from 1980 to 1990 was only 30 percent, while the increase from 1990 to 2000 was over 60 percent. Although it is not and cannot be known at this writing what the full costs of the 2010 census will be, current estimates suggest that they could amount to $115 per housing unit (including the ACS and the TIGER modernization program), an increase of 64 percent over 2000 and a cumulative increase of more than 600 percent over 1960 in real dollar terms.
A major factor in the substantial cost increase between the 1970 and 1980 censuses was that the enumerator workforce more than doubled in size, as did headquarters and processing staff. District field offices also stayed open several months longer on average (6–9 months) in 1980 compared with 1970 (4–6 months) and 1960 (3–6 months) (National Research Coun-
Table 2-2 Per Housing Unit Costs, 1960–2010 U.S. Censuses
|
1960 |
1970 |
1980 |
1990 |
2000 |
2010a |
Full-cycle census costs (millions, nominal dollars) |
$120 |
$231 |
$1,136 |
$2,600 |
$6,600 |
$14,700 |
Population (millions) |
179.3 |
203.3 |
226.5 |
248.7 |
281.4 |
301.6 |
Housing units (millions) |
58.9 |
70.7 |
90.1 |
104.0 |
115.9 |
127.9 |
Costs in 2009 real dollars |
|
|
|
|
|
|
Total (millions) |
$995 |
$1,206 |
$2,947 |
$4,424 |
$8,060 |
$14,700 |
Per housing unit |
$16.89 |
$17.06 |
$32.71 |
$42.54 |
$69.54 |
$114.93 |
NOTES: 2010 estimates of population and housing units are the U.S. Census Bureau’s 2007-vintage estimates. For the cost of the 2010 census, we have used the high end of the Census Bureau’s budget submission for fiscal year 2009 (U.S. Census Bureau, 2009b:CEN-154), which includes the cost of the MAF/TIGER Enhancements Program and the American Community Survey. Nominal dollars are converted to real 2009 dollars by the gross domestic product chain-type price indexes for federal government nondefense consumption expenditures (based on Table 3.10.4, line 34, at http://www.bea.gov/national/nipaweb/SelectTable.asp?Selected=N\#S3 [5/25/09]. Based on the life-cycle cost estimates shown in Table 2-3, the Census Bureau estimated the per housing unit (excluding group quarters) cost for the 2010 census as $93.58 in constant 2010 dollars; see text for discussion of the Table 2-3 figures. a Estimated, 2009 dollars. SOURCES: National Research Council (1995:Table 3.1); National Research Council (2004a:Table 3.1). |
cil, 1995:Table 3.3). One reason for the scaling up of field operations was the greatly expanded effort in 1980, referenced above, to reduce undercount by implementing a variety of coverage improvement programs, such as rechecks of the address list, a 100 percent Vacant/Delete Check, and others. Tables 2-4 to 2-8 illustrate the increase in the number and extent of such operations by listing the various coverage improvement methods used in the 1970–2000 censuses and planned for the 2010 census; because of the length of these tables, they are placed at the end of the chapter and numbered accordingly.
The real dollar per housing unit cost increase between 1980 and 1990 was only a third as much as that between 1970 and 1980. Some of the same factors contributed to increases as in 1980: offices stayed open even longer (9–12 months), and there were continued, although less marked, increases in staff. An analysis of the cumulative cost increases between 1970 and 1990 questioned the effectiveness of about two-thirds (over $1 billion in 2009 dollars) of the added real dollar per housing unit costs, even after taking account of a substantial decline in the mail response rate (from 78 percent in 1980 to 75 percent in 1980 to 65 percent in 1990), which necessitated
many more expensive enumerator visits to households to obtain their data (National Research Council, 1995:Table 3.2, 50–55).
The 2000 census incurred major additional expenses due to the need to plan for two different censuses, one incorporating sampling for nonresponse follow-up and adjustment of the census counts and the other a “traditional” census, which was demanded by the congressional majority party in the second half of the 1990s. The final census plan was not confirmed until one year prior to Census Day, following the February 1999 U.S. Supreme Court decision that sampling as part of the census enumeration violated census law for apportionment counts. Recognizing the difficulties for planning the 2000 census caused by the political conflict between Congress and the administration about an appropriate census design, the appropriations for 2000 were greatly increased to enable the Census Bureau to hire the necessary staff to throw into the effort—for example, to rewrite software programs that had been developed assuming one design and subsequently had to be revamped for the final design. The 2000 census stemmed the decline in the mail response rate, which was an important achievement and also, as noted earlier, produced a close to zero net undercount.
2–C.2
Life-Cycle Cost of the 2010 Census
The latest projections of the anticipated costs of the 2010 census make clear that it will be the most expensive in the nation’s history, representing a significant increase in per housing unit costs over the 2000 census. We now review the history of evolving cost estimates of the 2010 census.
In order to obtain funding for all components of the 2010 census reengineering, the Census Bureau produced an initial document on projected lifecycle costs in June 2001 (U.S. Census Bureau, 2001). This document was revised periodically; a September 2005 version showed a total estimated cost for the 2010 census of $11.255 billion—$1.707 billion for the ACS, $228.9 million for the MAF/TIGER Enhancements Program, and $9.013 billion for the core, short-form-only census program (U.S. Census Bureau, 2005:3).
The Census Bureau’s budget submission in 2007 to Congress for fiscal year 2008 (U.S. Census Bureau, 2007:CEN-159) noted:
In June 2001, the Census Bureau estimated that the life cycle costs of a 2010 Decennial Census Program that repeated the Census 2000 approach would be $11.725 billion, while the estimated lifecycle cost for the reengineered design was estimated to be $11.280 billion—a savings of $445 million. [The] estimated life cycle cost for the 2010 Decennial Census Program now stands at $11.525 billion [an increase of $245 million in 2009 dollars from the 2001 estimate]. However, the forecasted savings from pursuing the re-engineered design now are estimated to be $1.409 billion because the estimated life cycle cost of reverting now
to a Census 2000 design is $12.934 billion [an increase from the 2001 estimate of $1.209 billion].
At that point, in 2007, the Census Bureau also expressed confidence that the 2010 census would “enjoy the lowest rate of cost increase in the last four decades.” Over those previous censuses, the average rate of cost increase, according to the Bureau, was 70.7 percent, and a basic extrapolation based on that growth rate would price the 2010 census at $15.1 billion or about $116.42 per household—substantially higher than the figures the Bureau projected for its reengineered process (U.S. Census Bureau, 2007:CEN-161). The Census Bureau reiterated the same discussion in its congressional budget submission for fiscal year 2009 (issued in February 2008), although it acknowledged a $20 million increase in projected life-cycle costs mainly due to changes in assumptions regarding mileage costs, office leasing costs, and number of housing units (U.S. Census Bureau, 2008a:CEN-166–CEN-167).
We see a number of problems with the Census Bureau’s budget submissions for 2010. The first is that the February 2008 estimate did not acknowledge the likelihood of cost increases due to the problematic performance of the handheld technology, which was known inside the Bureau but not yet outside it. The second is that there was no explanation of the changing estimates for conducting a traditional census in 2010 between the 2001 and 2007 estimates. Even more important, there has been no acknowledgement or awareness by the Bureau that forecasting increased costs for the traditional census at a rate of 70.7 percent might not be a reasonable or prudent thing to do. The historical cost data, even with their limitations, make it clear that the biggest increase in census costs occurred between 1970 and 1980 and also that the next biggest increase—that between 1990 and 2000—was due largely to the problems of being forced to plan for and test two different designs for the census. We see no justifiable basis for projecting traditional census costs as a straight line extrapolation of the average increase over the past few decades.
We requested from the Census Bureau its current estimated life-cycle costs for the major activities of the 2010 census (for comparative purposes, we also requested comparable category costs for the 2000 census but were not provided with those data). The provided life-cycle costs (provided in June 2009) are shown in Table 2-3. It is apparent that the estimates presented in the table refer strictly to the short-form-only census and not the entire 2010 census planning and implementation cycle: adding the MAF/TIGER Enhancements Program and the ACS by using the previously cited estimates of $239 million and $1.707 million, respectively, would bring the bottom line in Table 2-2 to $14.5 billion in nominal (2009) dollars, or about $3 billion higher than the estimates of $11.5 billion cited above.
Table 2-3 Census Bureau Life-Cycle Cost Estimates for the 2010 Census (covering fiscal years 2002–2013)
The added costs for 2010 are due largely to the need to “replan” crucial census operations in the wake of problems with the Bureau’s plans for using handheld computers in the 2010 census. In spring and summer 2007, the handhelds received their first full field test through their use in the address canvassing operation for the 2008 census dress rehearsal. Problems experienced with the devices during the operations prompted the Bureau to commission a review of its Field Data Collection Automation (FDCA) con-
tract with Harris Corporation, a contract involving not only the handheld computers, but also the development of various operations control systems (OCSs) that govern the flow of information during the census, as well as installation of computer equipment in field offices. This review prompted the Census Bureau to formalize a set of requirements for the FDCA contract and, in turn, for Harris to provide a “rough order of magnitude” estimate of additional funds needed to meet those requirements. The size of that estimate led then–newly on the job Census Director Steve Murdock in February 2008 to establish an internal task force to evaluate options for the FDCA contract; that task force recommended as its “replan” option that Harris retain authority for developing handheld computers for address canvassing but that the Census Bureau assume authority for a paper-based nonresponse follow-up operation. Harris was also initially slated to retain full authority for the OCS development, but the responsibility for OCSs other than that for the address canvassing was assumed by the Bureau later in 2008.
On April 4, 2008, Commerce Secretary Carlos Gutierrez testified to House appropriators that:9
The effect of moving forward with this alternative, as well as the non-FDCA related planning challenges we have faced will require an increase of $2.2 to $3 billion dollars through Fiscal Year (FY) 2013. This will bring the total lifecycle cost of the 2010 Census to between $13.7 to $14.5 billion.
These costs are driven in large part by increases in the numbers of people who will be needed to carry out the 2010 Census; these include enumerators and personnel to service the help desks, data centers, and the control system for the paper-based [nonresponse follow-up (NRFU)]. There are also additional costs that result from more recent increases in gas prices, postage, and printing.
To this end, the Census Bureau requested additional funding in fiscal year 2008 (largely through transfers from other Commerce Department programs) and formally submitted an amendment to its fiscal year 2009 budget request seeking an additional $546 million (U.S. Census Bureau, 2008b). In addition, the American Recovery and Reinvestment Act of 2009 (P.L. 111-5) economic stimulus package included provision of $1 billion for the 2010 census (made as a direct addition to the Periodic Censuses and Programs account), with explanatory text that one-quarter of the funds is intended for use in partnership and outreach programs. On the basis of these changes, the Census Bureau’s budget submission to Congress for fiscal year 2010 (dated May 2009) asserts that (U.S. Census Bureau, 2009b:CEN-154):
After factoring in appropriations for FY 2002 through FY 2008, the President’s Budget request for FY 2009, and ongoing programmatic en-
9 |
The secretary’s prepared testimony is available at http://www.commerce.gov/NewsRoom/SecretarySpeeches/PROD01_005468. |
hancements or changes due to new requirements, the estimated life cycle cost for the 2010 Decennial Census Program now stands at $14.7 billion (in nominal dollars). The life cycle estimate has been revised to reflect the Field Data Collection Automation Program rescope, including the Census Bureau’s assuming activities descoped from that contract and increases to a contingency fund based on increased risk. Additional changes include: (1) higher estimated mileage costs (increased to 62.5 cents/mile) that we will have to pay to over 1,000,000 temporary employees in FY 2010; and (2) higher costs for Census Coverage Measurement field activities due to revised work hour assumptions.
We understand the difficulties in costing out such an extensive operation as the decennial census, but the Census Bureau should be able to provide information on component operations that is more useful for decomposing the factors that led to the 2010 cost estimates prior to and after the problems with the handheld contract. It is also uncertain why comparable figures for the 2000 census are not more forthcoming. Still, the rough level of detail in Table 2-3 does make clear one basic truth about the cost of census-taking: that the NRFU operation, and the assumptions made about its conduct, is a critical driver of overall census costs. The table also speaks to the complexity of the census as it is currently conducted in terms of staff and space needs; the entries for office space and staff at Census Bureau headquarters and the regional and local census offices constitute over 40 percent of the total costs.
2–C.3
Comparison with Other Censuses
Systematic, time-series information on the lifecycle costs of censuses in countries that conduct a traditional-type census (as opposed to relying on a population register, rolling sample, or other means to determine population counts) is difficult to obtain. The United Nations Statistics Division conducted a survey of national census offices on expected “total cost of population and housing census” for the “2010 round” of censuses in participating countries, but the survey did not ask for comparable information for the 2000 round, nor does the report on the survey data provide detail on major components of those expected costs (Stukel, 2008).
The snapshots of costs that are available suggest that other countries have also experienced increases in per household or per capita costs in recent census rounds. For example, the United Kingdom Cabinet Office has projected the 2005–2016 costs for the 2011 census in England and Wales at £481.7 million (Cabinet Office, 2008:1.28); the 2001 census was said to have cost on the order of £260 million and the 1991 census £140 million (Geographical, 2004). Other countries have experienced sufficient growth in census costs as to take cost reduction as a first precept of census planning; the Australian Bureau of Statistics (ABS) commenced planning for its 2006 census “on the basis that the real per capita cost of conducting the 2006 Census be
no more than the cost of conducting the 2001 Census” (Trewin, 2006:5). In still other cases, census costs continue to grow but in a somewhat more contained way than the U.S. experience. The estimated cost per household of the census in the Republic of Korea increased 116 percent from 1985 to 1990 and 125 percent again from 1990 to 1995; thereafter, Korean census officials have succeeded in bringing the rate of growth down to just over 39 percent for 1995–2000 and 2000–2005 (with the average per household cost in 2005 being 8,065 Won or about U.S. $6.5 dollars; National Statistical Office, Republic of Korea, 2006:Table 1).
A useful example is that of Canada, in which fixed costs (adjusted for population growth) are assumed by census planners and built into census designs. In real 2008 Canadian dollars, the per housing unit cost of recent quinquennial censuses in Canada has held stable. In particular, the per dwelling cost of the 1996 Canadian census (in 2008 dollars) was $38.85, growing only to $40.32 in 2006; Statistics Canada currently estimates that the per dwelling cost for its 2011 census will decrease to $39.98.10
To be sure, censuses in other countries are not strictly comparable to the U.S. experience; the U.S. census presents significant challenges in terms of diversity, size, and distribution of the population and in trust in and receptiveness to government programs. That said, as we discuss in Section 3–A.2, the argument of the “special nature” of the U.S. census can be carried too far.
2–C.4
Coverage and Costs
Our historical review has found that census costs will have escalated by more than 600 percent over the period 1960–2010, even after adjusting for inflation and the growth in housing units. We have suggested that a major factor in cost escalation is the efforts by the Census Bureau, supported by Congress and the executive branch, to reduce coverage error to the greatest extent possible. Recent censuses have introduced new and seemingly better coverage improvement operations, layering operations to try to get more and better information on specific difficult-to-count population groups: college students, people who move on or around Census Day, people who live in housing projects, parolees and probationers, and so forth. Although these coverage improvement programs have value, they are not cost-free. The consequence for census cost has been a steady accretion of coverage improvement programs and other procedures that—once added—become difficult to subtract, lest census coverage appear to be harmed.
By 2000, the Census Bureau had achieved a major success in reducing levels of net undercoverage, to the point of yielding an estimated zero net
national undercount, the first in census history. But the focus on net coverage masks significant levels of gross census errors—omissions, duplicate or erroneous enumerations, and geographic displacement—that fell into a delicate near-balance when considered in national net. The results of 2000 suggest the importance of studying the nature and extent of gross census errors and further scrutinizing the components of census error. Indeed, they suggest that the census-taking in the United States has reached the point at which the long-standing goal of reducing net undercount is no longer quite apt—that the steady accretion of coverage-building operations needs to be balanced with operations for detecting and filtering duplicates and diagnosis of the unique contributions (and gaps) in each step of increasing coverage (e.g., adding addresses and making questionnaires more widely available). Without research and careful inquiry into the components of error and the contributions of individual operations, the census is arguably at the point at which introducing further complexity to the enumeration process in an effort to reduce net undercoverage could conceivably add more error than it removes as well as adding costs.
2–D
ASSESSMENT: TIME TO RETHINK THE CENSUS AND CENSUS RESEARCH
If one accepts the premise that cost and quality are two critical factors in the decennial census, then we think that the preceding discussion makes clear the key point that we wish to express in this report: effective planning for the 2020 census must reflect the concept that the cost of conducting the census has grown out of control in recent decades and that increased spending—alone, and applied unwisely—is unlikely to radically increase the quality of the census. In beginning to conceptualize the 2020 census, we think it appropriate to take as a precept that an incremental approach to 2020 census planning is simply untenable. Simply assuming the life-cycle costs of the 2010 census as a funding base and scaling up from there—thus arriving at a 2020 census that costs in excess of $20 billion (in 2010 dollars) but uses methodology not substantially different from the 1970 census—is unworkable.
To be sure, the most recent censuses have included significant new additions to methodology and enumeration procedures. That said, the 2010 census follows the same basic fundamentals as the 1970 census, relying heavily on mailout-mailback methods (and, with it, reliance on an address register to associate census responses from households with address locations), coupled with programs to supplement census coverage in areas not amenable to mail. There are a variety of reasons that should compel attention to the
basic assumptions of census-taking and, ideally, lead to a 2020 census that is significantly different from its predecessors. These reasons include:
-
Continued complexity of household arrangements and ties to geographic place: As described in more detail by the National Research Council (2006) Panel on Residence Rules in the Decennial Census, segments of the population are becoming more difficult to accurately enumerate using traditional methodologies. Broad societal trends have made it increasingly difficult to uniquely identify the single “usual residence” envisioned by the census; these situations include children of divorced or separated parents in joint custody situations, long-distance commuting patterns and “commuter marriages” in which spouses work in different locations, prisoner reentry into the community in the wake of the 1980s and 1990s surge in correctional populations, and seasonal migration based on extreme summer or winter temperatures. Likewise, recent experience with shifting economic conditions (high rates of home foreclosure) and natural disasters (displacement of Gulf Coast residents by Hurricanes Katrina and Rita in 2005) have suggested other major enumeration difficulties.
-
Decreased cooperation with surveys and confidentiality concerns: Declining response rates have been a major concern in survey research in recent years, owing in part to factors such as increased reliance on mobile phones (in some cases, to the exclusion of household land lines) and use of caller ID services to screen calls. Thus far, the Census Bureau has been able to hold off major dips in response in its demographic surveys, and the decennial census and the ACS are aided in this regard by their mandatory-by-law nature.
-
Technological advances: As we discuss in this and our earlier reports, the Census Bureau has recently had a mixed record with regard to technological advances currently available that offer opportunities for facilitating various aspects of census-taking. The Internet is now an ubiquitous part of society, and it offers a number of ways for easing data collection and increasing data quality. Although the 2000 census made the United States one of the first countries to offer an Internet response option to a census, the Census Bureau declined to permit Internet response in 2010 and has no current plans to use the Internet on even an experimental basis in 2010—even as other countries have turned to Internet response to meet public expectations and possibly reduce some data-processing costs. Handheld computing devices are also now ubiquitous, with smartphones serving as full-fledged computers and portable devices being commonly used by delivery services and other service-oriented businesses to automate paper-based activities. The Census Bureau approached the 2000–2010 decade with a
-
vision of using such handheld devices for major census operations, but multiple factors (including a failure to develop requirements in a timely fashion) forced a costly and risky late-stage switch back to paper-based nonresponse follow-up.
-
Increased availability and quality of administrative data: Administrative records databases (including those maintained by government programs as well as those in the private sector) have grown in their coverage and completeness. Although administrative records were a focus of one of the formal experiments of the 2000 census, the Census Bureau has not yet integrated records databases into all steps of the census process to the extent that is possible. Thorough examination of the uses of administrative data in the census context will require working with Congress and possibly amending Title 13 of the U.S. Code, not only to secure access and permissions to conduct research using the data but also to resolve the question of whether reference to administrative data (in part or in whole) can constitute a census enumeration.
-
Future of mail service: Related to the technological advances point is another reason why core reliance on the mailout-mailback model may require attention in coming years: the coming decade is likely to be a pivotal one for the U.S. Postal Service (USPS) as well, as it grapples with sharp declines in the volume of total mail. Since 2007, USPS has recorded annual losses of $5 billion or more and currently projects that its year-end debt at the end of 2010 will be $13.2 billion; the challenges faced by USPS in cutting costs, restructuring its workforce and networks, and range of mail products are such that the U.S. Government Accountability Office (2009c) added USPS finances to its “high-risk” list of government programs.11 It is also possible to envision electronic communication becoming so ubiquitous that physical mail may come to be perceived as a nuisance or inconvenience. By these points, we do not intend to imply the demise of the mail as the means for the census, but we do suggest that alternative contact strategies to achieve census participation should be a part of census research in the coming decade.
Taken together, these and other dynamics suggest that this is an opportune time to reconsider how the census is taken—to consider an approach to the 2020 census that is more than an incremental change from the general parameters of census-taking that have been used since 1970. The goal should be an efficient and effective census—one that both reduces costs and maintains (or improves) quality—that challenges long-held assumptions and draws ideas from all quarters. To determine what new census designs can
11 |
In this designation, USPS joins the 2010 census itself, which was added to the high-risk list in March 2008; see http://www.gao.gov/press/highrisk_pressrelease_3_2008.pdf. |
best respond to these dynamics, the Census Bureau needs to have a research program on census methodology that can assess which of several possible approaches to the 2020 census is most likely to provide a census responsive to these demands for reduced costs and for either equal or greater accuracy. In the remaining chapters of this report, we sketch some features of such a research program. We also suggest that commitment to major change should be matched by a bold goal, reflecting the key drivers of cost and quality. It has almost become rote to include “containing cost” as a goal of the decennial census. As a general vision for 2020, we suggest something more ambitious:
Recommendation 2.1: The Census Bureau should approach the design of the 2020 census with the clear and publicly announced goal of reducing the inflation-adjusted per housing unit cost to that of the 2000 census (subtracting the cost of the 2000 census long-form sample), while holding coverage errors (appropriately defined) to approximately 2000 levels.
(By “appropriately defined” coverage errors, we mean quality targets for subnational net error, including by major demographic groups, as well as overall national net error.)
To be clear, we do not make this recommendation to suggest cutting census costs simply for the sake of cost-cutting. The objective is a more efficient census that, for example, uses lower-cost options like the Internet and administrative records to supplement or replace expensive follow-up operations without compromising census quality; effective research and improved cost modeling are critical to determining to what extent these efficiencies can be realized. The reason for setting a cost goal that is stark, ambitious, and public is to convey commitment to a focused 2020 census planning process.
We note that in developing cost estimates throughout the decade, the Census Bureau should be consistent in presenting its estimates. Because the decennial census no longer includes a long-form sample, the 2020 census cost goal should exclude the costs of the American Community Survey, which will increasingly take on a life of its own as a source of intercensal data on a wide range of population characteristics.
Merely slowing or curbing the rate of cost growth of the decennial census is a good goal, but we think it more advisable to target meaningful reductions in per-household cost—through leveraging new technology and methodology—without impairing quality.
Table 2-4 Coverage Improvement Programs and Procedures, 1970–2010 Censuses—Address List Development
Census |
Original Sources of List(s) |
Field Checks of List |
Local Review of List |
1970 |
|
|
None |
1980 |
|
|
None |
|
|
|
None |
1990 |
|
|
|
Census |
Original Sources of List(s) |
Field Checks of List |
Local Review of List |
2000 |
|
|
|
2010 |
|
|
|
SOURCES: Anderson (2000); National Research Council (1995:App. B); U.S. Census Bureau (1976, 1989, 1993, 1995a,b, 1996, 1999). |
Table 2-5 Coverage Improvement Programs and Procedures, 1970–2010 Censuses—Publicity/Outreach
Census |
Advertising |
Pre-Census Day Outreach |
Post-Census Day Outreach |
1970 |
|
|
|
Census |
Advertising |
Pre-Census Day Outreach |
Post-Census Day Outreach |
1980 |
|
|
|
1990 |
|
|
|
Census |
Advertising |
Pre-Census Day Outreach |
Post-Census Day Outreach |
2000 |
|
|
|
2010 |
|
|
|
SOURCES: See Table 2-4. |
Table 2-6 Coverage Improvement Programs and Procedures, 1970–2010 Censuses—Initial Enumeration Methods
Census |
Type of Enumeration |
Questionnaire/Mailing Package |
Office/Processing Structure |
1970 |
|
|
|
1980 |
|
|
|
Census |
Type of Enumeration |
Questionnaire/Mailing Package |
Office/Processing Structure |
|
|
|
|
1990 |
|
|
|
|
|
|
|
2000 |
|
|
|
Census |
Type of Enumeration |
Questionnaire/Mailing Package |
Office/Processing Structure |
2010 |
|
|
|
SOURCES: See Table 2-4. |
Table 2-7 Coverage Improvement Programs and Procedures, 1970–2010 Censuses—Follow-Up of Mail Returns
Census |
Field Follow-Up |
Telephone Follow-Up |
1970 |
|
|
1980 |
|
|
1990 |
|
|
2000 |
|
|
2010 |
|
|
SOURCES: See Table 2-4. |
Table 2-8 Coverage Improvement Programs and Procedures, 1970–2010 Censuses—NRFU (Nonresponse Follow-Up) and Post-NRFU
Census |
Program |
1970 |
|
1980 |
|
1990 |
|
|
|
2000 |
|
2010 |
|
SOURCES: See Table 2-4. |