The goal of the decennial census is to count everyone in the country, once and in the right place, for the purpose of allocating representation in Congress. The census satisfies this goal only incompletely, as some people are omitted that should be included, and some enumerations are either duplicates, are in the wrong location, or are either not residents of the United States or are not people. These four components of coverage error have an important impact on the representation of demographic groups and geographic jurisdictions in Congress.
Since the 1950 census there has been an effort by the Census Bureau to estimate the size of error in census counts for areas and demographic groups and to use the information to improve census processes. The programs to measure census coverage error are referred to as coverage measurement programs. In recent years, coverage measurement programs included a third objective—correcting the census for enumeration error, referred to as census adjustment. The techniques used in coverage measurement programs to understand the extent of enumeration errors are sample surveys, dual-systems estimation (DSE), and demographic analysis.
In contrast to the previous two censuses, the Census Bureau has decided that the 2010 census coverage measurement (CCM) program will have a new principal objective: to emphasize census improvement
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 7
1
Introduction
The goal of the decennial census is to count everyone in the country,
once and in the right place, for the purpose of allocating representation in
Congress. The census satisfies this goal only incompletely, as some people
are omitted that should be included, and some enumerations are either
duplicates, are in the wrong location, or are either not residents of the
United States or are not people. These four components of coverage error
have an important impact on the representation of demographic groups
and geographic jurisdictions in Congress.
PROgRAM OBJECTIvES
Since the 1950 census there has been an effort by the Census Bureau
to estimate the size of error in census counts for areas and demographic
groups and to use the information to improve census processes. The
programs to measure census coverage error are referred to as cover
age measurement programs. In recent years, coverage measurement pro
grams included a third objective—correcting the census for enumeration
error, referred to as census adjustment. The techniques used in coverage
measurement programs to understand the extent of enumeration errors
are sample surveys, dualsystems estimation (DSE), and demographic
analysis.
In contrast to the previous two censuses, the Census Bureau has
decided that the 2010 census coverage measurement (CCM) program
will have a new principal objective: to emphasize census improvement
OCR for page 7
COVERAGE MEASUREMENT IN THE 2010 CENSUS
rather than census correction.1 As a result of this change, rather than focus
on the measurement of net census coverage error for demographic and
geographic subsets of the U.S. population, the coverage measurement
program in 2010 will focus on measurement of the rates of the components
of census coverage error for subsets of the population defined not only
geographically and demographically, but also by the census processes
used. The hope is to use this information to help identify census processes
that are associated with a high rate of coverage error and then identify
alternative processes to reduce the rates. This feedback loop will then
help to facilitate census process improvement for subsequent censuses.
Of course, an important secondary goal of the coverage measurement
program still remains, which is to inform census data users about the net
coverage error for large geographic areas and demographic groups. The
shift in the principal objective of the coverage measurement program from
that adopted in 2000 (and in 1990) stems from both the specific circum
stances surrounding the 2000 census and broader dynamics. With respect
to the 2000 census, a decision by the Supreme Court in 1999 precluded
the use of census adjustment for purposes of apportionment of the U.S.
House of Representatives if it is based on data from a sample survey (as
it would almost certainly be). Furthermore, the time needed to carry out
and review coverage measurement also very likely precludes the use of
adjusted counts as input into redrawing the boundaries of the districts for
the U.S. House of Representatives (unless the dates for the census, April 1,
or for redistricting, April 1 of the following year, are changed).
Also, the problem of census omissions has become a problem of erro
neous enumerations (overcounts) and census omissions: Prior to 1990 the
main coverage problem was census omissions, but at the national level
in 2000 the number of erroneous enumerations was roughly the same as
the number of census omissions.2
The new problem of both census omissions and erroneous enumera
tions has arisen partly because of the effort to reduce the main problem
of census omissions that dominated prior to 2000. In response to the chal
lenge of reducing census omissions, between 1960 and 2000 the Census
Bureau added a number of alternative ways in which households could
be included on the Master Address File (including the Local Update of
Census Addresses Program), and in which individuals could be enumer
ated in the census (including the Be Counted Program). These additional
ways for households and individuals to be included in the census certainly
1Actually,
this is in a sense a return to the pre1990 goal of coverage measurement.
2 While
this balancing did not obtain for every demographic or geographic subgroup, it is
also true that the differential nature of net coverage error was reduced from that of previous
censuses. (For information on adjusted counts in 2000, see Schindler, 2006.)
OCR for page 7
INTRODUCTION
increased the number of duplicate enumerations, which contributed to the
“balancing” of the undercount and the overcount in the 2000 census.
In addition to the duplication resulting from new avenues for enu
meration, there is evidence that a number of social dynamics are also
increasing the potential for census overcounts (see National Research
Council, 2006). First, the structure of households is becoming more com
plicated, with more people having attachments to multiple households,
including children in shared custody. Second, the number of people with
multiple residences is increasing: This group includes people with vaca
tion homes and “commuter marriages.” It has also been hypothesized that
the quality of the enumerator workforce has decreased over time.
The shift in the principal objectives of coverage measurement raises
many interesting and important technical issues. For example: What sam
ple design for the coverage measurement survey should be used? What
estimation approaches should be used in support of the attempt to link
error status to relevant census processes? What data products would best
communicate the linkages between census component coverage error and
census processes in need of improvement?
PANEL CHARgE AND WORK PLAN
At the Census Bureau’s request, the National Academies established
the Panel on Correlation Bias and Coverage Measurement in the 2010
Decennial Census to examine the Census Bureau’s coverage measurement
plans for 2010 with the following charge:
This project involves a study of four issues concerning census cover
age estimation with the goal of developing improved methods for use in
evaluating coverage of the 2010 census. A panel of experts will conduct
the study under the auspices of the Committee on National Statistics of
the Division of Behavioral and Social Sciences and Education. The panel
is charged to review Census Bureau work on these topics and recom
mend directions for research. The panel’s work may require develop
ment of statistical models to extend the dualsystems estimation (DSE)
approach, and may also include suggestions for the use of auxiliary data
sources such as administrative records. DSE, as applied to the 1990 and
2000 censuses, had several benefits as well as limitations as a means for
estimating net census coverage. Some of the limitations were:
1. The approach was designed for estimating net census coverage
errors and did not provide accurate estimates of gross coverage errors,
i.e., of gross census omissions separate from gross census erroneous
enumerations. In the DSE approach applied in the 1990 and 2000 cen
suses, certain census enumerations classified as erroneous were balanced
against certain coverage survey cases classified as nonmatches (census
OCR for page 7
10 COVERAGE MEASUREMENT IN THE 2010 CENSUS
omissions) for the purpose of estimating net census coverage. Some of
these paired census enumerations and coverage survey cases did not
necessarily reflect gross errors.
2. The application of DSE in Accuracy and Coverage Evaluation
(A.C.E.) Revision II during the 2000 census accounted for duplicates
found in the census in a simplistic way due to lack of information as to
which member of a duplicate pair was a correct enumeration and which
was an erroneous enumeration. This led to estimation error, as did the
simplistic treatment of A.C.E. cases (Psample) that matched to census
enumerations outside the search area.
3. The poststratification approach used to apply the DSE had cer
tain limitations. First, the number of factors that could be included in
the poststratification was limited because the approach crossclassified
the factors, so that each factor added to the poststratification greatly
split the sample. (Collapsing of poststrata was needed because many of
the crossclassified cells had small sample sizes.) Second, the synthetic
error that arose from the synthetic application of the poststratum cover
age correction factors to produce estimates for subnational areas and
population subgroups was not reflected in their corresponding variance
estimates.
4. Comparisons of aggregate tabulations of DSEs with estimates from
demographic analysis (DA), in both 1990 and 2000, suggested under
estimation by DSE of persons missed by both the census and the cover
age survey (correlation bias). In the 2000 A.C.E. Revision II, sex ratios
from DA were used to determine factors to correct adult male estimates
for correlation bias, assuming no correlation bias for children and adult
females. This approach appeared effective for adult blacks, but there
were concerns about the appropriateness of its assumptions for other
race/origin groups (particularly Hispanics). Also, DA totals for young
children (0–9) exceeded the corresponding aggregated DSEs from A.C.E.
Revision II by a sufficient amount to suggest possible correlation bias in
estimates for young children.
The Census Bureau is interested in improving the DSE methodology
to address the above issues to the extent possible, to develop improved
methods for estimating coverage of the 2010 census, both in regard to net
errors and gross errors.
This original charge to the panel had four areas of focus: (1) to effec
tively measure the components of coverage error rather than net cov
erage error; (2) to improve the determination of duplicate status and
the measurement of the rate of census duplication; (3) to assess the use
of modelbased alternatives to poststratification, including their impact
on the ability to model local heterogeneous effects; and (4) to examine
the use of demographic analysis to correct for correlation bias. It was
also understood that the panel’s work might involve the review of other
OCR for page 7
11
INTRODUCTION
statistical models proposed for estimation of net coverage error and the
use of auxiliary data sources, such as administrative records, in DSE.
Consistent with this, it was recognized that all the data retained from
the 2010 census—not only the census enumerations themselves and the
postenumeration survey and matching results, but also data collected
by the various management information systems that monitor census
processes—could prove useful in modeling census error rates and provid
ing information on the sources of census error. Therefore, the panel was
also asked to provide advice on what data should be retained from the
2010 census.
During the course of the study, several other issues in connection with
the panel’s overall task arose: a review of the Census Bureau’s draft docu
ment providing a framework for defining and estimating components
of census coverage error; examination of the possibility of estimating
the match status of cases previously categorized as having insufficient
information for matching, in order to reduce the number of cases clas
sified as erroneous enumerations due to item nonresponse; assessment
of the various alternatives that could be used to reduce or address con
tamination due to the similarity and simultaneity of the census coverage
followup interviews and the initial CCM interviews in 2010; the CCM
postenumeration survey design. More generally, the panel considered
any other limitations that the 2000 A.C.E. Program had in addressing the
objective in 2010 of measuring the rate of census component coverage
error.
The panel took as given the basic design of data collection and match
ing operations planned for census coverage measurement in 2010. The
plans include a sizable postenumeration survey that will be matched to
the census to assess match status for the housing units (and individual
residents in those housing units) found in a sample of census block clus
ters. The panel examined modifiable aspects of the data collection for
the 2010 coverage measurement program, including the sample design,
seeking possible improvements. The panel did not address the broader
range of possible coverage measurement programs that might best sup
port census improvement over time.
A postenumeration survey that is matched to the census, along with
a sample of census records that are matched to the census enumerations,
can be used to directly support the new objective of census improvement
because one can identify individual census enumerations that are dupli
cates, erroneous enumerations, and enumerations in the wrong location.
Furthermore, one can identify a sample of individuals that were omitted
in the census enumerations. In addition, and crucially, one can identify
the census processes that were used to enumerate these individuals, along
with characteristics of the individuals, their households and housing
OCR for page 7
12 COVERAGE MEASUREMENT IN THE 2010 CENSUS
units, and contextual variables. This information can then be analyzed
using statistical models to link higher rates of each of the four types of
census error and the associated census processes. Thus, a data collection
and estimation program that was originally proposed to be used in an
aggregate way for estimating net coverage error for large demographic
and geographic groups is also very useful for identifying individuals of
interest to populate a database to support statistical models predicting
census coverage error. The change in objectives also suggests that rather
than try to “fix” the census for net undercoverage using samplingbased
statistical procedures, it may be preferable to use information on census
coverage error to identify deficiencies in the decennial census processes.
Finally, in the course of its work, the panel also considered the
possible benefits of a broader program of research on census coverage
measurement. The panel explored other activities that might support
measurement of components of census coverage error. This work was
undertaken while recognizing that plans are close to final as the 2010
census nears, with a view to plans for coverage measurement for 2020.
In sum, the panel undertook to evaluate the Census Bureau’s plans
for coverage measurement in the 2010 census and to provide suggestions
and recommendations for changes and additions to those plans, given
the new objective of measuring the rates of components of census cover
age error, with the ultimate goal of assessing the contribution of various
census component processes to census coverage error.
PLAN OF THE REPORT
Substantial portions of this report are taken from the material in
the panel’s interim report (National Research Council, 2007). This report
expands the panel’s work in five areas: assessment of duplicate status,
missing data methods, the census coverage measurement sample design,
improvements to demographic analysis, and treatment of the potential
contamination of the census coverage measurement sample interview by
the overlap in the field with the census coverage followup interview.
To collect the necessary information for this study, the panel held six
plenary meetings between August 2004 and July 2007. During the course
of our meetings, Census Bureau staff described their current coverage mea
surement research activities and intended directions for further work, their
test and dress rehearsal plans, and their plans for the 2010 CCM program.
Some of the Census Bureau’s research on net coverage error has
been facilitated by the development of an A.C.E. research database. This
database contains the data collected by A.C.E. to support estimation of
net coverage error in 2000, and it is weighted to represent the additional
OCR for page 7
1
INTRODUCTION
information collected from the national duplicates search and the evalu
ation followup survey so that the net coverage error estimates produced
are nearly identical to those from A.C.E. Revision II.
This introductory chapter is followed by four chapters and three
appendices. Chapter 2 first discusses types of census coverage error
and the coverage error metrics for domains of interest. It then describes
the three primary purposes for coverage measurement and DSE and
demographic analysis, the two primary methods used to measure net
coverage error. Chapter 2 also presents short histories of the U.S. census
coverage measurement programs from 1950 to 1990, including a descrip
tion of A.C.E., the coverage measurement program for the 2000 census.
Chapter 3 examines how the 2010 census differs from the 2000 census
with respect to the impact on the coverage measurement program for
2010. It looks in some depth at the treatment of duplicates in the 2010
census and the 2010 coverage measurement program, including the possi
bility of contamination of the 2010 coverage measurement data collection
through the application of the coverage followup interview. The chapter
also discusses how the use of administrative records could potentially
assist in both coverage improvement and coverage measurement for the
2010 census.
Chapter 4 discusses a number of technical topics introduced by the
various changes made in coverage measurement for 2010, including: the
sample design for the census coverage measurement postenumeration
survey in 2010; the use of logistic regression modeling as a substitute
for poststratification in modeling net coverage error; how one compares
competing models in this situation; and the treatment of missing data in
net coverage error modeling, including the Census Bureau’s current plans
for addressing missing data prior to fitting the logistic regression models
in 2010. In relation to the issue of missing data, the chapter includes a
description of an attempt by the Census Bureau to greatly reduce the
number of cases that are considered to have insufficient information to
support matching. The chapter concludes with a discussion of how to
improve demographic analysis for use in census coverage measurement
in 2010.
Chapter 5 first briefly outlines the Census Bureau’s framework for
defining and estimating components of census coverage error. It then
considers potential variables for use in statistical models to assess cor
relates of components of census coverage error. The chapter ends with a
consideration of the purpose of the key output from the census coverage
measurement program in 2010—the analytic capability to develop statisti
cal models linking census coverage errors of various types to individual
and household characteristics and census process variables.
OCR for page 7
1 COVERAGE MEASUREMENT IN THE 2010 CENSUS
There are three appendixes. Appendix A provides additional details
from the paper by Mulry and kostanich (2006). Appendix B provides
additional details on the use of logistic regression models as a substitute
for poststratification. Appendix C provides biographical sketches of panel
members and staff.