National Academies Press: OpenBook

Coverage Measurement in the 2010 Census (2009)

Chapter: Appendix A: A Framework for Components of Census Coverage Error

« Previous: References and Bibliography
Suggested Citation:"Appendix A: A Framework for Components of Census Coverage Error." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 145
Suggested Citation:"Appendix A: A Framework for Components of Census Coverage Error." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 146
Suggested Citation:"Appendix A: A Framework for Components of Census Coverage Error." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 147
Suggested Citation:"Appendix A: A Framework for Components of Census Coverage Error." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 148
Suggested Citation:"Appendix A: A Framework for Components of Census Coverage Error." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 149
Suggested Citation:"Appendix A: A Framework for Components of Census Coverage Error." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 150
Suggested Citation:"Appendix A: A Framework for Components of Census Coverage Error." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 151
Suggested Citation:"Appendix A: A Framework for Components of Census Coverage Error." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 152

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Appendix A A Framework for Components of Census Coverage Error This appendix summarizes Mulry and Kostanich (2006). They begin by hypothesizing a P-census, which is the P-sample if the entire United States were included in a postenumeration survey (PES). The P-census is also idealized in that no errors are assumed to be made in its data collec­ tion or matching, though the P-census can miss, at random, some correct enumerations in the census. The authors then categorize people on the basis of the quality of their data, that is, whether their census questionnaire has errors or non­ response, as follows: 1. those correctly enumerated in the census, CE, 2. those enumerated in the census but in the wrong location, WL, 3. those erroneously enumerated in the census, EE, 4. those with insufficient information for matching to the P-census, II, 5. those that are not data defined in the census, NDD, and 6. those omitted in the census, OM. The authors also divide the population into four subsets by crossing the following two dichotomies: whether or not a census enumeration has sufficient information for matching and whether or not a census enumera­ tion is in the P-census. The subscript ij indicates subset membership: the first index is equal to 1 for those with sufficient information for match­ ing and 0 otherwise; the second index is equal to 1 with inclusion in the 145

146 COVERAGE MEASUREMENT IN THE 2010 CENSUS Census Eligible for Matching P-Census E-sample Universe In Not In In In In CE11 CE10 EE10 WL11 WL10 Not In II01 II00 EEII00 Not In NDD01 NDD00 EENDD00 Not In OM01 OM00 FIGURE A-1  Elements of dual-systems estimation. SOURCE: Adapted from Mulry and Kostanich (2006). P-census and 0 otherwise. See Figure A-1 for a depiction of the various subsets of the total population using this taxonomy. The result is 13 separate cells, defined as follows: CE11: correct enumeration in the census and in the P-census CE10: correct enumeration in the census and missed in the P-census EE10: erroneous enumeration in the census and missed in the P-census (which would include both erroneous enumerations as defined in this report and duplicate enumerations in the census EEII00: erroneous enumeration in the census with insufficient infor­ mation for matching and missed in the P-census EENDD00: erroneous enumeration in the census and not data-defined and missed in the P-census WL11: enumerated in the wrong location in the census and in the P-census WL10: enumerated in the wrong location in the census and missed in the P-census II01: insufficient information for matching in the census and counted in the P-census II00: insufficient information for matching in the census and missed in the P-census NDD01: not data defined in the census and in the P-census NDD00: nor data defined in the census and missed in the P-census OM01: missed in the census and in the P-census OM00: missed in the census and missed in the P-census The following additional relationships are used below: CE = CE11 + CE10 WL = WL11 + WL10

APPENDIX A 147 II = II01 + II00 + EEII00 NDD = NDD01 + NDD00 + EENDD00 OM = OM01 + OM00 Thus: Census = CE11 + CE10 + WL11 + WL10 + II01 + II00 + NDD01 + NDD00 + EE10 + EEII00 + EENDD00; True Population = CE11 + CE10 + WL11 + WL10 + II01 + II00 + NDD01 + NDD00 + (1) OM01 + OM00; Net Census Error = True Population – Census = OM10 + OM00 – EE10 – EEII00 - EENDD00; P-Census = CE11 + WL11 + II01 + NDD01 + OM01 Given that the number of correct enumerations, CE, is equal to CE11 + CE10; that the number of enumerations in the P-census, P, is equal to CE11 + WL11 + II01 + NDD01 + OM01; and that the number of the P-census matches to correct census enumerations in the matching universe in the correct location, M, is equal to CE11, one can re-express the dual-systems estimator, P DSE = CE , M in terms of the cell counts as DSE = ( CE11 + CE10 ) (CE11 + WL11 + II01 + NDD01 + OM01 ) (2) . CE11 To justify this formula, the authors express three assumptions that are used in practical implementation of dual-systems estimation as a function of the entire set of 13 quantities: Assumption 1: The basic assumption underlying dual-systems estimation is that the proportion of the true population correctly enumerated in the census equals the proportion of the P-census enumerated in the census. This can be expressed as CE + WL + II01 + II00 + NDD01 + NDD00 CE11 + WL11 + II01 + NDD01 = . DSE CE11 + WL11 + II01 + NDD01 + OM01

148 COVERAGE MEASUREMENT IN THE 2010 CENSUS Turning this around:  CE + WL11 + II01 + NDD01 + OM01  DSE = ( CE + WL + II01 + II00 + NDD01 + NDD00 )  11  . (3)  CE11 + WL11 + II01 + NDD01  Assumption 2: It is assumed that correct enumerations in the match- ing universe are included in the P-census at the same rate as all correct enumerations. That is, it is assumed that cases insufficient for matching can be treated as missing completely at random. This is expressible as CE11 + WL11 CE11 + WL11 + II01 + NDD01 = . (4) CE + WL CE + WL + II01 + II00 + NDD01 + NDD00 Assumption 3: Given that the search for a match is geographically limited, it is assumed that the proportion of people that should be enumerated but are called erroneous because they are in the wrong location equals the proportion of matches that are not found because they are in the wrong location. This assumption is the so-called balancing of erroneous enumerations and nonmatches and is equivalent to the statement that the proportion of correct enumerations found because they are in the correct location equals the percentage of matches found because they are in the correct location. This can be expressed as CE11 + CE10 CE11 = , ( CE11 + CE10 + WL11 + WL10 ) CE11 + WL11 which can be re-expressed as CE11 + CE10 ( CE11 + CE10 + WL11 + WL10 ) CE + WL = = . (5) CE11 CE11 + WL11 CE11 + WL11 Substituting expressions (4) and (3) into (2), we have: DSE = ( CE11 + CE10 ) (CE 11 + WL11 + II01 + NDD01 + OM01 ) , (6) = (2) CE11 therefore justifying dual-systems estimation when the above three assump­ tions obtain. The dual-systems estimation expression can be rewritten as CE10 DSE = ( CE11 + CE10 + WL11 + II01 + NDD01 + OM01 ) + (WL11 + II01 + NDD01 + OM01 ) , CE11 which is equal to the true population if the last term is equal to the miss­ ing elements in expression (1): that is, if

APPENDIX A 149 CE10 CE11 (WL11 + II01 + NDD01 + OM01 ) = WL10 + II00 + NDD00 + OM00 . (7) The quantity on the right-hand side of (7) is referred to as the fourth cell—the people who are missed by both the census and by the P-census. If one assumes that the property of being correctly included in the census at the correct location is statistically independent of being in the P-census, then  CE10    ( CE11 + WL11 + II01 + NDD01 + OM01 ) = CE10 + WL10 + II00 + NDD00 + OM00 ,  CE11  which is equivalent to (7). Mulry and Kostanich also discuss what information is available from the field as to which of the sample of census enumerations, and which of the P-sample enumerations (many of which are the same individuals) fall into the various 13 types of enumerations listed above. Recall that the P-sample enumerations are only matched to matchable census enu­ merations in a search area. Also, for persons who have moved into the P-sample block clusters after census day, the P-sample is matched to their residence address on census day. Matches therefore provide an estimate of the number of correct enumerations in the correct location that were included in the P-sample. The P-sample is composed of matches and nonmatches: the matches, again ignoring sampling variation, are equal to CE11, and the nonmatches are equal to II01 + WL11 + NDD01 + OM01. These various types of nonmatches are not distinguishable without further data collection. The number of census enumerations is the sum of the correct enumer­ ations and erroneous enumerations (as defined by the Census Bureau), or E = CE + EE, where CE = CE11 + CE10. In the expression CE = CE11 + CE10, the components are distinguishable for nonmovers because in matching the P-sample to the E-sample, it is determined which census enumera­ tions were included and which were missed in the P-sample. However, the two components of correct enumerations are not distinguishable for movers. Mulry and Kostanich further address the measurement of compo­ nents of census coverage error. If one wants to decompose the various summary estimates, more information would be needed than that used to support dual-systems estimation. When the objective is the estimation of net coverage error, a very strict definition of correct enumeration is used, involving a small restricted search area within the relevant P-sample block cluster (and possibly a small area surrounding that area). But when the objective is to measure components of census coverage error, one can define a correct enumera­ tion in a variety of ways to conform to a given tabulation of interest.

150 COVERAGE MEASUREMENT IN THE 2010 CENSUS For instance, a correct enumeration can be in the correct county, state, or ­ simply included correctly in the United States, the latter being the approach taken to simplify the argument given. Mulry and Kostanich state their goal is partly to obtain estimates of the number of erroneous enumerations, EE10 + EEII00 + EENDD00, and the number of census omissions, OM01 + OM00. (In this report, the panel states there is also interest in estimating the number of enumerations in the wrong place and the number of duplicate enumerations.) Unfortunately, because of enumerations in the wrong location and enumerations with either insufficient information for matching or not data defined, subtracting CE from the census count gives an inflated esti­ mate of the number of erroneous enumerations, EE10 + EEII00 + EENDD00. Specifically, Census – CE11 – CE10 = WL11 + WL10 + II01 + II00 + NDD01 + NDD00 + EE10 + EEII00 + EENDD00 , so Census – CE is the sum of erroneous census enumerations (which includes duplicates) plus census enumera­ tions in the wrong location plus correct census enumerations with insuf­ ficient information for matching. For the same reason as for erroneous enumerations, subtracting the matching enumerations from the P-census does not provide an unbiased estimate of the number of omitted people in the census, OM10 + OM00. In fact, P – M = II01 + WL11 + NDD01 + OM01. To obtain an estimate of the number of omissions, note that DSE – Census = NetCensusError = OM01 + OM00 – EENDD00 + EEII00 + EE10, and, therefore, OM01 + OM00 = NetCensusError + EENDD00 + EEII00 + EE10. So, to estimate the number of omissions, one can take an estimate of the net census error and add to it the number of erroneous enumerations (including the number of duplicates). The Census Bureau plans to use two definitions of a correct enumera­ tion in 2010, one to provide a quality estimate of net census error, which among other things will help to estimate the number of omissions, and one to estimate the remaining components of coverage error. To estimate the number of erroneous enumerations, the Census Bureau will need: • to collect additional data to determine where enumerations should be included if the search area is not the correct location; • to match the E-sample enumerations against the full set of census enumerations for duplicates, with field validation if necessary to establish proper census residence; and • for enumerations in the E-sample but not in the matching ­universe, to strive to match to the P-sample (when possible) to identify those KEs (responses that are census data-defined but have insuf­

APPENDIX A 151 ficient information for matching as defined in 2000) that are correct enumerations. This appendix omits the remaining details: Mulry and Kostanich discuss how one could separate out those enumerated in the wrong loca­ tion from those that are erroneous, other complications raised by cases with insufficient information for matching, movers, and duplicates, and when to use imputation methods. Finally, the estimates of the components are generally represented as sample weighted averages, mainly of 0-1 i ­ ndicator variables, but also of imputed probabilities.

Next: Appendix B: Logistic Regression for Modeling Match and Correct Enumeration Rates »
Coverage Measurement in the 2010 Census Get This Book
×
 Coverage Measurement in the 2010 Census
Buy Paperback | $56.00 Buy Ebook | $44.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The census coverage measurement programs have historically addressed three primary objectives: (1) to inform users about the quality of the census counts; (2) to help identify sources of error to improve census taking, and (3) to provide alternative counts based on information from the coverage measurement program.

In planning the 1990 and 2000 censuses, the main objective was to produce alternative counts based on the measurement of net coverage error. For the 2010 census coverage measurement program, the Census Bureau will deemphasize that goal, and is instead planning to focus on the second goal of improving census processes.

This book, which details the findings of the National Research Council's Panel on Coverage Evaluation and Correlation Bias, strongly supports the Census Bureau's change in goal. However, the panel finds that the current plans for data collection, data analysis, and data products are still too oriented towards measurement of net coverage error to fully exploit this new focus. Although the Census Bureau has taken several important steps to revise data collection and analysis procedures and data products, this book recommends further steps to enhance the value of coverage measurement for the improvement of future census processes.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!