Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 145
Coverage Measurement in the 2010 Census Appendix A A Framework for Components of Census Coverage Error This appendix summarizes Mulry and Kostanich (2006). They begin by hypothesizing a P-census, which is the P-sample if the entire United States were included in a postenumeration survey (PES). The P-census is also idealized in that no errors are assumed to be made in its data collection or matching, though the P-census can miss, at random, some correct enumerations in the census. The authors then categorize people on the basis of the quality of their data, that is, whether their census questionnaire has errors or non-response, as follows: those correctly enumerated in the census, CE, those enumerated in the census but in the wrong location, WL, those erroneously enumerated in the census, EE, those with insufficient information for matching to the P-census, II, those that are not data defined in the census, NDD, and those omitted in the census, OM. The authors also divide the population into four subsets by crossing the following two dichotomies: whether or not a census enumeration has sufficient information for matching and whether or not a census enumeration is in the P-census. The subscript ij indicates subset membership: the first index is equal to 1 for those with sufficient information for matching and 0 otherwise; the second index is equal to 1 with inclusion in the
OCR for page 146
Coverage Measurement in the 2010 Census FIGURE A-1 Elements of dual-systems estimation. SOURCE: Adapted from Mulry and Kostanich (2006). P-census and 0 otherwise. See Figure A-1 for a depiction of the various subsets of the total population using this taxonomy. The result is 13 separate cells, defined as follows: CE11: correct enumeration in the census and in the P-census CE10: correct enumeration in the census and missed in the P-census EE10: erroneous enumeration in the census and missed in the P-census (which would include both erroneous enumerations as defined in this report and duplicate enumerations in the census EEII00: erroneous enumeration in the census with insufficient information for matching and missed in the P-census EENDD00: erroneous enumeration in the census and not data-defined and missed in the P-census WL11: enumerated in the wrong location in the census and in the P-census WL10: enumerated in the wrong location in the census and missed in the P-census II01: insufficient information for matching in the census and counted in the P-census II00: insufficient information for matching in the census and missed in the P-census NDD01: not data defined in the census and in the P-census NDD00: nor data defined in the census and missed in the P-census OM01: missed in the census and in the P-census OM00: missed in the census and missed in the P-census The following additional relationships are used below:
OCR for page 147
Coverage Measurement in the 2010 Census Thus: (1) Given that the number of correct enumerations, CE, is equal to CE11 + CE10; that the number of enumerations in the P-census, P, is equal to CE11 + WL11 + II01 + NDD01 + OM01; and that the number of the P-census matches to correct census enumerations in the matching universe in the correct location, M, is equal to CE11, one can re-express the dual-systems estimator, in terms of the cell counts as (2) To justify this formula, the authors express three assumptions that are used in practical implementation of dual-systems estimation as a function of the entire set of 13 quantities: Assumption 1: The basic assumption underlying dual-systems estimation is that the proportion of the true population correctly enumerated in the census equals the proportion of the P-census enumerated in the census. This can be expressed as
OCR for page 148
Coverage Measurement in the 2010 Census Turning this around: (3) Assumption 2: It is assumed that correct enumerations in the matching universe are included in the P-census at the same rate as all correct enumerations. That is, it is assumed that cases insufficient for matching can be treated as missing completely at random. This is expressible as (4) Assumption 3: Given that the search for a match is geographically limited, it is assumed that the proportion of people that should be enumerated but are called erroneous because they are in the wrong location equals the proportion of matches that are not found because they are in the wrong location. This assumption is the so-called balancing of erroneous enumerations and nonmatches and is equivalent to the statement that the proportion of correct enumerations found because they are in the correct location equals the percentage of matches found because they are in the correct location. This can be expressed as which can be re-expressed as (5) Substituting expressions (4) and (3) into (2), we have: (6) = (2) therefore justifying dual-systems estimation when the above three assumptions obtain. The dual-systems estimation expression can be rewritten as which is equal to the true population if the last term is equal to the missing elements in expression (1): that is, if
OCR for page 149
Coverage Measurement in the 2010 Census (7) The quantity on the right-hand side of (7) is referred to as the fourth cell—the people who are missed by both the census and by the P-census. If one assumes that the property of being correctly included in the census at the correct location is statistically independent of being in the P-census, then which is equivalent to (7). Mulry and Kostanich also discuss what information is available from the field as to which of the sample of census enumerations, and which of the P-sample enumerations (many of which are the same individuals) fall into the various 13 types of enumerations listed above. Recall that the P-sample enumerations are only matched to matchable census enumerations in a search area. Also, for persons who have moved into the P-sample block clusters after census day, the P-sample is matched to their residence address on census day. Matches therefore provide an estimate of the number of correct enumerations in the correct location that were included in the P-sample. The P-sample is composed of matches and nonmatches: the matches, again ignoring sampling variation, are equal to CE11, and the nonmatches are equal to II01 + WL11 + NDD01 + OM01. These various types of nonmatches are not distinguishable without further data collection. The number of census enumerations is the sum of the correct enumerations and erroneous enumerations (as defined by the Census Bureau), or E = CE + EE, where CE = CE11 + CE10. In the expression CE = CE11 + CE10, the components are distinguishable for nonmovers because in matching the P-sample to the E-sample, it is determined which census enumerations were included and which were missed in the P-sample. However, the two components of correct enumerations are not distinguishable for movers. Mulry and Kostanich further address the measurement of components of census coverage error. If one wants to decompose the various summary estimates, more information would be needed than that used to support dual-systems estimation. When the objective is the estimation of net coverage error, a very strict definition of correct enumeration is used, involving a small restricted search area within the relevant P-sample block cluster (and possibly a small area surrounding that area). But when the objective is to measure components of census coverage error, one can define a correct enumeration in a variety of ways to conform to a given tabulation of interest.
OCR for page 150
Coverage Measurement in the 2010 Census For instance, a correct enumeration can be in the correct county, state, or simply included correctly in the United States, the latter being the approach taken to simplify the argument given. Mulry and Kostanich state their goal is partly to obtain estimates of the number of erroneous enumerations, EE10 + EEII00 + EENDD00, and the number of census omissions, OM01 + OM00. (In this report, the panel states there is also interest in estimating the number of enumerations in the wrong place and the number of duplicate enumerations.) Unfortunately, because of enumerations in the wrong location and enumerations with either insufficient information for matching or not data defined, subtracting CE from the census count gives an inflated estimate of the number of erroneous enumerations, EE10 + EEII00 + EENDD00. Specifically, Census − CE11 − CE10 = WL11 + WL10 + II01 + II00 + NDD01 + NDD00 + EE10 + EEII00 + EENDD00, so Census − CE is the sum of erroneous census enumerations (which includes duplicates) plus census enumerations in the wrong location plus correct census enumerations with insufficient information for matching. For the same reason as for erroneous enumerations, subtracting the matching enumerations from the P-census does not provide an unbiased estimate of the number of omitted people in the census, OM10 + OM00. In fact, P − M = II01 + WL11 + NDD01 + OM01. To obtain an estimate of the number of omissions, note that DSE − Census = NetCensusError = OM01 + OM00 − EENDD00 + EEII00 + EE10, and, therefore, OM01 + OM00 = NetCensusError + EENDD00 + EEII00 + EE10. So, to estimate the number of omissions, one can take an estimate of the net census error and add to it the number of erroneous enumerations (including the number of duplicates). The Census Bureau plans to use two definitions of a correct enumeration in 2010, one to provide a quality estimate of net census error, which among other things will help to estimate the number of omissions, and one to estimate the remaining components of coverage error. To estimate the number of erroneous enumerations, the Census Bureau will need: to collect additional data to determine where enumerations should be included if the search area is not the correct location; to match the E-sample enumerations against the full set of census enumerations for duplicates, with field validation if necessary to establish proper census residence; and for enumerations in the E-sample but not in the matching universe, to strive to match to the P-sample (when possible) to identify those KEs (responses that are census data-defined but have insuf-
OCR for page 151
Coverage Measurement in the 2010 Census ficient information for matching as defined in 2000) that are correct enumerations. This appendix omits the remaining details: Mulry and Kostanich discuss how one could separate out those enumerated in the wrong location from those that are erroneous, other complications raised by cases with insufficient information for matching, movers, and duplicates, and when to use imputation methods. Finally, the estimates of the components are generally represented as sample weighted averages, mainly of 0-1 indicator variables, but also of imputed probabilities.
OCR for page 152
Coverage Measurement in the 2010 Census This page intentionally left blank.