7
Accuracy and Coverage Evaluation: Assessment

This chapter presents the panel’s assessment of the Accuracy and Coverage Evaluation (A.C.E.) Program because the A.C.E. is crucial to any assessment of the census itself. We consider nine separate aspects of the A.C.E.:

  • conduct and timing;

  • household noninterviews in the P-sample;

  • imputation for missing characteristics and unresolved residence, match, and enumeration status;

  • quality of matching;

  • the targeted extended search;

  • post-stratification;

  • variance estimates;

  • final match codes and rates; and

  • gross errors.

We end this chapter with our summary assessment of the A.C.E.

CONDUCT AND TIMING

Overall, the A.C.E. appears to have been well executed. Although the sample size was twice as large as that fielded in 1990, the A.C.E. was carried out on schedule and with only minor problems that necessitated rearrangement or modification of operations after they had been specified.1 Some procedures,

1  

Mostly, such modifications involved accommodation to changes in the Master Address File (MAF) that occurred in the course of the census. For example, the targeted extended search (TES) procedures had to be modified to handle deletions from and additions to the MAF that were made after the determination of the TES housing unit inventory (Navarro and Olson, 2001:11).



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 103
The 2000 Census: Interim Assessment 7 Accuracy and Coverage Evaluation: Assessment This chapter presents the panel’s assessment of the Accuracy and Coverage Evaluation (A.C.E.) Program because the A.C.E. is crucial to any assessment of the census itself. We consider nine separate aspects of the A.C.E.: conduct and timing; household noninterviews in the P-sample; imputation for missing characteristics and unresolved residence, match, and enumeration status; quality of matching; the targeted extended search; post-stratification; variance estimates; final match codes and rates; and gross errors. We end this chapter with our summary assessment of the A.C.E. CONDUCT AND TIMING Overall, the A.C.E. appears to have been well executed. Although the sample size was twice as large as that fielded in 1990, the A.C.E. was carried out on schedule and with only minor problems that necessitated rearrangement or modification of operations after they had been specified.1 Some procedures, 1   Mostly, such modifications involved accommodation to changes in the Master Address File (MAF) that occurred in the course of the census. For example, the targeted extended search (TES) procedures had to be modified to handle deletions from and additions to the MAF that were made after the determination of the TES housing unit inventory (Navarro and Olson, 2001:11).

OCR for page 103
The 2000 Census: Interim Assessment such as telephone interviewing, proved more useful than had been expected. All processes, from sampling through estimation, were carried out according to well-documented specifications, with quality control procedures (e.g., reviews of the work of clerical matchers and field staff) implemented at appropriate junctures. HOUSEHOLD NONINTERVIEWS IN THE P-SAMPLE Because the quantity being estimated—the net undercount of the population—is very small relative to the total population (1–2%), it is essential that the P-sample survey meet high standards with regard to the completeness of reporting. A high rate of household noninterviews that required extensive adjustments to the sampling weights would be detrimental to the dual-systems estimation that is the key to the A.C.E. A high rate would not only increase variance, but also likely introduce bias due to the likelihood that nonresponding households differ from responding households in systematic ways that are important for estimation. Interview/Noninterview Rates Overall, the A.C.E. obtained interviews from 98.9 percent of households that were occupied on interview day. This figure compares favorably with the 98.4 percent interview rate for the 1990 Post-Enumeration Survey (PES).2 However, the percentage of occupied households as of Census Day that were successfully interviewed in A.C.E. was somewhat lower—97 percent, meaning that a weighting adjustment had to account for the remaining 3 percent of noninterviewed households. The lower interview rate for Census Day households is due largely to the fact that households that had been occupied entirely by outmovers at the time of the census were harder to interview than other households. This result is not surprising because the new occupants of such households may know nothing of the people who lived there before, and it may not always be possible to interview a knowledgeable neighbor or landlord. The interview rate for outmover households was 81.4 percent. Such households comprised 4 percent of Census Day occupied households in the P-sample. Noninterview Weighting Adjustments Two weighting adjustments were calculated so that interviewed households would represent all households that should have been interviewed: one for the A.C.E. interview day and the other for Census Day. Each of the two 2   These percentages are unweighted; they are about the same as weighted percentages. Weighted percentages are not available for 1990.

OCR for page 103
The 2000 Census: Interim Assessment weighting adjustments was calculated separately for households by type (single-family unit, apartment, other) within each individual block cluster. Mover status was not a factor for reweighting. For Census Day, what could have been a relatively large noninterview adjustment for outmover households in a block cluster was spread over all interviewed Census Day households in the cluster for each of the three housing types. Consequently, adjustments to the weights for interviewed households were quite low, which had the benefit of minimizing the increase in the variance of A.C.E. estimates due to differences among weights: 52 percent of the weights were not adjusted at all because all occupied households in the adjustment cell were interviewed; for another 45 percent of households, the weighting adjustment was between 1.0 and 1.2 (Cantwell et al., 2001:Table 2; see also “Variance Estimates,” below). MISSING AND UNRESOLVED DATA Another important aspect of A.C.E. data quality is the extent of missing and unresolved data in the P-sample and the E-sample and the effectiveness of imputation procedures to supply values for missing and unresolved variables. Understanding the role of imputation necessitates understanding the designation of the E-sample and the treatment of certain cases in the matching. As noted above, the E-sample excluded whole person imputations in the census, defined as people with only one short-form characteristic (which could be name). Matching was performed on the P-sample and E-sample, using only reported information. During the course of matching, it was determined that some cases lacked enough reported data for matching and follow-up when a more stringent criterion was applied than that used to exclude whole person imputations from the E-sample. Cases in the P-sample and E-sample lacking name and at least two other short-form characteristics could not be matched. Such cases were retained in both the E- and the P-samples; in the E-sample they were coded as erroneous enumerations and in the P-sample they were not yet assigned a final match status. After all matching and follow-up had been completed, the next step was item imputation. Missing characteristics were imputed separately for each item in the P-sample records (including those records that lacked enough reported data for matching). Imputations for missing characteristics in the E-sample records (including those records that lacked name and at least two other short-form characteristics) were obtained from those on the census data file (see Appendix A). Then, match probabilities and Census Day residence probabilities were imputed for unresolved P-sample cases, including those that were set aside in the matching, and correct enumeration probabilities were imputed for unresolved E-sample cases. E-sample cases set aside in the matching were assigned a correct enumeration probability of zero.

OCR for page 103
The 2000 Census: Interim Assessment TABLE 7-1 Missing Data Rates for Characteristics, 2000 A.C.E. and 1990 PES P-Sample and E-Sample (weighted)   Percentage of People with Imputed Characteristics   2000 A.C.E. 1990 PES Characteristic P-Sample E-Sample P-Sample E-Sample Age 2.4 2.9 0.7 2.4 Sex 1.7 0.2 0.5 1.0 Race 1.4 3.2 2.5 11.8 Hispanic Origin 2.3 3.4 N.A. N.A. Housing Tenure 1.9 3.6 2.3 2.5 Any of Above 5.4 10.4 N.A. N.A. NOTES: A.C.E. E-sample imputations were obtained from the imputations performed on the census records; PES E-sample imputations were performed specifically for the E-sample. A.C.E. E-sample “edits” (e.g., assigning age on the basis of the person’s date of birth, or assigning sex from first name) are not counted as imputations here. The base for the A.C.E. P-sample imputation rates includes nonmovers, inmovers, and outmovers, including people who were subsequently removed from the sample as nonresidents on Census Day. Excluded from the base for the A.C.E. P-sample and E-sample imputation rates are people eligible for the targeted extended search who were not selected for the targeted extended search sample and who were treated as noninterviews in the final weighting. N.A., not available. SOURCE: Cantwell et al. (2001:Tables 3b, 3c). Missing Characteristics Extent Overall, the extent of missing characteristics data in the P-sample and E-sample was low, ranging between 0.2 percent and 3.6 percent for the characteristics age, sex, race, Hispanic origin, and housing tenure. Missing data rates for most characteristics were somewhat higher for the E-sample than for the P-sample. Missing data rates for the 2000 A.C.E. showed no systematic difference (up or down) from the 1990 PES; see Table 7-1. As would be expected, missing data rates in the P-sample were higher for proxy interviews, in which someone outside the household supplied information, than for interviews with household members; see Table 7-2. By mover status, missing data rates were much higher for outmovers than for nonmovers and inmovers, which is not surprising given that 73.3 percent of interviews for outmovers were obtained from proxies, compared with only 2.9 percent and 4.8 percent of proxy interviews for nonmovers and inmovers, respectively. Even “non-proxy” interviews for outmovers may have been from household members who did not know the outmover. For the E-sample, one can distinguish mailed back returns from returns obtained by enumerators in nonresponse follow-up, although there is not information on proxy interviews for the latter. Table 7-3 shows that missing data rates were higher for some, but not all, characteristics when the return was obtained in nonresponse follow-up than when the return was mailed back by the household.

OCR for page 103
The 2000 Census: Interim Assessment TABLE 7-2 Percentage of 2000 A.C.E. P-Sample People with Imputed Characteristics, by Proxy Interview and Mover Status (weighted)   Percentage of People with Imputed Characteristics Characteristic Household Interview Proxy Interview Nonmover Inmover Outmover Age 2.1 7.9 2.3 2.3 6.0 Sex 1.5 4.2 1.7 0.4 3.4 Race 1.0 8.7 1.2 1.3 8.0 Hispanic Origin 1.8 11.0 2.1 0.8 9.0 Housing Tenure 1.7 5.2 1.9 0.4 2.4 Any of Above 4.4 21.9 5.0 3.7 17.4 Percent of Total   P-Sample 94.3 5.7 91.7 4.8 3.4 NOTES: See notes to Table 7-1. SOURCE: Cantwell et al. (2001:Table 3b). Effects of Item Imputation Because the overall rates of missing data were low, the imputation procedures had little effect on the distribution of individual characteristics (Cantwell et al, 2001:24–26). However, imputation could misclassify people by post-strata and contribute to inconsistent post-strata classification for matching P-sample and E-sample cases (see “Post-Stratification,” below). The reason is because the P-sample and E-sample imputations were performed using somewhat different procedures; also, imputation procedures for the P-sample were carried out separately for each characteristic.3 Unresolved Residence, Match, and Enumeration Status Residence Status The weighted percentage of all P-sample nonmover and outmover cases with unresolved Census Day residence status was 2.2 percent, of which 51.7 percent were cases lacking enough reported information for matching. The remaining 48.3 percent of unresolved residence cases were confirmed matches, confirmed nonmatches, and possible matches. After imputation, the percentage of cases estimated to be Census Day residents dropped slightly, from 98.2 percent of resolved cases to 97.9 percent of all cases because the imputation 3   For example, tenure on the P-sample was imputed by using tenure from the previous household of the same type (e.g., single-family home) with tenure reported, while race and ethnicity were imputed when possible from the distribution of race and ethnicity of other household members or from the distribution of race and ethnicity of the previous household with these characteristics reported (see Cantwell et al., 2001).

OCR for page 103
The 2000 Census: Interim Assessment TABLE 7-3 Percentage of 2000 A.C.E. E-Sample People with Imputed or Edited Characteristics, by Type of Return (weighted)   Percentage of People with Imputed or Edited Characteristics Characteristic Mail Return Enumerator Return Age   Imputed 1.1 7.0 Edited 1.2 1.9 Sex   Imputed 0.1 0.4 Edited 0.9 1.1 Race   Imputed 3.2 3.2 Edited 0.0 0.0 Hispanic Origin   Imputed 3.5 3.0 Edited 0.3 0.4 Housing Tenure   Imputed 2.2 6.8 Edited 0.5 0.8 Any of Above   Imputed 8.5 14.7 Imputed or edited or both 10.9 18.1 Percent of Total E-Sample 69.3 28.0 NOTES: Mail returns are those obtained before the April 18, 2000, cutoff to begin nonresponse follow-up (NRFU). Enumerator returns are those obtained during NRFU. The table excludes 2.7 percent of total E-sample (e.g., list/enumerate, rural update/enumerate, urban/update enumerate, late mail returns). SOURCE: Tabulations by panel staff of U.S. Census Bureau, E-Sample Person Dual-System Estimation Output File, February 16, 2001; tabulations weighted using TESFINWT (see notes to Table 7-7). procedure assigned lower residence probabilities to unresolved cases (77.4 percent overall; this figure is a correction from the original number in Cantwell et al., 2001:Table 8).4 To impute a residence probability, the Census Bureau classified resolved and unresolved cases by match status follow-up group, race, and tenure. The eight match status groups discriminated well: for example, residence probabilities were very low for potentially fictitious people or people said to be living elsewhere on Census Day (14%);5 moderate for college and military age children in partially matched households (84%); and very high for cases resolved 4   One would not expect there to be confirmed non-Census Day residents or unresolved cases among nonmovers and outmovers; however, it could happen because mover status was assigned prior to field follow-up work. 5   Fictitious people are those for whom it seems clear that the data were fabricated by the respondent or enumerator (e.g., a return for Mickey Mouse.)

OCR for page 103
The 2000 Census: Interim Assessment before follow-up (99%). The addition of race and tenure to the imputation cells did not capture much additional variability in the probability of Census Day residence (Cantwell et al., 2001:Table 8). The residence probabilities assigned to people without enough reported data for matching—84 percent overall—were based on the average of the probabilities for people in the other match status groups within each race and tenure category. Match Status The weighted percentage of P-sample cases with unresolved match status was only 1.2 percent.6 This percentage compares favorably with the 1.8 percent of cases with unresolved match status in the 1990 PES. Very little was known about the A.C.E. P-sample people with unresolved match status; 98 percent of them lacked enough reported data for matching (i.e., they lacked a valid name or at least two characteristics or both). After imputation, the percentage of matches dropped slightly, from 91.7 percent of resolved cases (matches and nonmatches) to 91.6 percent of all cases because the imputation procedure assigned lower match status probabilities to unresolved cases (84.3% overall). To impute a match status probability, the Census Bureau classified resolved and unresolved cases by mover status (nonmover, outmover), whether the person’s housing unit did or did not match, and whether the person had one or more characteristics imputed or edited. These categories discriminated well: the probability of a match for nonmovers was 92 percent overall, compared with only 76 percent for outmovers overall. The lowest match probability was 52 percent for outmovers when the housing unit did not match; the highest match probability was 95 percent for nonmovers when the housing unit matched and the person had no imputed characteristics (Cantwell et al, 2001:Table 9). Enumeration Status The weighted percentage of E-sample cases with unresolved enumeration status was 2.6 percent, slightly higher than the comparable 2.3 percent for the 1990 PES. Most of the unresolved cases (89.4%) were nonmatches for which field follow-up could not resolve their status as a correct or erroneous enumeration; the remainder were matched cases for which field follow-up could not resolve their residence status, possible matches, and cases for which the location of the housing unit was not clear. After imputation, the percentage of correct enumerations dropped slightly, from 95.5 percent of resolved cases (correct and erroneous enumerations) to 6   The denominator for the percentage is P-sample nonmovers and outmovers who were confirmed Census Day residents or had unresolved residence status; confirmed non-Census Day residents were dropped from the P-sample at this point.

OCR for page 103
The 2000 Census: Interim Assessment 95.3 percent of all cases because the imputation procedure assigned lower correct enumeration probabilities to unresolved cases (76.2% overall). To impute a correct enumeration status probability, the Census Bureau classified resolved and unresolved cases by match status group, whether the person had one or more imputed characteristics, and race (for some match status groups). The 12 match status groups discriminated well: for example, correct enumeration probabilities were very low for potentially fictitious people (6%) and people said to be living elsewhere on Census Day (23%); moderate for college and military age children in partially matched households (88%); and very high for cases resolved before follow-up (99%). The addition of race and whether the person had imputed characteristics did not capture much additional variability in the probability of correct enumeration (Cantwell et al., 2001:Table 10). QUALITY OF MATCHING Although the rates of unresolved match status and enumeration status were low, there remains a question about the accuracy of the classification of match and enumeration status for cases that were “resolved” before imputation. The accuracy of the matching and associated follow-up process is critical to dual-systems estimation (DSE). That accuracy is critical to distinguish the proportion of P-sample people who match a census record from the proportion who genuinely exist but were not enumerated in the census. If some of the nonmatched people should have been matched or should have been removed from the P-sample because they were fictitious or not a resident at the P-sample address on Census Day or for some other reason, then the estimated match rate will be too low and the estimate of the DSE will be too high. That accuracy is also critical to distinguish the proportion of E-sample people who were correctly counted (including matches and correct nonmatches) from the proportion who were enumerated erroneously because they were duplicate, fictitious, or for some other reason. If some cases who were classified as correct (nonmatched) enumerations were in fact erroneous, then the estimated correct enumeration rate will be too high and the estimate of the DSE will be too high. It is not possible to assess the reliability of assignment of the final match codes until the Census Bureau publishes results from evaluation studies that involve rematching and verifying samples of A.C.E. records (see Executive Steering Committee on A.C.E. Policy, 2001b). The Bureau is also looking at possible errors in assigning correct or erroneous enumeration status to E-sample cases due to the operation of the targeted extended search and the treatment of group quarters residents who should have been excluded from the sample.

OCR for page 103
The 2000 Census: Interim Assessment Rematching studies for 1990 found some degree of clerical matching error, although analysts disagreed on its importance (National Research Council, 1999b:70–75). The results for 2000 are not yet known. The Bureau believed that the accuracy of matching would improve through greater computerization of the process and other steps in 2000, compared with 1990. The results of quality assurance operations during the matching and follow-up interviewing indicated that relatively little error was identified in assigning match and enumeration status codes (see Childers et al., 2001). Nonetheless, the degree of matching error remains to be established. As indirect indicators of the quality of the matching, we examined specific match codes and how they related to the various steps in the process. Extent of Checking Required to Confirm Final Match Code We looked first at final match codes and asked what proportion of the cases in each category were confirmed at the conclusion of computer matching, at the conclusion of clerical matching, or not until after field follow-up. Confirmed Matches Table 7-4 shows that 80.3 percent of final confirmed P-sample matches were designated as a match by the computer and did not require follow-up in the field (last row, column 1). Another 18 percent of final confirmed matches were declared a match by clerks, technicians, or analysts and did not require a field check (last row, columns 2, 3, 4). Only 1 percent of final confirmed matches were declared a match only after confirmation of their Census Day residence status in the field (column 5); only 0.8 percent of final confirmed matches were declared a match only after confirmation of their match and residence status in the field (column 6). Similar results obtained for the E-sample (not shown). By domain and tenure group, the percentage of final confirmed matches that were declared a match by computer varied from 65 percent to 84 percent, perhaps due to difficulties with names. However, there was relatively little variation in the percentage of final confirmed matches that did not require confirmation of residence or match status in the field (97.0% to 99.2%). Given the standards for computer and clerical matching, these results suggest that one can have a high degree of confidence about the designation of a matched case.7 7   The cutoff probability score for a computer match was set high enough, based on previous research, so that false computer matches would almost never occur.

OCR for page 103
The 2000 Census: Interim Assessment TABLE 7-4 Percentage of 2000 A.C.E. P-Sample Matches to Census Enumerations, by Source of Final Match Code Assignment, Race/Ethnicity Domain, and Housing Tenure (weighted)   No Field Check Needed Field Check Needed for     Computer M Computer P, Clerk M Computer NM, Clerk M Other Final M Residence Status Match and Residence Percent Total Matches Domain and Tenure Group (1) (2) (3) (4) (5) (6) (7) American Indian/Alaska Native on Reservation   Owner 80.7 10.4 6.4 1.7 0.4 0.4 0.1 Renter 82.1 12.5 2.9 1.8 0.4 0.4 0.1 American Indian/Alaska Native off Reservation   Owner 78.8 10.4 8.4 1.1 0.8 0.5 0.3 Renter 77.8 9.9 8.3 1.8 0.7 1.6 0.2 Hispanic Origin   Owner 76.4 12.5 8.1 1.1 1.1 0.9 5.8 Renter 68.1 14.6 12.5 1.8 1.2 1.7 5.9 Black (Non-Hispanic)   Owner 77.7 12.1 6.7 1.4 1.1 1.0 5.7 Renter 71.2 12.9 10.9 2.0 1.4 1.5 5.1 Native Hawaiian/Pacific Islander   Owner 76.9 12.1 6.4 2.4 0.8 1.4 0.1 Renter 64.6 15.2 14.9 3.6 0.6 1.0 0.1 Asian   Owner 72.9 14.3 8.9 1.4 1.0 1.4 2.1 Renter 66.7 16.1 12.4 1.7 1.2 1.8 1.2 White and Other Race (Non-Hispanic)   Owner 84.2 8.2 5.7 0.6 0.9 0.4 57.5 Renter 77.9 10.3 8.6 1.0 1.2 1.0 15.8 Total 80.3 9.8 7.2 0.9 1.0 0.8 100.0 NOTES: Columns (1)–(6) in each row add to 100%; Column (7), reading down, adds to 100%. M: match; P: possible match; NM: nonmatch (confirmed Census Day resident). SOURCE: Tabulations by panel staff of P-sample cases that went through matching, from U.S. Census Bureau, P-Sample Person Dual-System Estimation Output File, February 16, 2001. Tabulations weighted using TESFINWT; exclude TES-eligible people not in TES sample block clusters, who have zero TESFINWT.

OCR for page 103
The 2000 Census: Interim Assessment Confirmed P-Sample Nonmatches Assignment of confirmed nonmatch status was always based on a field check for certain types of P-sample cases (see Appendix C), amounting to 50.4 percent of the total confirmed P-sample nonmatches. There was relatively little variation in this percentage for most race/ethnicity domain and tenure groups (data not shown), although 69 percent of final confirmed nonmatches for American Indians and Alaska Natives were not declared a nonmatch until after being checked in the field, compared with only 47 percent for non-Hispanic whites and other races. How many nonmatches were correctly assigned and how many should have been identified as either matches or cases to be dropped from the P-sample (e.g., fictitious cases or people residing elsewhere on Census Day) will not be known until the Census Bureau completes its studies of matching error. Confirmed E-Sample Correct (Nonmatched) or Erroneous Enumerations On the E-sample side, assignment of a final code as a correct (nonmatched) enumeration was always based on a field check. Of final erroneous enumerations (4% of the total E-sample), 35 percent were declared on the basis of a field check, while 65 percent were identified by clerks as duplicates or not enough reported data and did not require confirmation in the field. Unresolved Cases As noted above, the E-sample had a higher percentage of cases that could not be resolved after field checking than did the P-sample: 2.6 percent and 2.2 percent, respectively. Moreover, 52.2 percent of the unresolved P-sample cases were those coded by the computer or clerks as not having enough reported data for matching. These cases were not field checked but had their residence or match status imputed. Extent of Reassignment of Match Codes Another cut at the issue of matching quality is how often one stage of matching changed the code assigned in an earlier stage of matching. Table 7-5 shows that such changes happened quite infrequently. Thus (see Panel A), 99.9 percent and 99.7 percent of confirmed matches assigned by the computer for the P-sample and the E-sample, respectively, remained as such in the final coding. Also, 93 percent of computer possible matches in both the P-sample and the E-sample were confirmed as such without the need for field followup; another 5.5–5.7 percent were confirmed as a match (or, in the case of the E-sample, as a nonmatched correct enumeration) in the field. Only 1.3–1.5

OCR for page 103
The 2000 Census: Interim Assessment TABLE 7-6 2000 A.C.E. Matched P-Sample and E-Sample Cases: Consistency of Race/Ethnicity Post-Stratification Domain (unweighted)   E-Sample P-Sample Race/Ethnicity Domain Domain 1 Domain 2 Domain 3 Domain 4 Domain 5 Domain 6 Domain 7 Total % Inconsistent P-Sample   American Indian or Alaska Native on Reservations (Domain 1) 11,009 0 34 12 0 0 118 11,173 1.5 American Indian or Alaska Native off Reservations (Domain 2) 0 2,223 59 104 0 30 793 3,209 30.7 Hispanic Origin (Domain 3) 44 136 67,985 610 42 267 4,004 73,088 7.0 Non-Hispanic Black (Domain 4) 10 119 496 65,679 6 118 1,423 67,851 3.2 Native Hawaiian or Pacific Islander (Domain 5) 0 3 31 19 1,671 204 177 2,105 20.6 Asian (Domain 6) 1 31 107 102 143 19,679 1,062 21,125 6.8 Non-Hispanic White or Other Race (Domain 7) 107 944 5,041 2,589 183 2,105 360,125 371,094 3.0 E-Sample   Total 11,171 3,456 73,753 69,115 2,045 22,403 367,702 549,645   % Inconsistent 1.5 35.7 7.8 5.0 18.3 12.2 2.1   3.9 NOTE: See Table 6-2 for definitions of domains. SOURCE: Farber (2001a:Table A-3).

OCR for page 103
The 2000 Census: Interim Assessment been had there been no inconsistency. However, the coverage correction factor would have been lower yet for American Indians and Alaska Natives off reservations if they had been merged with the non-Hispanic white and other races stratum. The reverse flow of American Indians and Alaska Natives identifying themselves as non-Hispanic whites or other races had virtually no effect on the coverage correction factor for the latter group, given its much larger proportion of the population. VARIANCE ESTIMATES Overall, the A.C.E. was expected to have smaller variances due to sampling error and other sources than the 1990 PES, and that expectation was borne out. The coefficient of variation for the estimated coverage correction factor for the total population was reduced from 0.2 percent in 1990 to 0.14 percent in 2000 (a reduction of 30%). The coefficients of variation for the coverage correction factors for Hispanics and non-Hispanic blacks were reduced from 0.82 percent and 0.55 percent, respectively, to 0.38 percent and 0.40 percent, respectively (Davis, 2001:Tables E-1, F-1). However, the coefficients of variation for coverage correction factors were as high as 6 percent for particular post-strata, which translates into a very large confidence interval around the estimate of the net undercount.10 The overall coefficient of variation was expected to be reduced by about 25 percent due to the larger sample size of the A.C.E., almost double that of the 1990 PES. In addition, better measures of population size were available during the selection of the A.C.E. block clusters than during the selection of PES clusters, and the A.C.E. sampling weights were less variable than the PES sampling weights. The 2000 TES was much better targeted and thereby more efficient than the similar operation in 1990. Overall, TES was expected to reduce the variance of the DSE, although the 2000 TES also contributed somewhat to an increase in sampling error. Looking at size and variation in weights, Table 7-7 shows the changes in the P-sample weights, from the initial weighting that accounted for differential sampling probabilities to the intermediate weights that included household noninterview adjustments to the final weights that accounted for TES sampling. (The table also shows the distribution of E-sample initial and final weights.) At the outset, 90 percent of the initial P-sample weights were between 48 and 654 and the highest and lowest weights were 9 and 1,288; the distribution did not differ by mover status. After the household noninterview adjustment for Census Day, 90 percent of the weights were between 49 and 10   The variance estimates developed by the Census Bureau likely underestimate the true variance, but the extent of underestimation is not known. The variance estimation excludes some minor sources of error (specifically, the large block subsampling and the P-sample noninterview adjustment). It also excludes most sources of nonsampling error (see Appendix C).

OCR for page 103
The 2000 Census: Interim Assessment TABLE 7-7 Distribution of Initial, Intermediate, and Final Weights, 2000 A.C.E. P-Sample and E-Sample   Percentile of Weight Distribution Sample and Mover Status Number of Non-Zeros 0 1 5 10 25 50 75 90 95 99 100 P-Sample   Initial Weighta   Total 721,734 9 21 48 75 249 352 574 647 654 661 1,288 Nonmovers 631,914 9 21 48 76 253 366 575 647 654 661 1,288 Outmovers 24,158 9 21 48 69 226 348 541 647 654 661 1,288 Inmovers 36,623 9 21 47 67 212 343 530 647 654 661 1,288 Intermediate Weightb   Total with Census Day Weight 712,442 9 22 49 78 253 379 577 654 674 733 1,619 Total with Interview Day Weight 721,426 9 21 48 76 249 366 576 651 660 705 1,701 Final Weightc   Census Day Weight   Total 640,795 9 22 50 83 273 382 581 654 678 765 5,858 Nonmovers 617,390 9 22 50 83 274 382 581 654 678 762 5,858 Outmovers 23,405 9 23 50 77 240 363 577 655 682 798 3,847 Inmovers 36,623 9 21 47 67 214 345 530 651 656 705 1,288 E-Sample   Initial Weightd 712,900 9 21 39 55 212 349 564 647 654 661 2,801 Final Weighte 704,602 9 21 39 56 217 349 567 647 654 700 4,009 aP-sample initial weight, PWGHT, reflects sampling through large block subsampling; total includes removed cases bP-sample intermediate weight, NIWGT, reflects household noninterview adjustment for Census Day; NIWGTI reflects household noninterview adjustment for A.C.E. interview day cP-sample final weight, TESFINWT, for confirmed Census Day residents, total, nonmovers, and outmovers (reflects targeted extended search sampling); NIWGTI for inmovers dE-sample initial weight, EWGHT, reflects sampling through large block subsampling eE-sample final weight, TESFINWT, reflects targeted extended search sampling SOURCE: Tabulations by panel staff of U.S. Census Bureau, P-Sample and E-Sample Person Dual-System Estimation Output Files, February 16, 2001.

OCR for page 103
The 2000 Census: Interim Assessment 674, and the highest and lowest weights were 9 and 1,619. After the TES adjustment, 90 percent of the final weights for confirmed Census Day residents were between 50 and 678, and the highest and lowest weights were 9 and 5,858 (the variation in weights was less for outmovers than nonmovers). For inmovers, there was relatively little difference between the initial sampling weights and the final weights adjusted for household noninterview on the P-sample interview day. While the variations in final weights for the A.C.E. P-sample (and E-sample) were not small, they were considerably less than the variations in final weights for the 1990 PES. In 1990, some P-sample weights were more than 20,000, and 28 percent of the weights exceeded 700, compared with only 5 percent in the A.C.E. FINAL MATCH CODES AND RATES Having examined individual features of the A.C.E., we next looked at the distribution of final match codes and rates for the P-sample and E-sample. We wanted to get an overall sense of the reasonableness of the results for key population groups and in comparison with 1990. Final Match and Enumeration Status P-Sample Match Codes The distribution of final match codes for the P-sample was 89.5 percent confirmed match, 7.4 percent confirmed nonmatch, 2.2 percent match or residence status unresolved, and 0.9 percent not a Census Day resident or removed for another reason (e.g., a fictitious or duplicate P-sample case). Table 7-8 shows that the percent confirmed matches by domain and tenure varied from 80 percent for black and Native Hawaiian and Pacific Islander renters to 93 percent for non-Hispanic white and other race owners; conversely, the confirmed nonmatches varied from 15.8 percent for Native Hawaiian and Pacific Islander renters to 4.9 percent for non-Hispanic white and other race owners. Those groups with higher percentages of nonmatched cases also tended to have higher percentages of unresolved cases: they varied from 1 percent for Native Hawaiian and Pacific Islander owners to 4.7 percent for black renters. After imputation of residence and match status, the overall P-sample match rate (matches divided by matches plus nonmatches) was 91.6 percent. The match rate ranged from 82.4 percent for Native Hawaiian and Pacific Islander renters to 94.6 percent for non-Hispanic white and other race owners.

OCR for page 103
The 2000 Census: Interim Assessment TABLE 7-8 2000 A.C.E. P-Sample Final Match Codes, and A.C.E and PES Match Rates, by Race/Ethnicity Domain and Housing Tenure (weighted)   Percent Distribution of 2000 P-Sample Final Match Codes P-Sample Match Ratea Domain and Tenure Group Match Non-match Unresolved Removed 2000 A.C.E. 1990 PES American Indian/Alaska Native on Reservation   Owner 82.9 13.2 1.6 2.4 85.43 78.13b Renter 85.6 11.5 1.6 1.3 87.08   American Indian/Alaska Native off Reservation   Owner 88.5 9.2 1.4 0.9 90.19 — Renter 81.2 12.6 4.3 1.9 84.65 — Hispanic Origin   Owner 89.0 8.3 1.7 1.0 90.79 92.81 Renter 81.7 13.2 3.9 1.2 84.48 82.45 Black (Non-Hispanic)   Owner 87.9 8.8 2.3 1.1 90.14 89.65 Renter 80.4 13.7 4.7 1.2 83.67 82.28 Native Hawaiian/Pacific Islander   Owner 85.8 12.2 1.0 1.0 87.36 — Renter 80.3 15.8 2.7 1.2 82.39 — Asian (Non-Hispanic)c   Owner 90.1 6.6 2.3 1.0 92.34 93.71 Renter 84.4 10.8 3.7 1.1 87.33 84.36 White and Other Races (Non-Hispanic)   Owner 93.0 4.9 1.4 0.8 94.60 95.64 Renter 85.5 9.8 3.7 1.0 88.37 88.62 Total 89.5 7.4 2.2 0.9 91.59 92.22 NOTE: First four columns in each row add to 100%;—, not estimated. aMatch rates (matches divided by the sum of matches and unmatches) are after imputation for unresolved residence and match status for the A.C.E. and after imputation of unresolved match status for the PES. bTotal; not available by tenure. c1990 PES match rates include Pacific Islanders. SOURCES: A.C.E. match codes are from tabulations by panel staff of P-sample cases who went through the matching process, weighted using TESFINWT and excluding TES-eligible people not in TES sample block clusters (who have zero TESFINWT), from U.S. Census Bureau, P-Sample Person Dual-System Estimation Output File, February 16, 2001; A.C.E. and PES match rates from Davis (2001:Tables E-2, F-1, F-2). E-Sample Match Codes The distribution of final match codes for the E-sample was 81.7 percent matches, 11.6 percent other correct (nonmatched) enumerations, 4.0 percent erroneous enumerations, and 2.6 percent unresolved. Table 7-9 shows that the percent confirmed correct enumerations (the sum of matches plus other

OCR for page 103
The 2000 Census: Interim Assessment correct enumerations in the first two columns) by domain and tenure ranged from 87.2 percent for black renters to 95.8 percent for non-Hispanic white and other owners. The percent erroneous enumerations ranged from 3 percent for non-Hispanic white and other owners and American Indian/Alaska Native on reservation renters to 7 percent for black renters, and the percent unresolved ranged from 1.2 percent for non-Hispanic white and other race owners to about 6 percent for Hispanic and black renters. After imputation for enumeration status, the overall E-sample correct enumeration rate (matches and other correct enumerations divided by those groups plus erroneous enumerations) was 95.3 percent. The correct enumeration rate ranged from 91.2 percent for non-Hispanic black renters to 96.7 percent for non-Hispanic white and other race owners. Comparisons with 1990 The P-sample match rates are similar for the 2000 A.C.E. and the 1990 PES for the total population and for many race/ethnic domain and housing tenure groups (see Table 7-8). For the total population, the A.C.E. match rate is 0.6 percent lower than the PES rate; for population groups, the A.C.E. match rates are lower than the PES rates for some groups and higher for others. The E-sample correct enumeration rates are also similar between the 2000 A.C.E. and the 1990 PES (see Table 7-9). However, there is a general pattern for the A.C.E. correct enumeration rates to be somewhat higher than the corresponding PES rates. On balance, these patterns have the outcome that the A.C.E. correction ratios (calculated by dividing the correct enumeration rate by the match rate) are higher than the corresponding PES correction ratios. If other things were equal, these results would mean that the A.C.E. measured higher net undercount rates than the PES, but the reverse is true. We explore in Chapter 8 the role of people reinstated in the census (late additions) and people requiring imputation to complete their census records—who could not be included in the A.C.E. process—in explaining the reductions in net undercount from 1990 levels that were measured in A.C.E. GROSS ERRORS Our discussion has focused on net undercount. Some analysts also are interested in the level of gross errors in the census—that is, total omissions and total erroneous enumerations. The A.C.E. is designed to measure net undercount (or net overcount). It measures gross errors but in ways that can be misleading. Many errors that are identified by A.C.E. involve the balancing of a nonmatch on the P-sample side against an erroneous enumeration on the E-sample side—for example, when an E-sample case that should match is misgeocoded. These kinds of balancing errors are not errors for such levels

OCR for page 103
The 2000 Census: Interim Assessment TABLE 7-9 2000 A.C.E. E-Sample Final Match Codes, and 2000 A.C.E. and 1990 PES Correct Enumeration Rates, by Race/Ethnicity Domain and Housing Tenure (weighted)   Percent Distribution of 2000 E-Sample Final Match Codes E-Sample Correct Enumeration Ratea Domain and Tenure Group Match Other Correct Enumeration Erroneous Enumeration Unresolved 2000 A.C.E. 1990 PES American Indian/Alaska Native on Reservation   Owner 77.1 17.1 3.7 2.1 95.65 91.54b Renter 78.3 14.8 3.0 4.0 96.15   American Indian/Alaska Native off Reservation   Owner 81.9 11.7 4.9 1.5 94.56 — Renter 74.1 15.2 5.0 5.7 93.16 — Hispanic Origin   Owner 83.2 11.7 3.3 1.9 96.25 95.56 Renter 71.7 17.0 5.3 6.0 92.79 90.58 Black (Non-Hispanic)   Owner 80.3 12.7 5.2 1.7 94.25 92.84 Renter 68.2 19.0 7.0 5.9 91.16 89.19 Native Hawaiian/Pacific Islander   Owner 83.0 9.8 5.7 1.5 93.79 — Renter 72.7 16.6 6.1 4.6 92.33 — Asian (Non-Hispanic)c   Owner 83.3 11.3 3.8 1.6 95.84 93.13 Renter 72.1 15.3 5.9 6.7 92.45 92.22 White and Other Races (Non-Hispanic)   Owner 86.6 9.2 3.0 1.2 96.70 95.84 Renter 74.9 14.3 5.6 5.2 93.20 92.61 Total 81.7 11.6 4.0 2.6 95.28 94.27 NOTES: First cour columns in each row add to 100%;—, not estimated. aCorrect enumeration (CE) rates (matches and other correct enumerations divided by the sum of matches, other correct enumerations, and erroneous enumerations) are after imputation for unresolved enumeration status. bTotal; not available by tenure. c1990 correct enumeration rates include Pacific Islanders. SOURCES: A.C.E. match codes are from tabulations by panel staff of E-sample cases, weighted using TESFINWT and excluding TES-eligible people not in TES sample block clusters (who have zero TESFINWT), from U.S. Census Bureau, E-Sample Person Dual-System Estimation Output File, February 16, 2001; A.C.E. and PES correct enumeration rates are from Davis (2001:Tables E-2, F-1, F-2).

OCR for page 103
The 2000 Census: Interim Assessment of geography as counties, cities, and even census tracts, although they affect error at the block cluster level. Also, the classification of type of gross error in the A.C.E. is not necessarily clean. For example, A.C.E. will not classify an enumeration of a “snowbird” at the person’s winter residence as duplicating an enumeration for the same person at his or her summer residence because there is no nationwide search. A.C.E. will likely classify duplicate snowbird enumerations as erroneous in the aggregate, but will not label them as duplicates. It is important to take note of gross errors, however, because higher or lower net undercount does not relate directly to the level of gross errors. There can be a zero net undercount and a high rate of gross omissions and gross erroneous enumerations. Hence, for completeness, Table 7-10 shows gross errors in the 2000 A.C.E. and 1990 PES. The total gross errors in the A.C.E. appear to be somewhat reduced in percentage terms from the gross errors in the PES. However, the increased numbers of people requiring imputation and late additions, who may likely have had higher-than-average error rates, cloud the issue, as these people were not part of the E-sample. Also, the sizable differences between the A.C.E. and the PES in the distribution of types of gross erroneous enumerations are puzzling. For example, the A.C.E. estimates proportionately fewer duplicate enumerations than the PES. The Census Bureau is currently studying these discrepancies, which could also be due to the higher numbers of people requiring imputation and late additions who were not included in the A.C.E. processing. CONCLUSIONS On the basis of the evidence now available, we conclude that the A.C.E. was conducted according to well-specified and carefully controlled procedures. We also conclude that it achieved a high degree of quality in such areas as sample design, interviewing, and imputation for missing data. There are several outstanding questions that must be addressed before it will be possible to render a final verdict on the quality of the A.C.E. procedures (see Executive Steering Committee on A.C.E. Policy, 2001b). The major outstanding questions relate to those aspects of the 2000 A.C.E. that differed markedly from the 1990 PES and were relatively untested. First, there is concern that the targeted extended search may not have been balanced (in that the search areas for P-sample and E-sample cases may not have been equivalent) and that the imbalance could have led to incorrect treatment of nonmatched E-sample cases. There is a related concern that balancing error may have occurred because some E-sample cases were coded as correct when they were in fact outside the block cluster or because not all correct enumerations in the block cluster were searched for a match to the P-sample. Second, there is a concern that group quarters enumerations, such as of college students, may not have been handled correctly in the A.C.E. Group quarters residents were

OCR for page 103
The 2000 Census: Interim Assessment TABLE 7-10 Gross Omissions and Erroneous Enumerations, 2000 A.C.E. and 1990 PES   Percent of Weighted E-Sample Estimated Number of People (millions) Erroneous Enumerations 2000 A.C.E. 1990 PES 2000 1990 Total 4.7 5.8 12.5 16.3 (1) Insufficient Information for matching 1.8 1.2 4.8 3.4 (2) Duplicates 0.7 1.6 1.9 4.5 (3) Fictitious 0.3 0.2 0.7 0.6 (4) Geocoding Error 0.2 0.3 0.6 0.8 (5) Other Residence 1.0 2.2 2.7 6.2 (6) Imputed 0.6 0.3 1.8 0.8   2000A.C.E. (counts in millions) 1990 PES (counts in millions) Alternative Estimates of Gross Errors Erroneous Enumerationsa Omissions Erroneous Enumerations Omissions (1) Including All Types of Erroneous Enumerations (EEs) 12.5 15.8 16.3 20.3 (2) Excluding EEs with Insufficient Information to Match and Imputed EEs (EE types (1) and (6) above) 5.9 9.2 11.2 15.2 (3) Excluding EEs excluded in row (2) and also “geocoding errors” and “other residence” (EE types (4) and (5) above)a 3.1 6.4 4.4 8.4 (4) Row (3) plus an allowance for 50 percent duplication among late additions 4.3 7.6 N.A. N.A. NOTES: People with insufficient information who were excluded from the E-sample at the outset are not included in any of these numbers (EE category (1) above comprises additional cases found to lack enough reported data for matching). Gross omissions are calculated by adding net omissions (3.3 million people in 2000; 4 million people in 1990) to gross erroneous enumerations. aThe alternative estimates of erroneous enumerations in 2000 are not consistent with the information on types of erroneous enumerations above. The discrepancy is being investigated with the Census Bureau. SOURCE: Adapted from Anderson and Fienberg (2001a:Tables 2, 3). supposed to be excluded from the A.C.E.; error would occur if, say, enumerations of college students at their parental home were not classified as erroneous. Third, studies of the effect of the PES-C procedure on the estimates of match rates for movers and, more generally, estimates of matching error are not yet available. Finally, additional evaluations are needed to determine if the post-stratification was the most efficient possible and to assess the sensitivity of the A.C.E. results to error from particular sources, such as matching, imputation, and the PES-C procedure used for movers.

OCR for page 103
The 2000 Census: Interim Assessment Overall, the 2000 A.C.E. showed similar, but less pronounced, patterns of net undercount than the 1990 PES. Given that P-sample match rates and E-sample erroneous enumeration rates were similar between the A.C.E. and the 1990 PES, the key question at this time is why the A.C.E. showed a reduced net undercount, overall, and for such groups as Hispanics, non-Hispanic blacks, children, and renters. Because the only other component of the DSE equation is the number of census people with insufficient information to include in the E-sample (IIs), our attempts to resolve the undercount puzzle centered on that component of the census results. In Chapter 8, we analyze distributions of people requiring imputation and people reinstated in the census (late additions) and determine that people requiring imputation largely explain the reduced net undercount in 2000 for historically less well-counted groups.

OCR for page 103
The 2000 Census: Interim Assessment This page in the original is blank.