|
|
||||||||||||||||||||||||||||||||||||
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 103
The 2000 Census: Interim Assessment
7
Accuracy and Coverage Evaluation: Assessment
This chapter presents the panel’s assessment of the Accuracy and Coverage Evaluation (A.C.E.) Program because the A.C.E. is crucial to any assessment of the census itself. We consider nine separate aspects of the A.C.E.:
conduct and timing;
household noninterviews in the P-sample;
imputation for missing characteristics and unresolved residence, match, and enumeration status;
quality of matching;
the targeted extended search;
post-stratification;
variance estimates;
final match codes and rates; and
gross errors.
We end this chapter with our summary assessment of the A.C.E.
CONDUCT AND TIMING
Overall, the A.C.E. appears to have been well executed. Although the sample size was twice as large as that fielded in 1990, the A.C.E. was carried out on schedule and with only minor problems that necessitated rearrangement or modification of operations after they had been specified.1 Some procedures,
1
Mostly, such modifications involved accommodation to changes in the Master Address File (MAF) that occurred in the course of the census. For example, the targeted extended search (TES) procedures had to be modified to handle deletions from and additions to the MAF that were made after the determination of the TES housing unit inventory (Navarro and Olson, 2001:11).
OCR for page 104
The 2000 Census: Interim Assessment
such as telephone interviewing, proved more useful than had been expected. All processes, from sampling through estimation, were carried out according to well-documented specifications, with quality control procedures (e.g., reviews of the work of clerical matchers and field staff) implemented at appropriate junctures.
HOUSEHOLD NONINTERVIEWS IN THE P-SAMPLE
Because the quantity being estimated—the net undercount of the population—is very small relative to the total population (1–2%), it is essential that the P-sample survey meet high standards with regard to the completeness of reporting. A high rate of household noninterviews that required extensive adjustments to the sampling weights would be detrimental to the dual-systems estimation that is the key to the A.C.E. A high rate would not only increase variance, but also likely introduce bias due to the likelihood that nonresponding households differ from responding households in systematic ways that are important for estimation.
Interview/Noninterview Rates
Overall, the A.C.E. obtained interviews from 98.9 percent of households that were occupied on interview day. This figure compares favorably with the 98.4 percent interview rate for the 1990 Post-Enumeration Survey (PES).2 However, the percentage of occupied households as of Census Day that were successfully interviewed in A.C.E. was somewhat lower—97 percent, meaning that a weighting adjustment had to account for the remaining 3 percent of noninterviewed households.
The lower interview rate for Census Day households is due largely to the fact that households that had been occupied entirely by outmovers at the time of the census were harder to interview than other households. This result is not surprising because the new occupants of such households may know nothing of the people who lived there before, and it may not always be possible to interview a knowledgeable neighbor or landlord. The interview rate for outmover households was 81.4 percent. Such households comprised 4 percent of Census Day occupied households in the P-sample.
Noninterview Weighting Adjustments
Two weighting adjustments were calculated so that interviewed households would represent all households that should have been interviewed: one for the A.C.E. interview day and the other for Census Day. Each of the two
2
These percentages are unweighted; they are about the same as weighted percentages. Weighted percentages are not available for 1990.
OCR for page 105
The 2000 Census: Interim Assessment
weighting adjustments was calculated separately for households by type (single-family unit, apartment, other) within each individual block cluster. Mover status was not a factor for reweighting.
For Census Day, what could have been a relatively large noninterview adjustment for outmover households in a block cluster was spread over all interviewed Census Day households in the cluster for each of the three housing types. Consequently, adjustments to the weights for interviewed households were quite low, which had the benefit of minimizing the increase in the variance of A.C.E. estimates due to differences among weights: 52 percent of the weights were not adjusted at all because all occupied households in the adjustment cell were interviewed; for another 45 percent of households, the weighting adjustment was between 1.0 and 1.2 (Cantwell et al., 2001:Table 2; see also “Variance Estimates,” below).
MISSING AND UNRESOLVED DATA
Another important aspect of A.C.E. data quality is the extent of missing and unresolved data in the P-sample and the E-sample and the effectiveness of imputation procedures to supply values for missing and unresolved variables. Understanding the role of imputation necessitates understanding the designation of the E-sample and the treatment of certain cases in the matching.
As noted above, the E-sample excluded whole person imputations in the census, defined as people with only one short-form characteristic (which could be name). Matching was performed on the P-sample and E-sample, using only reported information. During the course of matching, it was determined that some cases lacked enough reported data for matching and follow-up when a more stringent criterion was applied than that used to exclude whole person imputations from the E-sample. Cases in the P-sample and E-sample lacking name and at least two other short-form characteristics could not be matched. Such cases were retained in both the E- and the P-samples; in the E-sample they were coded as erroneous enumerations and in the P-sample they were not yet assigned a final match status.
After all matching and follow-up had been completed, the next step was item imputation. Missing characteristics were imputed separately for each item in the P-sample records (including those records that lacked enough reported data for matching). Imputations for missing characteristics in the E-sample records (including those records that lacked name and at least two other short-form characteristics) were obtained from those on the census data file (see Appendix A). Then, match probabilities and Census Day residence probabilities were imputed for unresolved P-sample cases, including those that were set aside in the matching, and correct enumeration probabilities were imputed for unresolved E-sample cases. E-sample cases set aside in the matching were assigned a correct enumeration probability of zero.
OCR for page 106
The 2000 Census: Interim Assessment
TABLE 7-1 Missing Data Rates for Characteristics, 2000 A.C.E. and 1990 PES P-Sample and E-Sample (weighted)
Percentage of People with Imputed Characteristics
2000 A.C.E.
1990 PES
Characteristic
P-Sample
E-Sample
P-Sample
E-Sample
Age
2.4
2.9
0.7
2.4
Sex
1.7
0.2
0.5
1.0
Race
1.4
3.2
2.5
11.8
Hispanic Origin
2.3
3.4
N.A.
N.A.
Housing Tenure
1.9
3.6
2.3
2.5
Any of Above
5.4
10.4
N.A.
N.A.
NOTES: A.C.E. E-sample imputations were obtained from the imputations performed on the census records; PES E-sample imputations were performed specifically for the E-sample. A.C.E. E-sample “edits” (e.g., assigning age on the basis of the person’s date of birth, or assigning sex from first name) are not counted as imputations here. The base for the A.C.E. P-sample imputation rates includes nonmovers, inmovers, and outmovers, including people who were subsequently removed from the sample as nonresidents on Census Day. Excluded from the base for the A.C.E. P-sample and E-sample imputation rates are people eligible for the targeted extended search who were not selected for the targeted extended search sample and who were treated as noninterviews in the final weighting. N.A., not available.
SOURCE: Cantwell et al. (2001:Tables 3b, 3c).
Missing Characteristics
Extent
Overall, the extent of missing characteristics data in the P-sample and E-sample was low, ranging between 0.2 percent and 3.6 percent for the characteristics age, sex, race, Hispanic origin, and housing tenure. Missing data rates for most characteristics were somewhat higher for the E-sample than for the P-sample. Missing data rates for the 2000 A.C.E. showed no systematic difference (up or down) from the 1990 PES; see Table 7-1.
As would be expected, missing data rates in the P-sample were higher for proxy interviews, in which someone outside the household supplied information, than for interviews with household members; see Table 7-2. By mover status, missing data rates were much higher for outmovers than for nonmovers and inmovers, which is not surprising given that 73.3 percent of interviews for outmovers were obtained from proxies, compared with only 2.9 percent and 4.8 percent of proxy interviews for nonmovers and inmovers, respectively. Even “non-proxy” interviews for outmovers may have been from household members who did not know the outmover.
For the E-sample, one can distinguish mailed back returns from returns obtained by enumerators in nonresponse follow-up, although there is not information on proxy interviews for the latter. Table 7-3 shows that missing data rates were higher for some, but not all, characteristics when the return was obtained in nonresponse follow-up than when the return was mailed back by the household.
OCR for page 107
The 2000 Census: Interim Assessment
TABLE 7-2 Percentage of 2000 A.C.E. P-Sample People with Imputed Characteristics, by Proxy Interview and Mover Status (weighted)
Percentage of People with Imputed Characteristics
Characteristic
Household Interview
Proxy Interview
Nonmover
Inmover
Outmover
Age
2.1
7.9
2.3
2.3
6.0
Sex
1.5
4.2
1.7
0.4
3.4
Race
1.0
8.7
1.2
1.3
8.0
Hispanic Origin
1.8
11.0
2.1
0.8
9.0
Housing Tenure
1.7
5.2
1.9
0.4
2.4
Any of Above
4.4
21.9
5.0
3.7
17.4
Percent of Total
P-Sample
94.3
5.7
91.7
4.8
3.4
NOTES: See notes to Table 7-1.
SOURCE: Cantwell et al. (2001:Table 3b).
Effects of Item Imputation
Because the overall rates of missing data were low, the imputation procedures had little effect on the distribution of individual characteristics (Cantwell et al, 2001:24–26). However, imputation could misclassify people by post-strata and contribute to inconsistent post-strata classification for matching P-sample and E-sample cases (see “Post-Stratification,” below). The reason is because the P-sample and E-sample imputations were performed using somewhat different procedures; also, imputation procedures for the P-sample were carried out separately for each characteristic.3
Unresolved Residence, Match, and Enumeration Status
Residence Status
The weighted percentage of all P-sample nonmover and outmover cases with unresolved Census Day residence status was 2.2 percent, of which 51.7 percent were cases lacking enough reported information for matching. The remaining 48.3 percent of unresolved residence cases were confirmed matches, confirmed nonmatches, and possible matches. After imputation, the percentage of cases estimated to be Census Day residents dropped slightly, from 98.2 percent of resolved cases to 97.9 percent of all cases because the imputation
3
For example, tenure on the P-sample was imputed by using tenure from the previous household of the same type (e.g., single-family home) with tenure reported, while race and ethnicity were imputed when possible from the distribution of race and ethnicity of other household members or from the distribution of race and ethnicity of the previous household with these characteristics reported (see Cantwell et al., 2001).
OCR for page 108
The 2000 Census: Interim Assessment
TABLE 7-3 Percentage of 2000 A.C.E. E-Sample People with Imputed or Edited Characteristics, by Type of Return (weighted)
Percentage of People with Imputed or Edited Characteristics
Characteristic
Mail Return
Enumerator Return
Age
Imputed
1.1
7.0
Edited
1.2
1.9
Sex
Imputed
0.1
0.4
Edited
0.9
1.1
Race
Imputed
3.2
3.2
Edited
0.0
0.0
Hispanic Origin
Imputed
3.5
3.0
Edited
0.3
0.4
Housing Tenure
Imputed
2.2
6.8
Edited
0.5
0.8
Any of Above
Imputed
8.5
14.7
Imputed or edited or both
10.9
18.1
Percent of Total E-Sample
69.3
28.0
NOTES: Mail returns are those obtained before the April 18, 2000, cutoff to begin nonresponse follow-up (NRFU). Enumerator returns are those obtained during NRFU. The table excludes 2.7 percent of total E-sample (e.g., list/enumerate, rural update/enumerate, urban/update enumerate, late mail returns).
SOURCE: Tabulations by panel staff of U.S. Census Bureau, E-Sample Person Dual-System Estimation Output File, February 16, 2001; tabulations weighted using TESFINWT (see notes to Table 7-7).
procedure assigned lower residence probabilities to unresolved cases (77.4 percent overall; this figure is a correction from the original number in Cantwell et al., 2001:Table 8).4
To impute a residence probability, the Census Bureau classified resolved and unresolved cases by match status follow-up group, race, and tenure. The eight match status groups discriminated well: for example, residence probabilities were very low for potentially fictitious people or people said to be living elsewhere on Census Day (14%);5 moderate for college and military age children in partially matched households (84%); and very high for cases resolved
4
One would not expect there to be confirmed non-Census Day residents or unresolved cases among nonmovers and outmovers; however, it could happen because mover status was assigned prior to field follow-up work.
5
Fictitious people are those for whom it seems clear that the data were fabricated by the respondent or enumerator (e.g., a return for Mickey Mouse.)
OCR for page 109
The 2000 Census: Interim Assessment
before follow-up (99%). The addition of race and tenure to the imputation cells did not capture much additional variability in the probability of Census Day residence (Cantwell et al., 2001:Table 8). The residence probabilities assigned to people without enough reported data for matching—84 percent overall—were based on the average of the probabilities for people in the other match status groups within each race and tenure category.
Match Status
The weighted percentage of P-sample cases with unresolved match status was only 1.2 percent.6 This percentage compares favorably with the 1.8 percent of cases with unresolved match status in the 1990 PES. Very little was known about the A.C.E. P-sample people with unresolved match status; 98 percent of them lacked enough reported data for matching (i.e., they lacked a valid name or at least two characteristics or both).
After imputation, the percentage of matches dropped slightly, from 91.7 percent of resolved cases (matches and nonmatches) to 91.6 percent of all cases because the imputation procedure assigned lower match status probabilities to unresolved cases (84.3% overall). To impute a match status probability, the Census Bureau classified resolved and unresolved cases by mover status (nonmover, outmover), whether the person’s housing unit did or did not match, and whether the person had one or more characteristics imputed or edited. These categories discriminated well: the probability of a match for nonmovers was 92 percent overall, compared with only 76 percent for outmovers overall. The lowest match probability was 52 percent for outmovers when the housing unit did not match; the highest match probability was 95 percent for nonmovers when the housing unit matched and the person had no imputed characteristics (Cantwell et al, 2001:Table 9).
Enumeration Status
The weighted percentage of E-sample cases with unresolved enumeration status was 2.6 percent, slightly higher than the comparable 2.3 percent for the 1990 PES. Most of the unresolved cases (89.4%) were nonmatches for which field follow-up could not resolve their status as a correct or erroneous enumeration; the remainder were matched cases for which field follow-up could not resolve their residence status, possible matches, and cases for which the location of the housing unit was not clear.
After imputation, the percentage of correct enumerations dropped slightly, from 95.5 percent of resolved cases (correct and erroneous enumerations) to
6
The denominator for the percentage is P-sample nonmovers and outmovers who were confirmed Census Day residents or had unresolved residence status; confirmed non-Census Day residents were dropped from the P-sample at this point.
OCR for page 110
The 2000 Census: Interim Assessment
95.3 percent of all cases because the imputation procedure assigned lower correct enumeration probabilities to unresolved cases (76.2% overall). To impute a correct enumeration status probability, the Census Bureau classified resolved and unresolved cases by match status group, whether the person had one or more imputed characteristics, and race (for some match status groups). The 12 match status groups discriminated well: for example, correct enumeration probabilities were very low for potentially fictitious people (6%) and people said to be living elsewhere on Census Day (23%); moderate for college and military age children in partially matched households (88%); and very high for cases resolved before follow-up (99%). The addition of race and whether the person had imputed characteristics did not capture much additional variability in the probability of correct enumeration (Cantwell et al., 2001:Table 10).
QUALITY OF MATCHING
Although the rates of unresolved match status and enumeration status were low, there remains a question about the accuracy of the classification of match and enumeration status for cases that were “resolved” before imputation. The accuracy of the matching and associated follow-up process is critical to dual-systems estimation (DSE).
That accuracy is critical to distinguish the proportion of P-sample people who match a census record from the proportion who genuinely exist but were not enumerated in the census. If some of the nonmatched people should have been matched or should have been removed from the P-sample because they were fictitious or not a resident at the P-sample address on Census Day or for some other reason, then the estimated match rate will be too low and the estimate of the DSE will be too high.
That accuracy is also critical to distinguish the proportion of E-sample people who were correctly counted (including matches and correct nonmatches) from the proportion who were enumerated erroneously because they were duplicate, fictitious, or for some other reason. If some cases who were classified as correct (nonmatched) enumerations were in fact erroneous, then the estimated correct enumeration rate will be too high and the estimate of the DSE will be too high.
It is not possible to assess the reliability of assignment of the final match codes until the Census Bureau publishes results from evaluation studies that involve rematching and verifying samples of A.C.E. records (see Executive Steering Committee on A.C.E. Policy, 2001b). The Bureau is also looking at possible errors in assigning correct or erroneous enumeration status to E-sample cases due to the operation of the targeted extended search and the treatment of group quarters residents who should have been excluded from the sample.
OCR for page 111
The 2000 Census: Interim Assessment
Rematching studies for 1990 found some degree of clerical matching error, although analysts disagreed on its importance (National Research Council, 1999b:70–75). The results for 2000 are not yet known. The Bureau believed that the accuracy of matching would improve through greater computerization of the process and other steps in 2000, compared with 1990. The results of quality assurance operations during the matching and follow-up interviewing indicated that relatively little error was identified in assigning match and enumeration status codes (see Childers et al., 2001). Nonetheless, the degree of matching error remains to be established. As indirect indicators of the quality of the matching, we examined specific match codes and how they related to the various steps in the process.
Extent of Checking Required to Confirm Final Match Code
We looked first at final match codes and asked what proportion of the cases in each category were confirmed at the conclusion of computer matching, at the conclusion of clerical matching, or not until after field follow-up.
Confirmed Matches
Table 7-4 shows that 80.3 percent of final confirmed P-sample matches were designated as a match by the computer and did not require follow-up in the field (last row, column 1). Another 18 percent of final confirmed matches were declared a match by clerks, technicians, or analysts and did not require a field check (last row, columns 2, 3, 4). Only 1 percent of final confirmed matches were declared a match only after confirmation of their Census Day residence status in the field (column 5); only 0.8 percent of final confirmed matches were declared a match only after confirmation of their match and residence status in the field (column 6). Similar results obtained for the E-sample (not shown).
By domain and tenure group, the percentage of final confirmed matches that were declared a match by computer varied from 65 percent to 84 percent, perhaps due to difficulties with names. However, there was relatively little variation in the percentage of final confirmed matches that did not require confirmation of residence or match status in the field (97.0% to 99.2%). Given the standards for computer and clerical matching, these results suggest that one can have a high degree of confidence about the designation of a matched case.7
7
The cutoff probability score for a computer match was set high enough, based on previous research, so that false computer matches would almost never occur.
OCR for page 112
The 2000 Census: Interim Assessment
TABLE 7-4 Percentage of 2000 A.C.E. P-Sample Matches to Census Enumerations, by Source of Final Match Code Assignment, Race/Ethnicity Domain, and Housing Tenure (weighted)
No Field Check Needed
Field Check Needed for
Computer M
Computer P, Clerk M
Computer NM, Clerk M
Other Final M
Residence Status
Match and Residence
Percent Total Matches
Domain and Tenure Group
(1)
(2)
(3)
(4)
(5)
(6)
(7)
American Indian/Alaska Native on Reservation
Owner
80.7
10.4
6.4
1.7
0.4
0.4
0.1
Renter
82.1
12.5
2.9
1.8
0.4
0.4
0.1
American Indian/Alaska Native off Reservation
Owner
78.8
10.4
8.4
1.1
0.8
0.5
0.3
Renter
77.8
9.9
8.3
1.8
0.7
1.6
0.2
Hispanic Origin
Owner
76.4
12.5
8.1
1.1
1.1
0.9
5.8
Renter
68.1
14.6
12.5
1.8
1.2
1.7
5.9
Black (Non-Hispanic)
Owner
77.7
12.1
6.7
1.4
1.1
1.0
5.7
Renter
71.2
12.9
10.9
2.0
1.4
1.5
5.1
Native Hawaiian/Pacific Islander
Owner
76.9
12.1
6.4
2.4
0.8
1.4
0.1
Renter
64.6
15.2
14.9
3.6
0.6
1.0
0.1
Asian
Owner
72.9
14.3
8.9
1.4
1.0
1.4
2.1
Renter
66.7
16.1
12.4
1.7
1.2
1.8
1.2
White and Other Race (Non-Hispanic)
Owner
84.2
8.2
5.7
0.6
0.9
0.4
57.5
Renter
77.9
10.3
8.6
1.0
1.2
1.0
15.8
Total
80.3
9.8
7.2
0.9
1.0
0.8
100.0
NOTES: Columns (1)–(6) in each row add to 100%; Column (7), reading down, adds to 100%. M: match; P: possible match; NM: nonmatch (confirmed Census Day resident).
SOURCE: Tabulations by panel staff of P-sample cases that went through matching, from U.S. Census Bureau, P-Sample Person Dual-System Estimation Output File, February 16, 2001. Tabulations weighted using TESFINWT; exclude TES-eligible people not in TES sample block clusters, who have zero TESFINWT.
OCR for page 113
The 2000 Census: Interim Assessment
Confirmed P-Sample Nonmatches
Assignment of confirmed nonmatch status was always based on a field check for certain types of P-sample cases (see Appendix C), amounting to 50.4 percent of the total confirmed P-sample nonmatches. There was relatively little variation in this percentage for most race/ethnicity domain and tenure groups (data not shown), although 69 percent of final confirmed nonmatches for American Indians and Alaska Natives were not declared a nonmatch until after being checked in the field, compared with only 47 percent for non-Hispanic whites and other races. How many nonmatches were correctly assigned and how many should have been identified as either matches or cases to be dropped from the P-sample (e.g., fictitious cases or people residing elsewhere on Census Day) will not be known until the Census Bureau completes its studies of matching error.
Confirmed E-Sample Correct (Nonmatched) or Erroneous Enumerations
On the E-sample side, assignment of a final code as a correct (nonmatched) enumeration was always based on a field check. Of final erroneous enumerations (4% of the total E-sample), 35 percent were declared on the basis of a field check, while 65 percent were identified by clerks as duplicates or not enough reported data and did not require confirmation in the field.
Unresolved Cases
As noted above, the E-sample had a higher percentage of cases that could not be resolved after field checking than did the P-sample: 2.6 percent and 2.2 percent, respectively. Moreover, 52.2 percent of the unresolved P-sample cases were those coded by the computer or clerks as not having enough reported data for matching. These cases were not field checked but had their residence or match status imputed.
Extent of Reassignment of Match Codes
Another cut at the issue of matching quality is how often one stage of matching changed the code assigned in an earlier stage of matching. Table 7-5 shows that such changes happened quite infrequently. Thus (see Panel A), 99.9 percent and 99.7 percent of confirmed matches assigned by the computer for the P-sample and the E-sample, respectively, remained as such in the final coding. Also, 93 percent of computer possible matches in both the P-sample and the E-sample were confirmed as such without the need for field followup; another 5.5–5.7 percent were confirmed as a match (or, in the case of the E-sample, as a nonmatched correct enumeration) in the field. Only 1.3–1.5
OCR for page 120
The 2000 Census: Interim Assessment
TABLE 7-6 2000 A.C.E. Matched P-Sample and E-Sample Cases: Consistency of Race/Ethnicity Post-Stratification Domain (unweighted)
E-Sample
P-Sample
Race/Ethnicity Domain
Domain 1
Domain 2
Domain 3
Domain 4
Domain 5
Domain 6
Domain 7
Total
% Inconsistent
P-Sample
American Indian or Alaska Native on Reservations (Domain 1)
11,009
0
34
12
0
0
118
11,173
1.5
American Indian or Alaska Native off Reservations (Domain 2)
0
2,223
59
104
0
30
793
3,209
30.7
Hispanic Origin (Domain 3)
44
136
67,985
610
42
267
4,004
73,088
7.0
Non-Hispanic Black (Domain 4)
10
119
496
65,679
6
118
1,423
67,851
3.2
Native Hawaiian or Pacific Islander (Domain 5)
0
3
31
19
1,671
204
177
2,105
20.6
Asian (Domain 6)
1
31
107
102
143
19,679
1,062
21,125
6.8
Non-Hispanic White or Other Race (Domain 7)
107
944
5,041
2,589
183
2,105
360,125
371,094
3.0
E-Sample
Total
11,171
3,456
73,753
69,115
2,045
22,403
367,702
549,645
% Inconsistent
1.5
35.7
7.8
5.0
18.3
12.2
2.1
3.9
NOTE: See Table 6-2 for definitions of domains.
SOURCE: Farber (2001a:Table A-3).
OCR for page 121
The 2000 Census: Interim Assessment
been had there been no inconsistency. However, the coverage correction factor would have been lower yet for American Indians and Alaska Natives off reservations if they had been merged with the non-Hispanic white and other races stratum. The reverse flow of American Indians and Alaska Natives identifying themselves as non-Hispanic whites or other races had virtually no effect on the coverage correction factor for the latter group, given its much larger proportion of the population.
VARIANCE ESTIMATES
Overall, the A.C.E. was expected to have smaller variances due to sampling error and other sources than the 1990 PES, and that expectation was borne out. The coefficient of variation for the estimated coverage correction factor for the total population was reduced from 0.2 percent in 1990 to 0.14 percent in 2000 (a reduction of 30%). The coefficients of variation for the coverage correction factors for Hispanics and non-Hispanic blacks were reduced from 0.82 percent and 0.55 percent, respectively, to 0.38 percent and 0.40 percent, respectively (Davis, 2001:Tables E-1, F-1). However, the coefficients of variation for coverage correction factors were as high as 6 percent for particular post-strata, which translates into a very large confidence interval around the estimate of the net undercount.10
The overall coefficient of variation was expected to be reduced by about 25 percent due to the larger sample size of the A.C.E., almost double that of the 1990 PES. In addition, better measures of population size were available during the selection of the A.C.E. block clusters than during the selection of PES clusters, and the A.C.E. sampling weights were less variable than the PES sampling weights. The 2000 TES was much better targeted and thereby more efficient than the similar operation in 1990. Overall, TES was expected to reduce the variance of the DSE, although the 2000 TES also contributed somewhat to an increase in sampling error.
Looking at size and variation in weights, Table 7-7 shows the changes in the P-sample weights, from the initial weighting that accounted for differential sampling probabilities to the intermediate weights that included household noninterview adjustments to the final weights that accounted for TES sampling. (The table also shows the distribution of E-sample initial and final weights.) At the outset, 90 percent of the initial P-sample weights were between 48 and 654 and the highest and lowest weights were 9 and 1,288; the distribution did not differ by mover status. After the household noninterview adjustment for Census Day, 90 percent of the weights were between 49 and
10
The variance estimates developed by the Census Bureau likely underestimate the true variance, but the extent of underestimation is not known. The variance estimation excludes some minor sources of error (specifically, the large block subsampling and the P-sample noninterview adjustment). It also excludes most sources of nonsampling error (see Appendix C).
OCR for page 122
The 2000 Census: Interim Assessment
TABLE 7-7 Distribution of Initial, Intermediate, and Final Weights, 2000 A.C.E. P-Sample and E-Sample
Percentile of Weight Distribution
Sample and Mover Status
Number of Non-Zeros
0
1
5
10
25
50
75
90
95
99
100
P-Sample
Initial Weighta
Total
721,734
9
21
48
75
249
352
574
647
654
661
1,288
Nonmovers
631,914
9
21
48
76
253
366
575
647
654
661
1,288
Outmovers
24,158
9
21
48
69
226
348
541
647
654
661
1,288
Inmovers
36,623
9
21
47
67
212
343
530
647
654
661
1,288
Intermediate Weightb
Total with Census Day Weight
712,442
9
22
49
78
253
379
577
654
674
733
1,619
Total with Interview Day Weight
721,426
9
21
48
76
249
366
576
651
660
705
1,701
Final Weightc
Census Day Weight
Total
640,795
9
22
50
83
273
382
581
654
678
765
5,858
Nonmovers
617,390
9
22
50
83
274
382
581
654
678
762
5,858
Outmovers
23,405
9
23
50
77
240
363
577
655
682
798
3,847
Inmovers
36,623
9
21
47
67
214
345
530
651
656
705
1,288
E-Sample
Initial Weightd
712,900
9
21
39
55
212
349
564
647
654
661
2,801
Final Weighte
704,602
9
21
39
56
217
349
567
647
654
700
4,009
aP-sample initial weight, PWGHT, reflects sampling through large block subsampling; total includes removed cases
bP-sample intermediate weight, NIWGT, reflects household noninterview adjustment for Census Day; NIWGTI reflects household noninterview adjustment for A.C.E. interview day
cP-sample final weight, TESFINWT, for confirmed Census Day residents, total, nonmovers, and outmovers (reflects targeted extended search sampling); NIWGTI for inmovers
dE-sample initial weight, EWGHT, reflects sampling through large block subsampling
eE-sample final weight, TESFINWT, reflects targeted extended search sampling
SOURCE: Tabulations by panel staff of U.S. Census Bureau, P-Sample and E-Sample Person Dual-System Estimation Output Files, February 16, 2001.
OCR for page 123
The 2000 Census: Interim Assessment
674, and the highest and lowest weights were 9 and 1,619. After the TES adjustment, 90 percent of the final weights for confirmed Census Day residents were between 50 and 678, and the highest and lowest weights were 9 and 5,858 (the variation in weights was less for outmovers than nonmovers). For inmovers, there was relatively little difference between the initial sampling weights and the final weights adjusted for household noninterview on the P-sample interview day.
While the variations in final weights for the A.C.E. P-sample (and E-sample) were not small, they were considerably less than the variations in final weights for the 1990 PES. In 1990, some P-sample weights were more than 20,000, and 28 percent of the weights exceeded 700, compared with only 5 percent in the A.C.E.
FINAL MATCH CODES AND RATES
Having examined individual features of the A.C.E., we next looked at the distribution of final match codes and rates for the P-sample and E-sample. We wanted to get an overall sense of the reasonableness of the results for key population groups and in comparison with 1990.
Final Match and Enumeration Status
P-Sample Match Codes
The distribution of final match codes for the P-sample was 89.5 percent confirmed match, 7.4 percent confirmed nonmatch, 2.2 percent match or residence status unresolved, and 0.9 percent not a Census Day resident or removed for another reason (e.g., a fictitious or duplicate P-sample case). Table 7-8 shows that the percent confirmed matches by domain and tenure varied from 80 percent for black and Native Hawaiian and Pacific Islander renters to 93 percent for non-Hispanic white and other race owners; conversely, the confirmed nonmatches varied from 15.8 percent for Native Hawaiian and Pacific Islander renters to 4.9 percent for non-Hispanic white and other race owners. Those groups with higher percentages of nonmatched cases also tended to have higher percentages of unresolved cases: they varied from 1 percent for Native Hawaiian and Pacific Islander owners to 4.7 percent for black renters.
After imputation of residence and match status, the overall P-sample match rate (matches divided by matches plus nonmatches) was 91.6 percent. The match rate ranged from 82.4 percent for Native Hawaiian and Pacific Islander renters to 94.6 percent for non-Hispanic white and other race owners.
OCR for page 124
The 2000 Census: Interim Assessment
TABLE 7-8 2000 A.C.E. P-Sample Final Match Codes, and A.C.E and PES Match Rates, by Race/Ethnicity Domain and Housing Tenure (weighted)
Percent Distribution of 2000 P-Sample Final Match Codes
P-Sample Match Ratea
Domain and Tenure Group
Match
Non-match
Unresolved
Removed
2000 A.C.E.
1990 PES
American Indian/Alaska Native on Reservation
Owner
82.9
13.2
1.6
2.4
85.43
78.13b
Renter
85.6
11.5
1.6
1.3
87.08
American Indian/Alaska Native off Reservation
Owner
88.5
9.2
1.4
0.9
90.19
—
Renter
81.2
12.6
4.3
1.9
84.65
—
Hispanic Origin
Owner
89.0
8.3
1.7
1.0
90.79
92.81
Renter
81.7
13.2
3.9
1.2
84.48
82.45
Black (Non-Hispanic)
Owner
87.9
8.8
2.3
1.1
90.14
89.65
Renter
80.4
13.7
4.7
1.2
83.67
82.28
Native Hawaiian/Pacific Islander
Owner
85.8
12.2
1.0
1.0
87.36
—
Renter
80.3
15.8
2.7
1.2
82.39
—
Asian (Non-Hispanic)c
Owner
90.1
6.6
2.3
1.0
92.34
93.71
Renter
84.4
10.8
3.7
1.1
87.33
84.36
White and Other Races (Non-Hispanic)
Owner
93.0
4.9
1.4
0.8
94.60
95.64
Renter
85.5
9.8
3.7
1.0
88.37
88.62
Total
89.5
7.4
2.2
0.9
91.59
92.22
NOTE: First four columns in each row add to 100%;—, not estimated.
aMatch rates (matches divided by the sum of matches and unmatches) are after imputation for unresolved residence and match status for the A.C.E. and after imputation of unresolved match status for the PES.
bTotal; not available by tenure.
c1990 PES match rates include Pacific Islanders.
SOURCES: A.C.E. match codes are from tabulations by panel staff of P-sample cases who went through the matching process, weighted using TESFINWT and excluding TES-eligible people not in TES sample block clusters (who have zero TESFINWT), from U.S. Census Bureau, P-Sample Person Dual-System Estimation Output File, February 16, 2001; A.C.E. and PES match rates from Davis (2001:Tables E-2, F-1, F-2).
E-Sample Match Codes
The distribution of final match codes for the E-sample was 81.7 percent matches, 11.6 percent other correct (nonmatched) enumerations, 4.0 percent erroneous enumerations, and 2.6 percent unresolved. Table 7-9 shows that the percent confirmed correct enumerations (the sum of matches plus other
OCR for page 125
The 2000 Census: Interim Assessment
correct enumerations in the first two columns) by domain and tenure ranged from 87.2 percent for black renters to 95.8 percent for non-Hispanic white and other owners. The percent erroneous enumerations ranged from 3 percent for non-Hispanic white and other owners and American Indian/Alaska Native on reservation renters to 7 percent for black renters, and the percent unresolved ranged from 1.2 percent for non-Hispanic white and other race owners to about 6 percent for Hispanic and black renters.
After imputation for enumeration status, the overall E-sample correct enumeration rate (matches and other correct enumerations divided by those groups plus erroneous enumerations) was 95.3 percent. The correct enumeration rate ranged from 91.2 percent for non-Hispanic black renters to 96.7 percent for non-Hispanic white and other race owners.
Comparisons with 1990
The P-sample match rates are similar for the 2000 A.C.E. and the 1990 PES for the total population and for many race/ethnic domain and housing tenure groups (see Table 7-8). For the total population, the A.C.E. match rate is 0.6 percent lower than the PES rate; for population groups, the A.C.E. match rates are lower than the PES rates for some groups and higher for others. The E-sample correct enumeration rates are also similar between the 2000 A.C.E. and the 1990 PES (see Table 7-9). However, there is a general pattern for the A.C.E. correct enumeration rates to be somewhat higher than the corresponding PES rates. On balance, these patterns have the outcome that the A.C.E. correction ratios (calculated by dividing the correct enumeration rate by the match rate) are higher than the corresponding PES correction ratios. If other things were equal, these results would mean that the A.C.E. measured higher net undercount rates than the PES, but the reverse is true. We explore in Chapter 8 the role of people reinstated in the census (late additions) and people requiring imputation to complete their census records—who could not be included in the A.C.E. process—in explaining the reductions in net undercount from 1990 levels that were measured in A.C.E.
GROSS ERRORS
Our discussion has focused on net undercount. Some analysts also are interested in the level of gross errors in the census—that is, total omissions and total erroneous enumerations. The A.C.E. is designed to measure net undercount (or net overcount). It measures gross errors but in ways that can be misleading. Many errors that are identified by A.C.E. involve the balancing of a nonmatch on the P-sample side against an erroneous enumeration on the E-sample side—for example, when an E-sample case that should match is misgeocoded. These kinds of balancing errors are not errors for such levels
OCR for page 126
The 2000 Census: Interim Assessment
TABLE 7-9 2000 A.C.E. E-Sample Final Match Codes, and 2000 A.C.E. and 1990 PES Correct Enumeration Rates, by Race/Ethnicity Domain and Housing Tenure (weighted)
Percent Distribution of 2000 E-Sample Final Match Codes
E-Sample Correct Enumeration Ratea
Domain and Tenure Group
Match
Other Correct Enumeration
Erroneous Enumeration
Unresolved
2000 A.C.E.
1990 PES
American Indian/Alaska Native on Reservation
Owner
77.1
17.1
3.7
2.1
95.65
91.54b
Renter
78.3
14.8
3.0
4.0
96.15
American Indian/Alaska Native off Reservation
Owner
81.9
11.7
4.9
1.5
94.56
—
Renter
74.1
15.2
5.0
5.7
93.16
—
Hispanic Origin
Owner
83.2
11.7
3.3
1.9
96.25
95.56
Renter
71.7
17.0
5.3
6.0
92.79
90.58
Black (Non-Hispanic)
Owner
80.3
12.7
5.2
1.7
94.25
92.84
Renter
68.2
19.0
7.0
5.9
91.16
89.19
Native Hawaiian/Pacific Islander
Owner
83.0
9.8
5.7
1.5
93.79
—
Renter
72.7
16.6
6.1
4.6
92.33
—
Asian (Non-Hispanic)c
Owner
83.3
11.3
3.8
1.6
95.84
93.13
Renter
72.1
15.3
5.9
6.7
92.45
92.22
White and Other Races (Non-Hispanic)
Owner
86.6
9.2
3.0
1.2
96.70
95.84
Renter
74.9
14.3
5.6
5.2
93.20
92.61
Total
81.7
11.6
4.0
2.6
95.28
94.27
NOTES: First cour columns in each row add to 100%;—, not estimated.
aCorrect enumeration (CE) rates (matches and other correct enumerations divided by the sum of matches, other correct enumerations, and erroneous enumerations) are after imputation for unresolved enumeration status.
bTotal; not available by tenure.
c1990 correct enumeration rates include Pacific Islanders.
SOURCES: A.C.E. match codes are from tabulations by panel staff of E-sample cases, weighted using TESFINWT and excluding TES-eligible people not in TES sample block clusters (who have zero TESFINWT), from U.S. Census Bureau, E-Sample Person Dual-System Estimation Output File, February 16, 2001; A.C.E. and PES correct enumeration rates are from Davis (2001:Tables E-2, F-1, F-2).
OCR for page 127
The 2000 Census: Interim Assessment
of geography as counties, cities, and even census tracts, although they affect error at the block cluster level. Also, the classification of type of gross error in the A.C.E. is not necessarily clean. For example, A.C.E. will not classify an enumeration of a “snowbird” at the person’s winter residence as duplicating an enumeration for the same person at his or her summer residence because there is no nationwide search. A.C.E. will likely classify duplicate snowbird enumerations as erroneous in the aggregate, but will not label them as duplicates.
It is important to take note of gross errors, however, because higher or lower net undercount does not relate directly to the level of gross errors. There can be a zero net undercount and a high rate of gross omissions and gross erroneous enumerations. Hence, for completeness, Table 7-10 shows gross errors in the 2000 A.C.E. and 1990 PES. The total gross errors in the A.C.E. appear to be somewhat reduced in percentage terms from the gross errors in the PES. However, the increased numbers of people requiring imputation and late additions, who may likely have had higher-than-average error rates, cloud the issue, as these people were not part of the E-sample. Also, the sizable differences between the A.C.E. and the PES in the distribution of types of gross erroneous enumerations are puzzling. For example, the A.C.E. estimates proportionately fewer duplicate enumerations than the PES. The Census Bureau is currently studying these discrepancies, which could also be due to the higher numbers of people requiring imputation and late additions who were not included in the A.C.E. processing.
CONCLUSIONS
On the basis of the evidence now available, we conclude that the A.C.E. was conducted according to well-specified and carefully controlled procedures. We also conclude that it achieved a high degree of quality in such areas as sample design, interviewing, and imputation for missing data.
There are several outstanding questions that must be addressed before it will be possible to render a final verdict on the quality of the A.C.E. procedures (see Executive Steering Committee on A.C.E. Policy, 2001b). The major outstanding questions relate to those aspects of the 2000 A.C.E. that differed markedly from the 1990 PES and were relatively untested. First, there is concern that the targeted extended search may not have been balanced (in that the search areas for P-sample and E-sample cases may not have been equivalent) and that the imbalance could have led to incorrect treatment of nonmatched E-sample cases. There is a related concern that balancing error may have occurred because some E-sample cases were coded as correct when they were in fact outside the block cluster or because not all correct enumerations in the block cluster were searched for a match to the P-sample. Second, there is a concern that group quarters enumerations, such as of college students, may not have been handled correctly in the A.C.E. Group quarters residents were
OCR for page 128
The 2000 Census: Interim Assessment
TABLE 7-10 Gross Omissions and Erroneous Enumerations, 2000 A.C.E. and 1990 PES
Percent of Weighted E-Sample
Estimated Number of People (millions)
Erroneous Enumerations
2000 A.C.E.
1990 PES
2000
1990
Total
4.7
5.8
12.5
16.3
(1) Insufficient Information for matching
1.8
1.2
4.8
3.4
(2) Duplicates
0.7
1.6
1.9
4.5
(3) Fictitious
0.3
0.2
0.7
0.6
(4) Geocoding Error
0.2
0.3
0.6
0.8
(5) Other Residence
1.0
2.2
2.7
6.2
(6) Imputed
0.6
0.3
1.8
0.8
2000A.C.E. (counts in millions)
1990 PES (counts in millions)
Alternative Estimates of Gross Errors
Erroneous Enumerationsa
Omissions
Erroneous Enumerations
Omissions
(1) Including All Types of Erroneous Enumerations (EEs)
12.5
15.8
16.3
20.3
(2) Excluding EEs with Insufficient Information to Match and Imputed EEs (EE types (1) and (6) above)
5.9
9.2
11.2
15.2
(3) Excluding EEs excluded in row (2) and also “geocoding errors” and “other residence” (EE types (4) and (5) above)a
3.1
6.4
4.4
8.4
(4) Row (3) plus an allowance for 50 percent duplication among late additions
4.3
7.6
N.A.
N.A.
NOTES: People with insufficient information who were excluded from the E-sample at the outset are not included in any of these numbers (EE category (1) above comprises additional cases found to lack enough reported data for matching). Gross omissions are calculated by adding net omissions (3.3 million people in 2000; 4 million people in 1990) to gross erroneous enumerations.
aThe alternative estimates of erroneous enumerations in 2000 are not consistent with the information on types of erroneous enumerations above. The discrepancy is being investigated with the Census Bureau.
SOURCE: Adapted from Anderson and Fienberg (2001a:Tables 2, 3).
supposed to be excluded from the A.C.E.; error would occur if, say, enumerations of college students at their parental home were not classified as erroneous. Third, studies of the effect of the PES-C procedure on the estimates of match rates for movers and, more generally, estimates of matching error are not yet available. Finally, additional evaluations are needed to determine if the post-stratification was the most efficient possible and to assess the sensitivity of the A.C.E. results to error from particular sources, such as matching, imputation, and the PES-C procedure used for movers.
OCR for page 129
The 2000 Census: Interim Assessment
Overall, the 2000 A.C.E. showed similar, but less pronounced, patterns of net undercount than the 1990 PES. Given that P-sample match rates and E-sample erroneous enumeration rates were similar between the A.C.E. and the 1990 PES, the key question at this time is why the A.C.E. showed a reduced net undercount, overall, and for such groups as Hispanics, non-Hispanic blacks, children, and renters. Because the only other component of the DSE equation is the number of census people with insufficient information to include in the E-sample (IIs), our attempts to resolve the undercount puzzle centered on that component of the census results. In Chapter 8, we analyze distributions of people requiring imputation and people reinstated in the census (late additions) and determine that people requiring imputation largely explain the reduced net undercount in 2000 for historically less well-counted groups.
OCR for page 130
The 2000 Census: Interim Assessment
This page in the original is blank.
Representative terms from entire chapter:
census day