National Academies Press: OpenBook

The 2000 Census: Interim Assessment (2001)

Chapter: 7. Accuracy and Coverage Evaluation: Assessment

« Previous: 6. Accuracy and Coverage Evaluation: Overview
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

7
Accuracy and Coverage Evaluation: Assessment

This chapter presents the panel’s assessment of the Accuracy and Coverage Evaluation (A.C.E.) Program because the A.C.E. is crucial to any assessment of the census itself. We consider nine separate aspects of the A.C.E.:

  • conduct and timing;

  • household noninterviews in the P-sample;

  • imputation for missing characteristics and unresolved residence, match, and enumeration status;

  • quality of matching;

  • the targeted extended search;

  • post-stratification;

  • variance estimates;

  • final match codes and rates; and

  • gross errors.

We end this chapter with our summary assessment of the A.C.E.

CONDUCT AND TIMING

Overall, the A.C.E. appears to have been well executed. Although the sample size was twice as large as that fielded in 1990, the A.C.E. was carried out on schedule and with only minor problems that necessitated rearrangement or modification of operations after they had been specified.1 Some procedures,

1  

Mostly, such modifications involved accommodation to changes in the Master Address File (MAF) that occurred in the course of the census. For example, the targeted extended search (TES) procedures had to be modified to handle deletions from and additions to the MAF that were made after the determination of the TES housing unit inventory (Navarro and Olson, 2001:11).

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

such as telephone interviewing, proved more useful than had been expected. All processes, from sampling through estimation, were carried out according to well-documented specifications, with quality control procedures (e.g., reviews of the work of clerical matchers and field staff) implemented at appropriate junctures.

HOUSEHOLD NONINTERVIEWS IN THE P-SAMPLE

Because the quantity being estimated—the net undercount of the population—is very small relative to the total population (1–2%), it is essential that the P-sample survey meet high standards with regard to the completeness of reporting. A high rate of household noninterviews that required extensive adjustments to the sampling weights would be detrimental to the dual-systems estimation that is the key to the A.C.E. A high rate would not only increase variance, but also likely introduce bias due to the likelihood that nonresponding households differ from responding households in systematic ways that are important for estimation.

Interview/Noninterview Rates

Overall, the A.C.E. obtained interviews from 98.9 percent of households that were occupied on interview day. This figure compares favorably with the 98.4 percent interview rate for the 1990 Post-Enumeration Survey (PES).2 However, the percentage of occupied households as of Census Day that were successfully interviewed in A.C.E. was somewhat lower—97 percent, meaning that a weighting adjustment had to account for the remaining 3 percent of noninterviewed households.

The lower interview rate for Census Day households is due largely to the fact that households that had been occupied entirely by outmovers at the time of the census were harder to interview than other households. This result is not surprising because the new occupants of such households may know nothing of the people who lived there before, and it may not always be possible to interview a knowledgeable neighbor or landlord. The interview rate for outmover households was 81.4 percent. Such households comprised 4 percent of Census Day occupied households in the P-sample.

Noninterview Weighting Adjustments

Two weighting adjustments were calculated so that interviewed households would represent all households that should have been interviewed: one for the A.C.E. interview day and the other for Census Day. Each of the two

2  

These percentages are unweighted; they are about the same as weighted percentages. Weighted percentages are not available for 1990.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

weighting adjustments was calculated separately for households by type (single-family unit, apartment, other) within each individual block cluster. Mover status was not a factor for reweighting.

For Census Day, what could have been a relatively large noninterview adjustment for outmover households in a block cluster was spread over all interviewed Census Day households in the cluster for each of the three housing types. Consequently, adjustments to the weights for interviewed households were quite low, which had the benefit of minimizing the increase in the variance of A.C.E. estimates due to differences among weights: 52 percent of the weights were not adjusted at all because all occupied households in the adjustment cell were interviewed; for another 45 percent of households, the weighting adjustment was between 1.0 and 1.2 (Cantwell et al., 2001:Table 2; see also “Variance Estimates,” below).

MISSING AND UNRESOLVED DATA

Another important aspect of A.C.E. data quality is the extent of missing and unresolved data in the P-sample and the E-sample and the effectiveness of imputation procedures to supply values for missing and unresolved variables. Understanding the role of imputation necessitates understanding the designation of the E-sample and the treatment of certain cases in the matching.

As noted above, the E-sample excluded whole person imputations in the census, defined as people with only one short-form characteristic (which could be name). Matching was performed on the P-sample and E-sample, using only reported information. During the course of matching, it was determined that some cases lacked enough reported data for matching and follow-up when a more stringent criterion was applied than that used to exclude whole person imputations from the E-sample. Cases in the P-sample and E-sample lacking name and at least two other short-form characteristics could not be matched. Such cases were retained in both the E- and the P-samples; in the E-sample they were coded as erroneous enumerations and in the P-sample they were not yet assigned a final match status.

After all matching and follow-up had been completed, the next step was item imputation. Missing characteristics were imputed separately for each item in the P-sample records (including those records that lacked enough reported data for matching). Imputations for missing characteristics in the E-sample records (including those records that lacked name and at least two other short-form characteristics) were obtained from those on the census data file (see Appendix A). Then, match probabilities and Census Day residence probabilities were imputed for unresolved P-sample cases, including those that were set aside in the matching, and correct enumeration probabilities were imputed for unresolved E-sample cases. E-sample cases set aside in the matching were assigned a correct enumeration probability of zero.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

TABLE 7-1 Missing Data Rates for Characteristics, 2000 A.C.E. and 1990 PES P-Sample and E-Sample (weighted)

 

Percentage of People with Imputed Characteristics

 

2000 A.C.E.

1990 PES

Characteristic

P-Sample

E-Sample

P-Sample

E-Sample

Age

2.4

2.9

0.7

2.4

Sex

1.7

0.2

0.5

1.0

Race

1.4

3.2

2.5

11.8

Hispanic Origin

2.3

3.4

N.A.

N.A.

Housing Tenure

1.9

3.6

2.3

2.5

Any of Above

5.4

10.4

N.A.

N.A.

NOTES: A.C.E. E-sample imputations were obtained from the imputations performed on the census records; PES E-sample imputations were performed specifically for the E-sample. A.C.E. E-sample “edits” (e.g., assigning age on the basis of the person’s date of birth, or assigning sex from first name) are not counted as imputations here. The base for the A.C.E. P-sample imputation rates includes nonmovers, inmovers, and outmovers, including people who were subsequently removed from the sample as nonresidents on Census Day. Excluded from the base for the A.C.E. P-sample and E-sample imputation rates are people eligible for the targeted extended search who were not selected for the targeted extended search sample and who were treated as noninterviews in the final weighting. N.A., not available.

SOURCE: Cantwell et al. (2001:Tables 3b, 3c).

Missing Characteristics
Extent

Overall, the extent of missing characteristics data in the P-sample and E-sample was low, ranging between 0.2 percent and 3.6 percent for the characteristics age, sex, race, Hispanic origin, and housing tenure. Missing data rates for most characteristics were somewhat higher for the E-sample than for the P-sample. Missing data rates for the 2000 A.C.E. showed no systematic difference (up or down) from the 1990 PES; see Table 7-1.

As would be expected, missing data rates in the P-sample were higher for proxy interviews, in which someone outside the household supplied information, than for interviews with household members; see Table 7-2. By mover status, missing data rates were much higher for outmovers than for nonmovers and inmovers, which is not surprising given that 73.3 percent of interviews for outmovers were obtained from proxies, compared with only 2.9 percent and 4.8 percent of proxy interviews for nonmovers and inmovers, respectively. Even “non-proxy” interviews for outmovers may have been from household members who did not know the outmover.

For the E-sample, one can distinguish mailed back returns from returns obtained by enumerators in nonresponse follow-up, although there is not information on proxy interviews for the latter. Table 7-3 shows that missing data rates were higher for some, but not all, characteristics when the return was obtained in nonresponse follow-up than when the return was mailed back by the household.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

TABLE 7-2 Percentage of 2000 A.C.E. P-Sample People with Imputed Characteristics, by Proxy Interview and Mover Status (weighted)

 

Percentage of People with Imputed Characteristics

Characteristic

Household Interview

Proxy Interview

Nonmover

Inmover

Outmover

Age

2.1

7.9

2.3

2.3

6.0

Sex

1.5

4.2

1.7

0.4

3.4

Race

1.0

8.7

1.2

1.3

8.0

Hispanic Origin

1.8

11.0

2.1

0.8

9.0

Housing Tenure

1.7

5.2

1.9

0.4

2.4

Any of Above

4.4

21.9

5.0

3.7

17.4

Percent of Total

 

P-Sample

94.3

5.7

91.7

4.8

3.4

NOTES: See notes to Table 7-1.

SOURCE: Cantwell et al. (2001:Table 3b).

Effects of Item Imputation

Because the overall rates of missing data were low, the imputation procedures had little effect on the distribution of individual characteristics (Cantwell et al, 2001:24–26). However, imputation could misclassify people by post-strata and contribute to inconsistent post-strata classification for matching P-sample and E-sample cases (see “Post-Stratification,” below). The reason is because the P-sample and E-sample imputations were performed using somewhat different procedures; also, imputation procedures for the P-sample were carried out separately for each characteristic.3

Unresolved Residence, Match, and Enumeration Status
Residence Status

The weighted percentage of all P-sample nonmover and outmover cases with unresolved Census Day residence status was 2.2 percent, of which 51.7 percent were cases lacking enough reported information for matching. The remaining 48.3 percent of unresolved residence cases were confirmed matches, confirmed nonmatches, and possible matches. After imputation, the percentage of cases estimated to be Census Day residents dropped slightly, from 98.2 percent of resolved cases to 97.9 percent of all cases because the imputation

3  

For example, tenure on the P-sample was imputed by using tenure from the previous household of the same type (e.g., single-family home) with tenure reported, while race and ethnicity were imputed when possible from the distribution of race and ethnicity of other household members or from the distribution of race and ethnicity of the previous household with these characteristics reported (see Cantwell et al., 2001).

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

TABLE 7-3 Percentage of 2000 A.C.E. E-Sample People with Imputed or Edited Characteristics, by Type of Return (weighted)

 

Percentage of People with Imputed or Edited Characteristics

Characteristic

Mail Return

Enumerator Return

Age

 

Imputed

1.1

7.0

Edited

1.2

1.9

Sex

 

Imputed

0.1

0.4

Edited

0.9

1.1

Race

 

Imputed

3.2

3.2

Edited

0.0

0.0

Hispanic Origin

 

Imputed

3.5

3.0

Edited

0.3

0.4

Housing Tenure

 

Imputed

2.2

6.8

Edited

0.5

0.8

Any of Above

 

Imputed

8.5

14.7

Imputed or edited or both

10.9

18.1

Percent of Total E-Sample

69.3

28.0

NOTES: Mail returns are those obtained before the April 18, 2000, cutoff to begin nonresponse follow-up (NRFU). Enumerator returns are those obtained during NRFU. The table excludes 2.7 percent of total E-sample (e.g., list/enumerate, rural update/enumerate, urban/update enumerate, late mail returns).

SOURCE: Tabulations by panel staff of U.S. Census Bureau, E-Sample Person Dual-System Estimation Output File, February 16, 2001; tabulations weighted using TESFINWT (see notes to Table 7-7).

procedure assigned lower residence probabilities to unresolved cases (77.4 percent overall; this figure is a correction from the original number in Cantwell et al., 2001:Table 8).4

To impute a residence probability, the Census Bureau classified resolved and unresolved cases by match status follow-up group, race, and tenure. The eight match status groups discriminated well: for example, residence probabilities were very low for potentially fictitious people or people said to be living elsewhere on Census Day (14%);5 moderate for college and military age children in partially matched households (84%); and very high for cases resolved

4  

One would not expect there to be confirmed non-Census Day residents or unresolved cases among nonmovers and outmovers; however, it could happen because mover status was assigned prior to field follow-up work.

5  

Fictitious people are those for whom it seems clear that the data were fabricated by the respondent or enumerator (e.g., a return for Mickey Mouse.)

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

before follow-up (99%). The addition of race and tenure to the imputation cells did not capture much additional variability in the probability of Census Day residence (Cantwell et al., 2001:Table 8). The residence probabilities assigned to people without enough reported data for matching—84 percent overall—were based on the average of the probabilities for people in the other match status groups within each race and tenure category.

Match Status

The weighted percentage of P-sample cases with unresolved match status was only 1.2 percent.6 This percentage compares favorably with the 1.8 percent of cases with unresolved match status in the 1990 PES. Very little was known about the A.C.E. P-sample people with unresolved match status; 98 percent of them lacked enough reported data for matching (i.e., they lacked a valid name or at least two characteristics or both).

After imputation, the percentage of matches dropped slightly, from 91.7 percent of resolved cases (matches and nonmatches) to 91.6 percent of all cases because the imputation procedure assigned lower match status probabilities to unresolved cases (84.3% overall). To impute a match status probability, the Census Bureau classified resolved and unresolved cases by mover status (nonmover, outmover), whether the person’s housing unit did or did not match, and whether the person had one or more characteristics imputed or edited. These categories discriminated well: the probability of a match for nonmovers was 92 percent overall, compared with only 76 percent for outmovers overall. The lowest match probability was 52 percent for outmovers when the housing unit did not match; the highest match probability was 95 percent for nonmovers when the housing unit matched and the person had no imputed characteristics (Cantwell et al, 2001:Table 9).

Enumeration Status

The weighted percentage of E-sample cases with unresolved enumeration status was 2.6 percent, slightly higher than the comparable 2.3 percent for the 1990 PES. Most of the unresolved cases (89.4%) were nonmatches for which field follow-up could not resolve their status as a correct or erroneous enumeration; the remainder were matched cases for which field follow-up could not resolve their residence status, possible matches, and cases for which the location of the housing unit was not clear.

After imputation, the percentage of correct enumerations dropped slightly, from 95.5 percent of resolved cases (correct and erroneous enumerations) to

6  

The denominator for the percentage is P-sample nonmovers and outmovers who were confirmed Census Day residents or had unresolved residence status; confirmed non-Census Day residents were dropped from the P-sample at this point.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

95.3 percent of all cases because the imputation procedure assigned lower correct enumeration probabilities to unresolved cases (76.2% overall). To impute a correct enumeration status probability, the Census Bureau classified resolved and unresolved cases by match status group, whether the person had one or more imputed characteristics, and race (for some match status groups). The 12 match status groups discriminated well: for example, correct enumeration probabilities were very low for potentially fictitious people (6%) and people said to be living elsewhere on Census Day (23%); moderate for college and military age children in partially matched households (88%); and very high for cases resolved before follow-up (99%). The addition of race and whether the person had imputed characteristics did not capture much additional variability in the probability of correct enumeration (Cantwell et al., 2001:Table 10).

QUALITY OF MATCHING

Although the rates of unresolved match status and enumeration status were low, there remains a question about the accuracy of the classification of match and enumeration status for cases that were “resolved” before imputation. The accuracy of the matching and associated follow-up process is critical to dual-systems estimation (DSE).

That accuracy is critical to distinguish the proportion of P-sample people who match a census record from the proportion who genuinely exist but were not enumerated in the census. If some of the nonmatched people should have been matched or should have been removed from the P-sample because they were fictitious or not a resident at the P-sample address on Census Day or for some other reason, then the estimated match rate will be too low and the estimate of the DSE will be too high.

That accuracy is also critical to distinguish the proportion of E-sample people who were correctly counted (including matches and correct nonmatches) from the proportion who were enumerated erroneously because they were duplicate, fictitious, or for some other reason. If some cases who were classified as correct (nonmatched) enumerations were in fact erroneous, then the estimated correct enumeration rate will be too high and the estimate of the DSE will be too high.

It is not possible to assess the reliability of assignment of the final match codes until the Census Bureau publishes results from evaluation studies that involve rematching and verifying samples of A.C.E. records (see Executive Steering Committee on A.C.E. Policy, 2001b). The Bureau is also looking at possible errors in assigning correct or erroneous enumeration status to E-sample cases due to the operation of the targeted extended search and the treatment of group quarters residents who should have been excluded from the sample.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

Rematching studies for 1990 found some degree of clerical matching error, although analysts disagreed on its importance (National Research Council, 1999b:70–75). The results for 2000 are not yet known. The Bureau believed that the accuracy of matching would improve through greater computerization of the process and other steps in 2000, compared with 1990. The results of quality assurance operations during the matching and follow-up interviewing indicated that relatively little error was identified in assigning match and enumeration status codes (see Childers et al., 2001). Nonetheless, the degree of matching error remains to be established. As indirect indicators of the quality of the matching, we examined specific match codes and how they related to the various steps in the process.

Extent of Checking Required to Confirm Final Match Code

We looked first at final match codes and asked what proportion of the cases in each category were confirmed at the conclusion of computer matching, at the conclusion of clerical matching, or not until after field follow-up.

Confirmed Matches

Table 7-4 shows that 80.3 percent of final confirmed P-sample matches were designated as a match by the computer and did not require follow-up in the field (last row, column 1). Another 18 percent of final confirmed matches were declared a match by clerks, technicians, or analysts and did not require a field check (last row, columns 2, 3, 4). Only 1 percent of final confirmed matches were declared a match only after confirmation of their Census Day residence status in the field (column 5); only 0.8 percent of final confirmed matches were declared a match only after confirmation of their match and residence status in the field (column 6). Similar results obtained for the E-sample (not shown).

By domain and tenure group, the percentage of final confirmed matches that were declared a match by computer varied from 65 percent to 84 percent, perhaps due to difficulties with names. However, there was relatively little variation in the percentage of final confirmed matches that did not require confirmation of residence or match status in the field (97.0% to 99.2%). Given the standards for computer and clerical matching, these results suggest that one can have a high degree of confidence about the designation of a matched case.7

7  

The cutoff probability score for a computer match was set high enough, based on previous research, so that false computer matches would almost never occur.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

TABLE 7-4 Percentage of 2000 A.C.E. P-Sample Matches to Census Enumerations, by Source of Final Match Code Assignment, Race/Ethnicity Domain, and Housing Tenure (weighted)

 

No Field Check Needed

Field Check Needed for

 

 

Computer M

Computer P, Clerk M

Computer NM, Clerk M

Other Final M

Residence Status

Match and Residence

Percent Total Matches

Domain and Tenure Group

(1)

(2)

(3)

(4)

(5)

(6)

(7)

American Indian/Alaska Native on Reservation

 

Owner

80.7

10.4

6.4

1.7

0.4

0.4

0.1

Renter

82.1

12.5

2.9

1.8

0.4

0.4

0.1

American Indian/Alaska Native off Reservation

 

Owner

78.8

10.4

8.4

1.1

0.8

0.5

0.3

Renter

77.8

9.9

8.3

1.8

0.7

1.6

0.2

Hispanic Origin

 

Owner

76.4

12.5

8.1

1.1

1.1

0.9

5.8

Renter

68.1

14.6

12.5

1.8

1.2

1.7

5.9

Black (Non-Hispanic)

 

Owner

77.7

12.1

6.7

1.4

1.1

1.0

5.7

Renter

71.2

12.9

10.9

2.0

1.4

1.5

5.1

Native Hawaiian/Pacific Islander

 

Owner

76.9

12.1

6.4

2.4

0.8

1.4

0.1

Renter

64.6

15.2

14.9

3.6

0.6

1.0

0.1

Asian

 

Owner

72.9

14.3

8.9

1.4

1.0

1.4

2.1

Renter

66.7

16.1

12.4

1.7

1.2

1.8

1.2

White and Other Race (Non-Hispanic)

 

Owner

84.2

8.2

5.7

0.6

0.9

0.4

57.5

Renter

77.9

10.3

8.6

1.0

1.2

1.0

15.8

Total

80.3

9.8

7.2

0.9

1.0

0.8

100.0

NOTES: Columns (1)–(6) in each row add to 100%; Column (7), reading down, adds to 100%. M: match; P: possible match; NM: nonmatch (confirmed Census Day resident).

SOURCE: Tabulations by panel staff of P-sample cases that went through matching, from U.S. Census Bureau, P-Sample Person Dual-System Estimation Output File, February 16, 2001. Tabulations weighted using TESFINWT; exclude TES-eligible people not in TES sample block clusters, who have zero TESFINWT.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Confirmed P-Sample Nonmatches

Assignment of confirmed nonmatch status was always based on a field check for certain types of P-sample cases (see Appendix C), amounting to 50.4 percent of the total confirmed P-sample nonmatches. There was relatively little variation in this percentage for most race/ethnicity domain and tenure groups (data not shown), although 69 percent of final confirmed nonmatches for American Indians and Alaska Natives were not declared a nonmatch until after being checked in the field, compared with only 47 percent for non-Hispanic whites and other races. How many nonmatches were correctly assigned and how many should have been identified as either matches or cases to be dropped from the P-sample (e.g., fictitious cases or people residing elsewhere on Census Day) will not be known until the Census Bureau completes its studies of matching error.

Confirmed E-Sample Correct (Nonmatched) or Erroneous Enumerations

On the E-sample side, assignment of a final code as a correct (nonmatched) enumeration was always based on a field check. Of final erroneous enumerations (4% of the total E-sample), 35 percent were declared on the basis of a field check, while 65 percent were identified by clerks as duplicates or not enough reported data and did not require confirmation in the field.

Unresolved Cases

As noted above, the E-sample had a higher percentage of cases that could not be resolved after field checking than did the P-sample: 2.6 percent and 2.2 percent, respectively. Moreover, 52.2 percent of the unresolved P-sample cases were those coded by the computer or clerks as not having enough reported data for matching. These cases were not field checked but had their residence or match status imputed.

Extent of Reassignment of Match Codes

Another cut at the issue of matching quality is how often one stage of matching changed the code assigned in an earlier stage of matching. Table 7-5 shows that such changes happened quite infrequently. Thus (see Panel A), 99.9 percent and 99.7 percent of confirmed matches assigned by the computer for the P-sample and the E-sample, respectively, remained as such in the final coding. Also, 93 percent of computer possible matches in both the P-sample and the E-sample were confirmed as such without the need for field followup; another 5.5–5.7 percent were confirmed as a match (or, in the case of the E-sample, as a nonmatched correct enumeration) in the field. Only 1.3–1.5

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

TABLE 7-5 Outcome of Computer Matching and Cases Followed Up in the Field, 2000 A.C.E. (weighted)

 

P-Sample

E-Sample

 

Percent of Total

Percent of Match Group

Percent of Total

Percent of Match Group

A. Outcome of Computer Matching (Cases Included in Matching Process)

100.0

 

100.0

 

Computer Match

 

Total

72.6

100.0

69.4

100.0

Final Match

 

99.9

 

99.7

Other Code

 

0.1

 

0.3

Computer Possible Match

 

Total

9.7

100.0

9.2

100.0

Final Match, no Field Check

 

93.0

 

93.0

Field-Based Final Match (P) or Correct Enumeration (E)

 

5.5

 

5.7

Final Nonmatch (P) or Erroneous Enumeration (E)

 

0.5

 

0.6

Unresolved (or Removed for P-sample)

 

1.0

 

0.7

Computer Nonmatch

 

Total

17.1

100.0

19.2

100.0

Final Nonmatch (P) or Erroneous Enumeration (E)

 

41.4

 

10.8

Final Correct Enumeration (E)

 

N.A.

 

60.2

Final Match

 

43.9

 

16.1

Unresolved (or Removed for P-sample)

 

14.7

 

12.9

Computer Not Enough Reported Data for Matching

 

Total

0.6

 

2.2

 

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

B. Outcome of Cases Followed Up in the Fielda (7.1% of Total P-Sample; 17.1% of Total E-Sample)

100.0

 

100.0

 

Before Follow-up Code of Match

13.5

100.0

5.2

100.1

Field Match

 

82.3

 

79.6

Field Match but Unresolved Residence

 

14.9

 

13.3

Field Valid Nonmatch (P) or Correct Enumeration (E)

 

0.2

 

0.3

Field Other Code (Removed, Unresolved, Erroneous)

 

2.6

 

6.9

Before Follow-up Code of Possible Match

10.3

100.0

3.8

100.0

Field Match

 

82.9

 

82.1

Field Match but Unresolved Residence

 

1.9

 

1.9

Field Valid Nonmatch (P) or Correct Enumeration (E)

 

8.7

 

8.5

Field Other Code (Removed, Unresolved, Erroneous)

 

6.5

 

7.5

Before Follow-up Code of Nonmatch

75.6

100.0

73.3

100.0

Field Valid Nonmatch (P) or Correct Enumeration (E)

 

67.8

 

71.9

Field Removed (P) or Erroneous Enumeration (E)

 

1.8

 

11.4

Field Unresolved

 

27.1

 

15.7

Field Match

 

3.3

 

1.0

Before Follow-up Code of Found in Surrounding Block

N.A.

N.A.

14.6

100.0

Field Correct Enumeration (E)

 

 

 

89.9

Field Other Code

 

 

 

10.1

Before Follow-Up Other Code

0.6

 

3.1

 

NOTES: P: P-sample; E: E-sample. Correct enumerations in table are those not matching to the P-sample.

aCases followed up in the field included 1 percent of P-sample and E-sample before follow-up matches; 100 percent of P-sample and E-sample possible matches; 61 percent and 100 percent of P-sample and E-sample nonmatches; and 100 percent of E-sample cases found in another block.

SOURCE: Tabulations by panel staff of U.S. Census Bureau, P-Sample and E-Sample Person Dual-System Estimation Output Files, February 16, 2001. Tabulations weighted using TESFINWT. P-sample cases in the matching process included nonmovers and outmovers; TES-eligible persons who were not in TES sample block clusters were excluded.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

percent of computer possible match codes were changed to a nonmatch (P-sample) or to an erroneous enumeration (E-sample) or could not be resolved in the field.

The analysts who reviewed clerical matches rarely overturned the clerks’ decisions, and field follow-up most often confirmed the before-follow-up code or left the case unresolved. Thus (see Panel B), among cases identified as matches by the computer, clerks, or analysts, only 1 percent were followed up, and 80–82 percent of those were confirmed in the field. Most of the rest remained unresolved with regard to residence status. Less than 0.5 percent were turned into a nonmatch (P-sample) or into a correct (nonmatched) enumeration (E-sample). Among E-sample cases with a before-follow-up code of nonmatch (as distinct from an erroneous enumeration or unresolved case), 100 percent were followed up, and only 1 percent turned into a match after follow-up. Among P-sample cases with a before-follow-up code of nonmatch, 61 percent were checked in the field, and only 3.3 percent of them turned into a match.

TARGETED EXTENDED SEARCH

The targeted extended search (TES) operation in the A.C.E. was designed to reduce the variance and bias associated with geocoding errors (i.e., assignment of addresses to the wrong block) in the census or in the P-sample address listing. In a sample of block clusters for which there was reason to expect geocoding errors (2,177 of 6,414 such clusters), the clerical search for matches of P-sample and census enumerations and for correct E-sample enumerations was extended to one ring of blocks surrounding the A.C.E. block cluster. Sampling was designed to make the search much more efficient than in 1990 (see Appendix C).

For the P-sample, only people in households that did not match a census address (4.7% of total P-sample cases that went through matching) were searched in the ring of blocks surrounding a sampled block cluster. On the E-sample side, only people in households identified as geocoding errors (3% of total E-sample cases) were searched in the ring surrounding a sampled block cluster. Weights were assigned to the TES persons in the sampled block clusters to adjust for the sampling.8 Correspondingly, persons that would have been eligible for TES but were not in a sampled block cluster were assigned a zero weight.

The result of the extended search was to increase the overall P-sample match rate from 87.7 percent without TES to 91.6 percent with TES, an increase of 3.8 percentage points. At the same time, the overall E-sample correct

8  

The weight was either 1 for the 60 percent of sampled TES persons that were selected with certainty or 4.9 for the remaining sampled TES persons.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

enumeration rate was increased from 92.3 percent to 95.3 percent, an increase of 2.9 percentage points (Navarro and Olson, 2001:Table 1). Because the increase was larger for matches than for correct enumerations, the correction ratio (correct enumeration rate divided by match rate) decreased by 1.3 percentage points, from 1.053 to 1.040; such a change has the effect of reducing the estimate of the DSE and the net undercount.

TES affected the correction ratios for age and sex groups used in the post-stratification about equally (Navarro and Olson, 2001:Table V). There was somewhat more variation in the effects on the correction ratios for race and ethnicity domains. In particular, the correction ratio for American Indians and Alaska Natives on reservations was reduced by 8.4 percentage points, compared with the average reduction of 1.2 percentage points (Navarro and Olson, 2001:Table IV).

The TES had the desired effect of reducing the variance of the DSE estimates for post-strata. The reduction in the average and median coefficient of variation (the standard error of an estimate as a percent of the estimate) for post-strata was 22 percent, similar to an average reduction of 20 percent for the nationwide extended search operation in 1990 (Navarro and Olson, 2001:7).

The underlying question, however, is whether the TES operation was unbalanced, thereby introducing bias into the DSE. The larger increase in the P-sample match rate than in the E-sample correct enumeration rate suggests an imbalance. Such an imbalance may also have occurred in 1990, when the extended search increased the P-sample match rate by 4.1 percentage points and the E-sample correct enumeration rate by 2.3 percent. A follow-up study to the 1990 census was not able to determine whether balancing error had occurred (Bateman, 1991).

What could cause balancing error in the TES? Such error would result if the search area was not defined consistently for the P-sample and E-sample, so that the clerks might count as correct an enumeration outside the search area or fail to match to an enumeration inside the search area. One possible source of the observed imbalance of additional matches compared with additional correct enumerations in the TES was that the P-sample address listing could have contained errors. For example, the P-sample address list could have assigned an address to the A.C.E. block cluster when in fact it was located in the surrounding ring. When the clerk did not find a match in the A.C.E. block cluster because there was no corresponding census address, then a search for a match in the surrounding ring would likely be successful. The Census Bureau has fielded a study to determine if P-sample address geocoding errors largely explain the larger increase in the match rate compared with the erroneous enumeration rate. If they do, then there is no effect on the DSE.

Alternatively, it is possible that there was an underestimation of census housing units that were eligible for the E-sample TES. If a nonmatched E-sample address from the initial housing unit match was not found in the field,

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

it was classified as an erroneous enumeration, when it might in fact have been located in a nearby block and therefore should have been classified as a geocoding error. Only misgeocoded E-sample cases were eligible for TES, so housing units that were miscoded as erroneous were excluded from TES. The Census Bureau has fielded a study to determine the accuracy of the identification of E-sample units that were eligible for TES. The Bureau is also studying possible discrepancies between the classification of erroneous E-sample housing units in the housing unit match and the classification of some of the people in those units during field follow-up as correct enumerations.

POST-STRATIFICATION

Post-stratification is an important aspect of dual-systems estimation. Because research suggests that the probabilities of being included in the census or in the P-sample vary by individual characteristics, it is important to classify P-sample and E-sample cases into groups or strata for which coverage probabilities are as similar as possible within the group and as different as possible from other groups. Estimation of the DSE then is performed stratum by stratum.

Counterbalancing the need for finely defined post-strata are two considerations: each post-stratum must have sufficient sample size for reliable estimates; and the characteristics used to define the post-strata should be consistently measured between the P-sample and the E-sample. As an example, a respondent to the census who is in the E-sample may have reported a household member as age 30 when a possibly different respondent for the same household in the P-sample reported that household member as age 29. The matched person, then, would contribute to the P-sample match rate for the 18-to-29-year-old post-strata and to the E-sample correct enumeration rate for the 30-to-49-year-old post-strata. Such misclassification could be consequential if the proportions misclassified were large and if the coverage probabilities varied greatly for the affected post-strata. At the same time, the Census Bureau wanted to define post-strata in a way that could be easily explained.

Taking all these considerations into account, the Bureau decided to identify a moderate number of post-strata for which direct estimates could be developed without the use of modeling (see Table 6-2 in Chapter 6). In this regard, the Bureau adhered fairly closely to the number and type of post-strata that were used for the revised 1990 estimates, for which 357 post-strata were identified.9 Given the larger size of the A.C.E. relative to the 1990 PES, the Bureau was able to identify a somewhat larger number of post-strata in 2000 (448, collapsed to 416) than the final number in 1990.

9  

The revised set of 1990 post-strata were developed by analyzing census results that had become available (e.g., mail return rates, imputation rates, crowding) to determine which characteristics that could be used for post-stratification best explained variations in those results.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

Several participants at a workshop in fall 1999 (National Research Council, 2001a) urged the Bureau to use modeling techniques to develop post-strata on the basis of the A.C.E. data. The model would assess the best predictors of coverage, but the Bureau decided such an approach was not feasible. It would be desirable now for the Bureau to estimate such models to determine if the A.C.E. post-stratification was optimal or reasonably so. If a different stratification scheme seemed more effective, then it could be used to develop revised dual-systems estimates for use in any adjustment of the census.

On the face of it, the A.C.E. post-stratification seems reasonable. There are certainly wide variations in estimated coverage correction factors—from 0.958 to 1.07 among the 64 post-strata groups, excluding age and sex breakdowns, and from 0.929 to 1.186 for all 416 post-strata. As noted above (see Chapter 6), both the A.C.E. and the PES estimated higher net undercount rates for minorities than whites, for renters than owners, and for children than older people; however, estimates of net undercount rates for minorities, renters, and children were significantly lower in the 2000 A.C.E. than in the 1990 PES.

There was some inconsistency of classification by post-strata between the P-sample and E-sample in the A.C.E., although whether the level of inconsistency was higher or lower than in 1990 cannot be determined because of the unavailability of data for 1990 matched cases. Overall, 4.7 percent of A.C.E. matched cases (unweighted) were misclassified as owner or renter; 5.1 percent were misclassified among age and sex groups, and 3.9 percent were misclassified among race/ethnic domains (Farber, 2001a:Table 1).

Rates of inconsistency were much higher for matched cases for which the characteristic in question was imputed than for nonimputed cases. For example, 36 percent of cases for which age or sex were imputed were classified inconsistently among age/sex post-strata, and such cases were almost half of all inconsistent cases. However, as just noted, only 5 percent of all cases were misclassified among age/sex post-strata. The percentage of inconsistent cases for specific age/sex groups ranged from 1.3 percent for children aged 0–17 to 8.8 percent for males aged 18–29.

By race/ethnicity domain, inconsistent cases as a percentage of E-sample matches ranged from 1.5 percent for American Indians and Alaska Natives on reservations to 18.3 percent for Native Hawaiians and Pacific Islanders to 35.7 percent for American Indians and Alaska Natives off reservations. By age and sex, the percentage of inconsistent cases among American Indians and Alaska Natives off reservations ranged from 54 percent for nonowner females aged 18–29 to 68 percent for nonowner males aged 50 or older. These rates of inconsistency are very high. The major factor is that a large number of non-Hispanic whites and other races in one sample (relative to the Native American population) identified themselves as American Indians or Alaska Natives off reservations in the other sample; see Table 7-6. The effect was to lower the coverage correction factor for the latter group below what it would have

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

TABLE 7-6 2000 A.C.E. Matched P-Sample and E-Sample Cases: Consistency of Race/Ethnicity Post-Stratification Domain (unweighted)

 

E-Sample

P-Sample

Race/Ethnicity Domain

Domain 1

Domain 2

Domain 3

Domain 4

Domain 5

Domain 6

Domain 7

Total

% Inconsistent

P-Sample

 

American Indian or Alaska Native on Reservations (Domain 1)

11,009

0

34

12

0

0

118

11,173

1.5

American Indian or Alaska Native off Reservations (Domain 2)

0

2,223

59

104

0

30

793

3,209

30.7

Hispanic Origin (Domain 3)

44

136

67,985

610

42

267

4,004

73,088

7.0

Non-Hispanic Black (Domain 4)

10

119

496

65,679

6

118

1,423

67,851

3.2

Native Hawaiian or Pacific Islander (Domain 5)

0

3

31

19

1,671

204

177

2,105

20.6

Asian (Domain 6)

1

31

107

102

143

19,679

1,062

21,125

6.8

Non-Hispanic White or Other Race (Domain 7)

107

944

5,041

2,589

183

2,105

360,125

371,094

3.0

E-Sample

 

Total

11,171

3,456

73,753

69,115

2,045

22,403

367,702

549,645

 

% Inconsistent

1.5

35.7

7.8

5.0

18.3

12.2

2.1

 

3.9

NOTE: See Table 6-2 for definitions of domains.

SOURCE: Farber (2001a:Table A-3).

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

been had there been no inconsistency. However, the coverage correction factor would have been lower yet for American Indians and Alaska Natives off reservations if they had been merged with the non-Hispanic white and other races stratum. The reverse flow of American Indians and Alaska Natives identifying themselves as non-Hispanic whites or other races had virtually no effect on the coverage correction factor for the latter group, given its much larger proportion of the population.

VARIANCE ESTIMATES

Overall, the A.C.E. was expected to have smaller variances due to sampling error and other sources than the 1990 PES, and that expectation was borne out. The coefficient of variation for the estimated coverage correction factor for the total population was reduced from 0.2 percent in 1990 to 0.14 percent in 2000 (a reduction of 30%). The coefficients of variation for the coverage correction factors for Hispanics and non-Hispanic blacks were reduced from 0.82 percent and 0.55 percent, respectively, to 0.38 percent and 0.40 percent, respectively (Davis, 2001:Tables E-1, F-1). However, the coefficients of variation for coverage correction factors were as high as 6 percent for particular post-strata, which translates into a very large confidence interval around the estimate of the net undercount.10

The overall coefficient of variation was expected to be reduced by about 25 percent due to the larger sample size of the A.C.E., almost double that of the 1990 PES. In addition, better measures of population size were available during the selection of the A.C.E. block clusters than during the selection of PES clusters, and the A.C.E. sampling weights were less variable than the PES sampling weights. The 2000 TES was much better targeted and thereby more efficient than the similar operation in 1990. Overall, TES was expected to reduce the variance of the DSE, although the 2000 TES also contributed somewhat to an increase in sampling error.

Looking at size and variation in weights, Table 7-7 shows the changes in the P-sample weights, from the initial weighting that accounted for differential sampling probabilities to the intermediate weights that included household noninterview adjustments to the final weights that accounted for TES sampling. (The table also shows the distribution of E-sample initial and final weights.) At the outset, 90 percent of the initial P-sample weights were between 48 and 654 and the highest and lowest weights were 9 and 1,288; the distribution did not differ by mover status. After the household noninterview adjustment for Census Day, 90 percent of the weights were between 49 and

10  

The variance estimates developed by the Census Bureau likely underestimate the true variance, but the extent of underestimation is not known. The variance estimation excludes some minor sources of error (specifically, the large block subsampling and the P-sample noninterview adjustment). It also excludes most sources of nonsampling error (see Appendix C).

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

TABLE 7-7 Distribution of Initial, Intermediate, and Final Weights, 2000 A.C.E. P-Sample and E-Sample

 

Percentile of Weight Distribution

Sample and Mover Status

Number of Non-Zeros

0

1

5

10

25

50

75

90

95

99

100

P-Sample

 

Initial Weighta

 

Total

721,734

9

21

48

75

249

352

574

647

654

661

1,288

Nonmovers

631,914

9

21

48

76

253

366

575

647

654

661

1,288

Outmovers

24,158

9

21

48

69

226

348

541

647

654

661

1,288

Inmovers

36,623

9

21

47

67

212

343

530

647

654

661

1,288

Intermediate Weightb

 

Total with Census Day Weight

712,442

9

22

49

78

253

379

577

654

674

733

1,619

Total with Interview Day Weight

721,426

9

21

48

76

249

366

576

651

660

705

1,701

Final Weightc

 

Census Day Weight

 

Total

640,795

9

22

50

83

273

382

581

654

678

765

5,858

Nonmovers

617,390

9

22

50

83

274

382

581

654

678

762

5,858

Outmovers

23,405

9

23

50

77

240

363

577

655

682

798

3,847

Inmovers

36,623

9

21

47

67

214

345

530

651

656

705

1,288

E-Sample

 

Initial Weightd

712,900

9

21

39

55

212

349

564

647

654

661

2,801

Final Weighte

704,602

9

21

39

56

217

349

567

647

654

700

4,009

aP-sample initial weight, PWGHT, reflects sampling through large block subsampling; total includes removed cases

bP-sample intermediate weight, NIWGT, reflects household noninterview adjustment for Census Day; NIWGTI reflects household noninterview adjustment for A.C.E. interview day

cP-sample final weight, TESFINWT, for confirmed Census Day residents, total, nonmovers, and outmovers (reflects targeted extended search sampling); NIWGTI for inmovers

dE-sample initial weight, EWGHT, reflects sampling through large block subsampling

eE-sample final weight, TESFINWT, reflects targeted extended search sampling

SOURCE: Tabulations by panel staff of U.S. Census Bureau, P-Sample and E-Sample Person Dual-System Estimation Output Files, February 16, 2001.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

674, and the highest and lowest weights were 9 and 1,619. After the TES adjustment, 90 percent of the final weights for confirmed Census Day residents were between 50 and 678, and the highest and lowest weights were 9 and 5,858 (the variation in weights was less for outmovers than nonmovers). For inmovers, there was relatively little difference between the initial sampling weights and the final weights adjusted for household noninterview on the P-sample interview day.

While the variations in final weights for the A.C.E. P-sample (and E-sample) were not small, they were considerably less than the variations in final weights for the 1990 PES. In 1990, some P-sample weights were more than 20,000, and 28 percent of the weights exceeded 700, compared with only 5 percent in the A.C.E.

FINAL MATCH CODES AND RATES

Having examined individual features of the A.C.E., we next looked at the distribution of final match codes and rates for the P-sample and E-sample. We wanted to get an overall sense of the reasonableness of the results for key population groups and in comparison with 1990.

Final Match and Enumeration Status
P-Sample Match Codes

The distribution of final match codes for the P-sample was 89.5 percent confirmed match, 7.4 percent confirmed nonmatch, 2.2 percent match or residence status unresolved, and 0.9 percent not a Census Day resident or removed for another reason (e.g., a fictitious or duplicate P-sample case). Table 7-8 shows that the percent confirmed matches by domain and tenure varied from 80 percent for black and Native Hawaiian and Pacific Islander renters to 93 percent for non-Hispanic white and other race owners; conversely, the confirmed nonmatches varied from 15.8 percent for Native Hawaiian and Pacific Islander renters to 4.9 percent for non-Hispanic white and other race owners. Those groups with higher percentages of nonmatched cases also tended to have higher percentages of unresolved cases: they varied from 1 percent for Native Hawaiian and Pacific Islander owners to 4.7 percent for black renters.

After imputation of residence and match status, the overall P-sample match rate (matches divided by matches plus nonmatches) was 91.6 percent. The match rate ranged from 82.4 percent for Native Hawaiian and Pacific Islander renters to 94.6 percent for non-Hispanic white and other race owners.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

TABLE 7-8 2000 A.C.E. P-Sample Final Match Codes, and A.C.E and PES Match Rates, by Race/Ethnicity Domain and Housing Tenure (weighted)

 

Percent Distribution of 2000 P-Sample Final Match Codes

P-Sample Match Ratea

Domain and Tenure Group

Match

Non-match

Unresolved

Removed

2000 A.C.E.

1990 PES

American Indian/Alaska Native on Reservation

 

Owner

82.9

13.2

1.6

2.4

85.43

78.13b

Renter

85.6

11.5

1.6

1.3

87.08

 

American Indian/Alaska Native off Reservation

 

Owner

88.5

9.2

1.4

0.9

90.19

Renter

81.2

12.6

4.3

1.9

84.65

Hispanic Origin

 

Owner

89.0

8.3

1.7

1.0

90.79

92.81

Renter

81.7

13.2

3.9

1.2

84.48

82.45

Black (Non-Hispanic)

 

Owner

87.9

8.8

2.3

1.1

90.14

89.65

Renter

80.4

13.7

4.7

1.2

83.67

82.28

Native Hawaiian/Pacific Islander

 

Owner

85.8

12.2

1.0

1.0

87.36

Renter

80.3

15.8

2.7

1.2

82.39

Asian (Non-Hispanic)c

 

Owner

90.1

6.6

2.3

1.0

92.34

93.71

Renter

84.4

10.8

3.7

1.1

87.33

84.36

White and Other Races (Non-Hispanic)

 

Owner

93.0

4.9

1.4

0.8

94.60

95.64

Renter

85.5

9.8

3.7

1.0

88.37

88.62

Total

89.5

7.4

2.2

0.9

91.59

92.22

NOTE: First four columns in each row add to 100%;—, not estimated.

aMatch rates (matches divided by the sum of matches and unmatches) are after imputation for unresolved residence and match status for the A.C.E. and after imputation of unresolved match status for the PES.

bTotal; not available by tenure.

c1990 PES match rates include Pacific Islanders.

SOURCES: A.C.E. match codes are from tabulations by panel staff of P-sample cases who went through the matching process, weighted using TESFINWT and excluding TES-eligible people not in TES sample block clusters (who have zero TESFINWT), from U.S. Census Bureau, P-Sample Person Dual-System Estimation Output File, February 16, 2001; A.C.E. and PES match rates from Davis (2001:Tables E-2, F-1, F-2).

E-Sample Match Codes

The distribution of final match codes for the E-sample was 81.7 percent matches, 11.6 percent other correct (nonmatched) enumerations, 4.0 percent erroneous enumerations, and 2.6 percent unresolved. Table 7-9 shows that the percent confirmed correct enumerations (the sum of matches plus other

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

correct enumerations in the first two columns) by domain and tenure ranged from 87.2 percent for black renters to 95.8 percent for non-Hispanic white and other owners. The percent erroneous enumerations ranged from 3 percent for non-Hispanic white and other owners and American Indian/Alaska Native on reservation renters to 7 percent for black renters, and the percent unresolved ranged from 1.2 percent for non-Hispanic white and other race owners to about 6 percent for Hispanic and black renters.

After imputation for enumeration status, the overall E-sample correct enumeration rate (matches and other correct enumerations divided by those groups plus erroneous enumerations) was 95.3 percent. The correct enumeration rate ranged from 91.2 percent for non-Hispanic black renters to 96.7 percent for non-Hispanic white and other race owners.

Comparisons with 1990

The P-sample match rates are similar for the 2000 A.C.E. and the 1990 PES for the total population and for many race/ethnic domain and housing tenure groups (see Table 7-8). For the total population, the A.C.E. match rate is 0.6 percent lower than the PES rate; for population groups, the A.C.E. match rates are lower than the PES rates for some groups and higher for others. The E-sample correct enumeration rates are also similar between the 2000 A.C.E. and the 1990 PES (see Table 7-9). However, there is a general pattern for the A.C.E. correct enumeration rates to be somewhat higher than the corresponding PES rates. On balance, these patterns have the outcome that the A.C.E. correction ratios (calculated by dividing the correct enumeration rate by the match rate) are higher than the corresponding PES correction ratios. If other things were equal, these results would mean that the A.C.E. measured higher net undercount rates than the PES, but the reverse is true. We explore in Chapter 8 the role of people reinstated in the census (late additions) and people requiring imputation to complete their census records—who could not be included in the A.C.E. process—in explaining the reductions in net undercount from 1990 levels that were measured in A.C.E.

GROSS ERRORS

Our discussion has focused on net undercount. Some analysts also are interested in the level of gross errors in the census—that is, total omissions and total erroneous enumerations. The A.C.E. is designed to measure net undercount (or net overcount). It measures gross errors but in ways that can be misleading. Many errors that are identified by A.C.E. involve the balancing of a nonmatch on the P-sample side against an erroneous enumeration on the E-sample side—for example, when an E-sample case that should match is misgeocoded. These kinds of balancing errors are not errors for such levels

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

TABLE 7-9 2000 A.C.E. E-Sample Final Match Codes, and 2000 A.C.E. and 1990 PES Correct Enumeration Rates, by Race/Ethnicity Domain and Housing Tenure (weighted)

 

Percent Distribution of 2000 E-Sample Final Match Codes

E-Sample Correct Enumeration Ratea

Domain and Tenure Group

Match

Other Correct Enumeration

Erroneous Enumeration

Unresolved

2000 A.C.E.

1990 PES

American Indian/Alaska Native on Reservation

 

Owner

77.1

17.1

3.7

2.1

95.65

91.54b

Renter

78.3

14.8

3.0

4.0

96.15

 

American Indian/Alaska Native off Reservation

 

Owner

81.9

11.7

4.9

1.5

94.56

Renter

74.1

15.2

5.0

5.7

93.16

Hispanic Origin

 

Owner

83.2

11.7

3.3

1.9

96.25

95.56

Renter

71.7

17.0

5.3

6.0

92.79

90.58

Black (Non-Hispanic)

 

Owner

80.3

12.7

5.2

1.7

94.25

92.84

Renter

68.2

19.0

7.0

5.9

91.16

89.19

Native Hawaiian/Pacific Islander

 

Owner

83.0

9.8

5.7

1.5

93.79

Renter

72.7

16.6

6.1

4.6

92.33

Asian (Non-Hispanic)c

 

Owner

83.3

11.3

3.8

1.6

95.84

93.13

Renter

72.1

15.3

5.9

6.7

92.45

92.22

White and Other Races (Non-Hispanic)

 

Owner

86.6

9.2

3.0

1.2

96.70

95.84

Renter

74.9

14.3

5.6

5.2

93.20

92.61

Total

81.7

11.6

4.0

2.6

95.28

94.27

NOTES: First cour columns in each row add to 100%;—, not estimated.

aCorrect enumeration (CE) rates (matches and other correct enumerations divided by the sum of matches, other correct enumerations, and erroneous enumerations) are after imputation for unresolved enumeration status.

bTotal; not available by tenure.

c1990 correct enumeration rates include Pacific Islanders.

SOURCES: A.C.E. match codes are from tabulations by panel staff of E-sample cases, weighted using TESFINWT and excluding TES-eligible people not in TES sample block clusters (who have zero TESFINWT), from U.S. Census Bureau, E-Sample Person Dual-System Estimation Output File, February 16, 2001; A.C.E. and PES correct enumeration rates are from Davis (2001:Tables E-2, F-1, F-2).

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

of geography as counties, cities, and even census tracts, although they affect error at the block cluster level. Also, the classification of type of gross error in the A.C.E. is not necessarily clean. For example, A.C.E. will not classify an enumeration of a “snowbird” at the person’s winter residence as duplicating an enumeration for the same person at his or her summer residence because there is no nationwide search. A.C.E. will likely classify duplicate snowbird enumerations as erroneous in the aggregate, but will not label them as duplicates.

It is important to take note of gross errors, however, because higher or lower net undercount does not relate directly to the level of gross errors. There can be a zero net undercount and a high rate of gross omissions and gross erroneous enumerations. Hence, for completeness, Table 7-10 shows gross errors in the 2000 A.C.E. and 1990 PES. The total gross errors in the A.C.E. appear to be somewhat reduced in percentage terms from the gross errors in the PES. However, the increased numbers of people requiring imputation and late additions, who may likely have had higher-than-average error rates, cloud the issue, as these people were not part of the E-sample. Also, the sizable differences between the A.C.E. and the PES in the distribution of types of gross erroneous enumerations are puzzling. For example, the A.C.E. estimates proportionately fewer duplicate enumerations than the PES. The Census Bureau is currently studying these discrepancies, which could also be due to the higher numbers of people requiring imputation and late additions who were not included in the A.C.E. processing.

CONCLUSIONS

On the basis of the evidence now available, we conclude that the A.C.E. was conducted according to well-specified and carefully controlled procedures. We also conclude that it achieved a high degree of quality in such areas as sample design, interviewing, and imputation for missing data.

There are several outstanding questions that must be addressed before it will be possible to render a final verdict on the quality of the A.C.E. procedures (see Executive Steering Committee on A.C.E. Policy, 2001b). The major outstanding questions relate to those aspects of the 2000 A.C.E. that differed markedly from the 1990 PES and were relatively untested. First, there is concern that the targeted extended search may not have been balanced (in that the search areas for P-sample and E-sample cases may not have been equivalent) and that the imbalance could have led to incorrect treatment of nonmatched E-sample cases. There is a related concern that balancing error may have occurred because some E-sample cases were coded as correct when they were in fact outside the block cluster or because not all correct enumerations in the block cluster were searched for a match to the P-sample. Second, there is a concern that group quarters enumerations, such as of college students, may not have been handled correctly in the A.C.E. Group quarters residents were

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

TABLE 7-10 Gross Omissions and Erroneous Enumerations, 2000 A.C.E. and 1990 PES

 

Percent of Weighted E-Sample

Estimated Number of People (millions)

Erroneous Enumerations

2000 A.C.E.

1990 PES

2000

1990

Total

4.7

5.8

12.5

16.3

(1) Insufficient Information for matching

1.8

1.2

4.8

3.4

(2) Duplicates

0.7

1.6

1.9

4.5

(3) Fictitious

0.3

0.2

0.7

0.6

(4) Geocoding Error

0.2

0.3

0.6

0.8

(5) Other Residence

1.0

2.2

2.7

6.2

(6) Imputed

0.6

0.3

1.8

0.8

 

2000A.C.E. (counts in millions)

1990 PES (counts in millions)

Alternative Estimates of Gross Errors

Erroneous Enumerationsa

Omissions

Erroneous Enumerations

Omissions

(1) Including All Types of Erroneous Enumerations (EEs)

12.5

15.8

16.3

20.3

(2) Excluding EEs with Insufficient Information to Match and Imputed EEs (EE types (1) and (6) above)

5.9

9.2

11.2

15.2

(3) Excluding EEs excluded in row (2) and also “geocoding errors” and “other residence” (EE types (4) and (5) above)a

3.1

6.4

4.4

8.4

(4) Row (3) plus an allowance for 50 percent duplication among late additions

4.3

7.6

N.A.

N.A.

NOTES: People with insufficient information who were excluded from the E-sample at the outset are not included in any of these numbers (EE category (1) above comprises additional cases found to lack enough reported data for matching). Gross omissions are calculated by adding net omissions (3.3 million people in 2000; 4 million people in 1990) to gross erroneous enumerations.

aThe alternative estimates of erroneous enumerations in 2000 are not consistent with the information on types of erroneous enumerations above. The discrepancy is being investigated with the Census Bureau.

SOURCE: Adapted from Anderson and Fienberg (2001a:Tables 2, 3).

supposed to be excluded from the A.C.E.; error would occur if, say, enumerations of college students at their parental home were not classified as erroneous. Third, studies of the effect of the PES-C procedure on the estimates of match rates for movers and, more generally, estimates of matching error are not yet available. Finally, additional evaluations are needed to determine if the post-stratification was the most efficient possible and to assess the sensitivity of the A.C.E. results to error from particular sources, such as matching, imputation, and the PES-C procedure used for movers.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×

Overall, the 2000 A.C.E. showed similar, but less pronounced, patterns of net undercount than the 1990 PES. Given that P-sample match rates and E-sample erroneous enumeration rates were similar between the A.C.E. and the 1990 PES, the key question at this time is why the A.C.E. showed a reduced net undercount, overall, and for such groups as Hispanics, non-Hispanic blacks, children, and renters. Because the only other component of the DSE equation is the number of census people with insufficient information to include in the E-sample (IIs), our attempts to resolve the undercount puzzle centered on that component of the census results. In Chapter 8, we analyze distributions of people requiring imputation and people reinstated in the census (late additions) and determine that people requiring imputation largely explain the reduced net undercount in 2000 for historically less well-counted groups.

Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
This page in the original is blank.
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 103
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 104
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 105
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 106
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 107
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 108
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 109
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 110
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 111
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 112
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 113
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 114
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 115
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 116
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 117
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 118
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 119
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 120
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 121
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 122
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 123
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 124
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 125
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 126
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 127
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 128
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 129
Suggested Citation:"7. Accuracy and Coverage Evaluation: Assessment." National Research Council. 2001. The 2000 Census: Interim Assessment. Washington, DC: The National Academies Press. doi: 10.17226/10210.
×
Page 130
Next: 8. Imputations and Late Additions »
The 2000 Census: Interim Assessment Get This Book
×
Buy Paperback | $64.00 Buy Ebook | $49.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

This volume contains the full text of two reports: one is an interim review of major census operations, which also assesses the U.S. Census bureau's recommendation in March 2001 regarding statistical adjustment of census data for redistricting. It does not address the decision on adjustment for non-redistricting purposes. The second report consists of a letter sent to William Barron, acting director of the Census Bureau. It reviews the new set of evaluations prepared by the Census Bureau in support of its October decision. The two reports are packaged together to provide a unified discussion of statistical adjustment and other aspects of the 2000 census that the authoring panel has considered to date.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!