Read "Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey" at NAP.edu

Page 71 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

6

Weighting and Estimation

As is the case with the sample design, the current weighting and estimation procedures used in the American Community Survey (ACS) are not optimized to produce reliable small-area estimates for group quarters (GQ) residents, nor, as a result, are they adequate to produce reliable estimates of characteristics of the total population. Acknowledging these limitations, the Census Bureau is continuing to evaluate options for revising the weighting procedures. The methodology is expected to evolve based on decisions made about revising other aspects of the survey design, particularly the imputation plans discussed later in this chapter.

WEIGHTING PROCEDURES

The ACS estimates are based on a raking ratio estimation procedure that results in two sets of weights: a weight assigned to each sample person record and a weight assigned to each sample housing unit record. Estimates of person characteristics are based on summing the person weights in the geographic area of interest. Estimates of family, household, and housing unit characteristics are based on summing the housing unit weights.

Current Weighting Procedures

The Census Bureau uses a design-based weighting procedure, conducted in two steps: the first step involves assigning weights to persons in group quarters; the second step involves assigning weights to both housing units and to persons within housing units. The GQ person weighting is conducted before the house-

Page 72 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

hold person weighting because the weighting for household persons makes use of the GQ person weights. The household and GQ weights are combined to produce estimates of the total population.

The first step applies a trimmed base weight that reflects the initial sampling probability and the within-GQ subsampling probability. The second step is a noninterview adjustment across group quarters, defined within state, by county and by major GQ type. If the sample is small or if the adjustment is large, the cells are collapsed to state by major GQ type. The third step applies a coverage adjustment, controlling the weighted number of GQ persons at the state level by major GQ type, using the GQ population estimates from the Population Estimates Program (PEP).

On the basis of the current estimation procedures, only the total population (households and group quarters) is guaranteed to be controlled at the county (or groups of less populous counties) level. When some small geographic areas with GQ populations do not have group quarters represented in the sample, group quarters in other areas may be overrepresented. Thus, for some small areas, the 5-year estimates do not reflect local reality.

Alternative Approach Under Consideration

The Census Bureau is researching the possibility of introducing a new imputation and weighting approach, with the primary goal of achieving representation at the county level of all major GQ types present in that county for the 1-, 3-, and 5-year data. A secondary goal is to achieve representation at the tract level by major GQ type for the 5-year data. Keeping in mind the ongoing imputation research, the new method will make no distinction between sampled and imputed GQ person records, and it is developed to be sufficiently flexible to accommodate different possible outcomes of that research (Asiala, 2011).

The alternative GQ weighting methodology is based on the steps described below (Asiala, 2011). This approach is discussed in further detail later in this chapter.

Defining separate base weights for persons in large and small group quarters.
Applying tract- and county-level constraints based on the modeled populations on the frame and applying state by major GQ type-level controls based on independent population estimates.

PEP CONTROLS AND ALTERNATIVES

The population controls used in the ACS weighting process are based on estimates produced by the Census Bureau’s Population Estimates Program. The PEP publishes total population estimates annually, based on a methodol-

Page 73 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

ogy that essentially updates data from the most recent census with changes from births, deaths, and migration, as well as additional refinements based on Medicare enrollment data and estimates of the GQ population. After each new decennial census, the population estimates are rebenchmarked to reflect the new counts. For example, the 2010 ACS 1-year data, which are controlled to population estimates that reflect the 2010 census results, are not strictly speaking comparable to 2009 ACS 1-year data (or ACS 1-year data from previous years), which are controlled to population estimates derived as updates of the 2000 census (U.S. Census Bureau, 2011g).

To estimate changes in GQ populations, the Census Bureau starts with GQ population counts by facility type for each subcounty area from the previous decennial census and updates them with a time series of individual GQ records from the Group Quarters Report (GQR). The GQR is an annual estimate of GQ populations prepared by Federal-State Cooperative for Population Estimates program units (U.S. Census Bureau, 2008b). A time series of the GQ population is derived in two steps. First, facility-level GQ populations from the GQR are summed to the subcounty level by facility type for each estimate date in the time series. Second, a year-to-year change is calculated by the aggregated GQR time series of these populations.

As the decade progresses, the census counts become increasingly outdated and the updates, such as the GQR data collected from states, cannot always be relied on, which affects the overall quality of the GQ population estimates. For some GQ types, the population estimates are basically the decennial census counts kept constant. At the national and state levels, the Census Bureau urges caution when comparing the GQ population numbers based on the 2010 ACS and the 2010 census, and it advises data users not to compare the GQ data from these two sources at the substate level (U.S. Census Bureau, 2011h).

To better understand the magnitude of the differences among the GQ estimates from different sources, the panel compared the GQ counts from several ACS data releases (2005-2009 5-year, 2007-2009 3-year, and 2009 1-year) to expected counts interpolated from the 2000 and 2010 census data. Although the interpolated counts are themselves subject to error, they provide a reasonable comparison to ACS estimates as long as the change in population between 2000 and 2010 is fairly smooth. Table 6-1 shows the mean absolute percent errors (MAPE) and mean algebraic percent errors (MALPE) for the comparisons between the state-level ACS period estimates and the GQ count interpolated for the year in the middle of the time period, based on the 2000 and 2010 census counts (treating the interpolated number as the “gold standard”).¹

¹The MAPE is calculated as the average across all states of the absolute difference between the ACS estimate and the interpolated estimate, divided by the interpolated estimate and multiplied by 100. The MALPE is calculated similarly, except the sign of the difference (positive or negative) is considered in the calculation.

Page 74 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

TABLE 6-1 MAPE and MALPE of State-Level ACS Estimates of Group Quarters Compared with Expected GQ Counts


	ACS 0509/ Expected 2007	ACS 0709/ Expected 2008	ACS 09/ Expected 2009

MAPE	5.5	6.0	6.2
MALPE	2.5	2.2	1.7

NOTES: Expected counts are interpolated based on the 2000 and 2010 census counts. ACS = American Community Survey, GQ = group quarters, MALPE = mean algebraic percent error, MAPE = mean absolute percent error. SOURCE: Calculated by the panel based on 2000 census data and the 2010 census Advance Group Quarters Summary File.

Appendix H shows plots of the relative errors computed as the difference between the ACS estimates and the expected estimates of the GQ population, divided by the expected estimates of the GQ population in U.S. states. The graphs show that, in the case of the biggest states, the ACS estimates from all three data releases examined are uniformly higher than the expected estimates.

Table 6-2 shows the mean absolute percent error and mean algebraic percent error for counties by region and for counties with populations under 20,000. As anticipated, the MAPE errors at the county level are higher than at the state level, and they are highest for the counties with the smallest number of residents (under 20,000). Table 6-3 shows the county-level errors using medians instead of means.

TABLE 6-2 MAPE and MALPE of County-Level ACS Estimates of Group Quarters Compared with Expected GQ Counts


Region		ACS 0509/ Expected 2007	ACS 0709/ Expected 2008	ACS 09/ Expected 2009

Northeast	MAPE	22.3	20.8	23.4
	MALPE	5.2	7.4	9.9
Midwest	MAPE	56.8	28.1	26.4
	MALPE	17.1	13.1	7.8
West	MAPE	64.8	27.2	26.0
	MALPE	8.0	6.0	4.1
South	MAPE	55.9	39.1	30.4
	MALPE	14.9	19.3	9.4
Counties with population under 20,000	MAPE	86.2	118.0	—
	MALPE	20.0	56.3	—

NOTES: Expected counts are interpolated based on the 2000 and 2010 census counts. ACS =American Community Survey, GQ = group quarters, MALPE = mean algebraic percent error, MAPE = mean absolute percent error. SOURCE: Calculated by the panel based on 2000 census data and the 2010 census Advance Group Quarters Summary File.

Page 75 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

TABLE 6-3 MAPE and MALPE of County-Level ACS Estimates of Group Quarters Compared with Expected GQ Counts


Region		ACS 0509/ Expected 2007	ACS 0709/ Expected 2008	ACS 09/ Expected 2009

Northeast	MAPE	12.7	15.4	15.8
	MALPE	3.2	6.1	3.3
Midwest	MAPE	33.6	18.8	17.7
	MALPE	2.7	3.7	1.1
West	MAPE	34.7	17.4	16.6
	MALPE	-9.8	-1.7	-5.4
South	MAPE	33.5	24.5	24.1
	MALPE	0.3	7.0	0.5
Counties with population under 20,000	MAPE	68.5	67.9	—
	MALPE	-10.1	31.3	—

NOTES: Expected counts are interpolated based on the 2000 and 2010 census counts. ACS = American Community Survey, GQ = group quarters, MALPE = mean algebraic percent error, MAPE = mean absolute percent error. SOURCE: Calculated by the panel based on 2000 census data and the 2010 census Advance Group Quarters Summary File.

In most cases, the MAPE statistics are larger for the 5-year estimates than for the 1- and 3-year estimates, possibly because that data release includes smaller counties that may have estimates that are disproportionately unreliable. Table 6-4 shows that the MAPEs and MALPEs are reduced when the means are

TABLE 6-4 Weighted MAPE and MALPE of County-Level ACS Estimates of Group Quarters Compared with Expected GQ Counts


Region		ACS 0509/ Expected 2007	ACS 0709/ Expected 2008	ACS 09/ Expected 2009

Northeast	MAPE	14.5	15.7	18.2
	MALPE	5.6	6.3	6.9
Midwest	MAPE	22.5	19.4	20.9
	MALPE	7.5	8.4	5.4
West	MAPE	17.7	15.8	18.6
	MALPE	3.1	2.8	2.2
South	MAPE	27.4	25.1	26.6
	MALPE	7.8	8.7	6.3
Counties with population under 20,000	MAPE	76.9	119.1	—
	MALPE	20.6	57.9	—

NOTES: GQ counts are weighted by the 2010 total population size. Expected counts are interpolated based on the 2000 and 2010 census counts. ACS = American Community Survey, GQ = group quarters, MALPE = mean algebraic percent error, MAPE = mean absolute percent error. SOURCE: Calculated by the panel based on 2000 census data and the 2010 census Advance Group Quarters Summary File.

Page 76 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

weighted by the total population counts from the 2010 census. Yet it is troubling to see estimate errors of this magnitude. Part of the apparent error may be due to the simplistic manner by which the expected estimate was derived. But Table 6-3 reveals that for the counties selected for examination, more than half had ACS GQ estimates that deviated from the expected GQ estimate by more than 30 percent (for all but the Northeast) in 2005-2009 and close to 20 percent for 2007-2009. The story for small counties is much worse, with MAPEs for half the counties exceeding 65 percent error. For small counties, the population weighted MAPE for 2007-2009 suggests that well over half of the selected counties had errors in ACS GQ estimates that exceed 100 percent.

Appendix I shows plots of the relative errors computed as the difference between the 2005-2009 ACS estimate and the expected estimates of the GQ population, divided by the expected estimates of the GQ population in selected counties by region. The upper and lower limits for the error bars were computed as plus or minus the margin of error of the ACS divided by the expected estimate, where the margin of error here is twice the standard error of the ACS.

The ACS estimates tend to be higher than the expected values in the largest states in the Northeast and the Midwest. Appendix J shows similar relative error plots for selected counties with populations under 20,000. For these counties, the ACS estimates do not appear to be consistently higher or lower than the expected values.

The tables and graphs illustrate large overall differences between the GQ estimates from the ACS and the expected GQ population counts based on interpolated census numbers. The impact of these differences, however, varies greatly among counties, depending on local circumstances, which needs to be explored further. The panel anticipates that greater clarity regarding these difference explorations will result from the Census Bureau’s research comparing ACS estimates for 2010 against the 2010 census counts. The comparisons conducted by the panel could be used as a template for a more thorough analysis by the Census Bureau to determine the impact of these differences, particularly for small areas, because in small areas inaccurate GQ estimates can have an especially large impact on the accuracy of the data for the total population. Issues specific to small areas are discussed in further detail later in this chapter.

Following the release of counts from the decennial census, the Census Bureau typically conducts a formal evaluation of errors (bias and precision) in its population estimates for various levels of geography. These tests generally treat the census counts as the gold standard against which the population estimates are evaluated. The Census Bureau awarded eight contracts to external researchers to evaluate the 2010 round of population estimates against the 2010 census and to assess alternative population estimation methodologies. The purpose of this work is to evaluate the current PEP method by comparing the population estimates of the total resident population and the household population at the national, state, and county levels with the census counts.

Page 77 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

However, despite uncertainty surrounding the quality of the GQ estimates prepared by the PEP, the proposed evaluation research regrettably is focused only on the total population (household and GQ populations combined) and on the household population compared with total 2010 census counts. The Census Bureau plans to consider the GQ estimates separately at a later time, but this could be a missed opportunity to better understand the challenges surrounding the GQ population estimates in relation to the total population estimates and to inform the deliberations about the role of the GQ population in the ACS. The panel urges that an evaluation of the GQ estimates should be conducted along with the evaluation of other aspects of the Population Estimates Program.

Recommendation 6-1: The Census Bureau should conduct an evaluation of the 2010 American Community Survey estimates of the group quarters (GQ) population against the 2010 census counts at all levels of geography for which the Census Bureau’s Population Estimates Program (PEP) prepares such estimates. This research should estimate bias and imprecision by GQ type and seek to identify ways to improve the PEP estimates of group quarters.

Population controls for GQ estimates need to be considered in the context of their effect on error evaluations, given that inaccurate population controls are more likely to introduce error than to reduce it. Although there are arguments for considering county, or even subcounty controls, this is unrealistic at the moment, because GQ types often are collapsed as a result of small sample size or large adjustments. An alternative would be to control for demographic characteristics (age, sex, race, and Hispanic origin) and to drop controls for GQ type. This approach would reduce the likelihood that demographic characteristics for small areas are distorted because an age-clustered GQ, such as a nursing home or dormitory, happens to be included in the sample for the area.

Arguably, the use of outdated or inadequate controls may be worse than the use of no controls at all. As another alternative to the current approach, the use of population controls could be limited to those GQ types for which the controls are most reliable. If the updates received from outside sources about some GQ types are better than the PEP controls, it should be possible to use these population estimates instead. For example, the records of the Defense Manpower Data Center in the U.S. Department of Defense or the Federal Bureau of Prisons may supply better data than the current approach of updating the census counts for military and correctional facilities. In addition, many GQ facilities also maintain basic administrative records about their residents. If these facility-level records include sufficient information to produce population counts by demographic cross-classifications, they could also be used as controls.

As discussed, state and other local resources are underutilized as sources of data. State governments often have comprehensive lists of group quarters

Page 78 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

that are more current than any other source, and they often produce their own estimates as well (often based on a simple telephone call to facility administrators). Considering the limitations and costs of the current procedures, it should be worth exploring the possibility of obtaining state-generated estimates of GQ populations and assessing how these compare to the bureau’s own estimates, as recommended in Chapter 4.

Recommendation 6-2: Depending on the outcome of the evaluation discussed in Recommendation 6-1, the Census Bureau should evaluate the relative advantages and disadvantages of developing control totals for group quarters (GQ) residents in the American Community Survey by demographic characteristics (age, sex, race, ethnicity) at the state level, possibly in addition to the control totals that are currently implemented by GQ type. The Census Bureau should also evaluate the possibility of using population controls only for the GQ types for which reliable controls are available. Finally, the Census Bureau should evaluate whether data from outside sources that are currently used to provide updates for the sampling frame could also be used for controls.

ESTIMATES OF THE GQ POPULATION IN SMALL AREAS

The decennial census, because of its role of providing complete counts of the population down to the census block level, mostly succeeds in completely enumerating the GQ population everywhere and is able to support counts by GQ type for all entities in the census geographic hierarchy. In contrast, the state-based sample design of the ACS is not an adequate vehicle for providing small-area estimates of the GQ population.

The ACS substate samples are highly variable, particularly by GQ type, and there are large fluctuations over time in the characteristics associated with residence in group quarters. In some cases, this variation results in counties with known GQ facilities within their administrative boundaries having no group quarters represented in the sample. Table 6-5 shows the number of counties with specific GQ types on the sampling frame and whether the GQ type is actually represented in the 2006-2009 ACS sample.

At lower geographic levels this is an even more common occurrence, with approximately half of the census tracts that have group quarters according to the sampling frame ending up with none selected in the sample after 4 years (Asiala, 2010). Table 6-6 shows the breakdown of census tracts with and without group quarters in the sample, and Table 6-7 illustrates the differences in the availability of county-level samples among major GQ types.

As illustrated in Table 6-2, the MAPEs and MALPEs associated with the differences between the GQ estimates from the ACS and the census counts are especially large for counties with populations under 20,000. The ACS estimates

Page 79 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

TABLE 6-5 GQ Sample in Counties with Group Quarters on the ACS Sampling Frame by Major Type of Group Quarters, 2006-2009


Major GQ Type	Percentage of Counties with GQ Sample in the ACS	Percentage of Counties With-out GQ Sample in the ACS	Total Number of Counties with GQ Type on Frame

Correctional facilities for adults	65.3	34.7	2,745
Juvenile facilities	55.8	44.2	1,182
Nursing facilities/skilled nursing facilities	88.0	12.0	2,955
Other institutional facilities	41.4	58.6	1,332
College/university student housing	85.5	14.5	1,155
Military group quarters	54.5	45.5	396
Other noninstitutional facilities	66.9	33.1	2,823
Total	65.3	34.7	12,588

SOURCE: U.S. Census Bureau (2011e).

TABLE 6-6 GQ Sample in Census Tracts with Group Quarters on the ACS Sampling Frame, 2006-2009


Type of Census Tract	Percentage of Tracts	Number of Tracts

Census tracts with GQ sample	49.8	21,596
Census tracts without GQ sample	50.2	21,771
Total census tracts with group quarters	100.0	43,367

SOURCE: U.S. Census Bureau (2011e).

TABLE 6-7 GQ Sample in Census Tracts with Group Quarters on the ACS Sampling Frame by Major Type of Group Quarters, 2006-2009


Major GQ Type	Percentage of Tracts with ACS Sample	Percentage of Tracts Without ACS Sample	Total Number ofTracts with GQ Type on Frame

Correctional facilities for adults	57.7	42.3	4,994
Juvenile facilities	40.2	59.8	2,818
Nursing facilities/skilled nursing facilities	59.4	40.6	16,583
Other institutional facilities	27.1	72.9	3,633
College/university student housing	72.5	27.5	3,351
Military group quarters	49.8	50.2	576
Other noninstitutional facilities	28.7	71.3	34,971
Total	47.9	52.1	66,926

SOURCE: U.S. Census Bureau (2011e).

Page 80 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

of the GQ population, and of total population characteristics, can be especially error prone not only if a county with GQ residents does not have any group quarters represented in the sample but also if the county has group quarters in the sample, in which case these may be weighted up to match state-level population controls. The controls will bring the data in line with the PEP estimates at the state level, but they can seriously skew the estimated distributions at the county and lower levels of geography.

For example, during the time period between the 2000 and 2010 censuses, the small county of Goochland, Virginia, was home to two large state correctional institutions: the Virginia Correctional Center for Women and the James River Correctional Center, both with a capacity of approximately 500 residents (Virginia Department of Corrections, 2011). While the 2000 and 2010 census numbers show little change in the number of GQ residents in the county and a slight drop in the proportion of GQ residents relative to the total population, the 2005-2009 5-year ACS estimates of the GQ population show a percentage increase in excess of 400 percent and a large margin of error associated with the GQ estimate (see Table 6-8). This also affects the estimates for the demographic characteristics of the total population in the county. For example, based on the census 2010 numbers, 19.2 percent of the county’s total population is black, whereas the 5-year ACS estimates show the black population to be 30 percent. The source of the problem seems to be the disproportional weighting up of the prisons in Goochland County to account for the lack of sample cases of prisons in other areas in the state.

As another example, the ACS data for Elmore County, Alabama, seems to suggest that the poverty rate in the county dropped from 14 to 10.4 percent between 2006 and 2007. However, a closer examination of the role of the group quarters in the sample reveals that the apparent change is largely explained by the fact that in 2006 the ACS estimate of the GQ population for the county was 1,976, and 90 percent of the GQ residents were in poverty. In 2007, no group quarters were included in the sample, so the 10.4 percent poverty rate for that year is essentially the household poverty rate, which is not very different from the 11.8 percent household poverty rate in 2006 (Asiala, 2010).

TABLE 6-8 Census and 5-Year ACS Estimates of the GQ Population in Goochland County, Virginia

x


Source	Total Population	Number in Group Quarters	Percentage in Group Quarters

Census 2010	21,717	1,405	6.5
ACS 2005-2009	20,429	5,707^*	27.9
Census 2000	16,863	1,388	8.2

^*90 percent margin of error of +/– 1,638. SOURCE: U.S. Census Bureau. Available: http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml.

Page 81 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

It would be unfair and incorrect to judge an estimation system by selecting nonrandomly two counties with glaring errors and highlighting those as if they were typical examples. They are not. However, they do illustrate some of the potential difficulties facing the Census Bureau in this regard, and they serve as a reminder that there are communities like Goochland and Elmore counties in which estimates with large discrepancies may be data users’ first exposure to local data from the American Community Survey. Problems such as these draw attention to the immense difficulties of estimating, on the basis of a sample survey, a sparse and irregularly distributed population (such as those residing in group quarters) for small geographic units. This is a fundamental tension arising from the conflicting goals of providing relatively current and frequent estimates for what are often very small units of geography based on a sample survey. The challenges lead to sample-based estimates that have, for the statistician, very large standard errors and, for the unsophisticated data user, numbers that often simply make no sense.

Acknowledging that the Census Bureau has made the decision not to apply release restrictions for the 5-year estimates based on data quality, the panel thinks that it is important to ensure that the published numbers, and the metadata behind those numbers, resonate with reality from the perspective of small geographic areas and users of such data. The importance of improving the sampling frame and identifying solutions that can improve the sampling design cannot be overstated. In addition, statistical solutions that can be particularly cost-effective in improving the estimation procedures should be evaluated. One such option to consider is the use of some type of indirect estimate. There are a variety of estimators in this class, ranging from simple to complex. Which type would be both feasible and an improvement over the current method is a subject for study. The Census Bureau for many years has employed a variation of this general approach as part of its Small Area Income and Poverty Estimates (SAIPE) Program. It produces annual small-area income and poverty estimates for school districts, counties, and states using a model-based approach that relies on combining survey data with population estimates and administrative records (National Research Council, 2000).

An option would be to use a composite of a small-area model estimate and direct estimate. If the geographic entity has group quarters but the sample has none, then the direct estimate would receive a weight of zero, and the model-based estimate would apply. Otherwise, a combination estimate could be used that weights the direct estimate and the model-based estimate based on the variance of each.

Sources of GQ data that could be used in a model include (but are likely not to be limited to) counts of residents and group quarters for small areas as shown on the frame, the previous census counts of GQ population by small area, data provided by state or local agencies regarding GQ populations, and possibly the PEP subcounty estimates of the GQ population. Another option would be to investigate the use of administrative records maintained by GQ

Page 82 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

facilities for this purpose, even if these records are found not to be comprehensive enough to replace interviews with residents.

The best estimate to use may depend on how old the latest census counts are at any particular point. The census counts might be used exclusively in the years immediately following the decennial census, but a few years later information obtained from administrative records or the PEP numbers (assuming that these can be improved for group quarters) might be more reliable. An additional issue to consider is how the unreliability of the GQ sampling frame may affect synthetic small-area estimates. An example is a similar effort, the Local Area Unemployment Statistics (LAUS) Program of the Bureau of Labor Statistics, which uses state-level estimates from the Current Population Survey (CPS) as input to create model-based estimates. This program found that the direct CPS estimates of unemployment for lower levels of geography are not reliable enough to publish (Pfeffermann and Tiller, 2006).

If a model-based small-area estimate were used for the total GQ population, for example, for a county, an additional dilemma arises. A decision would have to be made about whether acceptably accurate small-area estimates could be made for the GQ totals in demographic groups in the small area. If this is not possible, it may be reasonable to simply report a small-area estimate for the total GQ population without breakdown by characteristics, and breakdowns by characteristics for that area would be reserved for the household population only. One advantage of model-based estimates is that there would be fewer confidentiality concerns associated with the small-area data.

Recommendation 6-3: The Census Bureau should evaluate statistical methods, such as indirect estimation, for producing group quarters estimates for counties in which group quarters are known to exist based on the American Community Survey sampling frame but are not included in the sample.

CENSUS BUREAU IMPUTATION PLANS TO IMPROVE THE GQ ESTIMATES

In parallel with the panel’s work on this study, the Census Bureau has been conducting its own internal research to identify ways of improving the ACS estimates for substate geographies. Its research is focused on the possibility of using data from in-sample GQ facilities to impute person records for group quarters that are not in sample but are either on the ACS sampling frame or known to exist based on information from the 2010 census (Erdman and Nagaraja, 2010). The advantage of an imputation method over other model-based alternatives of producing estimates for the GQ population would be that imputation emulates the ACS data capture approach and enables the “modeled data” to be folded directly into estimates not only of the total population counts but also of the population characteristics. Given that at the time when this report was pre-

Page 83 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

pared, the Census Bureau was considering the implementation of the imputation approach, we discuss this approach in a little more detail in this section.

The Census Bureau considered two methods for selecting group quarters for hot-deck imputation of person records and two methods for selecting donors. The two approaches are described below, followed by the panel’s comments on the proposals.

Selecting Group Quarters for Imputation

The first option for selecting group quarters for imputation is designed to improve representation for each major GQ type by county (before tract), based on the following steps:

For each year and for each large group quarters not in sample, 2.5 percent of the population (expected based on the sampling frame) is imputed.
For each year and for each combination of county and major GQ type on the sampling frame but not in that year’s sample (or among the imputed), one small group quarters is selected at random, with probability equal to the reciprocal of the number of small group quarters of the same major GQ type in the county.
For each small group quarters selected, person records equal to 20 percent of the population (expected based on the sampling frame) are imputed.
Each combination of tract and major GQ type on any year’s sampling frame, but not in any year’s sample (or among any year’s imputed records), is selected.
For each combination of tract and major GQ type above, for each year that the combination exists on the sampling frame, one small group quarters is selected at random, with probability equal to the reciprocal of the number of small group quarters of the same major GQ type in the tract.
For each small group quarters selected, person records equal to 20 percent of the expected population are imputed.

The second GQ selection option is designed to improve the representation of each major GQ type by tract (before county). To accomplish this, the steps described above are repeated, imputing for tracts before imputing for counties, as follows:

For each year and for each large group quarters not in sample, 2.5 percent of the expected population is imputed.

Page 84 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

For each year and for each combination of tract and major GQ type on the sampling frame but not in that year’s sample (or among the imputes), one small group quarters is selected at random, with probability equal to the reciprocal of the number of small group quarters of the same major GQ type in the tract.
For each small group quarters selected, person records equal to 20 percent of the expected population are imputed.
Each combination of county and major GQ type on any year’s sam-pling frame, but not in any year’s sample (or among any year’s imputed records), is selected.
For each combination of county and major GQ type above, for each year that the combination exists on the sampling frame, one small group quarters is selected at random, with probability equal to the reciprocal of the number of small group quarters of the same major GQ type in the county.
For each small group quarters selected, person records equal to 20 percent of the expected population are imputed.

Selecting Donors for Imputation

The Census Bureau also considered two options for selecting GQ residents with completed interviews who could serve as donors for the imputation. One option is to choose from within specific GQ type (when the donor-to-recipient ratio is reasonable) and give preference to donors from facilities that are geographically close. The donor pool is set to the first combination of geography and GQ type in which there is at least one donor per five imputed records, from the list of combinations below:

County and specific type
County and major type
State and specific type
State and major type
Division and specific type
Division and major type
Region and specific type
Region and major type
Specific type without restriction
Major type without restriction

Another option for donor selection is to apply a K-means clustering algorithm that selects donors from tracts that are demographically similar. The Census Bureau identified eight demographic clusters of tracts as part of the marketing campaign for the 2010 census, taking into consideration tract characteristics, such as vacancy rates, housing unit type, family structure, poverty

Page 85 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

rate,employment rate, and others (Bates and Mulry, 2008). The clusters are as follows:

All around average I (homeowner skewed)
All around average II (renter skewed)
Economically disadvantaged I (homeowner skewed)
Economically disadvantaged II (renter skewed)
Ethnic enclave I (homeowner skewed)
Ethnic enclave II (renter skewed)
Single/unattached/mobiles
Advantaged homeowners

Using the clusters above was another option considered to guide the donor selection process. The procedure involves grouping group quarters selected for imputation by cluster and type. If there is at least 1 donor per 5 imputations needed, donors are selected at random from within cluster and specific type. If this approach does not yield at least 1 donor per 5 imputations needed, the subtypes of clusters (i.e., I and II) are collapsed.

Evaluation of the Imputation Methodology

The Census Bureau compared the imputation methods proposed and the current design-based ACS method using a GQ population simulated based on census 2000 data, using estimates of age, sex, race, and Hispanic origin for comparison (Erdman and Nagaraja, 2010). From this population, 25 independent ACS samples were generated, and each of the imputation procedures was tested on the simulated samples. The results of the two methods for selecting facilities for imputation were comparable. For donor selection, the expanding geographic search performed better than the cluster approach. The results of the imputation methods were systematically biased even at the state level, but the variances of the imputed estimates were smaller than variances of the estimates from the design-based method. Regardless of the method used, close to half of the augmented data consisted of imputed records, and in the case of some major GQ types, well over half of the records were imputed.

Table 6-9 shows that the number of imputed persons is around half overall, but it is particularly high for some group quarter types, such as “other long-term care” facilities.

Overall, 86 percent of imputations come from the same specific GQ type as the recipient, and 69 percent come from within the same county, although the results for geography vary greatly by type (see Table 6-10).

Based on the simulation study using census 2000 data, several changes were made to the imputation methodology:

Page 86 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

TABLE 6-9 Survey Respondent and Imputed Record Counts by Major GQ Type for 5-Year Estimates


Major GQ Type	Number of Respondents (a)	Number of Imputed Persons (b)	Percentage of Imputed Persons (b/(a+b))	Number of RespondentsWho Are Donors

Correctional facilities for adults	236,946	132,931	35.9	87,242
Juvenile facilities	17,139	23,031	57.3	10,787
Nursing facilities/skilled nursing facilities	185,109	155,511	45.7	101,381
Other institutional facilities	7,331	28,582	79.6	6,883
College/university student housing	173,121	167,865	49.2	102,532
Military group quarters	25,416	30,325	54.4	16,530
Other noninstitutional facilities	84,322	177,700	67.8	67,879
Total	729,384	715,945	49.5	393,234

SOURCE: U.S. Census Bureau (2011i).

Taking account of sex when selecting donors for GQ facilities that have been preidentified as single-sex facilities.
Adjusting the expected GQ populations based on an algorithm that applies observed population changes to the unobserved group quarters.
Restricting imputation for GQs with seasonal residence patterns.
Limiting the number of times a person can be used as a donor in a tract.

A second evaluation was conducted using the expanding search method emphasizing county coverage, based on ACS data from 2006 through 2010, so that the effects of the imputation could be evaluated on the full range of estimates produced by the ACS. Examining the impact of the imputation on state-level estimates revealed that the imputation-based estimates were relatively consistent with the design-based estimates. Smaller states, especially Delaware, Idaho, Maine, and Wyoming, tended to have more of the estimates flagged as different. Larger differences were observed for “other long-term care” and “other noninstitutional” categories, which were also the GQ types with the higher imputation rates.

Limitations of the Imputation Method

The imputation methods are largely dependent on the quality of the sampling frame. In other words, reliable information is necessary about the GQ facilities that are not in sample, including their type and number of

Page 87 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

TABLE 6-10 Donor Sources for Imputed Records (in percentage)


Major GQ Type (and number of imputed records)	Donor in Same Specific Type	Donor in Same Tract	Donor in Same County (not tract)	Donor in Same State (not county)	Donor Outside of State	Total Proportion of Donors

Correctional facilities for adults (132,931)	FALSE	3.9	9.3	0.9	0.0	14.2
Correctional facilities for adults (132,931)	TRUE	40.5	15.5	25.4	4.5	85.8
Juvenile facilities (23,031)	FALSE	1.7	13.8	17.7	1.9	35.1
Juvenile facilities (23,031)	TRUE	5.9	11.5	44.9	2.4	64.9
Nursing facilities/skilled nursing facilities (155,511)	FALSE	—	—	—	—	—
Nursing facilities/skilled nursing facilities (155,511)	TRUE	7.2	75.4	17.1	0.3	100.0
Other institutional facilities (28,582)	FALSE	1.0	9.4	27.9	6.9	45.1
Other institutional facilities (28,582)	TRUE	4.6	6.2	32.8	11.3	54.9
College/university student housing (167,875)	FALSE	—	—	—	—	—
College/university student housing (167,875)	TRUE	37.1	46.2	13.4	3.2	100.0
Military group quarters (30,325)	FALSE	3.3	3.5	3.1	0.3	10.3
Military group quarters (30,325)	TRUE	40.9	18.8	15.6	14.5	89.7
Other noninstitutional facilities (177,700)	FALSE	0.4	22.5	8.7	0.2	31.8
Other noninstitutional facilities (177,700)	TRUE	1.2	33.3	30.0	3.7	68.2
All GQ Types (715,945)	FALSE	1.1	8.3	4.1	0.4	13.9
All GQ Types (715,945)	TRUE	20.2	39.8	22.4	3.7	86.1

NOTE: Nursing homes and college dormitories only have one specific type (see Box 1-1). SOURCE: U.S. Census Bureau (2011i).

Page 88 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

residents. Otherwise, the shortcomings described in earlier sections related to the GQ frame could result in scenarios in which data are imputed into facilities that no longer exist. The panel thinks that improvements to the GQ sampling frame are essential to ensure the success of the imputation approach.

The success of the item imputation plans also depends on the quality of the donors. Some of the data associated with the donor cases are also imputed due to item nonresponse, which, in essence, translates into “double imputation.” The item imputation rates in the GQ data are higher than in the household data and are particularly high for the income questions (see Table 6-11). Item imputation rates also vary by state (see Table 6-12). To the extent of the panel’s knowledge, the effects of the double imputation on the data have not yet been evaluated.

Panel Observations on the Imputation Plans

The Census Bureau’s plans to impute nonsample GQ person records are in line with the panel’s view that GQ estimates can be produced based on alternatives to a design-based weighting approach. The proposed method allows for the creation of a microdata file with all characteristics included that could also serve as the basis for a Public Use Microdata Sample (PUMS) file and would be valuable to data users. By contrast, small-area estimation would involve constructing separate estimates for group quarters, which would then be combined with the household estimates to obtain total population estimates. Moreover, person-level imputation would not need to be performed for the GQ types that are moved to the housing unit sample (see Recommendation 4-7), which also has the advantage of reducing the volume of records imputed.

We discuss below some refinements to the Census Bureau plans presented to the panel. We also make recommendations for additional research that could inform the direction of this work in the future.

There are several alternatives that could be explored to evaluate methods for identifying donors. One concern is that donors are pulled from multiple group quarters in order to impute for a recipient GQ. This does not reflect the natural intraclass correlation that occurs within a GQ facility, but it could nevertheless produce unbiased estimates of descriptive statistics. The variance of the imputation procedure could, in fact, be lower this way. If more complex statistics—having to do with the relationships of variables among persons in the same group quarters—were of interest, then the imputation method could be biased. Another issue is that the imputation model assumes that all GQ cases, in each cell, have the same mean or are, in some sense, exchangeable. This may not account for other important covariates.

In the case of the donor selection procedure that prioritizes donor pools based on geographic proximity, it is not clear that the sequence of combinations

Page 89 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

TABLE 6-11 Item Imputation Rates (in percentage) for Selected Characteristics by GQ Type, 2005-2009 American Community Survey


Major GQ Type	Sex	Age	Race	Hispanic Origin	One or More Income Source	Marital Status	Citizenship	Speaks Another Language at Home	Mobility Status	Veteran Status

Total GQ population	0.2	1.1	2.5	3.1	37.9	5.0	5.7	10.7	7.7	10.2
Correctional facilities for adults	0.2	0.5	1.5	2.3	27.0	6.6	3.4	11.7	8.1	9.1
Juvenile facilities	0.3	3.2	2.0	2.8	25.4	3.0	5.4	10.0	7.8	7.7
Nursing facilities/skilled nursing
facilities	0.2	1.2	0.7	1.6	63.4	3.3	6.0	9.5	5.6	13.1
Other institutional facilities	0.3	11.4	1.6	4.2	44.2	6.6	11.0	14.5	10.1	15.1
College/university student
housing	0.1	0.8	5.4	5.4	28.8	5.8	7.6	12.4	9.2	10.5
Military group quarters	0.0	0.3	2.4	2.1	16.7	2.0	4.3	6.9	6.3	2.1
Other noninstitutional facilities	0.2	1.3	1.4	2.1	43.1	4.3	5.5	8.3	7.3	9.5
2005 household population	0.2	0.8	1.6	1.5	18.0	5.4	1.6	1.7	2.1	2.1

NOTE: The 2005 American Community Survey did not include group quarters. SOURCE: Beaghen (2011).

Page 90 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

TABLE 6-12 Item Imputation Rates (in percentage) for Selected Characteristics of the GQ Population by State, 2005-2009 American Community Survey


State	Sex	Age	Race	Hispanic Origin	One or More Income Source	Marital Status	Citizenship	Speaks Another Language at Home	Mobility Status	Veteran Status

Alabama	0.2	0.7	0.8	2.7	28.7	4.6	3.4	6.0	5.8	7.5
Alaska	0.2	0.5	1.2	0.8	11.1	2.7	3.3	3.7	3.4	3.6
Arizona	0.1	0.5	3.1	3.3	30.0	7.4	6.0	9.0	7.2	9.8
Arkansas	0.0	1.0	2.3	1.2	38.4	2.3	4.6	6.4	5.4	7.9
California	0.2	1.0	3.6	2.8	36.6	7.5	8.3	12.3	7.8	12.8
Colorado	0.0	0.4	2.0	2.3	44.1	3.7	4.2	22.1	17.9	23.4
Connecticut	0.1	0.6	4.2	4.1	48.1	4.8	8.9	13.3	10.3	9.0
Delaware	0.0	0.3	1.5	3.0	37.6	2.0	11.0	15.6	14.9	8.7
District of Columbia	0.1	2.6	3.5	3.9	48.2	10.0	12.8	20.3	20.7	27.8
Florida	0.3	1.0	2.0	3.7	32.9	5.6	7.3	11.3	9.1	12.0
Georgia	0.3	0.4	1.0	1.4	22.1	1.9	2.0	4.1	2.8	4.1
Hawaii	0.1	1.0	1.4	1.9	29.8	1.1	1.8	4.6	2.8	7.6
Idaho	0.1	0.6	0.3	0.5	18.5	0.3	1.1	5.1	1.6	3.1
Illinois	0.2	1.5	2.7	3.4	41.4	3.4	4.5	9.9	5.2	10.3
Indiana	0.3	1.4	1.0	1.7	44.4	5.7	7.6	14.0	11.6	15.7
Iowa	0.1	0.8	3.3	3.9	49.7	5.6	6.5	9.8	7.0	10.7
Kansas	0.2	0.5	3.8	4.2	45.2	4.5	5.0	10.0	7.3	10.1
Kentucky	0.1	0.6	1.3	1.7	34.5	2.4	3.8	6.8	5.2	7.9
Louisiana	0.1	1.6	0.5	2.4	34.5	3.1	2.2	6.7	6.8	8.2
Maine	0.0	0.3	4.9	6.4	41.3	11.9	13.3	19.2	13.0	18.1
Maryland	0.3	0.6	2.7	3.1	36.0	4.3	8.4	14.7	12.7	16.3
Massachusetts	0.1	0.9	4.2	4.7	50.8	9.3	13.6	19.8	12.4	16.9
Michigan	0.1	0.6	1.1	1.5	33.9	2.8	2.7	4.5	3.5	6.3
Minnesota	0.1	0.6	2.1	2.8	52.6	2.8	4.6	7.4	5.1	8.4
Mississippi	0.2	0.5	0.4	1.1	28.4	2.0	2.8	5.9	4.6	6.7

Page 91 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

Missouri	0.1	1.5	0.4	0.9	38.0	1.8	1.1	3.3	1.8	4.6
Montana	0.0	0.4	0.4	0.6	40.6	2.3	1.4	3.6	2.9	3.7
Nebraska	0.0	0.9	1.2	0.9	45.7	2.0	1.6	3.4	1.3	4.7
Nevada	0.2	0.5	1.3	0.6	23.4	1.7	1.5	3.1	1.7	2.3
New Hampshire	0.1	1.2	2.3	3.9	42.7	3.7	3.4	7.1	3.8	8.6
New Jersey	0.3	0.9	2.0	3.7	50.4	3.6	6.8	17.4	13.6	16.0
New Mexico	0.0	2.6	2.2	2.5	35.6	2.9	4.4	13.2	7.4	9.8
New York	0.3	2.5	6.1	6.7	42.8	7.6	9.4	15.4	10.0	12.8
North Carolina	0.1	0.8	1.4	2.1	34.7	4.1	3.5	7.8	5.7	9.0
North Dakota	0.0	0.1	0.6	0.3	41.7	0.8	1.7	3.2	2.3	3.6
Ohio	0.0	0.5	0.9	1.5	38.3	2.1	2.9	5.1	2.4	6.9
Oklahoma	0.1	1.1	2.9	3.5	36.2	3.8	2.5	7.0	5.0	7.3
Oregon	0.4	3.7	0.6	1.2	26.4	2.0	2.3	3.3	2.9	5.5
Pennsylvania	0.2	1.3	4.0	4.9	46.7	12.3	6.5	19.0	17.8	11.4
Rhode Island	0.0	0.3	5.9	9.9	37.0	10.5	11.9	13.3	12.3	14.1
South Carolina	0.1	0.2	1.1	3.0	31.2	2.4	4.8	7.8	6.9	10.0
South Dakota	0.0	1.4	0.7	0.7	34.7	0.7	0.9	2.2	2.0	2.3
Tennessee	0.0	1.0	1.3	1.9	36.3	4.0	5.2	9.8	9.0	11.3
Texas	0.2	1.2	1.3	2.0	29.2	3.0	3.7	5.8	4.9	6.8
Utah	0.1	1.7	1.2	1.2	27.1	1.5	5.1	8.6	6.9	4.1
Vermont	0.1	0.1	5.8	7.1	37.6	6.2	6.7	12.2	9.7	13.5
Virginia	0.2	0.7	2.1	2.7	44.0	4.5	4.9	19.8	5.5	8.3
Washington	0.1	1.2	1.3	2.1	33.8	5.7	6.2	8.2	6.4	9.4
West Virginia	0.0	1.0	3.0	4.5	44.8	4.6	4.8	11.1	9.6	13.2
Wisconsin	0.2	0.4	3.7	3.6	38.8	4.9	4.8	8.4	6.3	9.5
Wyoming	0.0	2.5	3.1	3.1	43.2	3.7	3.5	9.9	8.2	8.0
Puerto Rico	0.1	0.3	0.7	0.8	27.5	1.4	0.4	0.6	0.8	1.9

SOURCE: Beaghen (2011).

Page 92 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

of geography and GQ type proposed is the best or only option. For example, a sequence of combinations of geography and GQ type that collapses geographic areas before GQ type could be considered, as follows:

County and specific type
State and specific type
County and major type
State and major type

A classification algorithm may be useful in exploring this further using 2010 census data or frame data. For example, a regression tree could be used within either a specific or a major GQ type to model the number of persons in a facility with a specific characteristic. In the case of such characteristics as disability, the predictors could be dummy variables for tracts, dummy variables for county, number of persons in different age ranges, number of persons by educational attainment, and so on. The predictors selected would have to be variables that are available on the sampling frame of group quarters or could be tabulated by group quarters based on the census. The hierarchy created by the tree could be used in deciding which variables are the most effective predictors of disability (or other analytic variables). The results would then guide the order of collapsing of group quarters.

Another option to consider for the imputation would be to identify GQ facilities rather than GQ residents to serve as donors. In this case, a block of persons from the donor group quarters would be assigned to the recipient group quarters. This would more closely reflect the population structure that exists within a GQ facility, although it would probably increase variances of some descriptive statistics because of the imputation of correlated observations.

In the case of the cluster approach to donor selection, the initial clusters formed for census marketing purposes and based on household data were not ideally suited to evaluate this method of donor selection. This approach should be evaluated based on clusters formed for this purpose, from 2010 census GQ data.

The Census Bureau’s test of the proposed imputation procedures using 25 simulated samples generated based on census data (Erdman and Nagaraja, 2010) should be repeated on a larger scale. It is possible that a test performed on a larger number of samples will be able to reveal more differences between the imputation-based and the design-based estimates.

Recommendation 6-4: The Census Bureau’s research on imputing group quarters (GQ) person records in the American Community Survey should further investigate the possibility of using a donor selection procedure that deemphasizes geographic proximity in relation to matching by GQ type, trying out alternatives to the proposed sequence of collapsing the combinations of geography and GQ type. The possibility of using a cluster

Page 93 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

approach to donor selection should be reevaluated using clusters formed for this purpose based on GQ data from the 2010 census. The Census Bureau should also expand its simulation study of imputation methods to include a sufficiently large number of samples capable of revealing significant differences between the imputation-based and the design-based estimates.

Finally, the concerns related to the double imputation, resulting from the fact that many of the donor cases themselves have imputed data, raises a broader question about whether the GQ questionnaires could be revised to better reflect the ways group quarters differ from households. The questionnaire currently used to collect data from the residents of group quarters is very similar to the data collection instrument used for the housing unit sample, except that the questions about the physical and financial characteristics of the household are not asked of GQ residents. The GQ questionnaire has not been customized further, in part because it is operationally more efficient to maintain as much overlap between the two forms as possible. However, the Census Bureau currently imputes 38 percent of one or more sources of income for the GQ population, compared with an 18 percent imputation rate for this question for the household population (Asiala, 2011). Another item with much higher imputation rates among GQ residents is the question about the language spoken at home (10.7 percent for GQs compared with 1.7 percent for households), presumably because the concept of “at home” is not as straightforward for people who may be living in a GQ facility for the long term or permanently, as it is for those who live in households.

The high item imputation rates in the case of some of the questions asked of GQ residents warrant a closer look at whether the questionnaire in its current form is appropriate for the GQ population, particularly the institutional population. The Census Bureau should conduct an assessment of the reasons for the high item imputation rates and the need for revisions to the questionnaire, possibly conducting cognitive interviews with GQ residents living in different GQ types, and an analysis of the impacts of the revisions on both data quality and ability to meet data user needs. Customizing the questionnaire would reduce the burden on GQ respondents, which is likely to have a positive impact not only on the questions that have high imputation rates but also on other questions, which may be affected by cognitive shortcuts taken by respondents as a result of the less than optimal questionnaire design. Dropping or revising the questions with high item imputation rates will also greatly reduce double imputation, if the individual record level–approach is implemented for group quarters.

Recommendation 6-5: The Census Bureau should evaluate the possibility of customizing by group quarter (GQ) type the American Community Survey questionnaire for the GQ population with the goal of reducing item

Page 94 Cite

Suggested Citation:"6 Weighting and Estimation." National Research Council. 2012. Small Populations, Large Effects: Improving the Measurement of the Group Quarters Population in the American Community Survey. Washington, DC: The National Academies Press. doi: 10.17226/13387.

×

imputation rates, improving data quality, and reducing the burden on the GQ respondents who are required to answer questions that are not applicable to their circumstances. Changes to consider should include omitting or revising some of the questions on the GQ questionnaire for some types of group quarters.