4
Innovation in Design and Data Collection

The Census Bureau’s plans for the redesigned Survey of Income and Program Participation (SIPP) have three primary elements. The first, to make greater use of administrative data to improve data quality, is discussed in Chapter 3. The second, to improve the processing system for SIPP, involves converting a computer-assisted personal interview (CAPI) survey instrument that is currently implemented in an obsolete survey questionnaire programming language to the widely used Windows-based BLAISE survey programming language (see http://www.blaise.com/?q=ShortIntroduction). Moreover, the Census Bureau is converting the postinterview data processing system from Fortran to SAS and is improving the documentation of SIPP data editing and imputation procedures. The panel commends the Census Bureau’s efforts in these important undertakings. The panel has the general belief that these are worthwhile, constructive steps, but they were outside the scope of the panel’s review. Hence, the panel says nothing further about them.

The third element is to change SIPP from its current structure, in which interviews are conducted every 4 months for each of four staggered rotation groups (thus ensuring a uniform month-by-month workload for SIPP interviewers), to an annual interview making use of an event history calendar (EHC) to document intrayear changes in demographic and economic circumstances. Regularly scheduled topical modules will no longer be included in the redesigned SIPP, although some prior topical module content will be incorporated into the primary survey instrument, and federal agencies may pay for supplemental questions to be asked between annual interviews.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 97
4 Innovation in Design and Data Collection T he Census Bureau’s plans for the redesigned Survey of Income and Program Participation (SIPP) have three primary elements. The first, to make greater use of administrative data to improve data quality, is discussed in Chapter 3. The second, to improve the processing sys- tem for SIPP, involves converting a computer-assisted personal interview (CAPI) survey instrument that is currently implemented in an obsolete survey questionnaire programming language to the widely used Windows- based BLAISE survey programming language (see http://www.blaise.com/ ?q=ShortIntroduction). Moreover, the Census Bureau is converting the postinterview data processing system from Fortran to SAS and is improving the documentation of SIPP data editing and imputation procedures. The panel commends the Census Bureau’s efforts in these important under- takings. The panel has the general belief that these are worthwhile, con- structive steps, but they were outside the scope of the panel’s review. Hence, the panel says nothing further about them. The third element is to change SIPP from its current structure, in which interviews are conducted every 4 months for each of four staggered rotation groups (thus ensuring a uniform month-by-month workload for SIPP inter- viewers), to an annual interview making use of an event history calendar (EHC) to document intrayear changes in demographic and economic cir- cumstances. Regularly scheduled topical modules will no longer be included in the redesigned SIPP, although some prior topical module content will be incorporated into the primary survey instrument, and federal agencies may pay for supplemental questions to be asked between annual interviews. 

OCR for page 97
 REENGINEERING THE SURVEY SIPP sample members will be followed for 3 to 4 years, but, following SIPP practice since 1996, the panels will not overlap. The first part of this chapter discusses concerns about moving SIPP to a nonoverlapping annual survey that relies on EHCs to develop month- to-month information on households. The remainder of the chapter dis- cusses several additional issues related to SIPP design features (length and frequency of interviews, length and overlap of panels), content, timeliness, and budget that the panel thinks are important. One feature of SIPP that the panel does not discuss is the sample size and design. The current design (see Chapter 2), which oversamples low- income populations based on the previous census, has been in use beginning with the 1996 panel, and sample sizes have been what the SIPP budget could afford. While data users would always prefer additional sample, SIPP users have found the sample sizes of recent SIPP panels (see Table 2-1) to be adequate for most purposes. The design, although not state represen- tative, includes cases in every state (most of which are identified on the public-use microdata files) so that researchers can take account of differ- ences in state tax and transfer program rules in their analyses. Ordinarily, the design would next be revised based on the 2010 census; however, that census will not include a long-form sample with data on income and other socioeconomic characteristics. Instead, the continuous American Commu- nity Survey (ACS) now provides that information (beginning in 2005). It will be necessary to redesign the SIPP sample to use the ACS, but it is our understanding that the ACS will not be available until 2012 for this purpose. As the ACS is relatively new and the shape of the reengineered SIPP is not finalized, the panel thinks it would be premature to comment on sample design issues. EvENT HISTORy CALENDARS As emphasized throughout this report, a unique feature of SIPP is its capacity to measure short-run dynamics. Monthly data on incomes, employment, program participation, health insurance coverage, and demo- graphic characteristics of the household allow analysts to study transitions into marriage and divorce, transitions into and out of poverty, and transi- tions in health insurance coverage, at a monthly frequency. Monthly data also make SIPP particularly well suited for assessing eligibility for major transfer programs, since program rules typically depend on economic and demographic characteristics in the month or months prior to application. Studies of program take-up require careful calculations of eligibility—the denominator of the take-up rate—and high-quality measures of program participation—the numerator of the take-up rate. Studies of short-run dynamics are impossible with other nationally representative data sets,

OCR for page 97
 INNOVATION IN DESIGN AND DATA COLLECTION and studies of take-up are badly flawed if the period reflected in the data does not align with the period over which program eligibility is assessed. In short, the monthly time frame is essential for many of the applications that use SIPP data. The Census Bureau’s plans to move SIPP to an annual sur- vey, filling in intrayear dynamics using EHCs, potentially affects—perhaps positively, perhaps negatively—SIPP’s single most important feature. What Is an Event History Calendar? An EHC interview is centered on a customized calendar that shows the reference period under investigation (1 year in the case of the reengineered SIPP). The calendar contains time lines for different domains, for example, residence history, household composition, work history, and other areas that might be helpful in aiding the respondent’s memory. As discussed in Belli (1998), in an EHC, “respondents are encouraged to consider various events that constitute their personal pasts as contained within broader thematic streams of events. Not only can respondents note the interrelationship of events within the same themes (top-down and sequential retrieval) but, depending on which themes are represented by the calendar, respondents can also note the interrelationships among events that exist within differ- ent themes (parallel retrieval).” Put more concretely, if respondents tend to remember life events as “I lost my job a month after having my second baby,” interview accuracy may improve if respondents are allowed to con- nect these events in calendar time, rather than reporting births in a house- hold roster and job changes later in the interview in an employment section of the questionnaire. Another potential advantage of the EHC approach, if it proves capable of generating high-quality monthly data, is that the first year of income data could be collected with no added sample attrition beyond the loss of households that refuse to participate in the survey at all. This is a sub- stantial potential advantage relative to the data conventionally collected in SIPP. Under the current design, annual income must be aggregated across four waves in order to have a common 12-month reference period for the four rotation groups. Annual income for the first calendar year of the conventionally collected SIPP panel requires data through Wave 4, which will be affected by three waves of attrition beyond the initial sample loss at Wave 1. In the 2004 SIPP panel, the cumulative sample loss after four waves was 28 percent compared with a Wave 1 nonresponse rate of 15 percent (from information provided by Census Bureau staff; see also Table 2-1 in Chapter 2). Several ongoing surveys make use of EHCs for at least a portion of their survey content, including the Panel Study of Income Dynamics (PSID), the 1997 National Longitudinal Survey of Youth (NLSY97), the

OCR for page 97
00 REENGINEERING THE SURVEY Los Angeles Family and Neighborhood Survey, and the British Panel Survey. In December 2007, leaders at the Census Bureau and the Panel Study of Income Dynamics convened a conference of survey design experts and other scholars knowledgeable about event history methodology to learn from and improve their plans (see http://psidonline.isr.umich.edu/Publications/ Workshops/ehc-07papers.html). The panel commends the Census Bureau for sponsoring this conference and reaching out to additional experts in this methodology. The Census Bureau and Panel Study of Income Dynamics conference highlighted many of the reasons the Census Bureau is envisioning that an event history methodology may play a key role in the reengineered SIPP’s efforts to reduce burden on respondents, reduce program costs, improve accuracy, and improve timeliness and accessibility. Belli (2007) noted that EHCs are “expected to provide advantages to data quality by encouraging respondents to use idiosyncratic cues available in the chronological and thematic structures of autobiographical memory.” Fields and Moore (2007) noted that the approach may mitigate missing or erroneous responses by developing timelines for major life events. In particular, the EHC can gather information on overlapping events (such as multiple transfer program par- ticipation) or nonoverlapping events (such as a succession of jobs). More- over, the status at the end of a previously reported calendar year could, in principle, be preloaded to help control seam problems (subject to the respondent being able to override the prior response). If a single annual EHC interview could replace three conventional interviews gathering retrospective information from the prior 4 months, the cost savings could be significant. There is considerable evidence that the event history methodology can be used successfully to identify demographic changes to a household—the arrival and departure of children, spouses, and other family members—and to identify employment transitions. Both types of events are generally regarded as major life transitions, and it is perhaps not surprising that calendar time may be a convenient way to elicit accurate recall of major life transitions. It is less clear, however, that recall over a 12-month period will be similarly precise for potentially less consequential life events, such as whether a household received benefits from the Special Supplemental Nutrition Program for Women, Infants and Children (WIC) 11 months earlier or when a pay raise occurred for a household member. The panel is not aware of conclusive evidence that a 12-month EHC framework is capable (or not) of generating accurate information on pro- gram participation and income. The Census Bureau recently presented preliminary results from a 2006 paper test of the EHC approach, discussed below, in which it claimed success for the EHC. However, these results (also discussed below) were limited in scope and showed a mixed picture with regard to the ability of the EHC to accurately capture monthly income.

OCR for page 97
0 INNOVATION IN DESIGN AND DATA COLLECTION Several passages in the papers prepared for the Census Bureau’s EHC conference highlighted the uncertainty associated with the approach. Sastry, Pebley, and Peterson (2007:20), writing about the Los Angeles Family and Neighborhood Survey, conclude, “we recommend keeping the period cov- ered by the EHC to a minimum and only using it to collect information on domains and topics that are difficult to collect using standard question-list approaches.” Callegaro and Belli (2007) suggest that the EHC approach may reduce seam bias, but they also expect that the magnitude of the seam effect will increase when moving from quarterly to yearly data collection. In a different paper, Belli (2007:13), writing about an experimental sub- sample of the PSID, finds “with program participation, the [conventional questionnaire] showed consistent advantages in reports among disadvan- taged groups in comparison to the event history calendar for the timing of receipt of benefits during 1996.” Pierret and colleagues (2007:28), writing about the NLSY97, note: “one decision that we have made is not to collect all details on every spell for every event history. This decision reflects our experience that many respondents have difficulty recalling details of events that occurred far in the past and lasted a very short time.” This conclu- sion is troubling for the proposed changes to SIPP, since the interview time frame for the NLSY97, like the reengineered SIPP, is 1 year. Testing the EHC Approach for SIPP The lack of evidence about the ability of an EHC to collect monthly data on the many topics that are covered in SIPP places considerable pres- sure on the Census Bureau. Not only must the bureau design an effective pretesting program for the EHC methodology, but it must also make its survey reengineering plans for SIPP sufficiently flexible so that it can modify its plans if the pretesting reveals unanticipated, negative evidence on the likely success of the proposed methodology. Paper EHC Test The Census Bureau administered a paper test of the EHC approach that was completed in June 2008. This test was designed primarily to give the bureau a relatively quick “go/no-go” signal for continued investment in further development of an automated instrument and larger scale testing. The sample for this test was drawn from 2004 SIPP panel participants from Illinois and Texas. A point of emphasis in the paper test was on designing and administering the EHC instrument. Given this, professionals from the Census Bureau and the Office of Management and Budget observed a large number of paper test interviews. Assessments from observation reports and field representative debriefing reports, in addition to comparisons of

OCR for page 97
0 REENGINEERING THE SURVEY estimates from the standard SIPP and EHC questionnaires and comparisons with administrative records for selected programs, will be obtained, with the goal of furthering knowledge about the overarching question: Can the EHC methodology produce data of similar quality to that of the standard SIPP interview? The Census Bureau recently presented preliminary findings from the 2008 paper test based on comparing aggregate reports of selected income sources and other characteristics from the standard SIPP questionnaire and the EHC questionnaire for 1,620 cases that completed both types of questionnaires (Moore et al., 2009). The results are both promising and disquieting. For SSI and WIC (Illinois only), the aggregate estimates of recipients track very closely for the months of January-December 2007. For Medicare, Social Security, WIC (Texas only), and food stamps (Illinois only), aggregate estimates of recipients show the same patterns over the 12-month period, but the EHC levels are significantly lower than the stan- dard questionnaire levels—by several percentage points for Medicare, for example. For food stamps (Texas only), employment, and school enroll- ment, the trends in monthly aggregates differ between the standard and EHC questionnaires—for example, the standard questionnaire aggregates are several percentage points higher than the EHC aggregates in January- September 2007 and about the same as the EHC aggregates in the rest of the year. No results have been presented as yet on comparisons of benefit amounts, on the extent to which the standard and EHC responses track across time on an individual respondent basis, or on comparisons with administrative records, which will involve the entire test sample, including SIPP participants who were cut from the sample in 2006 and so did not respond to the standard SIPP questionnaire for 2007. The panel commends the Census Bureau for conducting this paper test of the EHC methodology. It undoubtedly will provide valuable informa- tion on ways to administer the calendars in the context of a comprehensive national survey. Moreover, it will provide the first available information on the ability of households to recall spells of program participation and amounts of monthly income. Nevertheless, an extensive program of design and research must be conducted to assess the EHC approach. We describe a set of unresolved issues below. First, more needs to be learned about how data collection mode affects content. The paper test, of course, uses a different mode than the BLAISE- based computer-assisted interviewing that is envisioned for the reengineered SIPP. There is evidence in some contexts that survey mode (e.g., paper versus computer) has relatively minor effects on survey responses in some domains (see Carini et al., 2003), but that respondents tend to prefer computer-based applications. If so, particularly for a long, time-intensive survey like SIPP, the paper test may understate the ability of the EHC approach to elicit

OCR for page 97
0 INNOVATION IN DESIGN AND DATA COLLECTION accurate information if people are put off by the structure of the paper test. Alternatively, the specially trained interviewers for the paper test may aid respondents in a manner that would not occur for the reengineered SIPP. At a minimum, the discrepancy between the paper test and the actual data collection mode that will be used raises one concern about the value of the paper test results. Second, samples used for a test of the EHC approach need to be large enough to generate reliable results. To give a sense of the sampling difficulties that EHC tests face, consider the following: in 2006, about 8.9 percent of the U.S. population received food stamp benefits, whereas only about 2.4 percent received Supplemental Security Income (SSI) ben- efits and only about 1.6 percent received Temporary Assistance for Needy Families (TANF) benefits (Assistant Secretary for Planning and Evaluation, 2008:Tables IND 3a, 3b, 3c). Given these figures, serious tests of the EHC need large samples to ensure there are a substantial number of respondents receiving TANF benefits, SSI benefits, or food stamps. This can be done by making appropriate power calculations and then drawing appropriately sized test samples, perhaps augmented by oversamples of program recipi- ents drawn from administrative records. If too few program participants are in an EHC test sample, it will be extremely difficult for the Census Bureau to assess whether the EHC can provide accurate month-to-month informa- tion on program participation for sampled individuals. The problem is even more acute if the test is to provide useful information on multiple program participation, since even smaller fractions of the population will simultane- ously participate in more than one program. Facilitating accurate analysis of program participation is one of the central goals of SIPP. Tests of the EHC face another sample-related concern. The Census Bureau needs to have some benchmark that it can use to assess the quality of EHC responses. Two possibilities suggest themselves. First, the test results can be matched against administrative data. The Census Bureau is pursuing this approach. The paper test includes matching the survey results to data drawn from administrative records on program receipt in Illinois and Texas. This raises the question mentioned above: Are SIPP samples from Texas and Illinois large enough to provide a reasonable assessment of the EHC approach? In addition, can results for a sample from Texas and Illinois be generalized to the U.S. population? A subsequent “electronic prototype” test, described below, will add more states to the evaluation, which is a positive step forward. The second benchmark would be to field an EHC-based survey concurrently with a traditional SIPP survey, allowing for immediate comparisons of the two approaches. We say more about this possibility below. A third unresolved issue has to do with the effects of an EHC approach on seam bias and sample attrition. As described in Chapter 2, a major issue

OCR for page 97
0 REENGINEERING THE SURVEY for the traditional SIPP is that too many transitions—on and off programs, in and out of the formal labor market, in and out of health insurance coverage—happen at the beginning of a new survey wave. Moreover, large percentages of sample participants leave the survey following the first wave. It is not clear how the EHC approach will affect these problems. By having fewer waves (or seams), seam bias may be diminished. But transitions may pile up at the point of annual sampling, making the longitudinal informa- tion elicited from the EHC less valuable. Respondent burdens with the EHC approach are high, since calendars must be used for an extensive set of employment, program, and demographic characteristics. It is not clear how the burdens will affect survey attrition. Finally, a problem with the 2008 paper test comparisons reported to date is that participants in the “traditional” SIPP were also the sample for the comparisons. These households in the test already provided monthly detail on incomes, employment, demographic changes, insurance coverage, and program participation. This raises the question of whether respondents who have already recorded this information in the SIPP 4-month interviews were better able to respond accurately to the paper EHC than would be the case if the EHC sample cases had all been drawn independently. Electronic EHC Test To provide further evidence on these issues, the Census Bureau plans to test a one- or two-wave electronic prototype EHC in early 2010. If funding during FY 2010 and FY 2011 is available, this prototype would examine issues that arise with locating movers when interviews are 1 year rather than 4 months apart, as well as the consistency of data reports between interviews that are 1 year apart. The development and implementation of the prototype experiment is a valuable next step in developing the informa- tion base needed for the reengineered SIPP. The panel does not have enough detail on the 2010 one- or two-wave electronic prototype test to fully assess its ability to resolve questions about whether the EHC approach can adequately replace the traditional SIPP interview structure. Our understanding is that the Census Bureau will not use respondents to the 2008 traditional SIPP panel as the sample for the elec- tronic EHC because of a concern that doing so could compromise responses to the traditional interviews for some or all waves following the 2010 EHC test. Just as important, in our view, is that using a separate sample obviates the concern, expressed above for the paper test, that respondents would provide more accurate reports to the EHC given their participation in the traditional SIPP than if they had not participated in the SIPP. Instead of using SIPP cases, the Census Bureau plans to conduct EHC interviews in 10 states with about 8,000 households in high-poverty strata

OCR for page 97
0 INNOVATION IN DESIGN AND DATA COLLECTION that are selected from the areas in which traditional SIPP interviews are cur- rently being conducted. The bureau will then select traditional SIPP cases from the same areas and do side-by-side comparisons of the EHC and SIPP estimates. In addition, the Census Bureau hopes to acquire administrative records from the 10 states that will be used to help evaluate the validity of responses in both the traditional SIPP 2008 panel interviews and the 2010 EHC electronic prototype for calendar year 2009. The panel thinks this broad approach is a promising basis for developing important additional knowledge about the EHC and the traditional SIPP, particularly if the elec- tronic prototype EHC test can be carried out for two waves and not just one wave. Overlap of Traditional and Reengineered SIPP Panels While the panel thinks the Census Bureau’s EHC electronic prototype plans are promising, it is clear that the knowledge base for EHC methods is not yet sufficiently well developed to have confidence that the approach can be used to generate data of equal or better quality than found in the traditionally collected SIPP. The paper test prototype provides only limited information on data quality for the reasons given above. Moreover, the elec- tronic prototype EHC test, even with its fairly large sample size and even if it is conducted for two waves, is not likely to provide conclusive evidence about the ability of EHCs to match month-to-month details on program eligibility and participation, employment, and income that are obtained with a 4-month interview cycle. Instead, it is likely to provide mixed results, identifying not only strengths but also weaknesses of the EHC approach that require modification and further testing, as well as leaving some issues unresolved—either pro or con. Consequently, we think it is essential for the Census Bureau to admin - ister (and for Congress to appropriate resources for) a full-blown imple- mentation of the “reengineered” SIPP, concurrently with a traditional SIPP panel. The concurrent surveys should be fielded for at least 2 years, with samples large enough to ensure that a substantial number of survey respondents will in fact be receiving transfer program benefits. Ideally, administrative information on earnings (from the Social Security Admin- istration, SSA), employment (from state employment and wage records), and program participation (from selected state records on TANF and SSA records on SSI and Old-Age, Survivors, and Disability Insurance [OASDI]) would be linked to both surveys, which would allow the Census Bureau to compare aspects of data quality for the traditional and reengineered SIPP designs. The panel further recommends that the Census Bureau start a new, traditional SIPP panel in February 2012 to provide a comparison data set

OCR for page 97
0 REENGINEERING THE SURVEY for the reengineered SIPP panel that will begin in 2013. Respondents who participate over time in longitudinal surveys gain experience in responding to survey questions. Moreover, they are the people who do not leave the survey. Given the experience and selection issues that arise, results from the reengineered SIPP should be compared with the first (rather than fourth) year of a traditional SIPP panel. Assuming the reengineered panel has annual interviews, then the traditional panel with its 4-month interviews must begin a year ahead so that the traditional panel obtains data for the period covered by the first interview of the reengineered panel (2012). Furthermore, the traditional panel should continue for at least 2 years so that comparisons can be made for at least two interviews of the reengi- neered panel. Otherwise, it will be impossible to adequately evaluate attri- tion bias and seam issues that arise in the reengineered SIPP. Moreover, if Wave-1-to-Wave-2 seam bias issues with the reengineered SIPP prove to be a major problem, the Census Bureau can continue to field the traditional SIPP as it further refines the EHC approach. If the expense of having two SIPP surveys in the field is prohibitive, cost savings could be achieved by making the 2012 traditional SIPP panel smaller than prior panels. There is another reason why it is critical to field overlapping traditional and reengineered SIPP panels. Policy makers, analysts, and researchers who use SIPP need to assess the effects that the new methodology will have on survey findings. One of SIPP’s strengths is that it has been fielded since 1984 (with a significant redesign in 1996). Because SIPP panels cover a 25-year period, a common, important use of the data is to document trends in household behavior. As noted earlier, it is clear that problems exist with the traditionally conducted SIPP. But analysts need to have some way of assessing whether changes in trends that arise when comparing results from the reengineered SIPP to results from the traditionally collected SIPP reflect true changes in the population or whether they are a result of changes in survey methodology. The only way to be able to even roughly account for the changes due to survey methodology is to have at least 1 and preferably 2 years of overlapping data. A third reason to have 2 years of overlap between a traditionally col- lected SIPP and the reengineered SIPP, in addition to better understanding attrition, seam bias, and the effects of changes in survey methodology, is that responses to the EHC may improve between the first and second interviews. Without a second year of overlap, this improvement would be difficult to detect. When comparing a full-blown implementation of the reengineered SIPP to a concurrently fielded traditional SIPP or to administrative data, it is important to keep in mind that the form of the measurement error can have important implications for empirical analysis. For example, Gottschalk and Huynh (2006) compare data on inequality from SIPP and detailed adminis-

OCR for page 97
0 INNOVATION IN DESIGN AND DATA COLLECTION trative earnings records from the Social Security Administration. They show that while SIPP understates inequality, primarily because measurement error is mean-reverting, measures of mobility are very similar in SIPP and the administrative data. The point is that all surveys will have error—the importance of error depends on context. Considerable content evaluation has been done with the traditionally collected SIPP over the years. It is critical to have a solid basis for assessing the changes in survey results that arise primarily from changes in survey design, as distinct from changes in respondent behavior. Full overlapping panels are the only way to assess the effects of survey design changes, although they will not necessarily settle all questions about data quality. A third data source, particularly administra- tive data, would be useful to interpret systematic differences between the reengineered and the traditionally fielded SIPP. LENgTH AND FREQuENCy OF INTERvIEWS Respondent Burden Concerns Moving to an annual schedule of interviews, in which monthly infor- mation for an entire year is elicited with EHCs, and continuing to include some topical module content, as planned for the reengineered SIPP, raise concerns that the overall length of the SIPP interview and the burden it places on respondents may exceed that of the current questionnaire. In turn, respondent burden may contribute to item nonresponse, poor quality responses, and attrition from the survey. It is essential, as the Census Bureau evaluates its electronic EHC prototype and implements the overlapping redesigned and traditional SIPP panels, that it not only carefully examine the ability of the EHC approach to generate accurate month-by-month transitions in employment, earnings, household structure, and program participation, but also determine whether the burden on respondents from the redesigned questionnaire is not so taxing as to degrade the overall quality of responses. The SIPP topical modules have historically provided a large amount of information of considerable interest to the SIPP user community. Many programs have asset tests associated with eligibility rules, making the SIPP asset and liability topical modules essential for accurate modeling of pro- gram participation. Other topical modules also contain vital information (see Box 2-1). Yet while the topical modules have provided a great deal of information that is valuable to the fundamental purpose of SIPP, their costs also need to be recognized and weighed against their benefits. Costs include that topical modules require resources that could presumably be used for research and evaluation to improve SIPP; that some topical modules (like the tax topical module) require extensive imputation; and that topical

OCR for page 97
 REENGINEERING THE SURVEY contains information on the status of immigrants upon entry to the United States (the New Immigrant Survey contains detailed information on the immigration and admission status of legal immigrants, but not unauthor- ized or nonimmigrants—see http://nis.princeton.edu). In addition, SIPP is the only nationally representative population sample that follows a large sample of immigrants over time. In these respects, SIPP is a unique, valu- able data source for immigration scholars. However, additional information would further enhance the usefulness of the data for policy-relevant analysis of immigrant populations. Three specific suggestions are listed below. Ask migration history questions for new adult household members Cur- rently, the detailed immigration information is asked only in Wave 2. To obtain a complete picture of the migration history of household members, it would be useful to administer the migration history questionnaire to adults who join a sample household after Wave 2. Collect information on parents’ place of birth A major question about immigrant populations concerns the degree to which they change and adapt with increasing time in the country. Duration in the country can be measured as time since arrival within the lifetime of immigrants themselves or as the number of generations a person’s family has been in the country (i.e., first- generation immigrants, second-generation U.S.-born children of immigrants, and third-or-higher generation U.S.-born children of U.S.-born parents). Although SIPP includes information about the timing of immigration for individuals, it would be useful to also collect data on mother’s and father’s place of birth, which would permit the identification of the first, second, and third-or-higher generations. Currently, the monthly CPS is the only nation- ally representative sample that includes information on parents’ place of birth. The addition of these items to SIPP would make it possible to compare income dynamics and other characteristics of immigrant generations. Investigate alternative techniques for collecting sensitive information on immigration status By collecting data on immigration status, SIPP goes well beyond most other surveys. Nevertheless, the quality of the data on immigration status is questionable. Many respondents fail to answer these questions, and, of those who do, many appear to provide inaccurate information. Among the foreign-born in the 2004 panel migration history topical module, 28 percent did not answer the question about immigration status (compared with 21 percent for the question on country of birth). In addition, the accuracy of reporting is doubtful. For example, among Mexican-born adults in the 2004 SIPP panel who reported on immigra- tion status, 33 percent (weighted) said they were not admitted as a legal permanent resident, had not naturalized, and had not converted to this

OCR for page 97
 INNOVATION IN DESIGN AND DATA COLLECTION status, thus suggesting that no more than 33 percent were unauthorized. But other estimates based on demographic methods suggest that nearly half (47 percent) of the Mexican foreign-born were unauthorized migrants in 2004 (Passel, 2006). The imputation procedures used in SIPP to fill in missing values do not improve the situation. When imputed responses are included in the sample, the upper-bound estimate of unauthorized migrants drops to 28 from 33 percent. It is understandable that many unauthorized migrants would mis- report their citizenship or immigration status to employees of the U.S. federal government. One possible way to improve reporting is to use a self- administered questionnaire for these items. Another possibility is to use the randomized response method, first introduced by Warner (1965).2 Still another way to improve the accuracy of data on immigration status is to attempt to match respondents with the immigration admission and naturalization administrative records of the Office of Immigration Statistics (OIS) in the U.S. Department of Homeland Security. Matching these data would be challenging because the electronic OIS records currently do not contain a field for Social Security number (personal communication with OIS). Thus, matches would have to be made on the basis of such identi- fiers as name, sex, date of birth, year of admission, and country of birth, although the Census Bureau has made striking advances in its ability to link data based on these or similar characteristics. If SIPP foreign-born respondents were successfully matched to OIS admission and naturalization records, the information in the administrative records could be used to improve the quality of SIPP data on citizenship and immigration status. For example, matched data could be used to evaluate the accuracy of responses generated by alternative survey methodologies (e.g., in-person interviews versus self-administered questionnaires, or the random response method versus standard questions). In addition, matched data could be used to improve imputations of missing data on immigration and citizenship status as well as items related to immigration status—for example, unauthorized immigrants are ineligible for many public assistance programs, so they should not be imputed as recipients. 2 Respondents are presented with two alternative questions—one about their immigration status and another on an innocuous topic (e.g., favorite color). Respondents then roll a die in private (or engage with some other random device) to determine which question to answer (e.g., those rolling “1” or “2” answer the question about favorite color, and those rolling other numbers answer the question about immigration status). Because no one but the respondent knows which question was answered, privacy is maintained, and respondents may be more likely to give truthful answers. Response error is better managed because it is more likely to be randomly distributed. Statistical methods have been developed for analyzing this type of data. See also U.S. General Accounting Office (1999), which proposes a three-card method for collecting sensitive information such as immigration status.

OCR for page 97
 REENGINEERING THE SURVEY SIPP PROCESSINg, ACCESS, MANAgEMENT, AND BuDgET Timeliness Given the absence of an external agency SIPP sponsor (discussed below), it is critical that SIPP meet the needs of its large, diverse user community in order to have a strong base of support. The panel thinks SIPP data would be used even more extensively if the Census Bureau could significantly shorten the amount of time needed to release the data, consistent with maintain- ing a high-quality product. One model for efficiency of data collection and release in the Census Bureau itself is the CPS. For example, data from the CPS Annual Social and Economic Supplement (ASEC) (which are typically collected in February, March, and April) are made publicly available by August of the same year. There are several reasons why this time frame could not realistically be applied to SIPP, a prominent one being that processing longitudinal SIPP data is in many ways considerably more complicated than processing the cross-sectional information collected in the CPS ASEC supplement. The SIPP instrument is also longer and collects a broader range of information. Nonetheless, as noted in Chapter 2, the release of SIPP data is often not timely, lagging 2 or more years behind data collection. One survey that is more comparable to SIPP than the CPS is the PSID. Like SIPP, the PSID is a longitudinal household survey that asks a broad array of questions on demographics and employment. The PSID has the advantage of going into the field every 2 years, rather than every 4 months, as SIPP does. The PSID generally releases data in a timelier manner than SIPP—typically 12 months after the data are collected. The Medical Expen- diture Panel Survey also releases each year “point-in-time” public-use files within 12 months of data collection; these files are based on a single round of interviewing (from two overlapping panels) and in that respect are simi- lar to SIPP wave files. A reasonable goal for the reengineered SIPP to adopt could be to release wave files within 12 months of data collection, and, indeed, the SIPP 2001 panel data were released on roughly this schedule. The usefulness of SIPP data to users would be increased by consistently having a relatively short lag time between data collection and release of 1 year or less. The Census Bureau is capable of timely dissemination of data, as evi- denced by the efficiency of the processing of the CPS ASEC supplement and occasional past SIPP panels. The bureau needs to ensure that the same type of management attention and coordination is applied to ensure timely delivery of future SIPP panels, particularly in years when the survey instrument or processing procedures are being updated, which occurs periodically. The panel anticipates that the move to the BLAISE-based instrument and SAS-based processing system will improve the speed at which the reengineered SIPP is processed. Regardless, the Census Bureau should iden-

OCR for page 97
 INNOVATION IN DESIGN AND DATA COLLECTION tify the key bottlenecks that are hindering timely release of the data and take the steps necessary to reduce them, while not forgoing thorough quality checks that might help prevent the need to rerelease a SIPP file with corrections. The goal should be to meet the best practices of other national surveys in the release of data. The panel thinks that 1 year between the end of a survey and data release should be an achievable target. Enhancing Access to SIPP One common complaint from current and prospective SIPP data users is the difficulty associated with working with SIPP files. Longitudinal files are inevitably more complex than cross-sectional files, particularly for researchers interested in linking individual and household information over time. Moreover, since each wave of a SIPP panel consists of four staggered rotation groups, new users often grapple with creating calendar-year files (if that is their goal). Most importantly, the quality and quantity of docu- mentation of SIPP files was poor in the past. SIPP documentation is improving. An early edition of a SIPP Users’ Guide was released in 1987 and updated in 1991. A comprehensive third edition was released in 2001 (available at http://www.census.gov/sipp/ usrguide.html), which is currently being updated to include information about the 2001, 2004, and 2008 panels. The SIPP website also provides a link to a tutorial (see http://www.census.gov/sipp/). Moreover, in recent years, it has become easier to access and download SIPP data over the Internet. The main mechanisms for downloading SIPP data from the Cen- sus Bureau are via (1) a file transfer protocol (FTP) with a link at the SIPP home page, which is for users who wish to download entire longitudinal, core, or topical module files and (2) the DataFerrett application tool, with which researchers can download a subset of variables or observations from particular SIPP files. Despite documentation improvements and the various data extraction tools available, there is still room for improvement. For example, a rather minor change would be to integrate the documentation that is available at the SIPP homepage with the DataFerrett data extraction tool. The latter could at least have various links to the former. More importantly, the process of updating the SIPP Users’ Guide should be completed as soon as possible. Chapters of the guide that have not yet been revised refer only to data up to the 1996 panel. Another feature that would assist some users would be to provide code on how to construct calendar-year files, which would assist them in dealing with the complexities introduced by having different rota- tion groups for a given wave. This issue would become irrelevant, of course, if the SIPP moves to the EHC instrument that collects data annually, as the rotation groups would be eliminated. Finally, the Census Bureau could enhance DataFerrett, making it even easier to use (see Box 4-1).

OCR for page 97
0 REENGINEERING THE SURVEY BOX 4-1 Improving Access to SIPP Data via DataFerrett DataFerrett (available at http://dataferrett.census.gov/) is the central access point for many users to data from the Survey of Income and Program Participation. It is an online data access tool that permits users to create a customized data extract of selected variables and observations from any one of a large number of Census Bureau data sets, including SIPP. The user interface of DataFerrett is “point-and-click” and does not require specialized programming knowledge. Users are guided through several steps in which they select a data set (e.g., the 2001 SIPP longitudinal file), select a set of variables from the data set, and select a subsample (e.g., men ages 20-29). Users then may either download the data extract (so that they can analyze it with their own statistical software) or continue to work online to create a table of descriptive results (e.g., frequency distributions, cross-tabulations). Several points of concern about DataFerrett warrant further scrutiny by the Census Bureau to improve access to SIPP and other data sets: • Tutorial—In general, the directions in the tutorial for using DataFerrett are unclear. • SIPP-specific information—DataFerrett is not tailored for any specific data set; the user interface and information provided are structured in the same way for the Current Population Survey, the American Community Survey, and SIPP. Yet there are unique features of SIPP that may require special treatment. For example, SIPP is longitudinal, and data for each panel are contained in several files, which may not be readily apparent to a new user. Another unique feature of SIPP is its topical modules. Although DataFerrett will display information about each data set, the specific information provided about the contents of the topical modules is not useful. • Variable selection—Finding and selecting variables in DataFerrett can be tedious and frustrating. For example, once users have selected a list of vari- ables, they always have to click the browse/selection variables and values button, then click the selection box, then click ok. An easier approach should be possible. The search tool for variable selection could be improved by provid- ing an “advanced search” option in which users can enter four or five search items and combinations of those items (using either AND or OR), and by pro- viding a list of commonly used search terms or list of variables or topic areas. SIPP Management and Budget As we recounted in Chapter 2, SIPP has a unique position among the Census Bureau’s data creation activities for the household sector. Unlike other surveys of people and households that the Census Bureau conducts, SIPP does not have a government client outside the Census Bureau or a federally mandated set of reports that are based on the survey. The earlier

OCR for page 97
 INNOVATION IN DESIGN AND DATA COLLECTION It would be helpful if DataFerrett provided more guidance to users about which variables to include in their data extracts. First-time users (and even experi- enced analysts) may be confused about or unaware of important variables to include, such as sampling weights and key identifiers (e.g., sampling unit, address, family, person, entry identification, and wave). DataFerrett could pro- vide a description of these key variables and alert users if they fail to download them. Other data access programs—such as the one used with the Integrated Public Use Microdata Series (IPUMS; see http://www.ipums.umn.edu)—go so far as to automatically include these key variables on all extracts. • Merging data across waves—One of the barriers for new users in working with SIPP is its complex, longitudinal design. DataFerrett could be designed to provide an easy-to-use, transparent way of merging data for individuals across waves. One especially valuable feature—the ability to select and download in a single extract variables from multiple topical module and core data files and waves across a panel—exists but is very hard to find in the current interface. Also, the task of selecting variables from multiple data files (e.g., from a topical module and the core) can be tedious. A better design might be to list all of the variables in the core and topical modules together in one place (not broken down by data file or wave). As the user selects variables, information on the available waves for the selected variable would pop up, and the user would then select the waves he or she wants. This design would make it easier to quickly identify and download all variables that repeat across waves of a panel and would not require users to know in advance which items are in which topical modules. • Table and recode functions—The tabulation and recode functions are difficult to determine how to use, and some users may not find them helpful. It is dif- ficult to code a variable as a dummy or to assign the same value to more than one variable. In addition, DataFerrett does not permit users to export tables as Microsoft Excel files. It would be helpful to include a prominent button that users can select if they want to export a table. A dialog box could then appear with various format options, including an Excel worksheet. SOURCE: Analysis by students of panel member Jennifer Van Hook. Committee on National Statistics SIPP panel recommended that this situa- tion be addressed, most naturally by making a required report to Congress on poverty (or poverty transitions) based on SIPP (National Research Council, 1993:85). This recommendation was not adopted. Not having an external client, such as the Bureau of Labor Statistics (which has a collab- orative and financial stake in the monthly CPS), or a set of regular reporting requirements, as with the decennial census and the American Community

OCR for page 97
 REENGINEERING THE SURVEY Survey, has contributed to setbacks in the development of SIPP (see also National Research Council, 2001:150-154, on this point). In addition, as described in Chapter 2 and in the prior SIPP report (National Research Council, 1993:20), the value of the survey has been materially diminished over its history by sample cutbacks necessitated by cutbacks in funding. Historically, SIPP has also lacked a project director with full manage- ment and budget authority for all aspects of the survey. A recommendation in the earlier SIPP report reads as follows (National Research Council, 1993:235-236): To be as effective as possible in carrying out its responsibilities to produce timely, comprehensive, relevant, high-quality, and analytically appropriate statistics on income and program participation, the Census Bureau should establish a senior-level position of project director for the Bureau’s income surveys, SIPP and the March CPS income supplement. This position should include full management and budgetary authority for the income statistics program and sufficient resources to obtain the level of analysis staff that is needed to provide substantive guidance to the program, prepare reports, conduct analyses, and evaluate analytical concepts and methods. The person who fills this position should have recognized substantive expertise in topics related to income, poverty, and assistance programs, combined with strong survey management skills. This recommendation was never acted upon, yet we continue to think that SIPP would benefit from a project director with a distinct budget. The budget must always include adequate research and development funding, since SIPP is a major ongoing survey that requires regular evaluation and improvement. CONCLuSIONS AND RECOMMENDATIONS Event History Calendar Approach Conclusion 4-1: The Survey of Income and Program Participation (SIPP) is the only national survey that provides information on the short- term dynamics of employment, income, program participation, and other family characteristics, and its monthly time frame is essential for many applications. The Census Bureau’s plans to move SIPP to an annual sur- vey, filling in intrayear dynamics using event history calendars, potentially affects—perhaps positively, perhaps negatively—SIPP’s single most impor- tant feature. Conclusion 4-2: The panel is not aware of conclusive evidence that a 12-month event history calendar (EHC) framework is capable (or not) of

OCR for page 97
 INNOVATION IN DESIGN AND DATA COLLECTION generating accurate monthly information on income, program participa- tion, and other topics that are covered in the Survey of Income and Program Participation (SIPP). The lack of evidence about the ability of an EHC to collect monthly data places considerable pressure on the Census Bureau, not only to design an effective pretesting program for the EHC methodol- ogy, but also to make its survey reengineering plans for SIPP sufficiently flexible so that it can modify its plans if the pretesting reveals unanticipated, negative evidence on the likely success of the proposed methodology in providing high-quality monthly information. Conclusion 4-3: Understanding transitions at the seam between inter- views in a reengineered Survey of Income and Program Participation (SIPP) using the event history calendar approach will require data from at least two annual interviews. Moreover, not enough is yet known about the factors driving seam bias in the traditional SIPP. Conclusion 4-4: A parallel traditional Survey of Income and Program Participation (SIPP) panel that provides 2 or more years of data is a necessary component of a thorough evaluation of the reengineered SIPP using the event history approach. The recently completed paper test is of limited value for this purpose. The Census Bureau’s planned electronic prototype test is promising, but, as a single test, is unlikely to provide conclusive findings. Recommendation 4-1: The Census Bureau should engage in a major program of experimentation and evaluation of the event history approach for developing suitable data on the short-run dynamics of household compo- sition, income, employment, and program participation from a reengineered Survey of Income and Program Participation (SIPP). The details of the Census Bureau’s plans should be disseminated to SIPP stakeholders for com- ment and suggestions for improvement. If the experimental results indicate that the quality of data on income and program dynamics is significantly worse under the event history calendar approach than in the traditional SIPP, the Census Bureau should return to a more frequent interview schedule, say, every 6 months, devise other methods to improve data on short-run dynamics, or revert to the traditional SIPP with 4-month interviews using standard questionnaires. Recommendation 4-2: To ensure not only adequate evaluation of a reengineered Survey of Income and Program Participation (SIPP), but also a bridge between data collected under the new and old methods, the Census Bureau should conduct traditional and reengineered SIPP panels to pro- vide at least 2 years of comparable data. If the new design works, then the parallel traditional panel provides a bridge. If the new design does not

OCR for page 97
 REENGINEERING THE SURVEY work, then the parallel panel provides a backup for the continued collection of SIPP data while the new design is modified as appropriate. Recommendation 4-3: Because the reengineered Survey of Income and Program Participation (SIPP) should be compared with the first year of a traditional SIPP panel in order to minimize attrition bias, the Census Bureau should begin a new traditional SIPP panel in February 2012. If the costs of fielding two concurrent national longitudinal surveys appear prohibitive, the 2012 traditional SIPP panel could be smaller than previous SIPP panels without substantially diminishing its scientific value. Length and Frequency of Interviews and Panels Conclusion 4-5: Design features for a reengineered Survey of Income and Program Participation (SIPP) that are important to evaluate in terms of their effects on respondent burden, survey costs, data quality, and operational complexity include the length and frequency of interviews, the length of panels, and whether successive panels overlap. With regard to interviews, there is no evidence that a 12-month event history calen- dar strikes the optimal balance between respondent burden, costs, and data quality in comparison to the traditional SIPP design of 4-month interviews. With regard to panels, there is evidence that nonoverlapping panels have adverse effects on cross-sectional estimates of trends over time, yet they are advantageous in terms of larger sample sizes per panel and operational feasibility. Recommendation 4-4: The Census Bureau should study the trade- offs in survey quality and respondent burden in comparison to survey costs between longer but less frequent event history-based interviews in a reengineered Survey of Income and Program Participation (SIPP) and more frequent interviews in the traditional SIPP. The Census Bureau’s research and evaluation program for SIPP should also improve understanding of panel bias and how it grows over time. Because overlapping panels remain the best way to document the extent of panel bias across the full range of variables collected in SIPP, they should be on the research agenda for possible implementation at a future time. Due to technical demands and capacity issues that arise in launching the reengineered SIPP, the initial design plans should not include overlapping panels. Content Conclusion 4-6: The Census Bureau has done an exemplary job in reaching out to the Survey of Income and Program Participation user com-

OCR for page 97
 INNOVATION IN DESIGN AND DATA COLLECTION munity with “content matrices” and other efforts to identify critical por- tions of the core questionnaire and topical modules for data users. Recommendation 4-5: The Census Bureau should expand the scope of the reconstituted Survey of Income and Program Participation (SIPP) Working Group or establish a new SIPP advisory group with members from academic institutions and policy research organizations that would meet periodically to assist the Census Bureau in its efforts to continually improve the quality and relevance of the SIPP survey content. This group, which could include government members from the recommended interagency working group on uses of administrative records in SIPP (see Recommen- dation 3-5), would review the Census Bureau’s use of cognitive and other methods to evaluate and improve survey question wording and improve response rates (or, when that is not possible, either dropping the question or seeking an alternate data source); assist in benchmarking survey responses against external, reliable sources; and advise the bureau on ways to improve imputation and editing procedures. The group would provide a sounding board for the Census Bureau’s plans to develop appropriate survey content in a reengineered SIPP and advise the bureau on appropriate modifications to survey content as policy developments occur, such as health care and immigration reform Timeliness Conclusion 4-7: The release of Survey of Income and Program Partici- pation (SIPP) data is often not timely. Data from the 2004 SIPP panel were generally released more than 2 years after being collected. Other panel sur- veys have more timely data release, often within a year of data collection, which enhances their usefulness to external users. Recommendation 4-6: The Census Bureau should release Survey of Income and Program Participation data within 1 year of data collection. Management and Budget Conclusion 4-8: Unlike other surveys of people and households that the Census Bureau conducts, the Survey of Income and Program Participation (SIPP) does not have a government client outside the Census Bureau or a federally mandated set of reports that are based on the survey. Not having an external client, such as the Bureau of Labor Statistics (which has a col- laborative and financial stake in the monthly Current Population Survey), or a set of regular reporting requirements, as with the decennial census and the American Community Survey, has contributed to setbacks in the devel-

OCR for page 97
 REENGINEERING THE SURVEY opment of SIPP. The value of the survey has also been diminished over its history by sample cutbacks necessitated by cutbacks in funding. We agree with an earlier Committee on National Statistics panel (National Research Council, 1993) that SIPP would benefit from a project director with full management and budget authority for design, evaluation, and operations. The budget should always include adequate research and development funding, since SIPP is a major ongoing survey that requires regular evaluation and improvement.