7
Evaluation of Public-Use Data Sets

HUD is similar to other government agencies in providing data for public use. Like all government agencies, HUD collects data for internal administrative purposes. Like most, it makes some of these data sets, or information from them, available on a routine basis for public use. Like many, it collects and provides at little or no cost to users other data sets that are not primarily collected for purposes of program administration.

PD&R’s public-use data sets are too numerous and their uses too varied for the committee to have been able to evaluate the benefits and costs of every one or suggest opportunities for improving each of them. Table 7-1 shows the funding of the major surveys. This chapter focuses heavily on the role of PD&R’s public-use data sets in program evaluation and policy development, including the preparation of accurate information about current housing conditions, the evaluation of existing programs, and predicting the likely consequences of future policies.

The chapter devotes most attention to two data sets—those of the American Housing Survey (AHS) and the Low-Income Housing Tax Credit (LIHTC) Program—that have already played important roles in assessing the performance of government housing programs, but could play even more important roles without significant additional costs. The large expenditure on government housing programs argues for a focus on data sets that are particularly relevant for assessing the performance of these programs, and the substantial fraction of PD&R’s budget devoted to the AHS argues for a focus on this data set in particular.

The chapter also discusses several other data sets, as well as some issues that pertain to several sets or other broader issues about public use.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 113
7 Evaluation of Public-Use Data Sets HUD is similar to other government agencies in providing data for public use. Like all government agencies, HUD collects data for internal administrative purposes. Like most, it makes some of these data sets, or information from them, available on a routine basis for public use. Like many, it collects and provides at little or no cost to users other data sets that are not primarily collected for purposes of program administration. PD&R’s public-use data sets are too numerous and their uses too varied for the committee to have been able to evaluate the benefits and costs of every one or suggest opportunities for improving each of them. Table 7-1 shows the funding of the major surveys. This chapter focuses heavily on the role of PD&R’s public-use data sets in program evaluation and policy development, including the preparation of accurate information about cur- rent housing conditions, the evaluation of existing programs, and predicting the likely consequences of future policies. The chapter devotes most attention to two data sets—those of the American Housing Survey (AHS) and the Low-Income Housing Tax Credit (LIHTC) Program—that have already played important roles in assessing the performance of government housing programs, but could play even more important roles without significant additional costs. The large expen- diture on government housing programs argues for a focus on data sets that are particularly relevant for assessing the performance of these programs, and the substantial fraction of PD&R’s budget devoted to the AHS argues for a focus on this data set in particular. The chapter also discusses several other data sets, as well as some issues that pertain to several sets or other broader issues about public use. 

OCR for page 113
TABLE 7-1 Funding by Survey by Year (dollars in thousands)  New Homec Manufactured Year AHSa AHS-FFb Sales FFb SOMAd Homese LIHTCf RFSg FMR RDDh TOTAL 2007 14,471 1,529 2,437 453 800 1,100 440 0 1,000 22,230 2006 14,471 1,529 2,397 453 400 500 100 0 500 20,350 2005 18,471 1,529 2,027 453 715 949 400 200 1,500 26,244 2004 19,971 1,529 1,932 453 687 912 420 237 2,125 28,266 2003 16,790 1,529 1,840 464 661 877 457 681 600 23,899 2002 14,065 1,529 1,690 453 618 820 465 3,625 1,722 24,987 2001 14,041 1,574 1,654 466 611 811 465 2,855 1,300 23,777 2000 4,981 1,574 2,038 0 588 780 374 5,510 1,052 16,897 1999 15,176 5,823 1,960 0 565 750 0 1,144 0 25,418 1998 14,426 0 1,900 0 550 730 0 0 0 17,606 1997 15,026 2,574 1,840 0 525 705 0 0 0 20,670 NOTE: The funding is money paid to the Census Bureau and other organizations to provide the data under PD&R’s major survey programs, including the costs to prepare the data for distribution and produce reports that summarize the results. The totals are not the entire cost of providing PD&R’s public-use data sets, however, as they do not include the value of PD&R staff time and other assets involved in the provision of data sets listed in the table, nor do they include the costs of providing other PD&R data sets, such as the Picture of Subsidized Households. Nonetheless, it almost surely accounts for the bulk of the cost of providing public-use data sets. aAmerican Housing Survey. bForward funding (FF) is funding for activity to be carried out in the subsequent fiscal year; for example, AHS activity in 2007 will include the $14.471 million in fiscal 2007 funds plus the $1.529 million forward funded from fiscal 2006. cSurvey of new homes sales and completions. dSurvey of Market Absorption. eSurvey of manufactured homes placements. fLIHTC data set. gResidential Finance Survey. hRandom digit dial survey done to develop fair market rents. SOURCE: Unpublished data from HUD, Office of Policy Development and Research.

OCR for page 113
 EVALUATION OF PUBLIC-USE DATA SETS The specific data sets are presented in the following order: the AHS, the RFS, surveys of current housing market conditions, the LIHTC, the Picture of Subsidized Households (PHS), and the State of the Cities Data System (SOCDS). The chapter then considers several issues of access and adminis- trative data. As background for all these issues, the next section comments on the role of government in providing data on public programs. PROVIDING DATA A strong argument can be made for public provision of certain types of data. The same data may be of use to many different organizations. Although some data are so valuable to an individual organization that it would collect them on its own, the cost of collecting information often exceeds its value to any single organization. In many such cases, however, the total value of the data to all organizations that might use them exceeds, or even greatly exceeds, the total cost of data collection. In these cases, government can create value by collecting and disseminating the data at little or no cost to users. The committee’s conversations with representatives of a number of organizations, representing many firms and agencies, indicated that they frequently used HUD data sets that their own organizations would not be able to collect on their own. The impressive use of HUD’s public-use data sets and the frequent citation of published reports summarizing their results provide other evidence of their value. For example, there were more than 3 million hits in 2007 on SOCDS, more than 2 million hits on the files containing the 2007 income limits for HUD programs, and more than 1 million hits on the files reporting and documenting the 2007 fair market rents in HUD’s Section 8 Housing Choice Voucher Program.1 PD&R plays the major role in HUD in providing data for public use. Most of PD&R’s public-use data sets are available at no cost from its website, HUD USER. A booklet entitled Data Sets Aailable from HUD USER provides short descriptions of these data sets except for a few recent additions (see U.S. Department of Housing and Urban Development, n.d.). The most interesting of the recently added data sets is quarterly reports on vacancy rates at the census tract level from the U.S. Postal Service (USPS). This information is likely to prove useful for studying the operation of hous- ing markets, making business decisions, and public policy analysis. Other 1 Since these files are designed to provide information to users without downloading the files, hits surely reflect usage to a much greater extent than for publications and large data sets that must be downloaded to be used. The number of downloads of HUD’s large public-use data sets is much smaller, but once downloaded, these data sets are typically used over long periods. HUD’s budget justifications projected that, in fiscal 2008, more than 7.6 million files related to housing and community development topics would be downloaded from PD&R’s website.

OCR for page 113
 REBUILDING THE RESEARCH CAPACITY AT HUD HUD-funded public data sets that are collected by the Census Bureau, namely, the AHS, the RFS, the Survey of Market Absorption, a survey of new residential sales, and the Manufactured Homes Survey, are available from the HUD USER website. The publication Housing Data Between the Censuses provides a detailed overview of the AHS and brief descriptions of other data sets (see U.S. Census Bureau, 2004). The two largest data sets collected by the Census Bureau, the AHS and the RFS, are also available on HUD USER. Some of PD&R’s data sets are primarily intended for the use of people involved in the operation of HUD programs and the housing and com- munity development programs of other agencies. For example, data on fair market rents are used mainly in the administration of the Section 8 Housing Choice Voucher Program; annual adjustment factors are used in the administration of HUD programs that subsidize privately owned low-income housing projects; a list of metropolitan areas and particular census tracts in other locations where larger subsidies are provided for low-income housing tax-credit projects is used by project developers and program administrators; and income limits in different localities are used to determine eligibility for various HUD and non-HUD housing programs. Researchers studying these programs also rely heavily on the same data sets. HUD USER also provides a data set helpful to state and local agen- cies in preparing the comprehensive plans that they must submit in order to receive HUD support under the HOME Investment Partnerships Program and Community Development Block Grant (CDBG) Programs. Other data sets are intended primarily for the use of researchers inside and outside of HUD, both governmental and nongovernmental, including those interested in estimating the effects of government programs. The AHS is by far the most important data set in this regard, and it accounts for a significant fraction of the PD&R budget. It is also the oldest and covers the longest period of time. Other data sets in this category include the RFS, PSH, the LIHTC data set, SOCDS, the government sponsored enterprise data set, the Property Owners and Managers Survey, and the multifamily assistance and Section 8 contracts data set. A few of these surveys, such as the Property Owners and Managers Survey, were conducted only in 1 year; but most are produced periodically. Researchers use PD&R’s data sets for a variety of purposes. Private decision makers and those involved in policy development seek information on the current state of the nation’s housing and related markets: the AHS, the Survey of Market Absorption, SOCDS, data on the new residential sales, and the Manufactured Homes Survey are particularly important for this purpose. The new USPS data set on vacancy rates is likely to join this group. Other researchers use PD&R’s data sets to study the behavior of

OCR for page 113
 EVALUATION OF PUBLIC-USE DATA SETS individuals, the operation of markets, or the performance of government programs: The AHS and RFS are especially important for these purposes. Providing public-use data sets is arguably one of PD&R’s most impor- tant functions. The production and public availability of many PD&R data sets is necessary for the administration of HUD programs. Other PD&R public-use data sets are essential for policy development. They provide information on the current state of the nation’s housing and related markets and the information needed to estimate the effects of existing government programs and predict the likely consequences of proposed programs. Many of PD&R’s public-use data sets are important to private parties for mak- ing good decisions. Finally, PD&R’s public-use data sets have stimulated independent research on a wide range of urban policy issues without addi- tional government funding, thereby injecting new ideas into public policy debates. AMERICAN HOUSING SURVEY As mentioned above, the AHS is PD&R’s most expensive public-use data set. It accounts for 72 percent of the amount paid to outside parties for the major surveys (see Table 7-1). Since the core PD&R budget is about $57 million and some of the cost of the AHS is not included in the table, the AHS accounts for more than 28 percent of the entire PD&R budget.2 The AHS has two components—a national survey and a survey of selected metropolitan areas. The national AHS was conducted annually from 1973 through 1981, and has been conducted biennially since then. Its sample size has varied between 53,000 and 80,000 households depending on the budget available. In 2007 the sample size was about 55,000. Since 1974, the AHS has collected data on enough households in certain specific metropolitan areas to make inferences about housing conditions and other matters in these places. Over its history, there have been severe cutbacks—in the number of metropolitan areas in the metropolitan sample, reducing it from 60 to 21; the frequency of data collection, from once every 3 years to once every 6 years, and sample sizes, from about 15,000 in the largest metropolitan areas and 5,000 in the others to about 3,000 in each kind of area. The AHS collects a much wider range of information than any other HUD-funded data set. Indeed, it is one of the federal government’s richest data sets. The codebook describing its contents covers about 1,200 pages. Its length is due in part to the necessity of documenting improvements in the wording of questions over time to solicit more accurate answers. However, 2 We exclude from the PD&R budget the cost of the University Partnerships Grants Program. As noted earlier, although PD&R administers this program, it is only tangentially related to its core mission of policy development and research (see Chapter 2).

OCR for page 113
 REBUILDING THE RESEARCH CAPACITY AT HUD it also reflects the wealth of information collected in each year. The types of information include (1) whether the unit is occupied; (2) the size and composition of the household living in it; (3) characteristics of household members such as their age, race, ethnicity, nativity, citizenship, education level, and income from various sources; (4) detailed housing and neighbor- hood characteristics, including recent alterations and renovations; (5) hous- ing expenditures, including expenditures on utilities; (6) details regarding the mortgages of homeowners; and (7) respondent-reported information on the type of government housing assistance received. The AHS is the only national data set that contains detailed informa- tion about housing characteristics. Other data sets, such as the decennial census and the American Community Survey (ACS), are poor substitutes for the AHS in this regard because they contain no information on the condition of dwelling units and little information about their amenities (see Eggers, 2007a). Their data on the housing stock is limited to a few rudimen- tary measures, such as the number of rooms and bedrooms, the existence of complete plumbing and kitchen facilities, and the age of the structure. Dwelling units that are the same with respect to these characteristics can differ enormously in their condition. Some have large cracks in their walls, peeling paint, leaking roofs, and multiple heating breakdowns each winter, while others have none of these defects. The AHS contains this and much more information about the conditions of the housing units. Since the bulk of HUD’s budget is devoted to low-income housing assistance and the primary purpose of this assistance is to ensure that every- one lives in housing units that meet certain minimum housing standards, detailed information about housing conditions is particularly important for HUD’s mission. Without knowledge of the current condition of the housing stock, it is impossible to make an informed decision about whether addi- tional housing assistance is called for. In targeting housing assistance, it is important to know housing conditions of subsets of the population. The AHS is the only periodic survey that combines detailed information on the characteristics of housing units with information on the characteristics of their occupants. The uses of the AHS for policy development go well beyond simply describing current housing conditions. The AHS has been used to estimate the effects of existing programs and to predict the effects of proposed programs. To give a few examples, it has been used to estimate the effects of public housing and housing vouchers on the nature of the housing occupied by recipients of housing assistance; the cost-effectiveness of alter- native methods for delivering housing assistance; the effects of the afford- able housing goals of government sponsored enterprises (GSEs) on home ownership rates; the adequacy of Section 8 subsidies for providing housing meeting the program’s minimum housing standards; the effects of housing

OCR for page 113
 EVALUATION OF PUBLIC-USE DATA SETS vouchers on the rents of unsubsidized units; the effects of rent control on the rents of uncontrolled apartments; and the benefits of increased home ownership to the neighbors of the new homeowners. In almost all of these studies, accurate estimation relied on detailed information in the AHS about housing characteristics. The AHS has also been used to study the workings of housing and related markets and the behavior of actors in those mar- kets, modeling such things as the housing filtering process; homelessness; mortgage terminations and refinancing; home improvement decisions; and tenure choice. Such studies are often used to predict the effects of proposed government actions to deal with housing problems. Many uses of the AHS and other major surveys require (or would benefit from) a good index of the market rent for identical housing units in different locations. Such a price index is valuable for a wide range of studies of the workings of housing markets, the behavior of families, and the effects of government programs. Because it contains detailed data on the characteristics of housing, the best housing price indices have been produced using the AHS (see, e.g., Thibodeau, 1995). These indices are superior to other widely used alternatives such as median rent and HUD’s fair market rents because differences in the values of these other indices between locations reflect differences in the quality of the housing as well as differences in price of identical units.3 The AHS offers by far the most important data set for studying the effects of low-income housing programs. It nonetheless has several major deficiencies from this viewpoint. The most cost-effective approach to pro- ducing data useful for program evaluation and policy development would be to modify the AHS to overcome these deficiencies. One important deficiency of the AHS from the viewpoint of studying the effects of low-income housing programs is that the sample of assisted families in each program is much too small. The most recent matching of administrative records with households in the AHS to identify the type of HUD rental assistance identified 326 households living in public housing projects, 636 in privately owned subsidized projects, and 571 received housing vouchers (U.S. Department of Housing and Urban Development, 2008). These households accounted for 9.8 percent of all renter households in the sample. Since the allocation of the budget among different programs is one of the most important decisions in housing policy, evidence on the com- parative performance of different programs is essential for good decision making. In light of the clustered nature of the sample, the current samples are too small even for estimating the average effects of the three broad 3 The reduction in the frequency and number of areas covered by the AHS metropolitan sample has increasingly led researchers to use these inferior alternative price indices.

OCR for page 113
0 REBUILDING THE RESEARCH CAPACITY AT HUD types of assistance—public housing projects, privately owned subsidized projects, and housing vouchers—with much precision. These sample sizes are unambiguously too small for comparing program performance for particular types of households, such as minorities and the elderly. Finally, the layering of subsidies from multiple programs on individual units raises important questions about the cost-effectiveness and value of these vari- ous combinations. Addressing this issue requires a much larger sample of subsidized units because there are many more combinations of programs than individual programs. It is standard practice in major surveys to oversample subsets of the population that are rare but of particular interest to the organization fund- ing the survey. Households that receive low-income housing assistance meet these criteria for HUD. Therefore, oversampling of households receiving housing subsidies would be logical for the AHS. By reducing the frac- tion of the renter sample that does not receive low-income rental housing assistance from 92.3 to 84.6 percent, the sample of renters that do receive such assistance could be doubled. This modification of the AHS would not require more resources. The cost of increasing the size of the subsidized sample would be offset by reducing the size of the unsubsidized sample. Alternatively, the AHS sample size could be increased towards its historical norm by adding only subsidized units. Another major shortcoming of the AHS from the viewpoint of program evaluation is its attempt to determine the type of assistance received by asking respondents. Despite several efforts over the years, the questions asked do not yield accurate answers, even at the level of the three broad categories—public housing projects, privately owned subsidized projects, and housing vouchers (see Shroder, 2002, for a description and analysis of the inaccuracies). Furthermore, program evaluation requires more detailed information about the programs involved that cannot possibly be obtained by asking respondents. Housing subsidies from multiple sources are paid on behalf of many assisted households. For example, about 28 percent of tax-credit units receive additional development subsidies from the HUD’s HOME housing block grant program,4 and owners of tax-credit projects received subsidies in the form of unit-based or tenant-based Section 8 assistance on behalf of 40 percent of their tenants (see Climaco, Chiarenza, and Finkel, 2006). Any analysis of the performance of housing programs should thus combine subsidies from multiple sources, requiring accurate information on the specific programs that serve each household. Recipi- ents of housing assistance do not know this information. The most recent HUD-funded study that addressed this problem produced some sensible 4 HOME is the largest federal block grant to state and local governments that is designed exclusively to create affordable housing for low-income households.

OCR for page 113
 EVALUATION OF PUBLIC-USE DATA SETS suggestions for revising the wording of the questions (Gordon et al., 2005). However, it did not test the extent to which the proposed questions would lead to more accurate assignment of families to the three broad categories, and more fundamentally, this approach has no potential to obtain accurate information about the specific multiple programs that provide assistance on behalf of many recipients of housing assistance. The solution to this problem is to use administrative records on the addresses of subsidized projects and voucher recipients to identify the low- income housing programs that serve each household in the AHS. HUD has this information for HUD-subsidized and LIHTC projects. It also has addresses of families using housing vouchers. These programs account for the overwhelming majority of low-income households that receive rental assistance. PD&R first used HUD’s administrative records to create AHS data sets that identify HUD-assisted households by broad type of assis- tance in 1989, and published tabulations for 1989, 1991, and 1993; these tabulations then lapsed until the 2003 AHS. They should be reinstated on a regular basis in the future, for specific programs. Indeed, PD&R could assemble the addresses of households that receive assistance from HUD’s CDBG Program and other block grant programs (HOME and American Indian) that already exist in HUD’s administrative data sets. PD&R could also explore the possibility of assembling addresses for other households served by these programs as well as the U.S. Department of Agriculture’s low-income housing programs. The technology for matching records based on geographic identifiers has improved enormously in recent years; PD&R has yet to take advantage of this technological development. The third shortcoming of the AHS for policy analysis is the absence of data on taxpayer costs associated with each subsidized unit. An assessment of the performance of any government program requires information on its costs as well as its benefits. Knowing what programs provided assistance on behalf of each household in the AHS sample is different from knowing the dollar amount of the subsidy from each source. Unfortunately, it would be too expensive to overcome this shortcoming on a regular basis in the AHS because respondents have no knowledge, and HUD’s administrative records do not contain much of the needed informa- tion either. Although HUD’s administrative records contain data on the taxpayer cost of providing vouchers to recipients who live in unsubsidized housing units, a significant minority of voucher recipients live in housing units that receive other subsidies as well, such as units in tax credit or HOME projects. Similarly, HUD has information on the amount that it pays on behalf of each family living in one of its privately owned subsidized projects, but these projects often receive subsidies from other sources. For example, some Section 8 projects receive subsidies for rehabilitation from the LIHTC Program. The evidence available indicates that the taxpayer cost

OCR for page 113
 REBUILDING THE RESEARCH CAPACITY AT HUD for HUD-assisted households significantly exceeds the cost that appears in HUD’s administrative records. Given the complexity of the issue, it would be prohibitively expensive to produce an estimate of full taxpayer costs incurred on behalf of each subsi- dized household in the AHS on a regular basis. However, this topic would be an excellent choice for a separate HUD-funded study based on the AHS in a single year or a topical module to include in the survey on occasion. Many major surveys, such as the Current Population Survey, Survey of Income and Program Participation (SIPP), and Panel Study of Income Dynamics (PSID), contain regular modules that collect information on top- ics that are important enough to justify data collection from time to time but not important enough to include in each survey. This practice recog- nizes that it is important to keep survey questionnaires short enough so that respondents are willing to answer carefully all questions and also recognizes that not all potential questions are equally important. The AHS has con- tained topical modules (or occasional supplements) from time to time—on lead-based paint, housing modifications for persons with disabilities, second homes, characteristics of neighbors, journey to work, and ownership of cars and appliances—but this has not been a regular feature of the AHS. If PD&R increased the use of topical modules in the AHS, with the ultimate goal of including one in each biennial survey, this could be accom- plished without increasing the length of the questionnaire by delegating to a topical module some of the questions that are currently asked in each survey. In our judgment, little would be lost by asking the least important questions less frequently. This would make it possible to ask new ques- tions in each survey without increasing the length of the questionnaire and thereby compromising participation or accuracy. Topical modules typically involve asking the same respondents addi- tional questions. A particularly promising variant on the theme of increas- ing the use of topical modules would be to collect selected AHS data on members of a subset of the households who move from units in the AHS sample in some year. Following individuals as they move from one dwell- ing unit to another over a number of years has advantages over the cur- rent sampling procedure of collecting data on the same dwelling units and a changing set of occupants. Following individuals makes it possible to observe how they respond to changes in their circumstances, for example, the difference in their housing conditions and expenditures before and after receipt of housing assistance. Making topical modules a regular feature of the AHS would require some additional expenditure. To defray this additional cost, PD&R could encourage organizations and individual scholars with substantial fund- ing for data collection to propose topical modules. Indeed, these funding sources could be asked to pay some of the fixed costs of the AHS. PD&R

OCR for page 113
 EVALUATION OF PUBLIC-USE DATA SETS could use the additional money to improve the AHS in other dimensions. The Moving to Opportunity (MTO) for Fair Housing demonstration is a recent example of this type of collaboration between government agencies and foundations in funding data collection (see Chapter 3). Any evaluation of the AHS should assess the effects of the cutbacks in the AHS metropolitan sample. As already noted, over its history there have been severe cutbacks in the number of metropolitan areas in the metro- politan sample, the frequency of data collection, and sample sizes. These cutbacks are not particularly damaging for some types of studies, such as estimating the condition of the nation’s housing stock or the overall effects of low-income housing programs, as these are based on the national AHS sample. Studies of the workings of housing markets that rely on substan- tial samples from many markets are also not yet substantially affected because many different markets have been surveyed in more than 1 year as a part of the metropolitan sample since its inception. However, to the extent that innovations such as new mortgage products alter the operation of housing markets, the cutbacks in the number of areas, sample size, and frequency will progressively decrease HUD’s ability to estimate accurately housing market models used to predict the consequences of government interventions. For other types of studies, the effects of the cutbacks have been sub- stantial. To the extent that it is desirable to know housing conditions in par- ticular metropolitan areas or the effects of housing programs in these areas, only the AHS metropolitan sample has enough observations to produce a reliable picture, especially for subsets of the population in these areas. As a result of the cuts, however, the AHS no longer produces such informa- tion for San Francisco, CA; Albany, NY; Springfield, MA; Seattle, WA; Honolulu, HI; Orlando, FL; Louisville, KY; Raleigh, NC; and a dozen other large metropolitan areas that were once included in its metropolitan sample. Moreover, other large metropolitan areas such as Austin, TX; Jacksonville, FL; Nashville, TN; and Richmond, VA; were never in the AHS. If data on housing conditions in many specific areas are important from the viewpoint of national public policy, the enormous reduction in the number of metropolitan areas in the AHS is alarming. If the data for particular metropolitan areas are only important for local policy issues, it might be argued that the solution is to offer to include metropolitan areas if local governments in these areas are willing to pay the cost of the survey. A problem with this solution is that metropolitan areas typically contain many political jurisdictions. This results in a “free-rider problem” that may justify federal funding. Consideration of these issues might well be addressed by an ad hoc committee to thoroughly review the content and other aspects of the AHS. PD&R regularly solicits advice about these matters from Census Bureau

OCR for page 113
 REBUILDING THE RESEARCH CAPACITY AT HUD lies living in the projects. This shortcoming of the LIHTC data set could be partially overcome by using HUD administrative data on families with tenant-based and project-based Section 8 assistance. Determining the mag- nitudes of the other development subsidies would be more difficult: it would require combining data from the LIHTC data set with data from multiple HUD and non-HUD administrative data sets. The shortcomings of the LIHTC data set for policy analysis are not limited to the absence of information about the magnitudes of subsidies associated with tax-credit units. The LIHTC data alone cannot be used to answer some of the most important questions about the program. For example, they cannot be used to determine the effect of the tax credit on the types of neighborhoods in which families live because the data set does not contain information on the previous residence of occupants. The LIHTC data set also cannot be used to estimate the effect of the tax credit on the housing conditions of occupants because it does not contain any informa- tion on the housing provided in tax-credit projects beyond the location of the unit and the number of units with each number of bedrooms. It contains no direct information on the previous housing of occupants of tax-credit units or information that could be used to estimate their previous housing conditions. Indeed, it contains no information at all about occupants of tax-credit units. The LIHTC data set only provides information available at the time that a project was placed in service, and it offers no information about the quality of the housing even at that time. It contains no information on the characteristics of the housing provided, other than its location and number of bedrooms, and no information on the characteristics of the families living in the housing. Adding this information to the LIHTC data set for all projects in any single year would be very expensive. Furthermore, since neither the condition of the units nor the characteristics of the families living in them remain constant over time, this information would have to be updated periodically to provide an accurate picture. Supplementing the LIHTC data set to provide the above information for all projects would surely be a poor use of PD&R’s limited resources, and doing it for a random sample of sufficient size to produce credible estimates of program effects would be expensive. A more cost-effective approach to increase the usefulness of the LIHTC data set would be to use information on the address of each project to append some or all of its information to the voluminous data on households and housing units in the AHS and perhaps other major national data sets, such as the ACS, SIPP (and its successor, the Dynamics of Economic Well-Being System), and PSID. In summary, the LIHTC data set does not presently contain the full

OCR for page 113
 EVALUATION OF PUBLIC-USE DATA SETS range of information needed to estimate most important effects of the tax- credit program or the impact of the subsidies that HUD provides to many families in its projects. In the future, more information should be available about residents of tax-credit projects. Section 2835(d) of the Housing and Economic Recovery Act (P.L. 110-289) requires that state agencies adminis- tering LIHTC programs submit data annually to HUD on the characteristics (including race, ethnicity, family composition, age, disability status, receipt of vouchers, income, and rent payments) of tenants living in each LIHTC development. This information should be very useful for policy makers and analysts, but it is not sufficient to identify assisted households because some households in these projects receive vouchers or other forms of assistance in addition to any cost reduction attributable directly to the LIHTC itself. The best approach to using the data set for program evaluation is to use its information on addresses of tax-credit projects to append the information in the LIHTC data set to other information on the tax-credit households in the AHS, particularly if PD&R follows the committee’s recommendation to oversample such units. PICTURE OF SUBSIDIZED HOUSEHOLDS The Picture of Subsidized Households (PSH) data base provides sum- mary statistics on the characteristics of HUD-assisted households from two sources: HUD’s Multifamily Tenant Characteristics System (MTCS), which covers public housing tenants and housing voucher recipients, and the Tenant Rental Assistance Certification System (TRACS), which covers households living in privately owned HUD-assisted housing projects. These two data sets contain information on each household when it is initially offered assistance and when it is recertified for continued assistance. The PSH does not provide data on individual households. Rather, it provides information to the project level for project-based assistance (except for project-based certificates and vouchers) and to the census-tract level for Section 8 certificates and vouchers (e.g., the mean income of all households in each public housing project or the number of households per tract with Section 8 housing vouchers). (For reasons of confidentiality, tract data are suppressed when fewer than 11 households are involved.) Summary statis- tics are also provided for other geographic levels, such as states and, since 2000, metropolitan statistical areas and cities. The data set is available in its entirety for purposes of analysis, and since the 2000 PSH, it can be searched easily for particular pieces of information through a user-friendly customized search feature. Delayed production of this data set has been a chronic problem, and the committee heard complaints about it from analysts seriously interested

OCR for page 113
0 REBUILDING THE RESEARCH CAPACITY AT HUD in HUD’s low-income housing programs.6 Until the fall of 2007, the most recent data set referred to 1998. Data for 2000 were made available late in 2007. The long delay was due in part to making the transition to a modern computer program. It also reflected the time necessary to develop a custom- ized search feature. However, the long delay cannot be explained by these factors alone. There is little doubt that PD&R staff shortages, combined with the priority assigned to the production of this data set by PD&R’s assistant secretaries, played a role in the long delays. In the early stages of the committee’s deliberations, PD&R staff said that they expected to produce the 2004 PSH by the end of 2007 and the 2006 PSH early in 2008 and then to backfill the missing years. As of September 2008, however, the 2004 data set had not appeared on HUD USER. HUD’s Office of Public and Indian Housing (PIH) recently launched its own customized search program, the Resident Characteristics Report (RCR). This system provides summary statistics for public housing tenants and voucher recipients at levels of geography similar to the PSH (see http://www.hud.gov/offices/pih/systems/pic/50058/rcr/index.cfm [accessed August 15, 2008]). Unlike the PSH, this information is quite up to date: by early September 2008, the RCR had been updated through August 2008. The RCR provides almost as much information about public housing tenants and voucher recipients as the PSH. However, the RCR is not a sub- stitute for the PSH because it does not contain information about occupants of privately owned, HUD-subsidized projects, it does not report HUD’s subsidy on behalf of public housing tenants and voucher recipients, and it does not provide analysts with an electronic version of the entire data set. Therefore, it is important for PD&R to continue its efforts to increase the timeliness of the PSH. However, since it is not reasonable to expect PD&R to produce a public-use data set that includes information on families living in private subsidized projects as promptly as PIH produces the RCR, the RCR will continue to play a useful role in providing up-to-date information on public housing tenants and housing voucher recipients. The PSH and RCR provide useful simple descriptive statistics about HUD-assisted households, such as the percentage of all households whose head is disabled and the percentage with annual incomes less than $5,000. However, they do not permit more complicated descriptive statistics, nor the data needed to estimate program effects. For example, they cannot be used to calculate the percentage of all households with annual incomes less than $5,000 separately for households with disabled and nondisabled heads. Furthermore, the PSH and RCR alone do not contain the infor- mation needed to estimate the effects of the programs. The effect of a 6 Concerns with respect to the timely release of data were also raised during a public information-gathering meeting organized by the committee.

OCR for page 113
 EVALUATION OF PUBLIC-USE DATA SETS program is the difference between the household’s outcomes, such as the characteristics of its housing, with and without the program. The PSH and RCR contain little information about outcomes under the program, and no information about these outcomes in the absence of the program. STATE OF THE CITIES DATA SYSTEM The SOCDS compiles data on urban and metropolitan areas from mul- tiple public sources. Data available through SOCDS include demographic and economic data from the census, unemployment data from the Bureau of Labor Statistics (through October 2007), data on business establish- ments from the County Business Patterns data base (through 2002), crime statistics from the Federal Bureau of Investigation (through 2005), building permit information from the census (through 2007), city and urban govern- ment finances from the census (through 1997), and housing affordability indexes used for the CDBG and HOME Programs. It serves as a one-stop shop that allows many constituencies to construct data sets that can be used for various purposes, including research, policy making, and advocacy. SOCDS has been used extensively by HUD in the production of in- house reports. For example, the widely read State of the Cities reports have drawn heavily from its data. Evidence also suggests that SOCDS is used heavily by external constituencies, including local and community organi- zations seeking information on their locales. SOCDS averages more than 250,000 hits per month (nearly 3.5 million per year). A key issue for the success of a data repository is making the most current data available as quickly as possible. PD&R staff aspires to update statistics monthly for the employment and building permits data, and annual updates to the FBI crime statistics, ACS, and County Business Pat- terns (CBP) data. However, the CBP data (available at least through 2006), the crime data (available through 2006), and the city and urban government finance data (available for 2002) all lack the most current data, perhaps due to funding shortfalls. The lack of current data reduces the utility of the SOC data and inhibits wider use. At the time of its creation in the 1990s, SOCDS was a one-of-a-kind data base. No other resource allowed individuals to acquire so much infor- mation about a particular location from a single place. However, the market- place has created a competitor: “Dataplace,” which was jointly constructed by the Fannie Mae Foundation (FMF), the Urban Institute, EconData.net, and Vinq Incorporated. Available online since 2004, Dataplace expanded SOCDS by including data from additional sources, most notably the infor- mation from the Home Mortgage Data Act (HMDA) on mortgage lending, and by creating a more user friendly interface for finding and using data. Also, because of its connection with FMF, which more actively engages its

OCR for page 113
 REBUILDING THE RESEARCH CAPACITY AT HUD constituency, its reach has been broader than SOCDS. The Dataplace site has had over 4 million hits since its inception, with annual use holding at approximately 1.5 million hits a year. Since the demise of the FMF in 2007, the future of Dataplace has been in some doubt. KnowledgePlex, an FMF spin-off nonprofit, now has the lead, but its funding stream beyond 2008 is not certain. This might repre- sent an opportunity to eliminate the duplication by a consolidation of it and SOCDS. DATA ACCESS AND AVAILABILITY Administrative Data on HUD-Assisted Households One issue that arose in the committee’s deliberations was the possibility of making HUD’s administrative data on individual households and dwell- ing units available as public-use data sets. The MTCS and TRACS data sets that provide information on the characteristics of HUD-assisted households have been mentioned most often in this regard. At present, PD&R does not make administrative data on individual households available to all researchers who would like to use them. Instead, it provides aggregate data from these data sets to the general public and data on individual households to selected researchers. As noted above, PSH provides unrestricted access to a data set con- taining summary statistics to the project level for project-based assistance and to the census tract level for Section 8 certificates and vouchers for 11 or more households involved. Providing average values of variables for all assisted households in a housing project or census tract rather than the values of these variables for individual households is one method for protecting the privacy of assisted households. HUD has also provided MTCS/TRACS data on individual households to researchers in its Research Cadre Program for the purpose of statistical analysis. These researchers must sign an agreement to protect the con- fidentiality of the information on individual households, and they face punishments for violating this agreement. Any researcher can apply for membership in the Research Cadre Program whenever it is open to new members, and it has been open to new members on several occasions. Never- theless, the distribution of the MTCS/TRACS data only to the members of the Research Cadre Program undoubtedly limits the number of researchers with access to the data. Some researchers interested in using the data surely did not hear about the program the last time it was open for membership. Others did not have a project that would use the data at that time. Still others had not completed their advanced degrees until after the most recent invitation to join the program. So it is reasonable to believe that many

OCR for page 113
 EVALUATION OF PUBLIC-USE DATA SETS others would have used these data sets if they had been available to any researcher willing to sign the confidentiality agreement. In the committee’s view, PD&R should create a public-use version of the MTCS and TRACS data sets that would be available to anyone who wants to use it. The privacy of the households involved can be fully pro- tected by limiting the information about their location in the unrestricted public-use data by geography (e.g., by metropolitan area for households in such areas and by state for households in nonmetropolitan areas). This is the standard procedure for protecting the privacy of households in unrestricted public-use data sets. Since some analyses require information on location at a smaller geo- graphic level, PD&R could provide information about the location of each household at the smallest level of geography consistent with protecting the household’s personal information, as long as such an effort did not unduly delay the production of an unrestricted public-use data set. It could also develop procedures for providing access to a restricted-use version of the data set that contains more detailed information about location to any person with a valid research use for it. Many other government agencies and organizations have found ways to routinely provide such data sets on individual households to researchers in ways that protect confidential information about households from abuse. The Panel Study of Income Dynamics, for example, provides each household’s census tract, as well as its personal information, to researchers who sign confidentiality agreements designed to avoid abuse of this information. Since other agencies and orga- nizations have developed protocols for dealing with confidentiality issues, PD&R would not have to start from ground zero in developing protocols that would expand the access of independent researchers to HUD’s admin- istrative data. Data from HUD-Funded Studies HUD sponsors many studies that involve substantial original data col- lection, such as the 2000 Housing Discrimination Study (HDS-2000) and the MTO demonstration. Contractors are routinely required to deliver to HUD data sets from their studies and documentation for the data sets. However, due to the staff time that would be involved, PD&R has rarely created unrestricted public-use data sets from the data sets delivered.7 7 The HDS-2000 data set is the primary exception, but due to concerns about the amount of staff time that would be required to answer questions about it, this data set is not listed among PD&R’s data sets on HUD USER. Instead, it is stored in a folder with the project’s final report in the publications directory (see http://www.huduser.org/publications/hsgfin/hds. html [accessed August 15, 2008]). An early MTO data set was made available, but only for a limited period of time, to avoid demands on staff time.

OCR for page 113
 REBUILDING THE RESEARCH CAPACITY AT HUD Instead of creating unrestricted public-use data sets, PD&R responds on a case-by-case basis to requests for access to the data sets supplied by HUD contractors. Given the fixed cost of creating a public-use data set, a case-by-case approach may make sense for data sets of limited interest to researchers outside HUD. However, for rich data sets of interest to many researchers, such as the MTO data, another approach could be used. PD&R could produce an unrestricted public-use version of the major data sets that result from its funded research, and it could always produce a restricted-use ver- sion that would be available to any reputable researcher who is willing to sign a confidentiality agreement.8 This would be a very effective way to leverage the taxpayer’s investment in the original study, and it could result in important new analyses. It would enable analysts who were not involved in the study to attempt to replicate the results reported by HUD’s contrac- tors, determine the sensitivity of these results to reasonable alternative assumptions and methods, and produce new findings outside the purview of the funded study. It is the committee’s understanding that creating a public-use or pub- licly available data base is not currently required of contractors and grant- ees because of the additional expense involved in preparing such data bases, primarily as regards careful documentation. But, particularly in the case of major studies, the potential additional value of multiple new analyses almost certainly outweighs the cost. To achieve that value, the budget for each study that involves the collection of data of broad interest to researchers would have to include sufficient money to prepare a carefully documented public-use or publicly available data base. One option is for PD&R to work with the Inter-University Consortium of Political and Social Research (ICPSR) at the University of Michigan, which is a long-standing and well-regarded repository of major social sci- ence data bases including the AHS. ICPSR’s core mission is to “acquire and preserve social science data,” and it is particularly interested in data arising from survey research and administrative records.9 ICPSR follows established practices for protecting the confidentiality of research subjects. Although ICPSR prefers data bases that are accompanied by comprehensive technical documentation, it will consider “lower quality data” if they have “unique historical value.” ICPSR also offers significant value added by pre- 8 When almost all research uses of a data set require information on the location of the house- hold at a small level of geography, an unrestricted public-use data set may be of little value; in these cases, it may make sense to produce only a restricted-use version of the data set. 9 For a description of the two main types of data assembled for PD&R studies, see http:// www.icpsr.umich.edu/ICPSR/org/policies/colldev.html [accessed August 15, 2008]).

OCR for page 113
 EVALUATION OF PUBLIC-USE DATA SETS paring data and documentation files in user-friendly formats and providing a detailed description of each study in its archive. Administrative Data on Housing Assistance Recipients In the past, HUD collected data on the previous rent of new recipi- ents of housing assistance (see U.S. Department of Housing and Urban Development, 1978), but then stopped collecting this information. The reasons for the decision to stop are not clear, and important information is being missed. For families moving from unsubsidized units, previous rent provides an excellent summary measure of the overall desirability of the housing occupied by new recipients of housing assistance immediately prior to receipt of assistance. Before HUD stopped collecting this information, it was used in several studies to greatly reduce the bias and increase the precision of estimates of the effects of low-income housing programs on the desirability of the housing occupied and the recipients’ expenditure on other goods (see Murray, 1975). Asking about previous rent and a few other questions, such as whether the previous unit was publicly subsidized or shared with others, on the form (HUD 50058) used to determine a family’s eligibility for assistance would provide extremely important information about the performance of low-income housing programs at very low cost. Going a step further and adding the family’s previous address would be very useful in determining the effect of the housing program on the type of neighborhood in which the family lives. HUD 50058 does contain information on the family’s previous ZIP code, but address information would enable the identification of previous addresses at such geographic levels as census tract, for which data on many neighborhood characteristics are available. Indeed, HUD’s large expenditure on low-income housing assistance and the dearth of evidence on the effects of this assistance argue for going beyond these simple measures. PD&R’s Customer Satisfaction Surveys have demonstrated that accurate detailed information about the housing of assisted households can be obtained at a modest cost by asking recipients to fill out a questionnaire (see U.S. Department of Housing and Urban Development, 1998b). Asking a large random sample of new recipients of housing assistance to complete a slightly expanded version of this question- naire for both their previous and their new subsidized unit would yield reli- able information about program effects. To determine the effect of housing assistance on the amount that families have to spend on other goods and their neighborhood, the expanded survey could also contain information about the rent and location of their previous unit. The information in these surveys, together with the information routinely collected to determine each family’s eligibility for assistance and contribution to rent under the

OCR for page 113
 REBUILDING THE RESEARCH CAPACITY AT HUD housing program, could provide the basis for an analysis of the effects of each housing program on the types of housing and neighborhoods occupied by recipients of assistance and their expenditure on other goods, and how these effects differ for different types of households. CONCLUSIONS AND RECOMMENDATIONS The provision of data for public use is arguably one of PD&R’s most important functions. Its data sets are heavily used for program adminis- tration and evaluation, policy development, private decision making, and studying the behavior of individuals and the operation of markets. As noted at the beginning of this chapter, PD&R’s public-use data sets are too numerous and their uses too varied for the committee to have evaluated the benefits and costs of every one or suggest opportunities for improving each of them. The AHS is PD&R’s most important data set for program evaluation and policy development. It is one of the federal government’s richest data sets and collects a much wider range of information than any other HUD- funded data set. Most importantly, the AHS is the only national data set that contains detailed information about the characteristics of dwelling units. Despite its many virtues, the AHS has some serious limitations for program evaluation and policy development: most important, it does not accurately identify the type of housing assistance received by each house- hold and its sample of subsidized households is too small. There are ways to overcome this limitation, by using administrative data to identify what specific programs provide housing assistance to each household in the AHS and increase the sample of assisted households, if necessary, at the expense of fewer unassisted households. In addition, to cover important topics not now covered in the AHS, PD&R could increase the use of topical modules, with the ultimate goal of including one in each biennial survey. This can be done at minimal cost without increasing the length of the questionnaire by delegating to a topical module some of the questions that are currently asked in each survey. This would make it possible to ask new questions in each survey without increas- ing the length of the questionnaire and thereby compromising participation or accuracy. Second, the committee recommends that PD&R establish an ad hoc committee to thoroughly review the content and other aspects of the AHS. The committee believes that a survey as expensive as the AHS would benefit from an occasional comprehensive reconsideration of its many fea- tures by a committee representative of its many users and uses. Because HUD has a significant interest in the tax-credit program, PD&R has funded the collection of information about tax-credit projects

OCR for page 113
 EVALUATION OF PUBLIC-USE DATA SETS from the state agencies that administer it. Although this data set has been used for important purposes, the LIHTC data set does not contain the information needed to estimate the most important effects of the tax-credit program or the effects of the subsidies that HUD provides to many families in its projects. It would be very expensive to overcome this deficiency by adding the necessary information for each project to the LIHTC data set. The best approach to using the data set for program evaluation would be to use information on addresses of tax credit projects to append the infor- mation in the LIHTC data set to other information on the households and housing units in the AHS. The Picture of Subsidized Households provides summary statistics on the characteristics of HUD-assisted households at the level of housing projects and census tracts. For some time now, this data set has been badly out of date. Although it provides useful simple descriptive statistics about HUD- assisted households, the PSH does not permit more complicated descriptive statistics, let alone the data needed to estimate program effects. The committee recognizes that the provision of additional public-use data sets requires additional resources. However, the committee believes that this would be money well spent. The availability of these data sets will stimulate considerable independent research that is important for achieving HUD’s goals. Finally, the committee is deeply concerned about the steady and substan- tial cutbacks in PD&R’s provision of public-use data over the past decade that has resulted from the reduction in PD&R’s budget adjusted for infla- tion. These cutbacks include, most importantly, the reduction in the number of metropolitan areas, the frequency of data collection, and the sample sizes in the AHS metropolitan sample and the apparent cancellation of the 2010 RFS. In the committee’s judgment, the country can ill afford decisions about important public policy initiatives based on inferior information about the current situation and the likely impacts of these policy reforms. Timely data of high quality is a key ingredient in producing this information. Major Recommendation 5: PD&R should strengthen its surveys and admin- istrative data sets and make them all publicly available on a set schedule. Recommendation 7-1: The number of metropolitan areas in the AHS, the frequency with which they are surveyed, and the sizes of the sample in each area should be increased substantially. Recommendation 7-2: PD&R should modify the AHS to increase its use- fulness for program evaluation and policy development. Administrative data should be used to identify the combination of programs that provide

OCR for page 113
 REBUILDING THE RESEARCH CAPACITY AT HUD assistance on behalf of each household, and the sample of households receiving housing assistance should be greatly increased. PD&R should also increase the use of topical modules in the AHS, funded in part by external sources. Recommendation 7-3: PD&R should establish an ad hoc committee to thoroughly review the content and other aspects of the AHS. Recommendation 7-4: Ensuring that the RFS is conducted in 2011 should be a high priority. Recommendation 7-5: PD&R should assign a high priority to the produc- tion of an up-to-date PSH. Recommendation 7-6: PD&R should produce a public-use version of HUD’s administrative data sets that provide information on the characteristics of HUD-assisted households, and it should develop procedures for providing access to a restricted-use version of the data set that contains more detailed information about household location to any reputable researcher. Recommendation 7-7: PD&R contracts for studies that involve the collec- tion of data of interest to many researchers should contain a restricted-use version of the data set that would be available to any reputable researcher and a public-use version when at least one important research use of the data set does not require information on the location of the household at a low level of geography. Recommendation 7-8: PD&R should use its Customer Satisfaction Survey to collect information on the housing and neighborhood conditions right before and after receipt of housing assistance for a random sample of new recipients to assess the effects of housing assistance.