Appendix A
Overview of Current Data Collections

This appendix provides an overview of current data sources used to construct business and employment statistics and to inform research and policy related to business formation, dynamics, and performance. Our focus is on data produced by the U.S. federal statistical system, but we also cite other examples.

The material in this appendix is organized into subsections loosely defined in terms of data source characteristics and purpose:

  • data to count firms and catalogue essential characteristics—the business lists;

  • longitudinal data for tracking businesses over time;

  • data sources designed to improve coverage of small businesses;

  • aggregate employment statistics;

  • data on the self-employed, entrepreneurs, and business gestation;

  • coverage of special sectors, such as agriculture, nonprofit organizations, and e-commerce; and

  • financial data.

Beyond describing the basic design elements, we indicate the extent to which data are available to users outside the agencies (or other organizations) that collect them. Statistical agencies generally provide documenta-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future Appendix A Overview of Current Data Collections This appendix provides an overview of current data sources used to construct business and employment statistics and to inform research and policy related to business formation, dynamics, and performance. Our focus is on data produced by the U.S. federal statistical system, but we also cite other examples. The material in this appendix is organized into subsections loosely defined in terms of data source characteristics and purpose: data to count firms and catalogue essential characteristics—the business lists; longitudinal data for tracking businesses over time; data sources designed to improve coverage of small businesses; aggregate employment statistics; data on the self-employed, entrepreneurs, and business gestation; coverage of special sectors, such as agriculture, nonprofit organizations, and e-commerce; and financial data. Beyond describing the basic design elements, we indicate the extent to which data are available to users outside the agencies (or other organizations) that collect them. Statistical agencies generally provide documenta-

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future tion for their accessible data sources, and we try to avoid reproducing detailed descriptions that can be accessed elsewhere.1 Table A-1, located at the end of this appendix, allows quick cross-comparisons of various data sets. In this appendix, we omit several important kinds of business data that are more closely linked to production of aggregate statistics and are less central to the panel’s charge. For example, we do not directly discuss price data—notably the producer price index (PPI), which measures changes over time in the selling prices received by producers of goods and services—or the array of industry and input/output data (much of it deflated by PPI) crucial to productivity measurement and to the construction of the national accounts and statistics on gross domestic product (GDP). A.1 COUNTING FIRMS AND CATALOGING ESSENTIAL CHARACTERISTICS—THE BUSINESS LISTS The two primary business lists administered by federal statistical agencies in the United States are the Census Bureau’s business register (BR), and the Bureau of Labor Statistics’ Quarterly Unemployment Insurance (UI) address file, more commonly referred to as the Quarterly Census of Employment and Wages (QCEW). Administrative data from the Internal Revenue Service (IRS), which maintains the Business Master File, and the Social Security Administration (SSA) underpin the BR, while the QCEW relies on data from the state UI programs. The most noteworthy business list maintained outside government is the Dun & Bradstreet Dun’s Market Identifiers (DMI). A.1.1 The Census Business Register In 1968, the Office of Management and Budget directed the Census Bureau to develop and maintain a comprehensive business list. Known until recently as the Census Bureau’s Standard Statistical Establishment List (SSEL), the BR covers the universe of businesses—over 7 million employer businesses and some 16.5 million nonemployer businesses. The BR serves as the master enumeration list for sampling frames drawn for the Census Bureau’s firm and establishment surveys, most notably the quinquennial 1 The Kauffman Foundation web page (http://research.kauffman.org) has a well-organized list, with links of government and private sources of data on U.S. and international businesses; the list focuses on entrepreneurship, small business, and self-employment information. RAND, with funding from the Kauffman Foundation, has also assembled an overview of data resources on small businesses—see http://www.rand.org/pubs/working_papers/WR293/index.html.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future economic census. The economic census, conducted during years ending in 2 and 7, covers over 5 million companies; nonemployer and small businesses are covered by sample only, not a full census (http://www.census.gov/econ/overview/mu0000.html). Domestic, nonfarm business data are collected at the metropolitan statistical area (MSA) geographic level. Because it occurs only every five years and a firm can materialize and close (or vice versa) over shorter periods, the economic census does not comprehensively capture business birth and death information. The BR also serves the important function of providing central storage for an array of administrative data—most notably, payroll tax records, corporate and individual tax returns, and Employer Identification Number (EIN) application information. Maintenance of the BR is heavily dependent on these administrative data. Data on nonemployer firms are drawn exclusively from administrative sources, mainly business income tax records.2 Within the BR, data are organized at the establishment level—that is, a single location where goods are produced or services provided. Reflecting the composition of the economy, most are single-unit businesses, but there are establishments that are part of businesses operating in multiple locations as well. Because taxes—and in turn tax information—are collected from firms, Census researchers must break up IRS administrative data to the establishment level for multiunit enterprises.3 In interim business census years, this is done using information from the Company Organization Survey (COS)—an annual survey of all large employers (250 employees or more) and a sample of smaller mid-size companies, reaching approximately 50,000 of the largest multiunit enterprises. The accuracy of the single/multiunit identification is reported to improve around economic census years, and then to decline thereafter (Jarmin and Miranda, 2002). The COS is used more generally in an attempt to maintain up-to-date company affiliation, location, closings, spin-offs, and operating information for multiestablishment companies. This allows for fuller coverage of such companies, which account for the vast majority of the nation’s business activity. Title 13 of U.S. Code authorizes this and the other economic census-related surveys and stipulates mandatory responses. A key element of the BR program is the identification and tracking of individual establishments owned by multiestablishment firms. The BR has 2 See Jarmin and Miranda (2002) for a thorough description of the Census Bureau’s business register, including specifics about industry-level coverage. 3 As noted throughout this report, unit of observation is a key issue. Administrative data, such as those originating with the IRS, are collected and stored by taxpayer ID. The Census Bureau and BLS create enterprise and establishment units through supplemental surveys and various processing techniques. Both the QCEW and the BR are establishment-based but built from data organized at the taxpayer ID (EIN) level.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future excellent coverage of multiestablishment firms every five years, but in the years between the economic census, the Census Bureau relies on the COS to update the multiunit segment of the business registers along with information it learns about multiestablishment firms from its other surveys (e.g., Annual Survey of Manufactures4). The limited scope of the sampled firms and the rotation of these sampled firms over time affect the timeliness and coverage of smaller multiestablishment companies in the business register. Several studies have noted that both births and deaths of smaller multiunit establishments are more concentrated in the year prior to the economic census when the register is being prepared for the upcoming economic census.5 One sees this pattern of the clumping of changes in other dimensions of the data as well. McGuckin and Peck (1992) show that industry coding changes for establishments were especially concentrated in the economic census years. A number of improvements were made to the newest version of the BR, which became fully operational in January 2002: additional data elements were added; seven years of data are now maintained, instead of three, allowing tracking of businesses from one quinquennial census to the next; processing of nonemployer statistics, which previously were not maintained, has been expedited; and industry detail has been brought into concordance with the North American Industry Classification System (NAICS). In addition, in 2005, the IRS began providing quarterly employment data from tax form 941 instead of only for the first quarter. Form 941 includes the EIN, employer-reported wages and other compensation, employment for the pay period, income and social security tax withholdings, and related information. When a new business payroll record is received from the IRS, the Census Bureau adds a business employer record to the BR. Nonemployers cannot be identified as quickly, since personal income tax returns are filed annually rather than quarterly. Form 941 now also includes an identifier for businesses filing final tax returns—useful for capturing business deaths. In July 2004, Census began receiving SS-4 form data directly from the IRS (rather than by way of SSA, as before) on a weekly basis, which allows industry codes to be assigned to new businesses more quickly.6 4 For noneconomic census years, the Annual Survey of Manufactures provides sample estimates of employment, plant hours, payroll, number of establishments, cost of materials, value of shipments, inventories, and detailed capital expenditures statistics for commercial manufacturing establishments with paid employees (http://factfinder.census.gov). 5 Jarmin and Miranda (2002) discuss the importance of retiming both small multiunit births and deaths in these data in order to improve the accuracy of the annual birth and death statistics. 6 Salyers (2004) provides a “progress report” for the BR, including a full description of the expanded use of administrative records, as a more general listing of recent changes and improvements (http://www.stats.gov.cn/english/18roundtable/papers/t20041230_402219768.htm).

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future Data from the BR, as well as the more than 100 surveys that rely on its sampling frame, are used in the production of a wide range of publicly available aggregate statistics (many available on the Census Bureau’s American Fact Finder web page at http://factfinder.census.gov). A widely used product of the BR is the Census Bureau’s County Business Patterns. First-quarter employment and payroll numbers, cross-tabbed by county and kind of business, are published, cooperatively with the SSA, in the County Business Patterns and in the ZIP Code Business Patterns statistical series. In addition, the Census Bureau’s Non-Employer Statistics (NES) “provides U.S. and sub-national data by industry for businesses without paid employees.” Originating primarily from administrative records, the NES “summarizes the number of establishments and receipts of sole proprietorships, partnerships, and corporations without paid employees.” The Census Bureau began publishing NES data annually in 1997, and annual releases beginning with the year 2002 can be found on its American FactFinder web page. These publications provide geographic aggregates of the BR microdata. BR data are also essential to economic research conducted at the Center for Economic Studies (for a description of these uses, see http://www.ces.census.gov/index.php/ces/1.00/researchprogram). Although a number of BR-based aggregate statistics enjoy high visibility, the BR is also structured with confidentiality very much at the fore. The BR itself is not a publicly available document, although parts of the register can be used by researchers under highly restrictive arrangements at the Census Bureau’s research data centers (RDCs). Beyond this, data from administrative records are maintained in separate tables, and IRS Title 26 data are segregated from Census collected data. Microdata on race and gender, required for the Survey of Business Owners (SBO), is likewise stored in a separate table for use by SBO analysts only. A.1.2 The BLS Business List The other primary business list maintained in the federal statistical system is BLS’s QCEW—formerly the Business Establishment List, initiated in 1988. The QCEW converts data submitted by the universe of employer businesses covered by state UI systems (ES-202), as well as federal agencies subject to the Unemployment Compensation for Federal Employees program, to an establishment basis. The master file includes a number of key fields: establishment name, address, telephone number, monthly employment and quarterly wages, federal EIN—all available by NAICS code, county, and ownership sector for the entire United States.7 UI wage records 7 Full details are documented at the BLS QCEW home page (http://www.bls.gov/cew/).

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future for individuals working in UI-covered employment are used at times by BLS and the states to validate individual cases of large wage fluctuations and include name, Social Security number, employer name and address, employer ID, and total earnings paid. The QCEW serves as the sampling frame for most BLS surveys, and it is used to benchmark the Current Employment Statistics (CES) establishment survey. The establishment count also sets the population base in establishment birth and death estimators. The QCEW program provides a comprehensive source of employment and wage statistics, as well as a virtual census (98 percent) of employees on nonfarm payrolls (Spletzer et al., 2004). A crucial limitation of the QCEW—particularly in the context of understanding new and young business dynamics—is that it excludes nonemployer businesses and data on owner characteristics. The QCEW, which currently is geocoded to the rooftop level for 90 percent of private-sector employment, has plans for developing data at the census tract level. QCEW provides industry, employment, county, and physical location addresses on over 3 million firms, mostly new and small businesses, to the Census Bureau. However, the QCEW and BR have different structures which makes cross-survey comparisons difficult. In addition, requirements under the UI program’s Multiple Worksite Report (MWR) vary from state to state and have size thresholds that may exclude certain businesses.8 Finally, the ability to make longitudinal and cross-state linkages is complicated because no firm ID fields other than the tax ID number exist in the database (this is discussed again in the next section). In the MWR, “multi-location employers with a total of 10 or more employees in their secondary locations are required or requested” to break out their employment and payroll by individual establishment. The MWR is mandatory in 21 states and provides good coverage for all but the smallest multiestablishment employers on a timely basis.9 The timing of small multiestablishment births may not be accurate because reporting will depend on the secondary establishment passing a threshold size. Thus, when a single-location firm expands to a multilocation firm, one will not observe the “new” establishment until the establishment has at least 10 employees. In addition, if the expansion occurs across state lines, it may not be captured as a multiestablishment birth but as a new firm in the other state if it did not already have a presence in that state and if the firm has different EINs across states. There are also issues about firms that have multiple UI 8 For example, businesses are not required to report a location in another state if there is only one, other sites within the state if total employment from these sites is less than 10; or any site that is under a different UI account number (http://bls.gov/cew/). 9 BLS, http://www.bls.gov/cew/cewmrr00.htm.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future and EIN accounts within a state that may affect multiestablishment measurement. QCEW has tried to identify across-state expansions in two ways. First, the state staff may notice a significant change in employment and wages reported by a firm. Upon follow-up, the staff may determine that a firm should file the MWR. If the change in employment and wages is small enough that the state staff does not observe the differences, the need for the MWR filing is captured after the employer completes the Annual Refiling Survey (ARS) and reports a new location.10 About 2 million businesses are contacted annually to update such information as business name, address, and industry codes through the ARS. As with the BR, numerous data products and statistics are derived from the QCEW, most prominently the quarterly wage and employment statistics, aggregated at various industry and geographic levels. Microdata underlying the QCEW are not publicly accessible; however, BLS does offer limited opportunities for researchers to access confidential data for the purpose of conducting statistical analyses. Data access is restricted to onsite use at the BLS national office in Washington (a list of the restricted access data sets available to researchers can be found on the BLS web site, http://www.stats.bls.gov/bls/blsresda.htm). A.1.3 Dun’s Market Identifiers Business data are also collected by private-sector firms. These efforts are typically geared more toward marketing or informing business decisions and less toward research and public policy. The most prominent private-sector collection (and one that has been used for both purposes) is the Dun & Bradstreet (D&B) DMI. Because the BLS and Census business lists are not typically available as sampling frames outside those agencies, D&B data—and its Data Universal Numbering System (DUNS)—have been widely used in a variety of applications elsewhere in government. For example, it serves as the sampling frame for the Federal Reserve’s Survey of Small Business Finances. The DUNS numbers are also used by the federal government to identify entities receiving federal contracts. Data have been broadly used by private-sector firms to estimate numbers of businesses, establishments, and employees, as well as sales and to perform cost-benefit analyses and risk assessment exercises. D&B data products can be purchased and used subject to the company’s terms and conditions, which differ for end users (individuals, businesses, and information professionals).11 10 Based on correspondence from Jim Spletzer, BLS. 11 A full description of these terms and conditions can be found at http://library.dialog.com/bluesheets/htmlaa/bl0518.html.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future DMI includes basic data, updated monthly, on over 2.9 million private and public companies and 17 million U.S. business establishment locations (about 18.4 million records as of January 2006) operating in private, public, and government spheres (there are also European and other international versions). The data set is broadly representative of all businesses but limited to private and public companies with five or more employees or sales of $1 million, and consequently, it does not include many of the newest start-up firms or self-employed individuals.12 In contrast, the IRS reports that for 2003 about “19.7 million individual income tax returns reported nonfarm sole proprietorships” (Pierce, 2005), of which about 3 million filed a Schedule C-EZ, on which annual receipts totaled less than $25,000 (www.irs.gov). The file contains up to three years of basic data (the length of coverage varies by company), such as type of business, legal and trade names, physical and mailing addresses, geographical descriptions, product and industry descriptors, sales and number of employees (and the number at each corporate location), growth rates, annual sales, net worth and profit, names and titles of key executives, corporate linkages, DUNS numbers, and other marketing information. D&B data are collected from various sources, such as in-person and telephone interviews, government publications, and business trade programs and mailings, a fact that limits the quality of information in some important ways. For example, there is no standard guideline for detecting new businesses and incorporating them into the file—information is brought in ad hoc from applications for credit, classified advertising, and other private sources. Similarly, there is no clear process for purging records. Unlike several of the government data sources, DMI does not have a mechanism for determining establishment versus firm records. Furthermore, the data are not longitudinal; in fact, DMI is not cross-sectional for a specific point in time, since there is no regular schedule for updates—the process is ongoing (Haviland and Savych, 2005). A.2 TRACKING BUSINESSES OVER TIME: BUSINESS LIST-BASED SOURCES OF LONGITUDINAL MICRODATA Sources of longitudinal business microdata have historically been scarce, particularly for smaller and newer businesses. However, new data programs are emerging that greatly enhance available information relevant to the topics covered in this report. Among the most promising data sets now or soon to be coming online are the Integrated Longitudinal Business Data- 12 As such, D&B data have limited coverage of nonemployers.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future base (ILBD) and Longitudinal Employer-Household Dynamics programs at the Census Bureau and BLS’s Business Employment Dynamics. A.2.1 ILBD and Precursors The ILBD has evolved as a natural extension of the Longitudinal Business Database (LBD), which the Census Bureau’s Center for Economic Studies began constructing in 1999. The LBD covers employer establishments, currently for the period 1975-2003. These programs, which can be traced to the early 1990s (under various names), have expanded research capabilities to new frontiers that would not have been possible with aggregate and cross-sectional data alone. The LBD was constructed using EINs to link year-to-year snapshots of all employer establishments, along with name and location information contained in the Census Bureau’s SSEL. Work is ongoing to add such fields as payroll employment, location, industrial activity, and firm affiliation. The LBD is useful for researching elements of business dynamics, such as firm entry and exit and job flows. Establishment identifiers also facilitate linking the LBD to other data sets. The value of the data set is enhanced by its algorithm for flagging establishment records as births, deaths, or continuers. Generally speaking, a birth is identified when a record appears for the current year that does not match any record from the previous year; a death is detected when a record for a previous year does not match any record for the current year; and continuing establishments show a match from one year to the next (see Jarmin and Miranda, 2002, for a detailed explanation of this algorithm). The practice of using EINs in conjunction with name and address information is intended to increase the accuracy with which establishment births and deaths can be identified; missing source data for some years make this a challenge. The LBD itself is an extension of another CES predecessor, the Longitudinal Research Database (LRD), which contains longitudinally linked plant-level data from censuses and annual surveys of manufactures. With the relatively rapid growth and subsequent interest in nonmanufacturing sectors, the narrow focus of the LRD has become an increasing concern. Furthermore, LRD coverage of firms with fewer than 250 employees is limited, and the plant-level data are not linked to enterprises, so the overall size and industry of enterprises owning large plants are not always known. Despite these limitations, the LRD has been intensely analyzed and has spawned a robust literature (see Bartelsman and Doms, 2000, for a review of these efforts). The LBD allowed academic research on employment dynamics issues at the establishment level (forged by Dunne, Roberts, and Samuelson, 1989; Davis and Haltiwanger, 1990, 1992; and Davis,

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future Haltiwanger, and Schuh, 1996) to begin expanding beyond the manufacturing sectors. The ILBD marks another discrete advance for business research aimed at understanding the processes of small and young firms over time, as its coverage is much broader than its predecessors. Extending work by researchers such as Boden and Nucci (2004),13 the ILBD integrates federal government administrative records and survey sources for nearly all private, nonagricultural employer and nonemployer businesses in the United States, currently covering the years 1992 and 1994-2000 (see Jarmin and Miranda, 2003, Miranda et al., 2005, for a detailed description of the ILDB). One clear advantage of the ILBD over earlier data sets in the lineage is that it allows analysts to track a business’s characteristics as it transitions from nonemployer to employer status (or vice versa), a key but difficult-to-study aspect of business evolution. ILBD data have shown, for example, that over three-year horizons, about 5 percent of nonemployer businesses become employer businesses or are acquired by, or absorbed into, employer businesses. This translates to approximately 750,000 businesses—a large number in absolute terms—and is an important component of job creation. Employer businesses and some nonemployers are linked from period to period by EIN; most nonemployers are linked using business owner ID (Social Security number) fields. This technique is not seamless. For example, over time, ID numbers can change for legal or other reasons.14 In addition, problems of inconsistent data formats, the volatility of young and small firms, and the sheer number of records (over 15 million nonemployers and over 5 million employer businesses) all pose challenges for the Census Bureau staff carrying out the project. The ILBD has continually been under development and is not currently available to users outside the Census Bureau. Initial versions of some statistics are scheduled to be made available in the near future. Access to microdata will become available at RDCs, after further documentation of data quality assurance is completed, by perhaps as early as 2007. Access to ILBD data is governed by U.S. Code Title 13 (i.e., for statistical purposes only and with “predominant purpose” consistent with Census).15 13 Richard Boden and Alfred Nucci linked nonemployer entities to the business register both cross-sectionally and longitudinally for the years 1992 through 1999. Their paper enumerates the myriad of issues that arise when attempting to track nonemployer businesses over time, including those involving sole proprietorships (not the least of which is a change in legal form of organization). 14 The technical challenges inherent in ILBD linking procedures are documented in Davis et al. (2006). 15 Documentation of the Census Bureau’s RDC guidelines define “predominant purpose” and describes Title 13 requirements generally (http://www.ces.census.gov/index.php/ces/1.00/researchguidelines).

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future A.2.2 BLS’s Business Employment Dynamics (BED) Program The BLS’s BED program produces a quarterly series of gross job gains and gross job losses statistics based on the universe of establishments covered in the QCEW (those subject to state unemployment insurance laws). Sectoral designations now conform to the NAICS classification system. Again, the major exclusions are the self-employed, along with certain nonprofit organizations. Data from the program were first published in September 2003 and are now complete for the period 1992 to the first quarter of 2006. Quarterly data will be released every three months, making them more timely than the alternative employment data sources previously available.16 The BED data allow disaggregation of employment changes into the underlying components—the number (and percentage) of gross jobs gained by opening and expanding establishments and the number (and percentage) of gross jobs lost by closing and contracting establishments.17 These data, constructed using a multistep procedure to link QCEW microdata across periods, provide a picture of the dynamics underlying aggregate employment growth statistics.18 Research based on the quarterly time series contributes to knowledge of the processes underlying the business cycle; for example, Clayton, Sadeghi, and Talan (2005) identify seasonally adjusted job changes resulting from establishment openings and closings, as opposed to expansions and contractions. In general, BED data have revealed that firm and establishment growth rates vary by size and how these results differ from those produced by analyses limited to annual data. The primary obstacle to further development of the BED is that EINs are imperfect for creating record linkages (see Okolie, 2004); however both the QCEW and the BED incorporate a complex multi-stage process to link records across quarters.19 As with QCEW microdata, researchers must submit proposals to access BED data; if the proposal is accepted, the data must be used at the BLS research center in Washington. 16 These and other details can be found at http://www.bls.gov/bdm/bdmover.htm. 17 Getz et al. (2005) provide a detailed description of the methodologies used to capture business births and deaths in the various Census Bureau and BLS data sources. 18 Pivetz et al. (2001) describe the technique used to longitudinally link the data. 19 Clayton, Sadeghi, and Talan (2005) provides some detail on the linkage procedures for the QCEW.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future Current Employment Statistics (CES) [also known as Payroll Establishment Survey] Purpose/uses Provides employment, hours, and earnings estimates based on payroll records. Provides first economic indicator of current economic trends each month (with unemployment rate). Design basics Based on a sample of about 400,000 business establishments (160,000 firms). The LDB, stratified by state, industry, and employment size, serves as the sampling frame. Frequency Monthly Unit level Establishment Coverage Payroll employment for establishments in nonagricultural industries (over 1,150 industries). Hours and earnings data are collected from SESAs for about 850 industries. Content Total employment, full address, number of women employed, number of production or nonsupervisory workers, average hourly earnings, average weekly hours, average weekly earnings, and average weekly overtime hours in manufacturing industries. Limitations or lag time As with QCEW, there is no nonemployer, self-employed, or farm coverage, and no detailed owner or small firm characteristics. Geographic coding is available only by MSA. Establishments are not tracked over time and multiple jobholders are overrepresented. Accessibility of data Electronic access to selected indicator data is available. Microdata are not publicly available. Researcher can apply for access to the confidential microdata. American Time Use Survey (ATUS) Purpose/uses Collects information on how people in the United States spend their time, including kinds of activities and time spent doing them. Used in preparation of BLS press releases and to produce categorical time use tables on ATUS web site. Design basics Sample frame is drawn from households that have completed their final month of interviews for the CPS, utilizing a stratified, 3-stage sample. Frequency Data have been collected since 2003, and they are published annually. Unit level Individual (household) Coverage Civilian noninstitutional population and workers ages 16 and over. For 2004 and 2005, approximately 27,000 cases yielded about 13,500 completed interviews; the survey was roughly 50% larger in 2003. Diaries are used to capture data spent on various activities. Content Data are collected on major activity categories (work, sleep, eating, etc.) and on selected variables such as earnings, school enrollment, selected demographics, household, labor force characteristics, and hours worked. There is also a self-employment identifier. Limitations or lag time Little information is collected on secondary activities (those done in combination with other activities) not collected. Estimates subject to nonsampling errors, particularly if nonresponse is correlated with time use. Accessibility of data Published tables and microdata files available on the ATUS web site.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future Job Openings and Labor Turnover Survey (JOLTS) Purpose/uses Data serve as demand-side indicators of labor shortages at the national level. Availability of unfilled jobs—the job openings rate—is an important measure of job market dynamics that complements measures of unemployment. Design basics Data from a sample of approximately 16,000 U.S. business establishments are collected on a voluntary basis. The sample frame consists of approximately UI million establishments on the BLS' ES-202 QCEW file. Reference periods for total employment is the pay period that includes the 12th of the month; for job openings, it is the last business day of the month; for hires and separations, it is the entire calendar month. Frequency Data tables are released monthly. Unit level Establishment Coverage The survey covers all nonagricultural industries in the public and private sectors for the 50 states and the District of Columbia. Content Total employment, job openings, hires, quits, layoffs and discharges, and other separations. Limitations or lag time Data available only on a national level. No turnover rates by occupation. Accessibility of data Data are disseminated in a news release and through updated tables on the BLS website. BLS AND U.S. CENSUS BUREAU Current Population Survey (CPS) Purpose/uses Provides information on the labor force characteristics of the U.S. population. Data are used to calculate total employment (by occupation) and unemployment statistics. Used as sample frame for ATUS. Used to produce supplements on displaced workers, job tenure and occupational mobility. CPS data have also been used to construct the KIEA (1996 to 2004), a measure of business creation defined as the percentage of nonbusiness owners who started a business each month. Design basics Uses a household-based (from the Census Bureau) sampling frame and rotating sample design: respondents are in the survey for UI months, out for UI months, and back in for an additional UI months. Frequency Monthly, longitudinal panel capability upon matching, 1962 to present (matching is imperfect—annual match rates around 70%). Unit level Individual, family, and household Coverage Civilian noninstitutionalized population ages 16 and over. Survey size is approximately 60,000 households. Content Employment (by occupation and industry), indicators for self-employed, unemployment, business ownership, some characteristics of small business employees, earnings, hours of work, age, sex, race, marital status, and educational attainment. Supplemental questions on school enrollment, income, previous work experience, health, employment benefits, and work schedules. Limitations or lag time Record matching over time is imperfect. Accessibility of data Microdata are publicly available. Data can be accessed electronically.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future U.S. CENSUS BUREAU Census Bureau's Business Register (BR) Purpose/uses Provides a comprehensive database of U.S. business establishments and companies for statistical program use. Serves as the master enumeration list for sampling frames drawn for the Census Bureau's firm and establishment surveys, most notably the quinquennial economic census. Source for annual reports providing summary statistics (e.g., number of establishments, payroll, employment) by county and 6-digit NAICS industry. Design basics Sample frame draws from the IRS Business Master File, tax return data from Schedule BR 1040, SSA data, the economic census, COS, and other Census Bureau business surveys. Frequency Establishment listings are initiated and updated continuously with information from Census Bureau and other federal statistical and administrative records programs. Individual data items are updated anywhere from every quarter to every UI years (with the economic census). Unit level Establishment (organized by EIN, enterprise, and alternate reporting units) Coverage Employer and nonemployer businesses: 180,000 multiunit companies representing 1.5 million affiliated establishments, UI million single- establishment companies, and approximately 16.5 million nonemployer businesses. Content Business location (mailing and physical address), organization type, EIN, NAICS, LFO code, tax status, employment and payrolls, IRS reported sales and receipts or revenue, assets, interest income, gross rents, parent EIN, activity status, and filing requirement codes. Limitations or lag time Geo and industry codes are updated only every UI years. Lack of detail on small business owners. Accuracy of single versus multiunit identification—and of small multiunit establishment births and deaths—declines between economic censuses; no data on government or farms; ownership links for multistate firms not comprehensive. Accessibility of data Information is confidential under Title 13 and Title 26. No public-use data set. Researchers can apply for access to the confidential microdata. Company Organization Survey (COS) Purpose/uses Used to obtain current organization and operating information on multiestablishment firms to maintain and update the BR. Source for CBP Reports. Design basics Some multiestablishment companies receive annual mail-out/mail-back surveys. Smaller companies are selected when administrative data indicate a probable organizational change using a probability sampling procedure. About 40,000 multiunit companies with more than 250 employees and about 10,000 smaller multiunit companies are selected on a rotating basis. Content and coverage vary during the 5-year economic census program cycle. Frequency Conducted annually since 1974. Unit level Multiestablishment firms Coverage Cross-sectional survey of multiestablishment companies with payroll (and their establishments), excluding agricultural production companies. Content Companies identify establishments (including mailing and physical address) that have been sold, closed, continued, started, and acquired. Businesses are asked about first quarter and annual payroll, employment during the pay period including March 12 for each establishment, large foreign equity positions, and controlling interests held by other domestic or foreign-owned organizations. Limitations or lag time Limited scope and rotation of sampled firms over time affect the timeliness and coverage of smaller multiestablishment companies in the BR. Accessibility of data No public-use version available. Researcher can apply for access to the confidential microdata, which are typically available for a given year with about an 8-month lag.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future The Economic Census Purpose/uses Provides a detailed portrait of the economy once every UI years at both national and local levels. Used to update the Census BR and to produce industry and geographic area series and supplemental surveys of minority- and women-owned businesses. Design basics More than UI million companies are mailed a census form. Large- and medium-size firms, plus all firms known to operate more than one establishment are sent forms. For most very small firms, data from existing administrative records of other federal agencies are used. Data can be linked longitudinally. Geographic detail available varies by sector and can range from state to zip code levels. Frequency Data are collected every UI years (years ending in UI and 7). Unit level Establishment, firm Coverage All domestic nonfarm business establishments (not operated by government). Content Statistics tabulated for all industries covered include number of establishments, number of employees, payroll, and measure of output (sales, receipts, revenue, value of shipments, or value of construction work done). Additional items are available for certain sectors. Limitations or lag time Collected every UI years, making birth and death coverage incomplete. Nonemployer coverage by sample only (though this can be supplemented using annual nonemployer statistics); no detailed owner characteristics; and no government or farm coverage. Accessibility of data Though many publications based on the economic census are readily available, no public-use version of underlying microdata is available. Researcher can apply for access to the confidential microdata. Longitudinal Research Database (LRD) Purpose/uses Provides company-level data that have supported research on employment dynamics as well as on the issues related to productivity, profitability, and the uses of research and development. Design basics Data collected primarily using a mail-out/mail-back process. Periodically, visits to key companies are conducted to record the changing nature of R&D activities and any reporting difficulties companies may have, and to determine collectibility of proposed new items. A probability-proportionate-to- size approach is used for selecting individual companies for participation. Frequency Underlying survey data collected annually since 1957; however, only data from 1972 forward are included in the database. The R&D database is generally updated annually within UI years after the survey reference year. Unit level Plant Coverage Sample size has been approximately 25,000 companies since 1992. In any given year, the number of sampled companies that conduct or sponsor R&D activities is in the 3,500 to 4,000 range. Due to the concentration of R&D activities among the larger companies, most companies with significant R&D activities remain in the sample for a number of years. Content Mandatory items'. federal and company financed R&D, domestic net sales, domestic employment, and R&D by state. Voluntary items: information about scientists and engineers employed; basic and applied R&D using federal and company funds; contracted-out, foreign, and budgeted research; R&D by major type of expense and technology area; and energy R&D. Limitations or lag time Data historically limited to manufacturing sectors; coverage of firms with fewer than 250 employees is limited; plant-level data are not linked to enterprises. Accessibility of data Research is conducted by permanent and specially sworn Census Bureau employees. All current research is done at the Center for Economic Studies.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future Longitudinal Busness Database (LBD) Purpose/uses Used for researching establishment and firm dynamics (entry and exit) and job flows. Extends the LRD beyond manufacturing sectors Design basics Contains annual observations on employment and payroll for all businesses in the U.S. private sector. The database sources are periodic business surveys conducted by the Census Bureau and federal government administrative records. It uses SSEL data and EIN-based year-to-year linking. Frequency Annual Unit level Establishment and firm Coverage Longitudinal data set of all employer business establishments from 1975-2003. Content Establishment identifiers, age and tenure, payroll, employment, firm affiliation, name, and location (mailing and physical address) information. Work is ongoing to add payroll employment, location, industry activity, and firm affiliation. Limitations or lag time Linkages can be difficult due to inconsistent data formats, changing business ID numbers, and the sheer number of records. Accessibility of data None outside Census Bureau. Plans are in place to provide access to microdata available at RDCs after further documentation and quality assurance. Integrated Longitudinal Business Database (ILBD) Purpose/uses Extends LDB coverage to include nonemployer businesses providing research data on firm and job dynamics. The database allows a business's characteristics to be tracked as it transitions from nonemployer to employer status (or vice versa). The ILBD is currently used by researchers working on a wide variety of projects at the Census Bureau's Center for Economic Studies and RDCs. Design basics Integrates federal administrative records and survey data in a longitudinal structure. Records are linked by EIN or social security number. Frequency Data compiled annually, 1992, 1994 to 2000. Unit level Firm Coverage All private, nonagricultural, employer, and nonemployer businesses in the United States, currently covering the years 1992 and 1994 to 2000. This amounts to approximately 21 million businesses (over 15 million nonemployers and over UI million employers). Content A range of business characteristics, including location (mailing and physical address) as they transition from nonemployer to employer and visa versa. Limitations or lag time Linkages can be difficult due to inconsistent data formats, changing business ID numbers, and the sheer number of records. Accessibility of data Data access governed by Title 13 of the U.S. Code. Microdata will be available at RDCs in the near future.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future Longitudinal Employer-Household Dynamics (LEHD) Purpose/uses A microdata source designed to provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. Employment-household linked microdata create opportunities to conduct longitudinal research using on business start-ups early life-cycle dynamics and on local labor market conditions. Used by QWI to measure job churning and in designing the LED program. Design basics A set of infrastructure files using administrative data provided by state agencies, enhanced with information from demographic and business surveys and censuses. Uses LBD (linked to household data); federal and state administrative data; core Census Bureau censuses and surveys. Frequency Yearly, panel (1992 to 2001). Unit level Establishment and household Coverage Establishments from about 20 states (about UI million) and 80 million individual records. Content Integrates information about employers (including employment and payroll levels, industry, location, and employment history of employees); employees (including gender, race, foreign-born status, and date of birth); the skill mix of businesses; and employer- level accessions, separations, job creation, and destruction. Limitations or lag time Data only reveal workers' quarterly earnings, not work hours. For most workers data are not available on education or family characteristics. Accessibility of data Available to authorized users at Census Bureau controlled facilities. No public-use version available. Researcher can apply for access to the confidential microdata. Survey of Business Owners (SBO) Purpose/uses Provides statistics describing the composition of U.S. businesses by gender, race, and ethnicity of owner and sources of financing. Economic policy makers in federal, state, and local governments use SBO data as a source of information on business success and failure rates. Design basics Sample frame is based on IRS administrative data. The sample size is typically around 2.3 million businesses. Frequency Tied to economic census (every UI years). Unit level Establishment and owner Coverage Firms operating during reference year with receipts of $1,000 or more that filed tax forms as individual proprietorships, partnerships, or any type of corporation. Excludes those classified as agricultural production, domestically scheduled airlines, railroads, U.S. Postal Service, mutual funds (except real estate investment trusts), religious grant operations, private households and religious organizations, public administration, and government. Content Legal form of organization, receipts, business owner's race (self-identified and allowing for identification of more than one racial group), gender, ethnicity, age, education level, veteran status, and primary function in the business. Also includes family- and home-based businesses, types of customers and workers, and sources of financing for expansion, capital improvements, or start-up. Limitations or lag time Infrequent; not longitudinal Accessitaility of data No public-use version available. Researchers can apply for access to the confidential microdata.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future DUN & BRADSTREET (D&B) Duns Market Identifiers (DMI) Purpose/uses Provides basic company data for U.S. business establishments. Used as a sampling frame for a wide variety of government (e.g., the SSBF) and private-sector applications. Design basics Data are collected from a wide range of public and private sources including in-person and telephone interviews, government publications, business trade programs, mailings, and applications for credit Frequency Data are updated continuously, albeit in an ad hoc fashion. Unit level Establishment and firm Coverage U.S. business establishment locations of all sizes and types, including public and private companies, government agencies and contractors, and schools and universities. Includes over 17 million establishments; over 2.9 million private and public companies. Limited to companies with UI or more employees or sales of $1 million. Content Information on owners, sales, employment and legal status, full address, names of executives and titles, corporate linkages, Duns numbers, organization status, marketing information, primary SIC code, and sometimes a NAICS code. Limitations or lag time Relies on disparate sources for detecting appearance of new businesses—there are no standard guidelines. There is no distinction between firm and establishments. Information on ownership and small firm characteristics is limited. Accessibility of data Microdata available for a fee. FEDERAL RESERVE BOARD (FRB) Survey of Small Business Finances (SSBF) [oonducted by NORC] Purpose/uses The most comprehensive source of information available on the characteristics of small businesses and their owners, focusing on financial data. Data have been used to prepare the Report to Congress on the Availability of Credit to Small Business every UI years. Facilitates research on factors affecting prices and availability of credit; characteristics of small businesses and their influence on credit needs; experiences with credit applications; impact of government regulations on credit access; financial and nonfinancial sources used for financing needs. The FRB intends to discontinue the SSBF. Design basics About 24,000 firms from D&B are screened for a final sample of 4,240 (for 2003). Frequency About every UI years (1987, 1993, 1998, and 2003); cross-sectional. Unit level Firm Coverage Nationally representative sample from D&B of firms with fewer than 500 employees; oversamples African American, Asian American, and Hispanic American owned firms. Content Income and expenses, assets and liabilities, and financing sources. Limitations or lag time Infrequent—last conducted in 2003 with a low response rate—around 33%. Accessibility of data Only a small number of authorized staff at NORC and the Federal Reserve System has access to the raw microdata. A public-use version, altered to maintain respondent confidentiality, is available.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future GLOBAL ENTREPRENEURSHIP MONITOR CONSORTIUM Global Entrepreneurship Monitor (GEM) Purpose/uses Measure differences in the level of entrepreneurial activity between countries and the relationship between entrepreneurship and national economic growth; uncovers factors that lead to higher levels of entrepreneurship and suggest policies that may enhance levels of entrepreneurial activity. Data have been used to produce an Indicator of Total Entrepreneurial Activity and reports on women and entrepreneurship. Design basics Data are collected through a series of coordinated household surveys in an increasing number of countries using a common interview schedule and consolidating and standardizing responses. Adult population surveys range from 1,000 to nearly 27,000 individuals per country—the average sample size is about 2,000. Frequency Samples are drawn annually. Unit level Household Coverage Cross-national assessment of entrepreneurship in 35 countries covering three types of data: adult population surveys, national expert interviews, and standardized cross-national data. Content Level of entrepreneurial activity, variance between countries, and change over time; relationship between entrepreneurship and economic growth; how national experts assess entrepreneurial climate in their countries; who becomes an entrepreneur, why and what types of businesses they are creating; and the importance of venture capital and informal finance. Limitations or lag time Individual-level data available only after a several year lag (national summary reports are available with a lag of less than a year). Accessibility of data GEM consortium members have access to individual-level survey data, interview schedules, data collection procedures, and other material needed for systematic analysis. Public users can view all reports. INTERNAL REVENUE SERVICE (IRS) Statistics of Income (SOI) Purpose/uses Provides the only publicly available financial information on all corporations. Data products for S-corporations—those with 75 or fewer shareholders—are also available. The SOI provides data annually to BEA on partnerships, as well as producing annual information on nonfarm sole proprietorships from Schedule BR data. Design basics The survey is based on a stratified probability sample of 130,000 preaudited income tax returns or other forms filed with the IRS. Frequency Yearly, cross-section (1990 to 2002). Unit level Firm Coverage Corporations, S-corporations, partnerships, and nonfarm sole proprietorships. Content Net income statements, balance sheets, and tax information by industry, accounting periods, sizes of assets, receipts, and income taxes after credits. Limitations or lag time Potential reporting errors and inconsistencies, processing errors, and the effects of any early cutoff of sampling. Accessibility of data Though statistics are publicly available, researchers must apply for the access to the confidential microdata.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future KAUFFMAN FOUNDATION Kauffman Firm Survey (KFS) [with Mathematica Policy Research, Inc.] Purpose/uses Produces data on the financial development of new businesses and to track them in the first UI years of existence. The data set is intended to create a public-use data source that informs policy decisions and academic analysis. Design basics Sampled from D&B, a longitudinal survey of the principals of 5,000 firms that started operations in 2004. The survey is oriented primarily to generate data on the financial development of new businesses in their first four years of existence. High-technology businesses are oversampled. Surveys conducted by either telephone or on the Internet Frequency Begun in 2005, an annual survey with UI follow-up panels over the period 2006 to 2008. Unit level Owner and firm Coverage New businesses starting in year prior to reference year in the United States. Content Business characteristics, strategy and innovation, employment, business organization and benefits, business finances, and work behaviors and demographics of owner(s). Limitations or lag time D&B sampling frame is limited in ability to quickly incorporate new firms (see D&B). Accessibility of data Data not now available; ultimately, publicly available longitudinal data on new firms will be available. Panel Study of Entrepreneurial Dynamics (PSED) [with the University of Michigan] Purpose/uses A nationally representative database designed to enhance understanding of the business start-up phenomenon. Resulting data are intended to promote research into the business gestation process (i.e., the period before the business actually produces output). Design basics A longitudinal sample of U.S. households was contacted to find individuals who were actively engaged in starting new businesses. Those identified as nascent entrepreneurs were included in the group and asked to participate in two follow up interviews (each 12 months apart). Four waves of PSED exist for 1998 to 2003; a new cohort has been developed for interview in the UI years beginning in 2006. Frequency Annual Unit level Individual entrepreneurs located by household. Coverage Approximately 670 nascent entrepreneurs, identified through a survey of 64,000 U.S. households. Content Proportion and characteristics (gender, ethnicity, age, education, household income, and urban context) of the adult population attempting to start new businesses, kinds of activities nascent entrepreneurs undertake during start-up, and proportion and characteristics of start-up efforts that become infant firms. Limitations or lag time Does not interview respondents who do not qualify as nascent entrepreneurs when initially selected (doing so could eliminating the need for comparison groups). Accessibility of data Data from the PSED are maintained and made available for download by the University of Michigan's Institute for Social Research. Four panels of data are currently available covering the time period 1998 to 2003.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future Kauffman Financial and Business Database (KFBD) Purpose/uses To collect financial information on U.S. businesses. Design basics Data primarily used for credit scoring purposes are purchased from D&B. Data include recent, detailed financial information. On average the longitudinal database contains complete, consecutive financial statements for a period of UI years in length. Frequency Annual—data are purchased from D&B on a semi-annual basis. Unit level Firm Coverage The longitudinal file includes data for every year since 1996 and contains more than UI million records with financial information on more than 500,000 unique firms. Content Financial statements for 3-year periods for about 50,000 companies, including annual balance sheet, annual income statement, 14 standard financial ratios, and various firm-level demographic items. Data may be sorted by industry, year started, number of employees, annual sales, minority ownership, and detailed location information. Limitations or lag time D&B is limited in its coverage of the newest start-up firms and of self-employed individuals. Accessibility of data Data are available for legitimate research from the Kauffman Foundation. SMALL BUSINESS ADMINISTRATION (SBA) Statistics of U.S. Businesses (SUSB) (1989 to present) [compiled by the Census Bureau] Purpose/uses An annual series that provides national and subnational data on the distribution of economic activity by size and industry. Provides data on firms, establishments, employment, annual payroll, and estimated receipts (or sales) from which various tables are produced. Design basics Data items extracted from SSEL. The annual COS provides individual establishment data for multiestablishment companies. Data for single-establishment companies are obtained from the Annual Survey of Manufactures and Current Business Surveys, as well as from administrative records from IRS, the SSA, and BLS. Frequency Annual. Historical comparability is affected by definitional changes in establishments, activity status, and industrial classifications over the period 1988 to 2002. Unit level Establishment and firm Coverage The 1999 Statistics of U.S. Businesses covers all NAICS industries except crop and animal production, rail transportation, U.S. Postal Service, pension, health, welfare, vacation funds, trusts, estates, agency accounts, private households, and public administration. Content Tabulations can be made to estimate employment, annual payroll, number of firms, number of establishments by location and industry categories. Limitations or lag time The series excludes data on self-employed individuals, employees of private households, railroad employees, agricultural production employees, and most government employees. There is a 2-year time lag in reporting. Accessibility of data Tabulations of data by enterprise size for the country, states, and/or metropolitan statistical area can be accessed for recent years.

OCR for page 123
Understanding Business Dynamics: An Integrated Data System for America’s Future Business Information Tracking Series (BITS) (also known as Longitudinal Establishment and Enterprise Microdata (LEEM)) [constructed by the Census Bureau] Purpose/uses To identify firm births and deaths, expansions and contractions, and mergers and acquisitions and for examining job flows. Design basics BITS is constructed by longitudinally linking archived SUSB data. The data set currently includes about 13 million establishments. Frequency Yearly, panel (1989 to present) Unit level Establishment and firm Coverage Private-sector establishments (single physical locations) with positive payroll. Same industry coverage as CBP. Content Establishment- and firm-level data on annual payroll, 4-digit SIC, location, start year, legal entity, total employment, firm affiliation, census geography, starting year, census file number, and constant firm identifiers (meaning there is no change in the ID even if legal or ownership status changes). Limitations or lag time No self-employed; long lag in production (about UI years); only tracks establishments (not firms), and has no farm coverage. Accessibility of data Not publicly available. Must become a sworn Census researcher and use data at Census RDCs. STANDARD & POOR’S (S&P) COMPUSTAT Purpose/uses Tracks firm level activity for publicly traded, listed firms since 1950. Standardizes financial and accounting statement information on companies around the world for investors. Data used by hedge funds, money managers, analysts, researchers, corporations, and government (the IRS) and regulatory agencies. Design basics Database produced by S&P. Reporting units are identified by firm and by 4-digit SIC code and are business or industry segments, defined as a component of an enterprise engaged in providing a product, service, or group of related products or services primarily to customers outside the enterprise for profit. Frequency Quarterly (longitudinal since 1980). Unit level Firm and industry segment Coverage All publicly traded firms in U.S. stock markets (about 65,000 firms). Content Data include quarterly and annual income statements, balance sheets, and cash flow statements. Source information includes annual and quarterly SEC filings, 8-K, 20-F and Proxy filings, EDGAR filings and media releases and original annual reports. Limitations or lag time By design, limited to publicly traded firms (generally means mature entities). Accessibility of data Data available for a fee.