6. Existing Data Sources

Specific data requirements are driven by a study’s methodology, and their definition would normally follow that discussion. As noted earlier, however, the status of prior agency studies and agency databases will affect the methodologies selected and hence data collection needs. The Committee will make an initial effort to identify and review existing data sources. These will be extended and undoubtedly modified during the early stages of Phase II of the NRC study.

In general, existing data will be extracted in the main from the following sources:

  • Agency SBIR databases

  • Published agency reports

  • Internal agency analysis

  • SBA and GAO reports

  • Previously conducted recipient surveys

  • Academic literature

  • Prior NRC studies

These existing data sources are briefly discussed below.

Existing agency and SBA reports

The agencies appear to have produced few major reports on their own SBIR programs, aside from annual reports to SBA. In addition to Fast Track, DoD has unpublished studies; NASA recently completed some analysis; NSF also has some internal assessments. These agency reports must be assessed for accuracy and comprehensiveness, as an early-stage priority under Phase II of the NRC study.51Annex E provides a list of these agency studies

Existing agency SBIR databases

All five agencies maintain databases of awards and awardees. This information typically contains basic information about the awardee (e.g., company name, Principal Investigator, contact address), information about the award (amount, date, award number), and in many cases, additional detailed project information (e.g., proposal summary, commercialization prospects.)52

In general, the agency databases offer reasonably strong input data – award amounts, dates, Principal Investigator information etc. – and relatively weak output data – commercial impact etc. The agency databases may have information on modifications that have added funds, but do not typically contain sufficient information about the use of funds (The abstract, which may be useful for case study decisions, does not lend itself to statistical use since the sample size is one for each unique abstract.)53

Thus, the agency databases will be most useful as sources two critical sets of information:

  • Basic information about awards, including some demographic data about awardees;

  • Contact information for awardees, useful as the survey distribution lists are developed. More technically, issues related to agency databases may include:

  • Completeness of the agency’s data

    • Do the data cover all of the applications received by the agency?

    • Are all grants accounted for? Is the contact data up to date (i.e., what percentage respond to a contact effort based on this information)?

    • What year was the database started?

    • Does it maintain information about non-awardees?

    • What percent of SBIR Phase I awards get converted to Phase II awards

51  

See Annex D for a list of these reports.

52  

See NIH/NSF background papers for specifics.

53  

It would not be cost effective to try to group abstracts in any fashion.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 23
6. Existing Data Sources Specific data requirements are driven by a study’s methodology, and their definition would normally follow that discussion. As noted earlier, however, the status of prior agency studies and agency databases will affect the methodologies selected and hence data collection needs. The Committee will make an initial effort to identify and review existing data sources. These will be extended and undoubtedly modified during the early stages of Phase II of the NRC study. In general, existing data will be extracted in the main from the following sources: Agency SBIR databases Published agency reports Internal agency analysis SBA and GAO reports Previously conducted recipient surveys Academic literature Prior NRC studies These existing data sources are briefly discussed below. Existing agency and SBA reports The agencies appear to have produced few major reports on their own SBIR programs, aside from annual reports to SBA. In addition to Fast Track, DoD has unpublished studies; NASA recently completed some analysis; NSF also has some internal assessments. These agency reports must be assessed for accuracy and comprehensiveness, as an early-stage priority under Phase II of the NRC study.51 Annex E provides a list of these agency studies Existing agency SBIR databases All five agencies maintain databases of awards and awardees. This information typically contains basic information about the awardee (e.g., company name, Principal Investigator, contact address), information about the award (amount, date, award number), and in many cases, additional detailed project information (e.g., proposal summary, commercialization prospects.)52 In general, the agency databases offer reasonably strong input data – award amounts, dates, Principal Investigator information etc. – and relatively weak output data – commercial impact etc. The agency databases may have information on modifications that have added funds, but do not typically contain sufficient information about the use of funds (The abstract, which may be useful for case study decisions, does not lend itself to statistical use since the sample size is one for each unique abstract.)53 Thus, the agency databases will be most useful as sources two critical sets of information: Basic information about awards, including some demographic data about awardees; Contact information for awardees, useful as the survey distribution lists are developed. More technically, issues related to agency databases may include: Completeness of the agency’s data • Do the data cover all of the applications received by the agency? o Are all grants accounted for? Is the contact data up to date (i.e., what percentage respond to o a contact effort based on this information)? What year was the database started? o Does it maintain information about non-awardees? o What percent of SBIR Phase I awards get converted to Phase II awards o 51 See Annex D for a list of these reports. 52 See NIH/NSF background papers for specifics. 53 It would not be cost effective to try to group abstracts in any fashion. 23

OCR for page 23
How many SBIR Phase II contracts lead to Phase III o Accuracy of the data The biggest challenge here will be the transient nature of the firms and the • information. PI’s come and go; firms shrink and grow; firms are acquired; firms may close down, move, or o change names; Answers often depend on whom you ask; o Firms that are very successful may have new management in place as a result of venture o capital activity or other financial arrangements, or due to firm acquisition by another organization; There is often a long gestation between award of the SBIR Phase II and achievement of o significant revenues. Often other SBIR grants and other R&D may have occurred in the interval. There may not be anyone still at the firm knowledgeable of the link between the product and the SBIR; The most serious analytical issue may be the dependency on self-reporting, as the agencies o generally know little about commercialization except that which is self-reported by the firm. Depth of the data – Does the data reach firm level variables, award data, projects, and outcomes? • Conversely, what primary gaps in the data should be filled by primary research? The Committee will also need to assess data collected by agencies beyond that required by SBA, to see if there are opportunities and/or gaps. The expanded role of DoD data Recent DoD collections include information on projects in the earlier • studies, as well as in the next Fast Track: about one-third of the DoD collection is on projects awarded by other agencies. Note that information from the various data collection has not been cross- referenced and analyzed. It will take extensive effort to properly identify each project in each collection (as the collections for example lack common unique identifiers) The form of the data – is the agency data in paper form or is it computerized? • Relevant Features of Existing Survey Data Four substantial surveys have addressed commercial and other outcomes from SBIR: GAO (1992), DoD (1997), SBA (1999), and DoD Fast Track.54 In many areas, these surveys ask similar or identical questions, creating extensive databases of results relevant to many of the metrics being considered for use in this study. The Fast Track surveys each addressed a single SBIR Phase II award, and collected some information on the firm. 80 to 90 percent of the questions were about the specific award. Some firms have only one award. Some have over 100. GAO (1992), SBA (1999) and DoD (1997) each surveyed 100 percent of the SBIR Phase II awards made from 1983 through an end date that was four years prior to the date of the survey: i.e., GAO (1992) surveyed, in 1991, all SBIR Phase II project awards from 1983 through 1987.55 These studies provide coverage for the early years of the program. The existing survey results showed the distribution of commercialization to be quite skewed. For example, 868 of the 1310 reporting projects in the SBA survey had no sales. Fifty five had over $5 million in sales, one of which was over $240M, two were slightly over $100M, and five were between $46 M and $60M. Those 55 projects represent 1.5 percent of the number surveyed, 4.2 percent of the responses, but 76 percent of the total sales. This means that in collecting commercialization data, firm selection becomes critically important. Surveying a high percentage of the 54 See U.S. General Accounting Office, “Federal Research: Small Business Innovation Research shows success but can be strengthened.” Washington, D.C.: U.S. General Accounting Office, 1992. The DoD study on the commercialization of DoD SBIR was based on a survey of Phase II awards from 1984–1992. It involved an 80 in-person and 69 telephone interviews with SBIR firms, interviews with DoD program managers and laboratory officials. This study, completed in October 1997, is unpublished. The SBA study on the commercialization of SBIR was based on a 100 percent survey of Phase II awards from 1983 to 1993 of non-DoD agencies, and 43 in-person interviews with SBIR firms. This study, completed in July 1999 is unpublished. The DoD Fast Track study was conducted by the National Research Council. See National Research Council, The Small Business Innovation Research Program: An Assessment of the Department of Defense Fast Track Initiative, 2000, op. cit. 55 See GAO (1992) op. cit. An unpublished study by the SBA was completed in 1999, and an unpublished study by DoD was completed in 1997. See footnote 27 for description. 24

OCR for page 23
awards (using a long survey) has the related problems of imposing a substantial burden, and risks causing multiple award winners not to respond.56 A note on the SBA Tech-Net database: SBA maintains a database of information derived from the annual reports made on SBIR by the agencies.57 Mandatory collected data includes award year and amount, agency topic number, awarding agency, phase, title, and agency tracking number. (Tracking numbers were not mandatory through 1998.) However, this database is far from complete for our purposes: Principal Investigator (PI) Information Today, reporting the PI name is mandatory, but although there are fields for title, email address, and phone, these are not mandatory entries for the agencies to report. As recently as 1998, agencies did not have to report the name of the PI. Company information There are fields for the name, title, phone, and email of a company contact official, but these fields are not mandatory for the agencies to report. Award information Agency award contract or grant number, solicitation number, year of solicitation and number of employees have fields, but they are not mandatory. Technical project information. There are large fields for technical abstract, project anticipated results, and project comments, but they are not mandatory. Women and minorities Although information is mandatory on minority or women owned, it was not complete in the SBA data for the years before 1993.58 Other data. Other data, such as award date for SBIR Phase I and Phase II, completion date for each phase, additional (non SBIR Phase II) and subsequent funding provided by the agencies, agency POC for each SBIR Phase II, information on cost sharing (if applicable), etc. may be available in some agency data bases. 56 All prior efforts addressed only phase II. NIH and perhaps other agencies have indicated that they would be interested in a survey of Phase I winners that did not submit or did not win Phase II award. 57 Tech-Net is an electronic gateway of technology information and resources, maintained by SBA, for and about small high tech businesses. It provides a search engine for researchers, scientists, state, federal, and local government officials, can serve as a marketing tool for small firms, and can "link" investment opportunities for investors and other sources of capital. Visit SBA Tech-Net database at http://tech-net.sba.gov/index.html 58 Agencies have often reported information that is not mandatory so some of the above is available for many projects. For example, the SBA database through 1993 had names for 78 percent of the PI. It had phone numbers for just over half of the named PI. Number of firm employees was entered in 5 percent of the entries. 25