|
|
|||||||||||||||||||||||||||||||||||||||
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 61
Annex G: Issues Related to Sampling
BACKGROUND:
Number of Phase II (1992 – 2000): Until we receive and integrate all databases, we do not know this.
Combining data from the SBA web site (1997-2000) and SBA published reports for 1992,1993, and 1994 and
extrapolating from DoD data for 1995 and 1996, I estimate this number to be about 10,800. Based on the three
published reports, about 7 percent of these Phase II are from the smaller agencies. Thus if we consider only the big
five 10,000 is a good approximation.
Number of Awards per Firm: Until we receive and integrate all databases, we do not know how many firms have
only one phase II award, or two or three etc. Thus I must estimate how many surveys will be generated by the
following approach.
Existing Commercialization Data: DoD has data by project for 10,372 Phase II projects. (This includes projects
from 1983 to 1991 and 2001). Since 1999, firms who have submitted SBIR or STRR proposals to DoD have had to
enter firm information and information on sales and investments for all of the Phase II awards that they received,
regardless of awarding agency. As a percent of Phase II awarded by Agency from 1992 to 2000, we have data on
approximately 75 percent of DoD, 67 percent of NASA and DOE, 54 percent of NSF and 16 percent of NIH/HHS
Phase II awards. DOE has provided commercialization data by product, which cannot be directly associated to
projects due to double counting. NASA has collected data by project, which could very useful to our examination of
NASA.
Proposed New Commercialization Database: We may set up a database comparable to the DoD one, to collect
initial data from firms not in the DoD Commercialization Database. The Commercialization Data includes substantial
information about the firm, which will not then have to be collected on the firm survey. It provides a broad overview
of all projects. This allows us to sample survey rather than 100 percent survey, yet still have info on a high
percentage of projects and firms. It also reduces the chance we will miss any high performing projects when we
sample.
Addresses: the use of a commercialization database insures we have a point of contact, phone number and email
address, which is important if not essential to executing a good on line survey.
SAMPLING APPROACH:
I propose several different samples described below.
Random Sample. After integrating the 10,000 awards in a single database, I will generate a random sample of
some percent of the awards (for example 20 percent) for each of the years; e.g., 20 percent of the 1992 awards, etc.
Generating the total sample one year at a time will provide a balance sample.
Random sample by agency. I would then group by agency and randomly select a few more as required to insure
each agency had at least 20 percent surveyed.
Top Performers. From the Commercialization database, we would identify the top projects in sales and investment.
(Since the current DoD Commercialization data include 10,372 projects, it gives us an approximation of how many
projects this would entail.) If we select all projects that had at least $5,000,000 in sales or at least $5,000,000 in
investment this would entail about 385 projects.
100 percent for Firms with a Small Number of Projects. I would like to survey 100 percent of the projects
that went to firms with only one or two awards (perhaps three). I would estimate about a third of the 10,000
awards went to firms with 2 or less awards. (Based on data from 1983 to 1993, which show 2/3 of all Phase II
awards went to firms with four or less awards and a roughly exponential distribution where firms with a single award
were most common, followed by firms with two etc.) These are the hardest firms to find; address information is
perishable, thus response rate is much lower. We usually have good address info for multiple winners, thus a much
higher level of response.
61
OCR for page 62
Coding: The database will track which sample(s) each survey belongs to. It would be possible for a random
sampled project to be a top performer from a firm, which had only two awards. Thus it could be coded as random
sample for the program, random sample for the awarding agency, top performer and 100 percent of single or double
winners. The database itself can group surveys that came from multiple winners once we establish how many
awards we use as a cutoff for that designation.
How many surveys: I estimate that if the random sample were 20 percent, this approach would generate about
5000 to 5500 project surveys and about 3000 firm surveys, assuming each firm that received at least one project
survey also received a firm survey. Although we would be sampling over half of the awards, firms that had many
awards would have surveys on slightly over 20 percent. The response rate depends on how much effort is spent
before the survey in insuring good addresses (Do we create the new commercialization database?) and how much
follow up and phone calls we make to people who do not respond. One agency mentioned that his survey had a 70-
80 percent response rate, but until he began phone calls that rate was 15 percent.
62
Representative terms from entire chapter:
commercialization data