14 The terms of federal research
funding could also support efforts for a clearinghouse. For
example, the National Science Foundation expects grantees to share
research data with other researchers (with safeguards in place for
the rights of experimental subjects and the like). Depositing
research data in a clearinghouse could be an efficient way for
researchers to satisfy this expectation. An explicit expectation
that data be deposited in appropriate data banks could also be
added as a condition of receiving grants.
In addition, a clearinghouse could encourage
comparabilityin both format and research
methodologyacross data sets and the reuse of data, especially
if academic researchers and also commercial data sources were to
collaborate on defining standards.
A possible model for such a clearinghouse is the
Inter-university Consortium for Political and Social Research
located within the Institute for Social Research at the University
of Michigan. Funded by subscribing member institutions, it provides
access to a large archive of computerized social science data. A
clearinghouse could also derive support from grant funding and
charges for access to data sets.
Both the archiving and standards-setting functions
would enable increased secondary use of data sets, which would of
course depend on the social science community's ease of access to
data in a clearinghouse. Joint work between social scientists and
technologists could lead to building new kinds of data
clearinghouses and new tools and techniques for making use of
them.
• Exploring ways for researchers to gain access to
private-sector data. Commercial data on firms' capital
investment in information technology is of considerable value to
researchers examining the social and economic impacts of computing
and communications. Consultant, trade magazine, and industry group
data is another valuable resource (see section 3.2.1).
Overall, however, several factors impede
collaboration between researchers and the private sector. First,
data on individual firms is often protected because of competitive
concerns. One remedy is for social scientists and the commercial
sector to explore aggregation and other ways of hiding individual
corporate identity. Second, incentives for collaboration by
private-sector firms typically are lacking, although one way of
providing them is to establish an agreement whereby researchers who
use private-sector data then make research results available in a
useful form to the firm or organization that supplied the data. To
both protect proprietary interests and increase incentives,
strengthened institutional relationships between the research
community and industry associations would be valuable.
It is important to note that private-sector
sources of data may have a number of possible limitations,
including a lack of consistent definitions and methods over time
and the tendency of private-sector firms to preserve only the
current information.
OCR for page 97
Page 97
In many of these cases closer working
relationships between researchers and the private sector can
provide solutions (see section 3.2.1 for examples).
• Increasing data collection efforts by government. As
described in section 3.2.3, deregulation and privatization can
reduce the quantity and availability of data on telecommunications
and computing at a time when more, not less, information is needed
to guide policy decisions. Budget constraints and government
reforms to reduce information gathering burdens also have reduced
data collection. In addition, fewer resources have been available
for analyzing data and for making the data publicly available.
Workshop participants noted that loss of such data
sources inhibits social science explorations of the social and
economic impacts of computing and communications. At a minimum,
government decision makers need to be aware of the cost of losing
such data. In some cases they may be able to find other ways to
gather valuable information. For example, additional questions
might be added to the Census Bureau's Current Population Survey
(CPS) to measure wireless phone or Internet use (see Box 3.6), as
was done by the National Telecommunications and Information
Administration and described in the report "Falling Through the
Net" (NTIA, 1995), which reported on computer and modem use and
explored the demographics of telephone, computer, and modem use in
terms of population density, ethnicity, age, and economic status.
However, this approach has been taken only once, because it is
expensive to add supplemental questions to the CPS.
• Exploring the development of new multipurpose data sets by
the research community. To what extent can multipurpose data
sets based on such techniques as user diaries prove helpful to
researchers examining the social impacts of information technology?
Careful observational methods have been critical to specific deep
organizational studies in such areas as computer-supported
cooperative work (e.g., the study of Xerox technicians described in
section 2.3.4). However such research has not typically relied on
multipurpose public data sets. As the body of observational data
grows, it may be possible to start development of such
nonquantitative multipurpose data sets. A precedent is the Human
Relations Area Files at Yale University, which consist of
ethnographic extracts organized in various categories. Given a rich
enough corpus of observational data in a given domain, it may prove
both possible and valuable to invest in the creation of new,
qualitative multipurpose data sets.
• Establishing stronger ties with industry associations to
facilitate collaborative research. In general, proprietary
concerns are likely to impede collaboration between academic
researchers and private-sector firms. Yet to explore topics such as
the relationships among organizational structure, the use of
information technology, and productivity, researchers need access
to firm- and process-level data that typically is not public.
Nonpublic data relevant to other social and economic research
includes details of pricing, employment, demand forecasts
OCR for page 98
Page 98
and the like. Lack of experience in collaborative
work between the two communities is another barrier.
Industry associations are a possible bridge
between the communities, to allow each to benefit from the
resources of the other. One approach might involve sponsorship of
forums where academics and industry people can meet to discuss
common interests. These events can create serendipitous
opportunities for cooperation that could not have been predicted or
planned in formal brokering. Industry associations might also help
connect the academic and industry communities in more formal roles
such as the following:
As an intermediary that aggregates or otherwise
makes proprietary data anonymous so that firms will be more
comfortable about providing it to outsiders;
As a matchmaker in bringing together industry and
the research community to work on projects of mutual interest;
As a depository for research results based on
industry-provided data (if research reports are readily available
to them, industries may be more willing to provide data); and
As a sponsor of research on topics of interest to
the membership.
Note that limited financial resources may place
constraints on such collaboration. Since trade associations are
unlikely to have in-house resources to cover the administrative
costs of a research study collaboration, it may prove necessary to
structure such a project on a break-even basis for the
association.
• Exploring, in workshop sessions, uses of the Internet as a
source of data on social interactions. As described in section
3.3.2, the Internet can provide a wealth of information on group
and community behavior. It would be very useful to convene a
workshop of technologists, social scientists, academics, and
representatives of commercial interests to discuss and resolve such
issues as the following:
How to develop indicators of group behavior that
are publicly available on the Internet;
The feasibility of commercial services providing
data such as those on use of their forums and chat rooms;
Appropriate sampling and estimation
procedures;
Appropriate publishing and archiving
procedures;
Standards for data collection and exchange;
and
How to establish relationships with possible
providers of information (e.g., search engines or newsgroup
archives).
Such an endeavor would also need to address ethical and privacy
issues associated with data collection, archiving, and reporting as
well as the proprietary interests of commercial Internet
services.
OCR for page 99
Page 99
Notes
1. Many of them were developed for other users, but some have
been able to provide useful input to social science studies of
information technology. Multipurpose data sets are often collected
by organizations dedicated to this task such as the Institute for
Social Research (ISR) at the University of Michigan (see
‹http://www.isr.umich.edu/›), the National Opinion
Research Center at the University of Chicago (see
‹http://www.norc.uchicago.edu›), and several other
research organizations. Many such multipurpose data sets are
maintained by organizations like the Inter-university Consortium
for Political and Social Research (ICPSR, see
‹http://www.icpsr.umich.edu/›), which is supported by
member-university subscriptions. ICPSR, located within the
Institute for Social Research at the University of Michigan, is a
membership-based, not-for-profit organization serving member
colleges and universities in the United States and abroad. Data
sets can be found online at
‹http://www.isr.umich.edu›.
2. For details see the project description and related documents
available online at
‹http://www.INDEX.berkeley.edu›.
3. Data sets and reports are available online at
‹http://stats.bls.gov/cesprog.htm›.
4. Information on the Institute for Social Research can be found
online at ›http://www.icpsr.umich.edu›.
5. Work to prepare the new edition (Susan Carter, Scott Gartner,
Michael Haines, Alan Olmstead, Richard Sutch, and Gavin Wright,
editors, Historical Statistics of the United States from
Colonial Times to the Present, Millennial Edition, Cambridge
University Press, in preparation, scheduled for publication in
2000) has received partial support from the National Science
Foundation and the Alfred P. Sloan Foundation.
6. A workshop participant noted that university and college
promotion committees seem to give little weight to faculty
contributions to such projects, perhaps because they, too, share
the widespread misunderstanding of the value of metadata and the
scholarly research required to create a metadata set.
7. Brynjolfsson and Kemerer knew that Software Digest,
published by National Software Testing Labs (NSTL), a private firm,
had regularly reviewed all the major spreadsheet products and
conducted detailed feature evaluations. By matching these data with
price data from Dataquest, another private firm, they could
estimate the values that consumers placed on various software
features.
The difficulties came in trying to get historical data. Only
intervention by the president of Dataquest following a chance
meeting at an industry dinner finally led the researchers to
historical data on prices. Obtaining data on software features
required a different sort of intervention. To save on storage
space, NSTL had simply erased all information regarding previous
years' spreadsheet products because there was no market for that
information. Repeated queries to various managers of the firm, as
well as efforts to find the back issues in libraries, were
unsuccessful. Finally, a mid-level employee of the firm came to the
rescue. On his own initiative, he had stockpiled back issues of
Software Digest in the basement of his home, along with
thousands of other magazines. He agreed to ship several large boxes
with the relevant issues to Brynjolfsson and Kemerer, where they
were duly re-entered into a computer database.
8. Dealing with people's inquiries for data can be time
consuming because people often need different data formats, more
detailed documentation, and follow-up explanations of what the
variables mean; make requests for related data; and have other
requirements that need attention. At a minimum, such database
support activities involve one or more conversations with people at
each participating company.
OCR for page 100
Page 100
9. The NSFNET backbone was originally the core of the Internet,
and thus much of the total Internet traffic passed through it,
making useful measurement of total Internet traffic possible. When
this backbone was replaced by a new architecture, data on total
traffic became harder to acquire. Also, the architecture of the
backbone had been designed to allow more measurements than are now
possible with the off-the-shelf router components that were used
post-NSFNET, in order to satisfy deliverables of traffic
measurement in the agreement with NSF.
10. Web advertising is one area where an effort has been made to
develop useful definitions of access and use (Novak and Hoffman,
1997).
11. For several years in the early 1990s Brian Reid, an employee
of the Digital Equipment Corporation, collected and posted on the
Internet data on Usenet groups. Each month he used a sampling plan
to report estimates of Usenet readership and message traffic for
all Usenet groups. Researchers were able to use Reid's data to
track growth in overall group membership over time, track the
relative popularity of different groups at any one point in time,
or identify groups worthy of further study (e.g., Sproull and
Faraj, 1995). Reid stopped collecting and reporting Usenet data in
June 1995.
12. Reid's study of Usenet traffic is a case in point. One
reason Reid stopped collecting Usenet data was that its relevance
declined when the World Wide Web was invented and the ratio of
quality material to junk declined markedly. Another reason was the
evolution of the way Usenet data was distributed, which made the
study's measurement techniques increasingly statistically
meaningless. However, the ultimate reason for ending the study was
not technological, but rather was related to the threat of legal
challenges over privacy issues surrounding collection, analysis,
and publication of the Usenet data (personal communication, Brian
Reid, Digital Equipment Corporation, 1998).
13. This work replicated findings by Hesse et al. (1993) derived
from analysis of electronic survey data.
14. GenBank, an annotated collection of publicly available DNA
sequences, is part of the International Nucleotide Sequence
Database Collaboration, which comprises the DNA Data Bank of Japan,
the European Molecular Biology Laboratory data library, and
GenBank. More information on GenBank may be found online at
‹http://www.ncbi.nlm.nih.gov/Web/Genbank/index.html›.
Representative terms from entire chapter:
social research