| ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 3
Issues and
Recommendations
INTRODUCTION
Data are the building blocks of empirical research, whether in the behavioral,
social, biological, or physical sciences. To understand fully and extend the
work of others, researchers often require access to the data on which that work
is based. Yet many members of the scientific community are reluctant or un-
willing to share their data even after publication of analyses of them.
Sometimes this unwillingness results from the conditions under which data
were gathered; sometimes it results from a desire to carry out further analyses
before others do; and sometimes it results from the anticipated costs, in time
or money or both.
She Committee on National Statistics believes blat sharing scientific data
with colleagues reinforces the practice of open scientific inquiry. Cognizant
of the often substantial costs to the original investigator for sharing data, the
committee seeks to foster attitudes and practices within the scientific com-
munity that encourage researchers to share data with others as much as feasi-
ble.
Some examples illustrate the benefits, problems, controversies, and other
consequences of sharing research data.
3
OCR for page 4
4
Committee on National Statistics
Reanalysis of shared data may lead to a conflicting conclusion. Because an
original investigator published his raw data on measurements of human cra-
nial capacity by race and described his procedures and methods of sllmmariza-
tion, reanalysis of the data was possible. A reanalysis more Man 120 years
later overturned the original inves~gator's conclusions (Gould, 19781.
Confidentiality may be breeched by legally imposing sharing data. Despite
promises of confidentiality to respondents, researchers may be in jeopardy of
arrest if police or We courts request or demand data. A study headed by
James Carroll at Syracuse University on the confidentiality of social science
research sources and data identified many such cases (Carroll and Knerr,
1975~; one was the Office of Economic Opportunitr's New Jersey negative
income tax experiment, in which a local prosecutor issued 14 subpoenas re-
questing We names of welfare families receiving excess payments (Kershaw
end Fair, 1976~.
When data are not shared, an investigator's results may have a greater in-
fluence on public policy than if the data are analyzed by others. An economist
prepared a paper on the deterrent effect of capital punishment, in which he
concluded that one execution prevents eight murders. A draft version of dais
paper was used by the Solicitor General of the United States as an appendix to
the government's pro-capital punishment brief in a case before the Supreme
Court. Detailed data were not available for reanalysis. Other researchers
have now assembled what are believed to be virtually identical data sets, and
many analysts believe the data do not support the deterrence hypothesis.
Marketing of biomedical research militates against data sharing. Several
university researchers have refused to share with colleagues the exact details
of how they did experiments that were reported in papers submitted for publi-
cation because such details might compromise the profit-~ing potential of
their work.
Sharing proprietary data may be forbidden by the originator of the delta. A
distinguished professor of business is carrying out research based on data
from a firm that not only does not want others to see the data, but is not even
willing to be identified. The professor considers We research useful, but is
disturbed because We conditions under which he obtained We data preclude
the possibility of anyone verifying his statistical analyses.
These and other situations fuel an ongoing debate in We research communi
ty on what are appropriate principles and practices of data sharing.
OCR for page 5
Sharing Research Data
s
Issues in Data Sharing
The Committee on National Statistics convened a conference on sharing so-
cial science research data in October 1979, chaired by Clifford Mildred (see
Committee on National Statistics, 1980; see the appendix for a list of partici-
pants). The participants were in substantial agreement regarding the exigen-
cies faced by social science researchers and how these often conflicted with
goals of greater access to data. The issues they considered included whether
there is ever justification for refusing or unduly postponing access to data; the
impact on data access of data collectors' responsibility for maintaining the pri-
vacy of respondents and the confidentiality of records; the professional re-
sponsibility of researchers to promote access; and procedures under which ba-
sic data should be released to others.
The conference participants presented the Committee on National Statistics
with the following conclusions:
1. Guidelines on data sharing need to be developed. Desirable practices
may vary with Me source of the data and whether the research is publicly or
privately funded.
2. A variety of institutions could be helpful in promulgating guidelines for
desirable practices. The institutions include professional associations and
their journals, consortia for data archiving, and foundations and other organi-
zations Mat fund research.
3. Government policy on access to data is important. Much social science
research relies heavily on data provided by the government directly or in-
directly through grants and contracts for research.
4. Many problems of access to data in the natural sciences are sinular to
those in the social sciences.
5. Standards for classifying, documenting, and archiving data would great-
ly facilitate access to data.
In response to the conclusions of We conference, this report suggests guide-
lines for appropriate sharing of data and how government agencies and other
institutions can encourage and foster such sharing of data.
Scope of the Report
The exploratory conference focused on Me sharing of social science research
data. Most people believe Mat natural scientists have fewer problems in shar-
ing data than do social scientists. The need for shared data may be less acute
for natural science experiments, which usually are replicable a situation that
occurs more rarely in Me social sciences. Nonetheless, data-sharing prob-
OCR for page 6
6
Committee on National Statistics
lems have existed in the natural sciences that are really not much different
from those in We social sciences, such as instances in which only some obser-
vations are reported rather Man all.
Selective reporting of experimental results In Me physical sciences is not
uncommon. For example, Millikan's 1910 Science paper on the oil drop ex-
periment (see Holton, 1978) gave results based on 27 observations, although
40 observations were available; the most extreme 13 values were dropped.
Similarly, in a 1919 report to Me Astronomical and Royal Societies on expedi-
tions to test predictions of Einstein's general theory of relativity, Eddington
chose not to mention the results of one complete set of measurements that pro-
duced a value for the deflection of starlight consistent with the Newtonian,
rather Man the Einstein, prediction (see Eastman and Glymour, 19801.
Some data-sharing problems in the biomedical sciences are also similar to
those in the social sciences: for example, problems associated with large-
scale, controlled clinical trials closely resemble those associated win large-
scale social surveys. For these reasons, and because of the interests of the
Committee on National Statistics in areas such as clinical trials, public health,
and environmental monitoring, this report looks beyond the social sciences
and addresses the issues of data sharing more broadly. The emphasis of He
report remains on problems and practices in He social and behavioral
sciences, but occasional links and parallels to the natural and biomedical
sciences are identified and pursued.
This report specifically does not address two kinds of research. The first is
research with nonquantitative data. Researchers often depend on materials
other than quantitative information, such as anthropological field notes, oral
histories, photographs, or videotape records. Problems of access to research
archives In university libraries have occurred (see, for example, Halberstadt,
19821. Although such materials are research data, the principles and prac-
tices recommended in this report are not intended to cover them, primarily be-
cause their consideration was beyond the resources of the committee. It does
not mean, however, that access to such research materials is not important or
that this report may not help in clarifying relevant issues.
The second kind of research not specifically addressed is research pertain-
ing to national security matters. Recently the National Security Agency has
requested that some scientists who are not employed by He government sub-
mit their papers on He mathematical theory of codes to He agency for review
prior to publication. The purpose of such reviews is to prevent the publica-
tion of information damaging to national security. One government spokes-
man has proposed Hat reviews be extended to fields such as computer hard-
ware and software and crop projections (Hilts, 1982a, 1982b). Although pri-
or review militates against free and open research, He Committee believed
that to recommend guidelines for such review was beyond its scope. This re-
OCR for page 7
Sharing Research Data
port, however, notes the existence of such pressure affecting the environment
in which data sharing occurs.
The sharing of research data occurs in many ways. Sometimes data are pu-
blished as appendices to papers and books. Sometimes data are made avail-
able in response to requests from other investigators. More formal methods
for exchange often involve archives and data libraries, which may be particu-
larly appropriate for the massive data files from surveys and experiments.
Careful documentation is important to facilitate data sharing. Poor documen-
tation or its absence inhibits replication and thereby allows some researchers
to make bolder claims than they otherwise might. This report pays special at-
tention to the needs for and costs of good documentation, but the formal tech-
nical aspects of data archives and the documentation required to make data of
use to others are not covered.
The principles and guidelines for data sharing in this report are addressed
not only to researchers in academia and government but also to institutions
that provide funds for research. Over the past 20 years, government agencies
and private and public foundations have underwritten social science research
to collect and analyze substantial bodies of data. Social science data col-
lected by the government in particular have been analyzed extensively by
many researchers. This report, however, does not treat the special case of
transfer of large data sets usually general-purpose statistics or data from ad-
ministrative records among different agencies of the federal government, al-
though many of the findings and suggestions in the report may be applicable.
Such transfers were not included in the scope of this study because they are
governed by specific statutes and regulations.
This report summarizes some of the benefits and costs of sharing research
data with qualitative statements based on judgment that is bolstered by anec-
dotal evidence. Although quantitative estimates of benefits and costs are
highly desirable, the committee unfortunately did not have the time or re-
sources for assembling such estimates. Quantitative estimates of the benefits
of data sharing are related to an assessment of the benefits of data generally,
an issue that the committee has been and will continue exploring (National
Research Council, 1976; Committee on National Statistics, 1980~.
Parties to Scientific Research
Many different parties are involved in or affected by scientific research, from
the initial investigator to the public. These parties have different, sometimes
conflicting interests.
Initial investigators scientists who first collect data for analysis. These
scientists may work alone or in teams and in academic, commercial, nonprof-
it, or government settings. They have an interest in being the first to examine
OCR for page 8
8
i]
Committee on National Statistics
and analyze their data and to publish results of their research.
Subsequent analysts scientists who analyze one or more data sets col-
lected by others, for purposes of verification of the original analysis as well as
for analysis of new problems.) These scientists have an interest in obtaining
data of others for analysis.
Scientif c comn~unity ill scientists who engage in research. Their interest
~ the advancement of science through new knowledge is promoted by the
sharing of data.
Agencies and four~atior~s that fund research public and private groups
that give grants or contracts for research to be performed by others. Their
interest is in advancing science rawer than in commercial gain.
Organizations that conduct research universities, nonprofit institutions,
commercial organizations (such as biophannaceutical concerns), individuals,
and government agencies that conduct research, whether they use their own
funds or are supported by others. Their interest in shanug data can be those
of initial investigators, subsequent analysts, the scientific community, or any
combination of ~em.
Respondents to surveys and participants in experiments those who agree
to participate in a survey or experiment, whether voluntarily or whether they
receive remuneration or other direct benefit. Respondents have an interest in
Me protection of We confidentiality of information they have given, in lirnit-
ing the invasion of Weir privacy, in reducing their time and effort required to
participate in surveys and experiments, as well as in the advancement of
science resulting from such investment of time and effort.
The public society generally. The public interest is served by open, free,
productive, and efficient science.
The different parties involved in or affected by scientific research have differ-
ent and sometimes conflicting interests when it comes to issues of data shar-
ing. The report and We papers in this volume address He interests of these
groups, and many of Be committee's recommendations reflect a balancing of
conflicting interests.
Occasionally in the report and frequently in the papers, cases are mentioned
in which data were shared or in which unsuccessful attempts were made to ob-
tain data from pnocipal investigators. These cases are included to illustrate
various aspects of data sharing He benefits, the costs, He bamers. The
cases are not included to assess blame on particular principal investigators or
over parties. Sometimes an incomplete account is given; sometimes the
'By this definition, subsequent analysts include secondary analysts. A definition of second-
ary analysis is provided by Hyman (1972:1): "extraction of knowledge on topics other than those
which were the focus of the ongmal surveys."
Representative terms from entire chapter:
national statistics