| ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 81
Apppendix A
Privacy for Research Data
Robert Gellman
INTRODUCTION
Scope and Purpose
The purpose of this paper is to describe privacy rules in the three most
important areas relevant to research uses of information involving remotely
sensed and self-identifying data. The three issues are (1) When is informa-
tion sufficiently identifiable so that privacy rules apply or privacy concerns
attach? (2) When does the collection of personal information fall under
regulation? and (3) What rules govern the disclosure of personal informa-
tion? In addition, a short discussion of liability for improper use or disclo-
sure is included. The goal is to provide sufficient information to illustrate
where lines—albeit vague, inconsistent, and incomplete—have been drawn.
Spatial information can have a variety of relationships with personal
data. A home address is spatial information that is likely to be personally
identifiable and will typically be included within the scope of statutory
privacy protections along with name, number, and other personal data.
Even in the absence of a statute, spatial data that are identifiable raise overt
privacy issues. In other contexts, spatial information linked with otherwise
nonidentifiable personal data (e.g., from an anonymous survey) may pro-
duce data that are personally identifiable or that may be potentially person-
ally identifiable. Spatial information is not unique in being either identifi-
able or linkable. However, the manner in which spatial information can
become linked with identifiable data or may create identifiable data differs
in practice from that for other types of data in both overt and subtle ways.
81
OCR for page 82
82 APPENDIX A
In general, data about individuals are growing more identifiable as
more information is collected, maintained, and available for public and
private uses. Technological developments also contribute to the increasing
identifiability of data that do not have overt identifiers. Spatial information
has both of these characteristics, more data and better technology. Linking
spatial information to research data can affect promises of confidentiality
that were made at the time of data collection and in ways that were not
foreseeable at that time. These are some of the challenges presented by the
use of spatial information.
Two preliminary observations about the complexity of privacy regula-
tion are in order. First, privacy regulation can be highly variable and unpre-
dictable in application. In the United States, privacy standards established
by statute may differ depending on the extent to which the information is
identifiable, the type of information, the identity of the record keeper, the
identity of the user, the purpose for which the information was collected or
is being used, the type of technology employed, and other elements. For
some information activities, such as surveillance, additional factors may be
relevant, including the manner in which information is stored or transmit-
ted, the location being surveilled, the place from which the surveillance is
done, and the nationality of the target. This list of factors is not exhaustive.
Second, American privacy regulation is often nonexistent. Privacy stat-
utes are often responsive to widely reported horror stories, and there are
huge gaps in statutory protections for privacy. For many types of personal
information, many categories of record keepers, and many types of infor-
mation collection and disclosure activities, no privacy rules apply. Further-
more, where regulation exists, information can sometimes be transferred
from a regulated to a nonregulated environment. A person in possession of
information regulated for privacy may be able to disclose the information
to a third party who is beyond the regulatory scheme. Common law stan-
dards may apply at times, but they rarely provide clear guidance.
The paper begins by discussing terminology, particularly distinctions
between privacy and confidentiality, and considers privacy as it is addressed
in legislation, administrative process, professional standards, and litigation
in the United States. Major legal and policy issues considered are identifi-
ability of personal data, data collection limitations, disclosure rules, and
liability for misuse of data.
A Note on Terminology
Privacy and confidentiality are troublesome terms because neither has a
universally recognized definition. While broad definitions can be found,
none is enlightening because definitions are at too high a level of abstrac-
tion and never offer operational guidance applicable in all contexts. Never-
OCR for page 83
83
PRIVACY FOR RESEARCH DATA
theless, because the terms are impossible to avoid, some clarification is
appropriate.
Privacy is generally an attribute of individuals. While some foreign
laws grant privacy rights to legal persons (e.g., corporations) as well as
individuals, American usage usually ties privacy interests to individuals.
That usage will be followed in this paper.
The scope of privacy interests that should be recognized is much de-
bated. Philosophers, sociologists, economists, physicians, lawyers, and oth-
ers have different views on the goals and meaning of privacy protection. For
present purposes, however, the focus is primarily on the privacy of personal
information. Europeans and others refer to this aspect of privacy as data
protection.
The most universally recognized statement of information privacy
policy comes from a 1980 document about Fair Information Practices (FIPs)
from the Organisation for Economic Co-Operation and Development.1
While this statement is not free from controversy, FIPs provide a highly
useful framework for discussing and evaluating information privacy mat-
ters. FIPs are useful because the principles define the elements of privacy in
some detail, and the details are crucial. The implementation of FIPs in any
context will vary because the principles are broad and not self-executing.
Applying FIPs is as much art as science.
Confidentiality is an attribute that can apply to individuals and to legal
persons. Both personal information and business information may be con-
fidential. However, the precise meaning of that designation is often unclear.
Statutes often designate information with the single-word descriptor of
confidential. However, these laws routinely fail to define the scope of con-
fidentiality, the obligations of record keepers, or the rights of record sub-
jects or third parties. Those who maintain statutorily designated confiden-
tial records may have to decide on their own if they can disclose information
to contractors, to police, to researchers, when required by other statutes, in
response to a subpoena, when requested by the data subject, or otherwise.
Standards for data collection, security, data quality, accountability, or ac-
cess and correction rights are typically wholly unaddressed.
Statutes that protect business information from disclosure suffer from
the same lack of specificity. The federal Freedom of Information Act (FOIA)
allows agencies to withhold “trade secrets and commercial or financial
information obtained from a person and privileged or confidential.”2 Each
of the terms in this phrase has been the subject of litigation, and different
courts have reached significantly different interpretations of what consti-
tutes confidential business information.
Categories of data held by government agencies sometimes have a des-
ignation suggesting or imposing a degree of secrecy. Under the Executive
Order on Security Classification,3 confidential is one of three terms with a
OCR for page 84
84 APPENDIX A
defined scope and process for designation of information that requires
protection in the interests of national defense and foreign policy. The other
terms are secret and top secret. However, many other terms used by federal
agencies (e.g., “for official use only” or “sensitive but unclassified”) to
categorize information as having some degree of confidentiality have no
defined standards.
The term confidential is much harder to encircle with a definition,
whether in whole or in part. It retains a useful meaning as broadly descrip-
tive of information of any type that may not be appropriate for unrestricted
public disclosure. Unadorned, however, a confidential designation cannot
be taken as a useful descriptor of rights and responsibilities. It offers a
sentiment and not a standard.
The terms privacy and confidentiality will not, by themselves, inform
anyone of the proper way to process information or balance the interests of
the parties to information collection, maintenance, use, or disclosure. In
any context, the propriety and legality of any type of information process-
ing must be judged by legal standards when applicable or by other stan-
dards, be they ethical, social, or local.
Local standards may arise from promises made by those who collect
and use personal data. Standards may be found, for example, in website
privacy policies or in promises made by researchers as part of the informed
consent process. In nearly all cases, broad promises of confidentiality may
create expectations that record keepers may not be able to fulfill. The laws
that may allow or require disclosure of records to third parties—and par-
ticularly the federal government—create a reality that cannot be hidden
behind a general promise of confidentiality. Other aspects of privacy (i.e.,
FIPs) may also require careful delineation. The vagueness of commonly
used terminology increases the need for clarity and specificity.
IDENTIFIABILITY AND PRIVACY
Information privacy laws protect personal privacy interests by regulat-
ing the collection, maintenance, use, and disclosure of personal informa-
tion. The protection of identifiable individuals is a principal goal of these
laws.4 Usually, it is apparent when information relates to an identifiable
individual because it includes a name, address, identification number, or
other overt identifier associated with a specific individual. Personal infor-
mation that cannot be linked to a specific individual typically falls outside
the scope of privacy regulation. However, the line between the regulated
and the unregulated is not always clear.
Removing overt identifiers does not ensure that the remaining informa-
tion is no longer identifiable. Data not expressly associated with a specific
individual may nevertheless be linked to that individual under some condi-
OCR for page 85
85
PRIVACY FOR RESEARCH DATA
tions. It may not always be easy to predict in advance when deidentified5
data can be linked. Factors that affect the identifiability of information
about individuals include unique or unusual data elements; the number of
available nonunique data elements about the data subject; specific knowl-
edge about the data subject already in the possession of an observer; the size
of the population that includes the data subject; the amount of time and
effort that an observer is willing to devote to the identification effort; and
the volume of identifiable information about the population that includes
the subject of the data.
In recent decades, the volume of generally available information about
individuals has expanded greatly. Partly because of an absence of general
privacy laws, the United States is the world leader in the commercial collec-
tion, compilation, and exploitation of personal data. American marketers
and data brokers routinely combine identifiable public records (e.g., voter
registers, occupational licenses, property ownership and tax records, court
records), identifiable commercial data (e.g., transaction information), and
nonidentifiable data (e.g., census data). They use the data to create for
nearly every individual and household a profile that includes name, ad-
dress, telephone number, educational level, homeownership, mail buying
propensity, credit card usage, income level, marital status, age, children,
and lifestyle indicators that show whether an individual is a gardener,
reader, golfer, etc.6 Records used for credit purposes are regulated by the
Fair Credit Reporting Act,7 but other consumer data compilations are
mostly unregulated for privacy. As the amount of available personal data
increases, it becomes less likely that nonidentifiable data will remain
nonidentifiable. Latanya Sweeney, a noted expert on identifiability, has
said: “I can never guarantee that any release of data is anonymous, even
though for a particular user it may very well be anonymous.”8
For the statistician or researcher, identifiability of personal data is
rarely a black and white concept. Whether a set of data is identifiable can
depend on the characteristics of the set itself, on factors wholly external to
the set, or on the identity of the observer. Data that cannot be identified by
one person may be identifiable by another, perhaps because of different
skills or because of access to different information sources. Furthermore,
identifiability is not a static characteristic. Data not identifiable today may
be identifiable tomorrow because of developments remote from the original
source of the data or the current holder of the data. As the availability of
geospatial and other information increases, the ability to link wholly
nonidentifiable data or deidentified data with specific individuals will also
increase.
From a legislative perspective, however, identifiability is more likely to
be a black and white concept. Privacy legislation tends to provide express
regulation for identifiable data and nonregulation for nonidentifiable data,
OCR for page 86
86 APPENDIX A
without any recognition of a middle ground. However, statutes do not yet
generally reflect a sophisticated understanding of the issues. Until recently,
policy makers outside the statistical community paid relatively little atten-
tion to the possibility of reidentification. Nevertheless, a selective review of
laws and rules illustrates the range of policy choices to date.
U.S. Legislative Standards
The Privacy Act of 1974,9 a U.S. law applicable mostly to federal
agencies, defines record to mean a grouping of information about an indi-
vidual that contains “his name, or the identifying number, symbol, or other
identifying particular assigned to the individual, such as a finger or voice
print or a photograph.”10 An identifier is an essential part of a record. The
ability to infer identity or to reidentify a record is not sufficient or relevant.
A location may or may not be an identifier under the Privacy Act. A
home address associated with a name is unquestionably an identifier. A
home address without any other data element could be an identifier if only
one individual lives at the address, but it might not be if more than one
individual lives there. As data elements are added to the address, the con-
text may affect whether the information is an identifier and whether the act
applies. If the information associated with the address is about the property
(“2,000 square feet”), then the information is probably not identifying
information about an individual. If the information is about the resident
(“leaves for work every day at 8:00 a.m.”), it is more likely to be found to
be identifying information. Part of the uncertainty here is that there is a
split in the courts about how to interpret the act’s concept of what is
personal information. The difference does not relate specifically to location
information, and the details are not enlightening.
However, the question of when a location qualifies as an identifier is an
issue that could arise outside the narrow and somewhat loosely drafted
Privacy Act of 1974.11 If a location is unassociated with an individual, then
it is less likely to raise a privacy issue. However, it may be possible to
associate location information with an individual, so that the addition of
location data to other nonidentifiable data elements may make it easier to
identify a specific individual.
Other federal laws are generally unenlightening on identifiability ques-
tions. Neither the Driver’s Privacy Protection Act12 nor the Video Privacy
Protection Act13 addresses identifiability in any useful way. The Cable
Communications Policy Act excludes from its definition of personally iden-
tifiable information “any record of aggregate data which does not identify
particular persons.”14 This exclusion, which probably addressed a political
issue rather than a statistical one, raises as many questions as it answers.
OCR for page 87
87
PRIVACY FOR RESEARCH DATA
Congress took a more sophisticated approach to identifiability in the Con-
fidential Information Protection and Statistical Efficiency Act of 2002
(CIPSEA).15 The law defines identifiable form to mean “any representation of
information that permits the identity of the respondent to whom the informa-
tion applies to be reasonably inferred by either direct or indirect means.” This
language is probably the result of the involvement of the statistical community
in the development of the legislation. The standard is a reasonableness stan-
dard, and some international examples of reasonableness standards will be
described shortly. CIPSEA’s definition recognizes the possibility of using indi-
rect inferences to permit identification, but it does not indicate the scope of
effort that is necessary to render deidentified data identifiable. That may be
subsumed within the overall concept of reasonableness.
No Standard
National privacy laws elsewhere do not always include guidance about
identifiability. Canada’s Personal Information Protection and Electronic
Documents Act (PIPEDA) defines personal information as “information
about an identifiable individual.”16 The act includes no standard for deter-
mining identifiability or anonymity, and it does not address the issue of
reidentification. A treatise on the act suggests that “caution should be
exercised in determining what is truly ‘anonymous’ information since the
availability of external information in automated format may facilitate the
reidentification of information that has been made anonymous.”17
Strict Standard
The 1978 French data protection law defines information as “nomina-
tive” if in any way it directly or indirectly permits the identification of a
natural person.18 According to an independent analysis, “the French law
makes no distinction between information that can easily be linked to an
individual and information that can only be linked with extraordinary means
or with the cooperation of third parties.”19 The French approach does not
appear to recognize any intermediate possibility between identifiable and
anonymous. Unless personal data in France are wholly nonidentifiable, they
appear to remain fully subject to privacy rules. This approach may provide
greater clarity, but the results could be harsh in practice if data only theoreti-
cally identifiable fall under the regulatory scheme for personal data. How-
ever, the French data protection law includes several provisions that appear
to ameliorate the potentially harsh results.20
OCR for page 88
88 APPENDIX A
Reasonableness Standards
The definition of personal data in the European Union (EU) Data Pro-
tection Directive refers to an identifiable natural person as “an individual
person . . . who can be identified, directly or indirectly.”21 On the surface,
the EU definition appears to be similar to the strict standard in French law.
However, the directive’s introductory Recital 26 suggests a softer intent
when it states that privacy rules will not apply to “data rendered anony-
mous in such a way that the data subject is no longer identifiable.” It also
provides that “to determine whether a person is identifiable, account should
be taken of all the means likely reasonably to be used either by the control-
ler or by any other person to identify the said person.”22 Thus, the directive
offers a reasonableness standard for determining whether data have been
adequately deidentified.
Variations on a reasonableness standard can be found elsewhere. The
Council of Europe’s recommendations on medical data privacy provide
that an individual is not identifiable “if identification requires an unreason-
able amount of time and manpower.”23 An accompanying explanatory
memorandum says that costs are no longer a reliable criterion for determin-
ing identifiability because of developments in computer technology.24 How-
ever, it is unclear why “time and manpower” are not just a proxy for costs.
The Australian Privacy Act defines personal information to mean “in-
formation . . . about an individual whose identity is apparent, or can
reasonably be ascertained, from the information.”25 It appears on the sur-
face that a decision about identifiability is limited to determinations from
the information itself and not from other sources. This language highlights
the general question of just what activities and persons are included within
the scope of a reasonableness determination inquiry. Under the EU direc-
tive, it is clear that identification action taken by any person is relevant. The
Council of Europe uses a time and manpower measure, but without defin-
ing who might make the identification effort. The Australian law appears to
limit the question to inferences from the information itself. The extent to
which these differences are significantly different in application or intent is
not clear.
The British Data Protection Act’s definition of personal data covers
data about an individual who can be identified thereby or through “other
information which is in the possession of, or is likely to come into the
possession of, the data controller.”26 The British standard does not ex-
pressly rely on reasonableness or on the effort required to reidentify data. It
bases an identifiability determination more narrowly by focusing on infor-
mation that a data controller has or is likely to acquire. This appears to be
only a step removed from an express reasonableness test.
The Canadian Institutes of Health Research (CIHR) proposed a clarifi-
OCR for page 89
89
PRIVACY FOR RESEARCH DATA
cation of the definition of personal information from PIPEDA that may
offer the most specific example of a reasonableness standard.27 The CIHR
language refers to “a reasonably foreseeable method” of identification or
linking of data with a specific individual. It also refers to anonymized
information “permanently stripped” of all identifiers such that the informa-
tion has “no reasonable potential for any organization to make an identifi-
cation.” In addition, the CIHR proposal provides that reasonably foresee-
ability shall “be assessed with regard to the circumstances prevailing at the
time of the proposed collection, use or disclosure.”
Administrative Process
The Alberta Health Information Act takes a different approach. It
defines individually identifying to mean when a data subject “can be readily
ascertained from the information,”28 and it defines nonidentifying to mean
that the identity of the data subject “cannot be readily ascertained from the
information.”29 This appears to limit the identifiability inquiry to the infor-
mation itself.
Alberta’s innovation comes in its regulation of data matching,30 which
is the creation of individually identifying health information by combining
individually identifying or nonidentifying health information or other in-
formation from two or more electronic databases without the consent of
the data subjects. The data matching requirements, which attach to anyone
attempting to reidentify nonidentifying health information, include submis-
sion of a privacy impact assessment to the commissioner for review and
comment.31
The Alberta law is different because it expressly addresses
reidentification activities by anyone (at least, anyone using any electronic
databases). In place of a fixed standard for determining whether identifi-
able information is at stake, the act substitutes an administrative process.32
The law regulates conduct more than information, thereby evading the
definitional problem for information that is neither clearly identifiable nor
wholly nonidentifiable.
Data Elements and Professional Judgment Standards
In the United States, general federal health privacy standards derive
from a rule33 issued by the Department of Health and Human Services
under the authority of the Health Insurance Portability and Accountability
Act34 (HIPAA). The rule defines individually identifiable health informa-
tion to include health information for which there is a reasonable basis to
believe that the information can be used to identify an individual.35 This is
an example of a reasonableness standard that by itself provides little inter-
OCR for page 90
90 APPENDIX A
pretative guidance. HIPAA’s approach to identifiability does not end with
this definition, however. HIPAA offers what may be the most sophisticated
approach to identifiability found in any privacy law.
The rule offers two independent methods to turn identifiable (regu-
lated) data into deidentified (unregulated) data. The first method requires
removal of 18 specific categories of data elements.36 With these elements
removed, any risk of reidentification is deemed too small to be a concern.
The HIPAA rule no longer applies to the stripped data, which can then be
used and disclosed free of HIPAA obligations. The only condition is that
the covered entity does not have actual knowledge that the information
could be used, either on its own or in combination with other data, to
identify an individual.37 The advantage of this so-called safe harbor method
is that mechanical application of the rule produces data that can nearly
always be treated as wholly nonidentifiable. Some critics claim that the
resulting data are useless for many purposes.
The second way to create deidentified (unregulated) health data re-
quires a determination by “a person with appropriate knowledge of and
experience with generally accepted statistical and scientific principles and
methods for rendering information not individually identifiable.”38 The
required determination must be that “the risk is very small that the infor-
mation could be used, alone or in combination with other reasonably avail-
able information, by an anticipated recipient to identify an individual who
is a subject of the information.”39 The person making the determination
must document the methods used and the results of the analysis on which
the determination is based.40
HIPAA includes another procedure for disclosure of a limited dataset
that does not include overt identifiers but that has more data elements than
the safe harbor method. In order to receive a limited dataset, the recipient
must agree to a data use agreement that establishes how the data may be
used and disclosed, requires appropriate safeguards, and sets other terms
for processing.41 Disclosures under the limited dataset procedure can be
made only for activities related to research, public health, and health care
operations. A recipient under this procedure is not by virtue of the receipt
subject to HIPAA or accountable to the secretary of health and human
services, but the agreement might be enforced by the covered entity that
disclosed the data or, perhaps, by a data subject.
Litigation
Identifiability issues have arisen in a few court cases.
• One U.S. case involved a commercial dispute between two large
health data processing companies. WebMD purchased a company (Envoy)
OCR for page 91
91
PRIVACY FOR RESEARCH DATA
from Quintiles in 2000. As part of the acquisition, WebMD agreed to
supply Quintiles with nonidentifiable patient claims data processed by En-
voy. Quintiles processes large volumes of data to assess the usage of pre-
scription drugs. Quintiles sells the resulting information in nonidentifiable
form primarily to pharmaceutical manufacturers. The litigation arose be-
cause of concerns by WebMD that the combination of its data with identi-
fiable data otherwise in the possession of Quintiles would allow
reidentification.42 The resolution of this dispute did not involve a ruling on
the identifiability issues raised, but it may be a precursor to other similar
battles.
• A United Kingdom case43 involving identifiability began with a
policy document issued by the British Department of Health. The document
expressly stated that stripping of identifiers from patient information be-
fore disclosure to private data companies seeking information on the habits
of physicians is not sufficient to avoid a breach of the physician’s duty of
confidentiality. Even the disclosure of aggregated data would be a violation
of confidentiality. A company that obtains prescription data identifiable to
physicians and not patients sued to overturn the policy. The lower court
found that disclosure of patient information was a breach of confidence
notwithstanding the anonymization. However, an appellate court found
the reverse and overturned the department policy. Both courts proceeded
on the theory that either personal data were identifiable, or they were not.
Neither opinion recognized or discussed any middle ground.
• An Illinois case arose under the state Freedom of Information Act
when a newspaper requested information from the Illinois Cancer Registry
by type of cancer, zip code, and date of diagnosis.44 The registry denied the
request because another statute prohibits the public disclosure of any group
of facts that tends to lead to the identity of any person in the registry. The
court reversed and ordered the data disclosed. Although an expert witness
was able to identify most of the records involved, the court was not con-
vinced. The court held that the “evidence does not concretely and conclu-
sively demonstrate that a threat exists that other individuals, even those
with skills approaching those of Dr. Sweeney, likewise would be able to
identify the subjects or what the magnitude of such a threat would be, if it
existed.” The Illinois Supreme Court upheld the decision in 2006.45
• Litigation over the constitutionality of a federal law prohibiting so-
called partial birth abortions produced a noteworthy decision on identifi-
ability.46 The specific dispute was over disclosure during discovery of pa-
tient records maintained by physicians testifying as expert witnesses. The
records were to be deidentified before disclosure so that a patient’s identity
could not reasonably be ascertained. The case was decided in part on
grounds that there is still a privacy interest even if there were no possibility
that the patient’s identity could be determined.47 Arguments that wholly
OCR for page 112
112 APPENDIX A
CONCLUDING OBSERVATIONS
The law surrounding the collection, maintenance, use, and disclosure
of personal information by researchers and others is typically vague, incom-
plete, or entirely absent. The possibility of civil liability to a data subject for
collection, use, or disclosure of personal information exists, but lawsuits
are not frequent, successes are few, and cases are highly dependent on facts.
However, the research community faces other risks. For example, if an
aggressive researcher or tabloid newspaper acquires deidentified research
data and reidentifies information about politicians, celebrities, or sports
heroes, the story is likely to be front-page news everywhere. The resulting
public outcry could result in a major change in data availability or the
imposition of direct restrictions on researchers. Many privacy laws origi-
nated with horror stories that attracted press attention. When a reporter
obtained the video rental records of a U.S. Supreme Court nominee, ner-
vous members of Congress quickly passed a privacy law restricting the use
and disclosure of video rental records.127 The Driver’s Privacy Protection
Act also had its origins with a horror story.
The demise of Human Resources Development Canada’s Longitudinal
Labour Force File in the summer of 2000 offers an example of how privacy
fears and publicity can affect a research activity. The file was the largest
repository of personal information on Canadian citizens, with identifiable
information from federal departments and private sources. The database
operated with familiar controls for statistical records, including exclusive
use for research, evaluation, and policy and program analysis. The public
did not know about the database until the federal privacy commissioner
raised questions about the “invisible citizen profile.”128 The database was
staunchly defended, but the public objections were too strong, and Canada
dismantled the database. The case for the database was not helped by its
media designation as the “Big Brother Database.”129
Methods for collecting and using data while protecting privacy inter-
ests exist, but how effective they are, how much they compromise research
results, and how much they are actually used is unclear. It appears that
there is room for improvement using existing policies, methodologies, and
practices. However, there may be some natural limits to what can be ac-
complished. The availability of personal data and the technological capa-
bilities for reidentification seem to increase routinely over time as the result
of factors largely beyond control.
Basic transparency rules (for both privacy and human subjects protec-
tion) require that respondents be told of the risks and consequences of
supplying data. For data collected voluntarily from respondents, it is pos-
sible that cooperation will vary inversely with the length of a privacy notice.
Even when data activities (research or otherwise) include real privacy pro-
OCR for page 113
113
PRIVACY FOR RESEARCH DATA
tections, people may still see threats regardless of the legal, contractual, or
technical measures promised. Reports of security and other privacy breaches
are commonplace.
Complex privacy problems will not be solved easily because of the
many players and interests involved. Those who need data for legitimate
purposes have incentives for reducing the risks that data collection and
disclosure entail, but data users are often more focused on obtaining and
using data and less on remote possibilities of bad publicity, lawsuits, and
legislation. The risk to a data subject is a loss of privacy. The risks to data
suppliers and users include legal liability for the misuse of data and the
possibility of additional regulation. The risk to researchers, statisticians,
and their clients is the loss of data sources. The risk to society is the loss of
research that serves important social purposes. These risks should encour-
age all to work toward better rules governing the use and disclosure of
sensitive personal information. Risks can be minimized, but most cannot be
eliminated altogether.
Self-restraint and professional discipline may limit actions that threaten
the user community, but controls may not be effective against all members
of the community and they will not be effective against outsiders. Industry
standards may be one useful way to minimize risks, maximize data useful-
ness, and prevent harsher responses from elsewhere. If standards do not
come from elsewhere, however, then the courts and the legislatures may
eventually take action. Judicial and legislative actions always follow tech-
nological and other developments, and any changes imposed could be harsh
and wide-reaching, especially if the issue is raised as a result of a crisis.
Privacy legislation often begins with a well-reported horror story.
NOTES
1. Collection Limitation Principle: There should be limits to the collection of personal
data and any such data should be obtained by lawful and fair means and, where
appropriate, with the knowledge or consent of the data subject.
Data Quality Principle: Personal data should be relevant to the purposes for which
they are to be used and, to the extent necessary for those purposes, should be accu-
rate, complete, and kept up-to-date.
Purpose Specification Principle: The purposes for which personal data are collected
should be specified not later than at the time of data collection, and the subsequent
use limited to the fulfillment of those purposes or such others as are not incompatible
with those purposes, and as are specified on each occasion of change of purpose.
Use Limitation Principle: Personal data should not be disclosed, made available or
otherwise used for purposes other than those specified in accordance with the Pur-
pose Specification Principle except (a) with the consent of the data subject, or (b) by
the authority of law.
Security Safeguards Principle: Personal data should be protected by reasonable secu-
rity safeguards against such risks as loss or unauthorized access, destruction, use,
modification or disclosure of data.
OCR for page 114
114 APPENDIX A
Openness Principle: There should be a general policy of openness about developments,
practices and policies with respect to personal data. Means should be readily available
of establishing the existence and nature of personal data, and the main purposes of
their use, as well as the identity and usual residence of the data controller.
Individual Participation Principle: An individual should have the right (a) to obtain
from a data controller, or otherwise, confirmation of whether or not the data con-
troller has data relating to him; (b) to have communicated to him data relating to
him within a reasonable time; at a charge, if any, that is not excessive; in a reason-
able manner; and in a form that is readily intelligible to him; (c) to be given reasons if
a request made under subparagraphs (a) and (b) is denied, and to be able to challenge
such denial; and (d) to challenge data relating to him and, if the challenge is success-
ful to have the data erased, rectified, completed, or amended.
Accountability Principle: A data controller should be accountable for complying
with measures, which give effect to the principles stated above.
Organisation for Economic Co-Operation and Development (1980).
2. 5 U.S.C. § 552(b)(4).
3. Executive Order 12958.
4. Laws in other countries sometimes extend privacy protections to legal persons. Cor-
porate confidentiality interests (whether arising under privacy laws, through statisti-
cal surveys that promise protection against identification, or otherwise) can raise
similar issues of identification and reidentification as with individuals. Corporate
confidentiality interests are beyond the scope of this paper.
Another set of related issues is group privacy. Groups can be defined in many
ways, but race, ethnicity, and geography are familiar examples. If the disclosure of
microdata can be accomplished in a way that protects individual privacy interests,
the data may still support conclusions about identifiable racial, ethic, or neighbor-
hood groups that may be troubling to group members. Group privacy has received
more attention in health care than in other policy arenas. See Alpert (2000).
5. The term deidentified is used here to refer to data without overt identifiers but that
may still, even if only theoretically, be reidentified. Data that cannot be reidentified
are referred to as wholly nonidentifiable data.
6. See generally Gellman (2001). For more on the growth in information collection and
availability, see Sweeney (2001).
7. 15 U.S.C. § 1681 et seq.
8. National Committee on Vital and Health Statistics, Subcommittee on Privacy and
Confidentiality (1998a).
9. 5 U.S.C. § 552a.
10. 5 U.S.C. § 552a(a)(4). The value of a fingerprint as an identifier is uncertain. With-
out access to a database of fingerprints and the ability to match fingerprints, a single
fingerprint can rarely be associated with an individual. The same is true for a photo-
graph. For example, a photograph of a four-year-old taken sometime in the last 50
years is not likely to be identifiable to anyone other than a family member.
11. Just to make matters even more complex, the federal Freedom of Information Act (5
U.S.C. § 552) has a standard for privacy that is not the same as the Privacy Act. In
Forest Guardians v. U.S. FEMA (10th Cir. 2005) available: http://www.kscourts.org/
ca10/cases/2005/06/04-2056.htm, the court denied a request for “electronic GIS files
. . . for the 27 communities that have a flood hazard designated by FEMA . . .
showing all of the geocoded flood insurance policy data (with names and addresses
removed) including the location of structures relative to the floodplain and whether
the structure insured was constructed before or after the community participated in
the NFIP.” The court found that disclosure would constitute an unwarranted inva-
OCR for page 115
115
PRIVACY FOR RESEARCH DATA
sion of privacy, the privacy standard under the FOIA. The court reached this conclu-
sion even though virtually identical information had been released in a paper file.
The case turned mostly on the court’s conclusion that there was a lack of public
interest in disclosure, a relevant standard for FOIA privacy determinations. In strik-
ing a balance, the court found that any privacy interest, no matter how small, out-
weighed no public disclosure interest.
12. Personal information means information that identifies “an individual, including an
individual’s photograph, social security number, driver identification number, name,
address (but not the 5-digit zip code), telephone number, and medical or disability
information, but does not include information on vehicular accidents, driving viola-
tions, and driver’s status.” 18 U.S.C. § 2725(3).
13. Personally identifiable information “includes information which identifies a person
as having requested or obtained specific video materials or services from a video tape
service provider.” 18 U.S.C. § 2710 (a)(3).
14. 47 U.S.C. § 551(a)(2)(A).
15. E-Government Act of 2002, Pub. L. 107-347, Dec. 17, 2002, 116 Stat. 2899, 44
U.S.C. § 3501 note §502(4).
16. S.C. 2000, c. 5, § 2(1), available: http://www.privcom.gc.ca/legislation/02_06_01_01_
e.asp.
17. Perrin, Black, Flaherty, and Rankin (2001).
18. Loi No. 78-17 du 6 janvier 1978 at Article 4, available: http://www.bild.net/
dataprFr.htm. A 2004 amendment added these words: “In order to determine whether
a person is identifiable, all the means that the data controller or any other person
uses or may have access to should be taken into consideration.” Act of 6 August
2004 at Article 2, available: http://www.cnil.fr/fileadmin/documents/uk/78-17VA.pdf.
The amendment does not appear to have changed the strict concept of identifiability
or to have added any reasonableness standard.
19. Joel R. Reidenberg and Paul M. Schwartz, Data Protection Law and Online Services:
Regulatory Responses (1998) (European Commission), Available: http://ec.europa.eu/
justice_home/fsj/privacy/docs/studies/regul_en.pdf.
20. See Loi No. 78-17 du 6 janvier 1978 (as amended) at Article 32 (IV) (allowing the
French data protection authority to approve anonymization schemes), Article 54
(allowing the French data protection authority to approve methodologies for health
research that do not allow the direct identification of data subjects), and Article 55
(allowing exceptions to a requirement for coding personal in some medical research
activities), available: http://www.cnil.fr/fileadmin/documents/uk/78-17VA.pdf.
21. Directive on the Protection of Individuals with Regard to the Processing of Personal
Data and on the Free Movement of Such Data, Council Directive 95/46/EC, 1995
O.J. (L 281) 31, at Article 2(a), available: http://europa.eu.int/comm/internal_market/
en/dataprot/law/index.htm.
22. Id. at Recital 26.
23. Council of Europe, Recommendation No. R (97) 5 of the Committee of Ministers to
Member States on the Protection of Medical Data §1 (1997), available: http://www.
cm.coe.int/ta/rec/1997/word/97r5.doc.
24. Council of Europe, Explanatory Memorandum to Recommendation No. R (97) 5 of
the Committee of Ministers to Member States on the Protection of Medical Data §
36 (1997), available: http://www.cm.coe.int/ta/rec/1997/ExpRec(97)5.htm.
25. Privacy Act 1988 § 6 (2001), available: http://www.privacy.gov.au/publications/
privacy88.pdf.
26. UK Data Protection Act 1998 § 1(1) (1998), available: http://www.legislation.hmso.
gov.uk/acts/acts1998/19980029.htm.
OCR for page 116
116 APPENDIX A
27. Canadian Institutes of Health Research, Recommendations for the Interpretation
and Application of the Personal Information Protection and Electronic Documents
Act (S.C.2000, c.5) in the Health Research Context 6 (Nov. 30, 2001), available:
http://www.cihr.ca/about_cihr/ethics/recommendations_e.pdf.
1(a) For greater certainty, ‘information about an identifiable individual’, within the
meaning of personal information as defined by the Act, shall include only that infor-
mation that can:
(i) identify, either directly or indirectly, a specific individual; or,
(ii) be manipulated by a reasonably foreseeable method to identify a specific indi-
vidual; or
(iii) be linked with other accessible information by a reasonably foreseeable met-
hod to identify a specific individual.
1(b) Notwithstanding subsection 1(a), ‘information about an identifiable individual’
shall not include:
(i) anonymized information which has been permanently stripped of all identifi-
ers or aggregate information which has been grouped and averaged, such that
the information has no reasonable potential for any organization to identify a
specific individual; or
(ii) unlinked information that, to the actual knowledge of the disclosing organiza-
tion, the receiving organization cannot link with other accessible information
by any reasonably foreseeable method, to identify a specific individual.
(c) Whether or not a method is reasonably foreseeable under subsections 1(a) and
1(b) shall be assessed with regard to the circumstances prevailing at the time of the
proposed collection, use or disclosure.
28. Alberta Health Information Act § 1(p) (1999), available: http://www.qp.gov.ab.ca/
Documents/acts/H05.CFM.
29. Id. at § 1(r).
30. Id. at § 1(g).
31. Id. at § 68-72.
32. Nonstatutory administrative reviews of data disclosure may be commonplace. For
example, the National Center for Health Statistics in the Department of Health and
Human Services uses an administrative review process with a Disclosure Review
Board to assess the risk of disclosure for the release of microdata files for statistical
research. National Center for Health Statistics, Staff Manual on Confidentiality
(2004), http://www.cdc.gov/nchs/data/misc/staffmanual2004.pdf.
33. U.S. Department of Health and Human Services, “Standards for Privacy of Individu-
ally Identifiable Health Information,” 65 Federal Register 82462-82829 (Dec. 28,
2000) (codified at 45 C.F.R. Parts 160 & 164).
34. Public Law No. 104-191, 110 Stat. 1936 (1996).
35. 45 C.F.R. § 160.103.
36. Id. at § 164.514(b)(2). The complete list of data elements includes “(A) Names; (B)
All geographic subdivisions smaller than a State, including street address, city, county,
precinct, zip code, and their equivalent geocodes, except for the initial three digits of
a zip code if, according to the current publicly available data from the Bureau of the
Census: (1) The geographic unit formed by combining all zip codes with the same
three initial digits contains more than 20,000 people; and (2) The initial three digits
of a zip code for all such geographic units containing 20,000 or fewer people is
changed to 000; (C) All elements of dates (except year) for dates directly related to
an individual, including birth date, admission date, discharge date, date of death; and
all ages over 89 and all elements of dates (including year) indicative of such age,
except that such ages and elements may be aggregated into a single category of age
90 or older; (D) Telephone numbers; (E) Fax numbers; (F) Electronic mail addresses;
OCR for page 117
117
PRIVACY FOR RESEARCH DATA
(G) Social security numbers; (H) Medical record numbers; (I) Health plan beneficiary
numbers; (J) Account numbers; (K) Certificate/license numbers; (L) Vehicle identifi-
ers and serial numbers, including license plate numbers; (M) Device identifiers and
serial numbers; (N) Web Universal Resource Locators (URLs); (O) Internet Protocol
(IP) address numbers; (P) Biometric identifiers, including finger and voice prints; (Q)
Full face photographic images and any comparable images; and (R) Any other unique
identifying number, characteristic, or code.”
37. Id. at. § 164.514(b)(2)(ii).
38. 45 C.F.R. § 164.512(b)(1).
39. Id. at § 164.512(b)(1)(i). The commentary accompanying the rule includes references
to published materials offering guidance on assessing risk, and it recognizes that
there will be a need to update the guidance over time. Those materials are Federal
Committee on Statistical Methodology, Statistical Policy Working Paper 22, Report
on Statistical Disclosure Limitation Methodology (1994), available: http://www.fcsm.
gov/working-papers/wp22.html; “Checklist on Disclosure Potential of Proposed Data
Releases,” 65 Federal Register 82709 (Dec. 28, 2000), available: http://www.fcsm.
gov/docs/checklist_799.doc.
40. 45 C.F.R. § 164.512(b)(1)(ii).
41. 45 C.F.R. § 164.514(e).
42. Quintiles Transnational Corp. v. WebMD Corp., No. 5:01-CV-180-BO(3), (E.D.
N.C. Mar. 21, 2002).
43. R. v. Dept of Health ex parte Source Informatics Ltd., 1 All E.R. 786, 796-97 (C.A.
2000), reversing 4 All E.R. 185 (Q.B. 1999).
44. The Southern Illinoisan v. Illinois Department of Public Health, 812 N.E.2d 27
(Ill.App. Ct. 2004), available: http://www.state.il.us/court/Opinions/AppellateCourt/
2004/5thDistrict/June/html/5020836.htm.
45. The Court’s opinion focused in significant part on the expert abilities of Sweeney and
found a lack of evidence demonstrating whether other individuals could identify
individuals in the same fashion. Available: http://www.state.il.us/court/opinions/
SupremeCourt/2006/February/Opinions/Html/98712.htm. The opinion suggests that
a different result might be obtained with a better factual showing that identifiability
capabilities were more widespread among the population. Just how difficult it would
be for others to reidentify the records is not entirely clear. However, both courts
ignored the possibility that a recipient of data could hire someone with Sweeney’s
skills and learn the names of patients. The court’s basis for decision does not seem to
be sustainable in the long run.
46. Northwestern Memorial Hospital v. Ashcroft, 362 F.3d 923 (7th Cir. 2004), avail-
able: http://www.ca7.uscourts.gov/tmp/I110H5XZ.pdf.
47. Two quotes from the decision are worth reproducing:
Some of these women will be afraid that when their redacted records are
made a part of the trial record in New York, persons of their acquaintance,
or skillful “Googlers,” sifting the information contained in the medical
records concerning each patient’s medical and sex history, will put two
and two together, “out” the 45 women, and thereby expose them to
threats, humiliation, and obloquy.
***
Even if there were no possibility that a patient’s identity might be learned
from a redacted medical record, there would be an invasion of privacy.
Imagine if nude pictures of a woman, uploaded to the Internet without her
consent though without identifying her by name, were downloaded in a
foreign country by people who will never meet her. She would still feel that
OCR for page 118
118 APPENDIX A
her privacy had been invaded. The revelation of the intimate details con-
tained in the record of a late-term abortion may inflict a similar wound.
48. See generally, Gellman (2005).
49. Extensive rules and laws govern surveillance by wire, whether by government actors
or private parties.
50. 389 U.S. 347 (1967).
51. 389 U.S. at 351.
52. 389 U.S. at 361.
53. See Schwartz (1995).
54. 460 U.S. 276 (1983).
55. 460 U.S. at 281.
56. Id. at 284.
57. 476 U.S. 207 (1986).
58. 476 U.S. 227 (1986).
59. Id.
60. In Kyllo v. United States, 533 U.S. 27 (2001), the Supreme Court found that police
use of heat imaging technology to search the interior of a private home from the
outside was a Fourth Amendment search that required a warrant. The case turned in
part on the use by the government of “a device that is not in general public use, to
explore the details of the home that would previously have been unknowable with-
out physical intrusion.” Id. at 40. The broader implications of the Court’s standard
for technology not in general public use are not entirely clear.
61. Wash. Rev. Code § 9A-44-115.
62. Wash. Rev. Code § 9A-44-115(1)(c).
63. 2003 Wash. Laws § 213 (amending Wash. Rev. Code § 9A-44-115).
64. Ariz. Rev. Stat. § 13-3019(C)(4).
65. Conn. Gen. Stat. § 31-48b(b).
66. Tex. Health & Safety Code § 242.501(a)(5).
67. The other torts are for appropriation of a name or likeness, publicity given to private
life, and publicity placing a person in a false light. 3 Restatement (Second) of Torts §
652A et seq. (1977)
68. Id. at § 652B.
69. Id. at comment c.
70. Nader v. General Motors Corp., 255 N.E.2d 765 (NY 1970), 1970 N.Y. LEXIS
1618.
71. Galella v. Onassis, 487 F.2d 986 (2d Cir. 1973).
72. See, e.g., In the Matter of an Application of the United States For an Order (1)
Authorizing the Use of a Pen Register and a Trap and Trace Device and (2) Autho-
rizing Release of Subscriber Information and/or Cell Site Information, Magistrate’s
Docket No. 05-1093 (JO), available: www.eff.org/legal/cases/USA_v_PenRegister/
celltracking_denial.pdf_; Brief for amicus Electronic Frontier Foundation at 7, avail-
able: http://www.eff.org/legal/cases/USA_v_PenRegister/celltracking_EFFbrief.pdf
(“The prospective collection of cell site data will therefore reveal the cell phone’s
location even when that information could not have been derived from visual surveil-
lance, but only from a physical search” [footnote omitted]).
73. Note, Harvard Journal of Law and Technology (fall, 2004).
Given current database and storage capacities, the door is open for an
Orwellian scenario whereby law enforcement agents could monitor not
just criminals, but anyone with a cell phone. If it sounds improbable,
consider that commercial tracking services already provide real-time loca-
tion information for families and businesses. (p. 316)
OCR for page 119
119
PRIVACY FOR RESEARCH DATA
74. Organisation for Economic Co-Operation and Development, Council Recommenda-
tions Concerning Guidelines Governing the Protection of Privacy and Transborder
Flows of Personal Data, 20 I.L.M. 422 (1981), O.E.C.D. Doc. C (80) 58 (Final)
(Oct. 1, 1980), available: http://www.oecd.org/document/18/0,2340,en_2649
_34255_1815186_1_1_1_1,00.html .
75. Council Directive 95/46, art. 28, on the Protection of Individuals with Regard to the
Processing of Personal Data and on the Free Movement of Such Data, 1995 O.J.
(L281/47), available: http://europa.eu.int/comm/justice_home/fsj/privacy/law/index_
en.htm.
76. Additional rules govern the processing of special categories of data (racial or ethnic
origin, political opinions, religious or philosophical beliefs, trade union membership,
and data concerning health or sex life). Generally, explicit consent is necessary for
collection of these special categories, with some exceptions.
77. Article 7.
78. UK Data Protection Act 1998 §§ 10, 11 (1998), available: http://www.legislation.
hmso.gov.uk/acts/acts1998/19980029.htm.
79. U.S. Department of Health and Human Services, “Standards for Privacy of Individu-
ally Identifiable Health Information,” 65 Federal Register 82462-82829 (Dec. 28,
2000) (codified at 45 C.F.R. Parts 160 & 164).
80. 5 U.S.C. § 552a.
81. Id. at §§ 552a(e)(1), (2), & (7).
82. U.S. Department of Health and Human Services, “Standards for Privacy of Individu-
ally Identifiable Health Information,” 65 Federal Register 82462- 82464 (Dec. 28,
2000).
83. 45 C.F.R. §164.502(b).
84. 15 U.S.C. § 6502.
85. 47 U.S.C. § 551(b).
86. Uniting and Strengthening America by Providing Appropriate Tools Required to
Intercept and Obstruct Terrorism (USA Patriot Act) Act of 2001, Public Law No.
107-056, 115 Stat. 272, available: http://frwebgate.access.gpo.gov/cgi-bin/getdoc.
cgi?dbname=107_cong_public_laws&docid=f:publ056.107.
87. 50 U.S.C. § 1861.
88. 5 U.S.C. § 552a.
89. The conditions of disclosure are at 5 U.S.C. § 552a(b), with the routine use authority
at (b)(2). The definition of routine use is at 5 U.S.C. § 552a(a)(7).
90. 15 U.S.C. § 1681b.
91. 45 C.F.R. § 164.512.
92. Id. at § 164.512(i).
93. 44 USC § 3501 note, § 512(a). An exception allows disclosure to a law enforcement
agency for the prosecution of submissions of false statistical information under stat-
utes imposing civil or criminal penalties. Id. at § 504(g).
94. See Privacy Protection Study Commission, Personal Privacy in an Information Soci-
ety 573 (1977), available: http://www.epic.org/privacy/ppsc1977report/. See also
National Research Council and the Social Science Research Council (1993:34-35).
95. 44 USC § 3501 note, § 502(5).
96. 18 U.S.C. § 2721.
97Id. at § 2721(b)(5).
98. N.H. Rev. Stat. Online § 237:16-e (2004), available: http://www.gencourt.state.nh.us/
rsa/html/XX/237/237-16-e.htm.
99. 42 U.S.C. § 934 (formerly 42 U.S.C. § 299c-3(c)).
100. 42 U.S.C. § 242m(d).
OCR for page 120
120 APPENDIX A
101. 42 U.S.C. § 3789g(a).
102. 21 U.S.C. § 872(c).
103. 20 U.S.C. § 9573. The law formerly applied only to the National Center for Educa-
tion Statistics.
104. USA Patriot Act of 2001 at § 508 (amending 20 U.S.C. § 9007), Public Law No.
107-056, 115 Stat. 272, available: http://frwebgate.access.gpo.gov/cgi-bin/
getdoc.cgi?dbname=107_cong_public_laws&docid=f:publ056.107.
105. 42 U.S.C. § 241(d).
106. The National Institutes of Health encourages investigators working on sensitive bio-
medical, behavioral, clinical, or other types of research to obtain certificates.
107. 5 U.S.C. § 552.
108. U.S. Office of Management and Budget, Circular A-110 (Uniform Administrative
Requirements for Grants and Agreements with Institutions of Higher Education,
Hospitals, and Other Non-Profit Organizations) (9/30/99), available: http://
www.whitehouse.gov/omb/circulars/a110/a110.html.
109. Id. at .36(d)(2)(i)(A).
110. See generally, Gellman (1995).
111. 18 U.S.C. § 2721.
112. More on this general subject can be found in Perritt (2003).
113. 15 U.S.C. § 1681 et seq.
114. Id. at § 1681s-2.
115. See, e.g., 13 U.S.C. § 214 (Census Bureau employees).
116. 44 U.S.C. § 3501 note § 513. Interestingly, while CIPSEA regulates both use and
disclosure of statistical information, id. at § 512, only wrong disclosure is subject to
criminal penalties.
117. 44 U.S.C. § 3501 note § 502 (“The term ‘‘agent’’ means an individual—
(A)(i) who is an employee of a private organization or a researcher affiliated with
an institution of higher learning (including a person granted special sworn status by
the Bureau of the Census under section 23(c) of title 13, United States Code), and
with whom a contract or other agreement is executed, on a temporary basis, by an
executive agency to perform exclusively statistical activities under the control and
supervision of an officer or employee of that agency;
(ii) who is working under the authority of a government entity with which a
contract or other agreement is executed by an executive agency to perform
exclusively statistical activities under the control of an officer or employee of
that agency;
(iii) who is a self-employed researcher, a consultant, a contractor, or an employee
of a contractor, and with whom a contract or other agreement is executed by
an executive agency to perform a statistical activity under the control of an
officer or employee of that agency; or
(iv) who is a contractor or an employee of a contractor, and who is engaged by
the agency to design or maintain the systems for handling or storage of data
received under this title; and
(B) who agrees in writing to comply with all provisions of law that affect informa-
tion acquired by that agency.”)
118. 3 Restatement (Second) of Torts §§ 652B, 652D (1977).
119. The HIPAA criminal penalties may not apply, either. See U.S. Department of Justice,
Office of Legal Counsel, Scope of Criminal Enforcement Under 42 U.S.C. § 1320d-6
(June 1, 2005), available: http://www.usdoj.gov/olc/hipaa_final.htm.
120. See, e.g., Reidenberg (1992).
121. Restatement (Second) of Contracts §§ 302, 303 (1981).
OCR for page 121
121
PRIVACY FOR RESEARCH DATA
122. The original draft HIPAA privacy rule required business partner agreements to state
that the agreements intended to create third-party beneficiary rights. In the final rule,
the third-party beneficiary language was removed. The commentary stated that the
rule’s intent was to leave the law in this area where it was. The discussion in the final
rule shows that there were strongly divergent views on the issue. See 65 Federal
Register 82641 (Dec. 28, 2000).
123. Considerable amounts of patient-level information are available. For example, the
Healthcare Cost and Utilization Project distributes four databases for health services
research, with data dating back to 1988. This joint federal-state partnership is spon-
sored by the Agency for Healthcare Research and Quality, a part of the federal
Department of Health and Human Services. The databases contain patient-level in-
formation for either inpatient or ambulatory surgery stays in a uniform format “while
protecting patient privacy.” Healthcare Cost and Utilization Project, Description of
Healthcare Cost and Utilization Project (undated), available: http://www.ahcpr.gov/
downloads/pub/hcup/appkitv15b.pdf. Whether the privacy protections are adequate
to protect against reidentification under all conditions is uncertain. Numerous other
medical data sets are available from other sources.
124. See National Committee on Vital and Health Statistics, Subcommittee on Privacy
and Confidentiality (1998b).
125. 5 U.S.C. § 552a.
126. 5 U.S.C. § 552a(b)(3) allows agencies to define a routine use to justify a disclosure.
127. Video Privacy Protection Act (“Bork Law”), 18 U.S.C. § 2710.
128. Privacy Commissioner (Canada), Annual Report 1999-2000 available: http://www.
privcom.gc.ca/information/ar/02_04_09_e.asp.
129. McCarthy (2000).
REFERENCES
Alpert, S.
2000 Privacy and the analysis of stored tissues. Pp. A-1–A-36 in Research Involving
Human Biological Materials: Ethical Issues and Policy Guidance (Volume II
Commissioned Papers). Rockville, MD: National Bioethics Advisory Commis-
sion. Available: http://bioethics.georgetown.edu/nbac/hbmII.pdf. [accessed De-
cember 2006].
Gellman, R.
1995 Public records: Access, privacy, and public policy. Government Information
Quarterly 12:391-426.
2001 Public Record Usage in the United States. Paper presented at the 23rd Interna-
tional Conference of Data Protection Commissioners, September 25, Paris,
France. Available: http://www.personaldataconference.com/eng/contribution/
gellman_contrib.html [accessed December 2006].
2005 A general survey of video surveillance law in the United States. In S. Nouwt,
B.R. de Vries, and C. Prins, eds., Reasonable Expectations of Privacy? Eleven
Country Reports on Camera Surveillance and Workplace Privacy. Hague, Neth-
erlands: T.M.C. Asser Press.
Harvard Journal of Law and Technology
2004 Who knows where you’ve been? Privacy concerns regarding the use of cellular
phones as personal locators. Harvard Journal of Law and Technology 18(1):307,
316 (fall).
OCR for page 122
122 APPENDIX A
McCarthy, S.
2000 Ottawa pulls plug on big brother database, Canadians promised safeguards on
data. Globe and Mail, May 30.
National Committee on Vital and Health Statistics, Subcommittee on Privacy and Confidentiality
1998a Proceedings of Roundtable Discussion: Identifiability of Data. Hubert Humphrey
Building, January 28, Washington, DC. Transcript available:
http://ncvhs.hhs.gov/980128tr.htm [accessed December 2006].
1998b Roundtable Discussion: Identifiability of Data. Available: http://ncvhs.hhs.gov/
980128tr.htm [accessed January 2007].
National Research Council and the Social Science Research Council
1993 Private Lives and Public Policies. G.T. Duncan, T.B. Jabine, and V.A.. de Wolf,
eds. Panel on Confidentiality and Data Access. Committee on National Statis-
tics, Commission on Behavioral and Social Sciences and Education.Washington,
DC: National Academy Press.
Organisation for Economic Co-Operation and Development
1980 Council Recommendations Concerning Guidelines Governing the Protection of
Privacy and Transborder Flows of Personal Data. O.E.C.D. Doc. C (80) 58
(Final). Available: http://www.oecd.org/document/18/0,2340,en_2649_34255_
1815186_1_1_1_1,00.html [accessed December 2006].
Perrin, S., H.H. Black, D.H. Flaherty, and T.M. Rankin
2001 The Personal Information Protection and Electronic Documents Act: An Anno-
tated Guide. Toronto, Canada: Irwin Law.
Perritt, H.H., Jr.
2003 Protecting Confidentiality of Research Data through Law. Paper prepared for
Committee on National Statistics, National Research Council Data Confidenti-
ality and Access Workshop, Washington, DC. Available: http://www7.national
academies.org/cnstat/Perritt_Paper.pdf [accessed January 2007].
Reidenberg, J.R.
1992 The privacy obstacle course: Hurdling barriers to transnational financial ser-
vices. Fordham Law Review 60:S137, S175.
Reidenberg, J.R., and P.M. Schwartz
1998 Data Protection Law and Online Services: Regulatory Responses
Commissioned from ARETE by Directorate General XV of the Commission of
the European Communities. Available: http://ec.europa.eu/justice_home/fsj/pri-
vacy/docs/studies/regul_en.pdf [accessed December 2006].
Schwartz, P.
1995 Privacy and participation: Personal information and public sector regulation in
the United States. Iowa Law Review 80:553, 573.
Sweeney, L.
2001 Information explosion. Chapter 3 in P. Doyle, J. Lane, J. Theeuwes, and L.
Zayatz, eds., Confidentiality, Disclosure, and Data Access: Theory and Practical
Applications for Statistical Agencies. New York: North-Holland Elsevier.