In 2002, in the wake of the September 11, 2001, attacks, the Defense Advanced Research Projects Agency (DARPA) of the U.S. Department of Defense (DOD) launched a research and development effort known as the Total Information Awareness (TIA) program. Later renamed the Terrorism Information Awareness program, TIA was a research and development program intended to counter terrorism through prevention by developing and integrating information analysis, collaboration, and decision-support tools with language-translation, data-searching, pattern-recognition, and privacy-protection technologies.2 The program included the development of a prototype system/network to provide an environment for integrating technologies developed in the program and as a testbed for conducting experiments. Five threads for research investigation were to be pursued: secure collaborative problem-solving among disparate agencies and institutions, structured information-searching and pattern recognition based
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 239
J
The Total/Terrorist Information
Awareness Program
J.1 A BRIEF HISTORY1
In 2002, in the wake of the September 11, 2001, attacks, the Defense
Advanced Research Projects Agency (DARPA) of the U.S. Department of
Defense (DOD) launched a research and development effort known as the
Total Information Awareness (TIA) program. Later renamed the Terrorism
Information Awareness program, TIA was a research and development
program intended to counter terrorism through prevention by developing
and integrating information analysis, collaboration, and decision-support
tools with language-translation, data-searching, pattern-recognition, and
privacy-protection technologies.2 The program included the development
of a prototype system/network to provide an environment for integrating
technologies developed in the program and as a testbed for conducting
experiments. Five threads for research investigation were to be pursued:
secure collaborative problem-solving among disparate agencies and insti-
tutions, structured information-searching and pattern recognition based
1 This description of the TIA program is based on unclassified, public sources that are
presumed to be authoritative because of their origin (for example, Department of Defense
documents and speeches by senior program officials). Recognizing that some aspects of the
program were protected by classification, the committee believes that this description is
accurate but possibly incomplete.
2 Defense Advanced Research Programs Agency (DARPA), “Report to Congress Regarding
the Terrorism Information Awareness Program: In response to Consolidated Appropriations
Resolution, 2003, Pub. L. No. 108-7, Division M, § 111(b),” DARPA, Arlington, Va., May 20,
2003.
OCR for page 239
0 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS
on information from a wide array of data sources, social-network analysis
tools to understand linkages and organizational structures, data-sharing
in support of decision-making, and language-translation and informa-
tion-visualization tools. A technical description of the system stressed
the importance of using real data and real operational settings that were
complex and huge.3
The TIA program sought to pursue important research questions,
such as how data mining techniques might be used in national-security
investigations and how technological approaches might be able to amelio-
rate the privacy impact of such analysis. For example, in a speech given
in August 2002, John Poindexter said that4
IAO [Information Awareness Office] programs are focused on making
Total Information Awareness—TIA—real. This is a high level, visionary,
functional view of the world-wide system—somewhat over simplified.
One of the significant new data sources that needs to be mined to dis-
cover and track terrorists is the transaction space. If terrorist organiza-
tions are going to plan and execute attacks against the United States,
their people must engage in transactions and they will leave signatures
in this information space. This is a list of transaction categories, and it
is meant to be inclusive. Currently, terrorists are able to move freely
throughout the world, to hide when necessary, to find sponsorship and
support, and to operate in small, independent cells, and to strike in-
frequently, exploiting weapons of mass effects and media response to
influence governments. We are painfully aware of some of the tactics
that they employ. This low-intensity/low-density form of warfare has
an information signature. We must be able to pick this signal out of the
noise. Certain agencies and apologists talk about connecting the dots,
but one of the problems is to know which dots to connect. The relevant
information extracted from this data must be made available in large-
scale repositories with enhanced semantic content for easy analysis to
accomplish this task. The transactional data will supplement our more
conventional intelligence collection.
Nevertheless, authoritative information about the threats of interest
to the TIA program is scarce. In some accounts, TIA was focused on a
generalized terrorist threat. In other informed accounts, TIA was pre-
mised on the notion of protecting a small number of high-value targets in
the United States, and a program of selective hardening of those targets
3 Defense Advanced Research Programs Agency (DARPA), Total Information Awareness
Program System Description Document, version 1.1, DARPA, Arlington, Va., July 19, 2002.
4 J. Poindexter, Overview of the Information Awareness Office, Remarks prepared for
DARPATech 2002 Conference, Anaheim, Calif., August 2, 2002, available at http://www.
fas.org/irp/agency/dod/poindexter.html.
OCR for page 239
APPENDIX J
would force terrorists to carry out attacks along particular lines, thus lim -
iting the threats of interest and concern to TIA technology.
The TIA program was cast broadly as one that would “integrate
advanced collaborative and decision support tools; language translation;
and data search, pattern recognition, and privacy protection technologies
into an experimental prototype network focused on combating terrorism
through better analysis and decision making.”5 Regarding data-searching
and pattern recognition, research was premised on the idea that
. . . terrorist planning activities or a likely terrorist attack could be uncov-
ered by searching for indications of terrorist activities in vast quantities
of transaction data. Terrorists must engage in certain transactions to co-
ordinate and conduct attacks against Americans, and these transactions
form patterns that may be detectable. Initial thoughts are to connect these
transactions (e.g., applications for passports, visas, work permits, and
drivers’ licenses; automotive rentals; and purchases of airline ticket and
chemicals) with events, such as arrests or suspicious activities. 6
As described in the DOD TIA report, “These transactions would form
a pattern that may be discernable in certain databases to which the U.S
Government would have lawful access. Specific patterns would be identi-
fied that are related to potential terrorist planning.”7
Furthermore, the program would focus on analyzing nontargeted
transaction and event data en masse rather than on collecting information
on specific individuals and trying to understand what they were doing.
The intent of the program was to develop technology that could discern
event and transaction patterns of interest and then identify individuals of
interest on the basis of the events and transactions in which they partici-
pated. Once such individuals were identified, they could be investigated
or surveilled in accordance with normal and ordinary law-enforcement
and counterterrorism procedures.
The driving example that motivated TIA was the set of activities of
the 9/11 terrorists who attacked the World Trade Center. In retrospect, it
was discovered that they had taken actions that together could be seen
5 Defense Advanced Research Programs Agency (DARPA), “Report to Congress Regarding
the Terrorism Information Awareness Program: In response to Consolidated Appropriations
Resolution, 2003, Pub. L. No. 108-7, Division M, § 111(b),” DARPA, Arlington, Va., May 20,
2003.
6 DARPA. Defense Adanced Research Projects Agency’s Information Awareness Office and Ter -
rorism Information Awareness Project. Available at http://www.taipale.org/references/iaotia.
pdf.
7 Defense Advanced Research Programs Agency (DARPA), “Report to Congress Regarding
the Terrorism Information Awareness Program: In response to Consolidated Appropriations
Resolution, 2003, Pub. L. No. 108-7, Division M, § 111(b),” May 20, 2003, p. 14.
OCR for page 239
PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS
as predictors of the attack even if no single action was unlawful. Among
those actions were flight training (with an interest in level flight but not
in takeoff and landing), the late purchase of one-way air tickets with
cash, foreign deposits into banking accounts, and telephone records that
could be seen to have connected the terrorists. If the actions could have
been correlated before the fact, presumably in some automated fashion,
suspicions might have been aroused in time to foil the incident before it
happened.
Because the TIA program was focused on transaction and event data
that were already being collected and resident in various databases, pri-
vacy implications generally associated with the collection of data per se
did not arise. But the databases were generally privately held, and many
privacy questions arose because the government would need access to
the data that they contained. The databases also might have contained the
digital signatures of most Americans as they conducted their everyday
lives, and this gave rise to many concerns about their vast scope.
After a short period of intense public controversy, Congress took
action on the TIA program in 2003. Section 8131 of H.R. 2658, the Depart-
ment of Defense Appropriations Act of 2004, specified that
(a) Notwithstanding any other provision of law, none of the funds ap-
propriated or otherwise made available in this or any other Act may be
obligated for the Terrorism Information Awareness Program: Provided,
That this limitation shall not apply to the program hereby authorized
for processing, analysis, and collaboration tools for counterterrorism
foreign intelligence, as described in the Classified Annex accompanying
the Department of Defense Appropriations Act, 2004, for which funds
are expressly provided in the National Foreign Intelligence Program for
counterterrorism foreign intelligence purposes.
(b) None of the funds provided for processing, analysis, and collabora-
tion tools for counterterrorism foreign intelligence shall be available for
deployment or implementation except for:
(1) lawful military operations of the United States conducted outside
the United States; or
(2) lawful foreign intelligence activities conducted wholly overseas, or
wholly against non-United States citizens.
(c) In this section, the term “Terrorism Information Awareness Program”
means the program known either as Terrorism Information Awareness or
Total Information Awareness, or any successor program, funded by the
Defense Advanced Research Projects Agency, or any other Department
or element of the Federal Government, including the individual compo-
nents of such Program developed by the Defense Advanced Research
Projects Agency.
OCR for page 239
APPENDIX J
It is safe to say that the issues raised by the TIA program have not
been resolved in any fundamental sense. Though the program itself was
terminated, much of the research under it was moved from DARPA to
another group, which builds technologies primarily for the National Secu-
rity Agency, according to documents obtained by the National Journal and
to intelligence sources familiar with the move. The names of key projects
were changed, apparently to conceal their identities, but their funding
remained intact, often under the same contracts.8
The immediate result, therefore, of congressional intervention was to
drive the development and deployment of data mining at DOD from pub-
lic view, relieve it of the statutory restrictions that had previously applied
to it, block funding for research into privacy-enhancing technologies, and
attenuate the policy debate over the appropriate roles and limits of data
mining. Law and technology scholar K.A. Taipale wrote:9
At first hailed as a “victory” for civil liberties, it has become increasingly
apparent that the defunding [of TIA] is likely to be a pyrrhic victory.
. . . Not proceeding with a focused government research and develop-
ment project (in which Congressional oversight and a public debate
could determine appropriate rules and procedures for use of these tech-
nologies and, importantly, ensure the development of privacy protecting
technical features to support such policies) is likely to result in little secu-
rity and, ultimately, brittle privacy protection. . . . Indeed, following the
demise of IAO and TIA, it has become clear that similar data aggregation
and automated analysis projects exist throughout various agencies and
departments not subject to easy review.
Thus, many other data mining activities supported today by the U.S.
government continue to raise the same issues as did the TIA program: the
potential utility of large-scale databases containing personal information
for counterterrorism and law-enforcement purposes and the potential
privacy impact of the use of such databases by law-enforcement and
national-security authorities.
J.2 A TECHNICAL PERSPECTIVE ON TIA’S
APPROACH TO PROTECTING PRIVACY
As noted above, managers of the TIA program understood that their
approach to identifying terrorists before they acted had major privacy
implications. To address privacy issues in TIA and similar programs, such
8 S. Harris, “TIA lives on,” National Journal, February 23, 2006, available at http://
nationaljournal.com/about/njweekly/stories/2006/0223nj1.htm#.
9 K.A. Taipale, “Data mining and domestic security: Connecting the dots to make sense of
data,” Columbia Science and Technology Law Reiew 5(2):1-83, 2003.
OCR for page 239
PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS
as MATRIX, Tygar10 and others have advocated the use of what has come
to be called selected revelation, involving something like the risk-util-
ity tradeoff in statistical disclosure limitation. Sweeney11 used the term
to describe an approach to disclosure limitation that allows data to be
shared for surveillance purposes “with a sliding scale of identifiability,
where the level of anonymity matches scientific and evidentiary need.”
That corresponds to a monotonically increasing threshold for maximum
tolerable risk in the risk-utility confidentiality-map framework previously
described in Duncan et al.12 Some related ideas emanate from the com-
puter-science literature, but most authors attempt to demand a stringent
level of privacy, carefully defined, and to restrict access by adding noise
and limitations on the numbers of queries allowed (e.g., see Chawla et
al.13).
The TIA privacy report suggests that14
selective revelation [involves] putting a security barrier between the
private data and the analyst, and controlling what information can flow
across that barrier to the analyst. The analyst injects a query that uses
the private data to determine a result, which is a high-level sanitized
description of the query result. That result must not leak any private
information to the analyst. Selective revelation must accommodate mul-
tiple data sources, all of which lie behind the (conceptual) security bar-
10 J.D. Tygar, “Privacy Architectures,” presentation at Microsoft Research, June 18, 2003,
available at http://research.microsoft.com/projects/SWSecInstitute/slides/Tygar.pdf; J.D.
Tygar, “Privacy in sensor webs and distributed information systems,” pp. 84-95 in Software
Security Theories and Systems, M. Okada, B. Pierce, A. Scedrov, H. Tokuda, and A. Yonezawa,
eds., Springer, New York, 2003.
11 L. Sweeney, “Privacy-preserving surveillance using selective revelation,” LIDAP Work-
ing Paper 15, Carnegie Mellon University, 2005; updated journal version is J. Yen, R. Popp,
G. Cybenko, K.A. Taipale, L. Sweeney, and P. Rosenzweig, “Homeland security,” IEEE Intel-
ligent Systems 20(5):76-86, 2005.
12 G.T. Duncan, S.E. Fienberg, R. Krishnan, R. Padman, and S.F. Roehrig, “Disclosure
limitation methods and information loss for tabular data,” pp. 135-166 in Confidentiality,
Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle,
J. Lane, J. Theeuwes, and L. Zayatz, eds., North-Holland, Amsterdam, 2001. See also G.T.
Duncan, S.A. Keller-McNulty, and S.L. Stokes, Database Security and Confidentiality: Examin-
ing Disclosure Risk s. Data Utility Through the R–U Confidentiality Map, Technical Report 142,
National Institute of Statistical Sciences, Research Triangle Park, N.C., 2004; G.T. Duncan
and S.L. Stokes, “Disclosure risk vs. data utility: The R–U confidentiality map as applied to
topcoding,” Chance 17(3):16-20, 2004.
13 S.C. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee, “Towards Privacy in Public
Datatbases,” in Theory of Cryptography Conference Proceedings, J. Kilian, ed., Lecture Notes in
Computer Science, Volume 3378, Springer-Verlag, Berlin, Germany.
14 Information Systems Advanced Technology (ISAT) panel, Security with Priacy, DARPA,
Arlington, Va., 2002, p. 10, available at http://www.cs.berkeley.edu/~tygar/papers/ISAT-
final-briefing.pdf.
OCR for page 239
APPENDIX J
rier. Private information is not made available directly to the analyst, but
only through the security barrier.
One effort to implement this scheme was dubbed privacy appliances
by Golle et al. and was intended to be a stand-alone device that would
sit between the analyst and the private data source so that private data
stayed in authorized hands.15 The privacy controls would also be inde-
pendently operated to keep them isolated from the government. Accord-
ing to Golle et al., the device would provide:
• Inference control to prevent unauthorized individuals from complet-
ing queries that would allow identification of ordinary citizens.
• Access control to return sensitive identifying data only to authorized
users.
• Immutable audit trails for accountability.
Implicit in the TIA report and in the Golle et al. approach was the
notion that linkages between databases behind the security barrier would
use identifiable records and thus some form of multiparty computation
method involving encryption techniques.
The real questions of interest in “inference control” are, What disclo-
sure-limitation methods should be used? To which databases should they
be applied? How can the “inference control” approaches be combined
with the multiparty computation methods? Here is what is known in the
way of answers:
• Both Sweeney and Golle et al. refer to microaggregation, known
as k-anonymity, but with few details on how it could be used in this con-
text. The method combines observations in groups of size k and reports
either the sum or the average of the group for each unit. The groups
may be identified by clustering or some other statistical approach. Left
unsaid is what kinds of users might perform with such aggregated data.
Furthermore, neither k-anonymity nor any other confidentiality tool does
anything to cope with the implications of the release of exactly linked files
requested by “authorized users.”
• Much of the statistical and operations-research literature on con-
fidentiality fails to address the risk-utility trade-off, largely because it
15 Philippe
Golle et al. “Protecting Privacy in Terrorist Tracking Applications,” presentation
to Computers, Freedom, and Privacy 2004, available at http://www.cfp2004.org/program/
materials/w-golle.ppt.
OCR for page 239
PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS
focuses primarily on privacy or on technical implementations without
understanding how users wish to analyze a database.16
• A clear lesson from the statistical disclosure-limitation literature
is that privacy protection in the form of “safe releases” from separate
databases does not guarantee privacy protection for a merged database. A
figure in Lunt et al.17 demonstrates recognition of that by showing privacy
appliances applied for the individual databases and then independently
for the combined data.
• There have been a small number of crosswalks between the sta-
tistical disclosure-limitation literature on multiparty computation and
risk-utility trade-off choices for disclosure limitation. Yang et al. provide
a starting point for discussions on k-anonymity.18 There are clearly a
number of alternatives to k-anonymity and alternatives that yield “ano-
nymized” databases of far greater statistical utility.
• The “hype” associated with the TIA approach to protection has
abated, largely because TIA no longer exists as an official program. But
similar programs continue to appear in different places in the federal gov-
ernment and no one associated with any of them has publicly addressed
the privacy concerns raised here regarding the TIA approach.
When Congress stopped the funding for DARPA's TIA program in
2003, work on the privacy appliance's research and development effort
at PARC Research Center was an attendant casualty. Thus, prototypes of
the privacy appliance have not been made publicly available since then,
nor are they likely to appear in the near future. The claims of privacy
protection and selective revelation continued with MATRIX and other
data warehouse systems but without an attendant research program, and
the federal government continues to plan for the use of data mining tech-
niques in other initiatives, such as the Computer Assisted Passenger Pro-
16 R. Gopal, R. Garfinkel, and P. Goes, “Confidentiality via camouflage: The CVC approach
to disclosure limitation when answering queries to databases,” Operations Research 50:501-
516, 2002.
17 T. Lunt, J. Staddon, D. Balfanz, G. Durfee, T. Uribe, D. Smetters, J. Thornton, P. Aoki, B.
Waters, and D. Woodruff, “Protecting Privacy in Terrorist Tracking Applications,” presen-
tation at the University of Washington/Microsoft Research/Carnegie Mellon University
Software Security Summer Institute, Software Security: How Should We Make Software Secure?
on June 15-19, 2003, available at http://research.microsoft.com/projects/SWSecInstitute/
five-minute/Balfanz5.ppt.
18 Z. Yang, S. Zhong, and R.N. Wright, “Anonymity-preserving data collection,” pp. 334-
343 in Proceedings of the th ACM SIGKDD International Conference on Knowledge Discoery
and Data MiningKDD’0, Association for Computing Machinery, New York, N.Y., 2005.
OCR for page 239
APPENDIX J
filing System II (CAPPS II). Similar issues arise in the use of government,
medical, and private transaction data in bioterrorism surveillance. 19
J.3 ASSESSMENT
Section J.1 provided a brief history of the TIA program. Whatever
one’s views regarding the desirability or technical feasibility of the TIA
program, it is clear that from a political standpoint, the program was a
debacle. Indeed, after heated debate, the Senate and House appropria-
tions committees decided to terminate funding of the program. 20 On pas-
sage of the initial funding limitation, a leading critic of the TIA program,
Senator Ron Wyden, declared:
The Senate has now said that this program will not be allowed to grow
without tough Congressional oversight and accountability, and that there
will be checks on the government’s ability to snoop on law-abiding
Americans.21
The irony of the TIA debate is that although the funding for the TIA
program was indeed terminated, both research on and deployment of
data mining systems continue at various agencies (Appendix I, “Illustra-
tive Government Data Mining Programs and Activity”), but research on
privacy-management technology did not continue, and congressional
oversight of data mining technology development has waned to some
degree.
The various outcomes of the TIA debate raise the question of whether
the nature of the debate over the program (if not the outcome) could have
been any different if policy makers had addressed in advance some of the
difficult questions that the program raised. In particular, it is interesting
to consider questions in the three categories articulated in the framework
of Chapter 2: effectiveness, consistency with U.S. laws and values, and
possible development of new laws and practices. The TIA example further
illustrates how careful consideration of the privacy impact of new tech-
nologies is needed before a program seriously begins the research stage.
The threshold consideration of any privacy-sensitive technology is
whether it is effective in meeting a clearly defined law-enforcement or
19 See S.E. Fienberg and G. Shmueli, “Statistical issues and challenges associated with
rapid detection of bio-terrorist attacks,” Statistics in Medicine 24:513-529, 2005; L. Sweeney,
“Privacy-Preserving Bio-Terrorism Surveillance,” presentation at AAAI Spring Symposium,
AI Technologies for Homeland Security, Stanford University, Stanford, Calif., 2005.
20 U.S. House, Conference Report on H.R. 2658, Department of Defense Appropriations
Act, (House Report 108-283), U.S. Government Printing Office, Washington, D.C., 2004.
21 Declan McCullagh, “Senate limits Pentagon ‘snooping’ plan,” CNET News.com, January
24, 2003. Available at http://sonyvaio-cnet.com.com/2100-1023_3-981945.html.
OCR for page 239
PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS
national-security purpose. The question of effectiveness must be assessed
through rigorous testing guided by scientific standards. The TIA research
program proposed an evaluation framework, but none of the results of
evaluation have been made public. Some testing and evaluation may have
occurred in a classified setting, but neither this committee nor the public
has any knowledge of results. Research on how large-scale data-analysis
techniques, including data mining, could help the intelligence community
to identify potential terrorists is certainly a reasonable endeavor. Assum-
ing that initial research justifies additional effort on the basis of scientific
standards of success, the work should continue, but it must be accompa-
nied by a clear method for assessing the reliability of the results.
Even if a proposed technology is effective, it must also be consistent
with existing U.S. law and democratic values. First, one must assess
whether the new technique and objective comply with law. In the case of
TIA, DARPA presented to Congress a long list of laws that it would com-
ply with and affirmed that “any deployment of TIA’s search tools may
occur only to the extent that such a deployment is consistent with current
law.” Second, inasmuch as TIA research sought to enable the deployment
of very large-scale data mining over a larger universe of data than the U.S.
government had previously analyzed, even compliance with then-current
law would not establish consistency with democratic values.
The surveillance power that TIA proposed to put in the hands of U.S.
investigators raised considerable concern among policy makers and the
general public. That the program, if implemented, could be said to com-
ply with law did not address those concerns. In fact, the program raised
the concerns to a higher level and ultimately led to an effort by Congress
to stop the research altogether.
TIA-style data mining was, and still is, possible because there are few
restrictions on government access to third-party business records. Any
individual business record (such as a travel reservation or credit-card
transactions) may have relatively low privacy sensitivity when looked
at in isolation; but when a large number of such transaction records are
analyzed over time, a complete and intrusive picture of a person’s life
can emerge.
Developing the technology to derive such individual profiles was
precisely the objective of the TIA program. It proposed to use such pro-
files in only the limited circumstances in which they indicated terrorist
activity. That may be a legitimate goal and could ultimately be recognized
explicitly as such by law. However, that the program was at once legal
and at the same time appeared to cross boundaries not previously crossed
by law-enforcement or national-security investigations gives rise to ques-
tions that must be answered.
OCR for page 239
APPENDIX J
John Poindexter, director of the DARPA office responsible for TIA,
was aware of the policy questions and took notable steps to include
in the technical research agenda various initiatives to build technical
mechanisms that might minimize the privacy impact of the data mining
capabilities being developed. In hindsight, however, a more comprehen-
sive analysis of both the technical and larger public-policy considerations
associated with the program was necessary to address Congress’s con-
cerns about privacy impact.