Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 7
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment 1 Scoping the Issue: Terrorism, Privacy, and Technology 1.1 THE NATURE OF THE TERRORIST THREAT TO THE UNITED STATES Since September 11, 2001, the United States has faced a real and serious threat from terrorist action. Although the primary political objectives of terrorist groups vary depending on the group (e.g., the political objectives of Al Qaeda differ from those of Aum Shinrikyo), terrorist actions throughout history have nevertheless shared certain common characteristics and objectives. First, they have targeted civilians or non-combatants for political purposes. Second, they are usually violent, send a message, and have symbolic significance. The common objectives of terrorists include seeking revenge, renown, and reaction; that is, terrorists generally seek to “pay back” those they see as repressing them or their people; to gain notoriety or social or spiritual recognition and reward; and to cause those they attack to respond with fear, an escalating spiral of violence, irrational reaction and thus self-inflicted damage (e.g., reactions that strengthen the hand of the terrorists), or capitulation. Third, terrorists often blend with the targeted population—and in particular, they can exploit the fundamental values of open societies, such as the United States, to cover and conceal their planning and execution. Despite these commonalities, today’s terrorist threat is fundamentally different from those of the past. First, the scale of damage to which modern terrorists aspire is much larger than in the past. The terrorist acts of September 11, 2001, took thousands of lives and caused hundreds of billions of dollars in economic damage. Second, the potential terrorist
OCR for page 8
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment use of weapons of mass destruction (e.g., nuclear weapons, biological or chemical agents) poses a threat that is qualitatively different from a threat based on firearms or chemical explosives. Third, terrorists operate in a modern environment plentiful in the amount of available information and increasingly ubiquitous in its use of information technology. Even as terrorist ambitions and actions have increased in scale, smaller bombings and attacks are also on the rise in many corners of the world. To date, all seem to have been planned and executed by groups or networks and therefore have required some level of interaction and communication to plan and execute. Left unaddressed, this terrorist threat will create an environment of fear and anxiety for the nation’s citizens. If people come to believe that they are infiltrated by enemies that they cannot identify and that have the power to bring death, destruction, and havoc to their lives, and that preventing that from happening is beyond the capability of their governments, then the quality of national life will be greatly depreciated as citizens refrain from fully participating in their everyday lives. That scenario would constitute a failure to “establish Justice, insure domestic Tranquility, provide for the common defense, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity,” as pledged in the Preamble to the Constitution. To address this threat, new technologies have been created and are creating dramatic new ways to observe and identify people, keep track of their location, and perhaps even deduce things about their thoughts and behaviors. The task for policy makers now is to determine who should have access to these new data and capabilities and for what purposes they should be used. These new technologies, coupled with the unprecedented nature of the threat, are likely to bring great pressure to apply these technologies and measures, some of which might intrude on the fundamental rights of U.S. citizens. Appendix B (“Terrorism and Terrorists”) addresses the terrorist threat in greater detail. 1.2 COUNTERTERRORISM AND PRIVACY AS AN AMERICAN VALUE In response to the mounting terrorist threat, the United States has increased its counterterrorist efforts with the aim of enhancing the ability of the government to prevent terrorist actions before they occur. These efforts have raised concerns about the potential negative impacts of counterterrorism programs on the privacy and other civil liberties of U.S. citizens, as well as the adequacy of relevant civil liberties protections. Because terrorists blend into law-abiding society, activities aimed at
OCR for page 9
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment detecting and countering their actions before they occur inherently raise concerns that such efforts may damage a free, democratic society through well-intentioned steps intended to protect it. One such concern is that law-abiding citizens who come to believe that their behavior is watched too closely by government agencies and powerful private institutions may be unduly inhibited from participating in the democratic process, may be inhibited from contributing fully to the social and cultural life of their communities, and may even alter their purely private and perfectly legal behavior for fear that discovery of intimate details of their lives will be revealed and used against them in some manner. Privacy is, and should continue to be, a fundamental dimension of living in a free, democratic society. An array of laws protect “government, credit, communications, education, bank, cable, video, motor vehicle, health, telecommunications, children’s, and financial information; generally carve out exceptions for disclosure of personal information; and authorize use of warrants, subpoenas, and court orders to obtain information.”1 These laws usually create boundaries between individuals and institutions (or sometimes other individuals) that may limit what information is collected (as in the case of wiretapping or other types of surveillance) and how that information is handled (such as the fair information practices that seek care and openness in the management of personal information). They may establish rules governing the ultimate use of information (such as prohibitions on the use of certain health information for making employment decisions), access to the data by specific individuals or organizations, or aggregation of these data with other data sets. The great strength of the American ideal of privacy has been its robustness in the face of new social arrangements, new business practices, and new technologies. As surveillance technologies have expanded the technical capability of the government to intrude into personal lives, the law has sought to maintain a principled balance between the needs of law enforcement and democratic freedoms. Public attitudes, as identified in public opinion polls, mirror this delicate balance.2 For example, public support for counterterrorism measures appears to be strongly influenced by perceptions of the terrorist threat, 1 U.S. Congressional Research Service, Privacy: Total Information Awareness Programs and Related Information Access, Collection, and Protection Laws (RL31730), updated March 21, 2003, by Gina Marie Stevens. 2 See Appendix M (“Public Opinion Data on U.S. Attitudes Toward Government Counterterrorism Efforts”) for more details. Among them are two caveats about the identification of public attitudes through public opinion surveys. The first one has to do with the framing of survey questions, in terms of both wording and context, which have been shown to strongly influence the opinions elicited. The second has to do with declining response rates to national sample surveys and the inability to detect or estimate nonresponse bias.
OCR for page 10
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment an assessment of government effectiveness in dealing with terrorism, and perceptions as to how these measures are affecting civil liberties. Thus, one finds that since 9/11, public opinion surveys reflect a diminishing acceptance of government surveillance measures, with people less willing to cede privacy and other civil liberties in the course of increased terrorism investigation and personally less willing to give up their freedoms and more pessimistic about protection of the right to privacy. Yet recent events, such as the London Underground bombings of July 2005 and reports in August 2006 that a major terrorist attack on transatlantic airliners had been averted, appeared to influence public attitudes; support increased for such surveillance measures as expanded camera surveillance, monitoring of chat rooms and other Internet forums, and expanded monitoring of cellular phones and e-mails. However, public attitudes toward recently revealed monitoring programs are mixed, with no clear consensus. Public opinion polls also indicate that the public tends to defend civil liberties more vigorously in the abstract than in specific situations. At the same time, people seem to be less concerned about privacy in general (i.e., for others) but rather with protecting the privacy of information about themselves. In addition, most people are more tolerant of surveillance when it is aimed at specific racial or ethnic groups, when it concerns activities they do not engage in, or when they are not focusing on its potential personal impact. Thus the perception of threat might explain why passenger screening and searches both immediately after September 11, 2001, and continuing through 2006 consistently receive high levels of support while, at the same time, the possibility of personal impact reduces public support for government collection of personal information about travelers. The public is also ambivalent regarding biometric identification technologies and public health uses, such as prevention of bioterrorism and the sharing of medical information. For these, support increases with assurances of anonymity and personal benefits or when they demonstrate a high degree of reliability and are used with consent. Legal analysts,3 even courts,4 if not the larger public, have long recognized that innovation in information and communications technologies often moves faster than the protections afforded by legislation, which is usually written without an understanding of new or emerging technologies, unanticipated terrorist tactics, or new analytical capabilities. Some of these developing technologies are described in Section 1.6 and in greater 3 For example, see R.A. Pikowsky, “The need for revisions to the law of wiretapping and interception of email,” Michigan Telecommunications & Technology Law Review 10(1), 2004. 4 U.S. Court of Appeals. (No. 00-5212; June 28, 2001), p. 10. Available at http://www.esp.org/misc/legal/USCA-DC_00-5212.pdf.
OCR for page 11
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment detail in Appendixes C (“Information and Information Technology”) and H (“Data Mining and Information Fusion”). The state of the law and its limitations are detailed in Appendix F (“Privacy-Related Law and Regulation: The State of the Law and Outstanding Issues”). As new technologies are brought to bear in national security and counterterrorism efforts, the challenge is no different from what has been faced in the past with respect to potential new surveillance powers: identify those new technologies that can be used effectively and establish specific rules that govern their use in accordance with basic constitutional privacy principles.5 1.3 THE ROLE OF INFORMATION Information and information technology are ubiquitous in today’s environment. Massive databases are maintained by both governments and private-sector businesses that include information about each person and about his or her activities. For example, public and private entities keep bank and credit card records; tax, health, and census records; and information about individuals’ travel, purchases, viewing habits, Web search queries, and telephone calls. Merchants record what individuals look at, the books they buy and borrow, the movies they watch, the music they listen to, the games they play, and the places they visit. Other kinds of databases include imagery, such as surveillance video, or location information, such as tracking data obtained from bar code readers or RFID (radio frequency identification) tags. Through formal and informal relationships between government and private-sector entities, much of the data available to the private sector is also available to governments. In addition, digital devices for paying tolls, computer diagnostic equipment in car engines, and global positioning services are increasingly common on passenger vehicles. Cellular telephones and personal digital assistants record not only call and appointment information, but also location, transmitting this information to service providers. Internet service providers record online activities, digital cable and satellite systems record what individuals watch and when, alarm systems record when people enter and leave their homes. People back up personal data files online and access online photo, e-mail, and music storage services. Global positioning technologies are appearing in more and more products, and RFID tags are beginning to be used to identify consumer goods, identification documents, pets, and even people. Modern technology offers myriad options for communication 5 “[T]he law must advance with the technology to ensure the continued vitality of the Fourth Amendment,” Senate Judiciary Committee Report on the Electronic Communications Privacy Act of 1986 (S. 2575), Report 99-541, 99th Congress, 2nd Session, 1986, p. 5.
OCR for page 12
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment between individuals and among small groups, including cell phones, e-mail, chat rooms, text messaging, and various forms of mass media. With voice-over-IP telephone service, digital phone calls are becoming indistinguishable from digital documents: both can be stored and accessed remotely. New sensor technologies enable the tagging and tracking of information about individuals without their permission or awareness. As noted earlier, the terrorists of today are embedded and operate in this environment. It is not unreasonable to believe that terrorists planning an attack might leave “tracks” or “signatures” in these digital databases and networks and might make use of the communications channels available to all. Extracting terrorist tracks from nonthreat tracks might be the goal, but this is nevertheless not easy. One could imagine that aspects of a terrorist signature may be information that is not easily available or easily linked to other information or that some signatures may garner suspicion but are really not threats. However, with appropriate investigative leads, the potential increases that examining these databases, monitoring the contents of terrorist communications, and using other techniques, such as tagging and tracking, may yield valuable clues to terrorist intentions. These possibilities have not gone unnoticed by the U.S. government, which has increased the number of and investment in counterterrorism programs that collect and analyze information to protect America from terrorism and other threats to public health and safety.6 The government collects information from many industry and government organizations, including telecommunications, electricity, transportation and shipping, law enforcement, customs agents, chemical and biological industries, finance, banking, and air transportation. The U.S. government also has the technical capability and, under some circumstances, the legal right to collect and hold information about U.S. citizens both at home and abroad. To improve the overall counterterrorism effort, the government has mandated interagency and interjurisdictional information sharing.7 In short, the substantial power of the U.S. government’s capability to collect information about individuals in the United States, as well as that of private-sector corporations and organizations, and the many ways that 6 In this report, the term “program” refers to the resources required to execute a specific function—for example, a counterterrorism program, such as the Terrorist Information Awareness program. A program always involves people executing information-intensive processes. Frequently, a program involves an information system and other information systems with which it exchanges information. Humans are always fully responsible for the actions of a program. 7 Intelligence Reform and Terrorism Prevention Act of 2004, Public Law 108-458, December 17, 2004.
OCR for page 13
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment advancing technology is improving that capability necessitate explicit steps to protect against its misuse. If it were possible to automatically find the digital tracks of terrorists and automatically monitor only the communications of terrorists, public policy choices in this domain would be much simpler. But it is not possible to do so. All of the data contained in databases and on networks must be analyzed to attempt to distinguish between the data associated with terrorist activities and those associated with legitimate activities. Much of the analysis can be automated, a fact that provides some degree of protection for most personal information by having data manipulated within the system and restricted from human viewing. However, at some point, the outputs need to be considered and weighed, and some data associated with innocent individuals will necessarily and inevitably be examined by a human analyst—a fact that leads to some of the privacy concerns raised above. (Other privacy concerns, largely rooted in a technical definition of privacy described below, arise from the mere fact that certain individuals are singled out for further attention, regardless of whether a human being sees the data at all.) In conceptualizing how information is used, it is helpful to consider what might be called the information life cycle. Addressed in greater detail in Appendix C, digital information typically goes through a seven-step information life cycle: Collection. Information, whether accurate or inaccurate, is collected by some means, whether in an automated manner (e.g., financial transactions at a point of sale terminal or on the Web, call data records in a telecommunications network) or a manual manner (e.g., a Federal Bureau of Investigation (FBI) agent conducting an interview with an informant). Information may often be collected or transmitted (or both) without the subject’s awareness. In some instances, the party collecting the information may not be the end user of that information. This is especially relevant in government use of databases compiled by private parties, since laws that regulate government collection of information do not necessarily place comparable restrictions on government use of such information. Correction. Information determined to be erroneous, whether through automated or manual means, may be discarded or corrected. Information determined to be incomplete may be augmented with additional information. Under some circumstances, the person associated with the collected information can make corrections. Information correction is not trivial, especially when large volumes of data are involved. The most efficient and practical means of correcting information may reduce
OCR for page 14
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment uncertainties but is not likely to eliminate them, and indeed error correction may itself sometimes introduce more error. Storage. Information is stored in data repositories—databases, data warehouses, or simple files. Analysis and processing. Information is used or analyzed, often using query languages, business intelligence tools, or analytical techniques, such as data mining. Analysis may require access to multiple data repositories, possibly distributed across the Internet. Dissemination and sharing. Results of information analysis and processing are published or shared with the intended customer or user community (which may consist of other analysts). Disseminated information may or may not be in a format compatible with users’ applications. Monitoring. Information and analytical results are monitored and evaluated to ensure that technical and procedural requirements have been and are continuing to be met. Examples of important requirements include security (Are specified security levels being maintained?), authorization (Are all access authorized?), service level agreements (Is performance within promised levels?), and compliance with applicable government regulations. Selective retention or deletion. Information is retained or deleted on the basis of criteria (explicit or implicit) set for the information repository by the steward or by prevailing laws, regulations, or practices. The decreasing cost of storage and the increasing belief in the potential value to be mined from previously collected data are important factors enabling the increase in typical data retention periods. The benefits of retention and enhanced predictive power have to be balanced against the costs of reduced confidentiality. Data retention policies should therefore be regularly justified through an examination of this trade-off. As described, these steps in the information life cycle can be regarded as a notional process for the handling of information. However, in practice, one or more of these steps may be omitted, or the sequencing may be altered or iterated. For example, in some instances, it may be that data are first stored and then corrected. Or the data may be stored with no correction at all or processed without being stored, which is what firewalls do. Additional issues arise when information is assembled or collected from a variety of storage sources for presentation to an analysis application. Assembling such a collection generally entails linking records based on data fields, such as unique identifiers if present and available (identification numbers) or less perfect identifiers (combinations of name, address, and date of birth). The challenge of accurately linking large databases should not be underestimated. In practice, it is often the case that data may be linked with little or no control for accuracy or ability to cor-
OCR for page 15
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment rect errors in these fields, with the likely outcome that many records will be linked improperly and that many other records that should be linked are not linked. Without checks on the accuracy of such linkages, there is no way of understanding how errors resulting from linkage may affect the quality or provenance of the subsequent analysis. Finally, different entities handle information differently because of the standards and regulations imposed on them. The types of information that can be collected, corrected, stored, disseminated, and retained and by whom, when, and for how long vary across private industries and government agencies. For example, three different kinds of agencies in the United States have some responsibility for combating terrorism: agencies in the intelligence community (IC), agencies of federal law enforcement (FLE), and agencies of state, local, and tribal law enforcement (SLTLE). The information-handling policies and practices of these different types of agency are governed by different laws and regulations. For example, the information collection policies and practices of SLTLE agencies require the existence of a “criminal predicate” to collect and retain information that identifies individuals and organizations; a criminal predicate refers to the possession of “reliable, fact-based information that reasonably infers that a particularly described … subject has committed, is committing or is about to commit a crime.”8 No such predicate is required for the collection of similar information by agencies in the intelligence community. Some FLE agencies (in particular, the FBI and the Drug Enforcement Agency) are also members of the intelligence community, and when (and only when) they are acting in this role, they are not required to have such predicates, either. The rules for information retention and storage are also more restricted for SLTLE agencies than for IC agencies (or FLE agencies acting in an IC role). 1.4 ORGANIZATIONAL MODELS FOR TERRORISM AND THE INTELLIGENCE PROCESS A variety of models exists for how terrorist groups are organized, so it is helpful to consider two ends of a spectrum of organizational practices. At one end is a command-and-control model, which also characterizes traditional military organizations and multinational corporations. In this top-down structure, the leaders of the organization are responsible for planning, and they coordinate the activities of operational cells. At the other end of the spectrum is an entrepreneurial model, in which terrorist 8 D.L. Carter, Civil Rights and Privacy in the Law Enforcement Intelligence Process, Intelligence Program, School of Criminal Justice, Michigan State University, March 2008.
OCR for page 16
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment cells form spontaneously and do their planning and execution without asking anybody’s permission or obtaining external support, although they may be loosely coordinated with respect to some overall high-level objective (such as “kill Westerners in large numbers”). In practice, terrorist groups can be found at one end or the other of this spectrum, as well as somewhere in the middle. For example, a terrorist cell might form itself spontaneously but then make contact with a central organization in order to obtain some funding and technical support (such as a visit by a bomb-making expert). The spectrum of organizational practice is important because the nature of the organization in question is closely related to the various information flows among elements of the organization. These flows are important, because they provide opportunities for disruption and exploitation in counterterrorist efforts. Exploitation in particular is important because that is what yields information that may be relevant to anticipating an attack. Because it originates spontaneously and organically, the decentralized terrorist group, almost by definition, is usually composed of individuals who do blend very well and easily into the society in which they are embedded. Thus, their attack planning and preparation activities are likely to be largely invisible when undertaken against the background of normal, innocent activities of the population at large. Information on such activities is much more likely to come to the attention of the authorities through tips originating in the relevant neighborhoods or communities or through observations made by local law enforcement authorities. Although such tips and observations are also received in the context of many other tips and observations, some useful and others not, the amount of winnowing necessary in this case is very much smaller than the amount required when the full panoply of normal, innocent activities constitutes the background. By contrast, the command-and-control terrorist group potentially leaves a more consistent and easily discernible information footprint in the aggregate (although the individual elements may be small, such as a single phone call or e-mail). By definition, a top-down command structure involves regular communication among various elements (e.g., between platoon leaders and company commanders). Against the background noise, such regularities are more easily detected and understood than if the communication had no such structure. In addition, such groups typically either “light up” with increased command traffic or “go dark” prior to conducting an attack. Under these circumstances, there is greater value in a centralized analysis function that assembles the elements together into a mosaic. Although data mining techniques are defined and discussed below
OCR for page 17
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment in Section 1.6.1, it is important to point out here that different kinds of analytical approaches are suitable in each situation. This report focuses on two general types of data mining techniques (described further in Appendix H): subject-based and pattern-based data mining. Subject-based data mining uses an initiating individual or other datum that is considered, based on other information, to be of high interest, and the goal is to determine what other persons or financial transactions or movements, etc., are related to that initiating datum. Pattern-based data mining looks for patterns (including anomalous data patterns) that might be associated with terrorist activity—these patterns might be regarded as small signals in a large ocean of noise. In the case of the decentralized group, subject-based data mining is likely to augment and enhance traditional police investigations by making it possible to access larger volumes of data more quickly. Furthermore, communications networks can more easily be identified and mapped if one or a few individuals in the network are known with high confidence. By contrast, pattern-based data mining may be more useful in finding the larger information footprint that characterizes centrally organized terrorist groups. Note that there is also a role for an analytical function after an attack occurs or a planned attack is uncovered and participants captured. Under these circumstances, plausible starting points are available to begin an investigation, and this kind of analytical activity follows quite closely the postincident activities in counterespionage: who were these people, who visited them, with whom were they communicating, where did the money come from, and so on. These efforts (often known as “rolling up the network”) serve both a prosecutorial function in seeking to bring the perpetrators to justice and a prophylactic function in seeking to prevent others in the network from carrying out further terror attacks. 1.5 ACTIVITIES OF THE INTELLIGENCE COMMUNITY AND OF LAW ENFORCEMENT AGENCIES The intelligence community is responsible for protecting U.S. national security from threats that have been defined by the executive branch. When threats are defined, further information is sought (i.e., “intelligence requirements”) to understand the status and operations of the threat, from which intervention strategies are developed to prevent or mitigate the threat. The information collection and management process for the intelligence community is driven by presidential policy. In contrast, law enforcement agencies identify threats based on behaviors that are specifically identified as criminal (i.e., with the Fourth Amendment requirement of particularity). The law enforcement approach
OCR for page 33
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment life have become ubiquitous.27 Such transactions include detailed information about individuals’ behavior, communications, and relationships. At the same time, people who live in modern society do not have a real choice to refrain from leaving behind such trails. Even in the 1970s when Miller and Smith were decided, individuals who wrote checks and made telephone calls did not voluntarily convey information to third parties—they had no choice but to convey the information if they wanted to make large-value payments or communicate over physical distances. And in those cases, the third parties did not voluntarily supply the records to the government. Financial institutions are required to keep records (ironically, this requirement is found in the Right to Financial Privacy Act), and telephone companies are subject to a similar requirement about billing records. In both cases, the government demanded the records. And, at the same time, the information collected and stored by banks and telephone companies is subject to explicit or implicit promises that it will not be further disclosed. Most customers would be astonished to find their checks or telephone billing records printed in the newspaper. Today, such transactional records may be held by more private parties than ever before. For example, a handful of service providers already process, or have access to, the large majority of credit and debit card transactions, automated teller machine (ATM) withdrawals, airline and rental car reservations, and Internet access, and the everyday use of a credit card or ATM card involves the disclosure of personal financial information to multiple entities. In addition, digital networks have facilitated the growth of vigorous outsourcing markets, so information provided to one company is increasingly likely to be processed by a separate institution, and customer service may be provided by another. And all of those entities may store their data with still another. Moreover, there are information aggregation businesses in the private sector that already combine personal data from thousands of private-sector sources and public records. They maintain rich repositories of information about virtually every adult in the country, which are updated daily by a steady stream of incoming data.28 Finally, in this view, the fact that all of the data in question are in digital form means that increasingly powerful tools—such as automated data mining—can be used to analyze it, thereby reducing or eliminating privacy protections that were previously based on obscurity and difficulty 27 K.M. Sullivan, “Under a watchful eye: Incursions on personal privacy,” pp. 128-146 in The War on Our Freedoms: Civil Liberties in an Age of Terrorism (R.C. Leone and G. Anrig Jr., eds.), The Century Foundation, New York, N.Y., 2003. 28 See, generally, U.S. Government Accountability Office, Personal Information: Agency and Reseller Adherence to Key Privacy Principles, GAO 06-421, Washington, D.C., 2006.
OCR for page 34
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment of access to the data. The impact of Miller in 1976 was limited primarily to government requests for specific records about identified individuals who had already done something to warrant the government’s attention, whether or not the suspicious activity amounted to probable cause. Today, the Miller and Smith decisions allow the government to obtain the raw material on millions of individuals without any reason for identifying anyone in particular. Thus, in this view, the argument suggests that by removing the protection of the Fourth Amendment from all of these records solely because they are held by third parties, there is a significant reduction in the constitutional protection for personal privacy—not as the result of a conscious legal decision, but through the proliferation of digital technologies. In short, under current Fourth Amendment jurisprudence, all personal information in third-party custody, no matter how sensitive or how revealing of a person’s health, finances, tastes, or convictions, is available to the government without constitutional limit. The government’s demand need not be reasonable, no warrant is necessary, no judicial authorization or oversight is required, and it does not matter if the consumer has been promised by the third party that his or her data would be kept confidential as a condition of providing the information. A contrary view is that Miller and Smith are important parts of the modern Fourth Amendment and that additional privacy protections in this context should come from Congress rather than the courts. According to this view, Miller and Smith ensure that there are some kinds of surveillance that the government can conduct without a warrant. Fourth Amendment doctrine has always left a great deal of room for unprotected activity, such as what happens in public: the fact that the police can watch in public areas for criminal activity without being constrained by the Fourth Amendment is critical to the balance of the Fourth Amendment’s rule structure. In switching from physical activity to digital activity, everything becomes a record. If all records receive Fourth Amendment protection, treating every record as private, the equivalent of something inside the home, then the government will have considerable difficulty monitoring criminal activity without a warrant. In effect, under this interpretation, the Fourth Amendment would apply much more broadly to records-based and digital crimes than it does to physical crimes, and all in a way that would make it very difficult for law enforcement to conduct successful investigations. In this view, the best way forward is for the Supreme Court to retain Smith and Miller and for Congress to provide statutory protections when needed, much as it has done with the enactment of privacy laws, such as the Electronic Communications Privacy Act. Given these contrasting perspectives and the important issues they
OCR for page 35
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment raise, the constitutional and policy challenges for the future are to decide—explicitly and in light of new technological developments—the appropriate boundaries of Fourth Amendment jurisprudence regarding the disposition of data held by third parties. The courts are currently hearing cases that help get to this question; so far they have indicated that noncontent information is covered by Miller but that content information receives full Fourth Amendment protection. But these cases are new and may be overturned, and it will be some years before clearer boundaries emerge definitively. 1.8.4 False Positives, False Negatives, and Data Quality29 False positives and false negatives arise in any kind of classification exercise.30 For example, consider a counterterrorism exercise in which it is desirable to classify each individual in a set of people as “not worthy of further investigation/does not warrant obtaining more information on these people” or “worthy of further investigation/does warrant obtaining more information on these people,” based on an examination of data associated with each individual. A false positive is someone placed in the latter category who has no terrorist connection. A false negative is someone placed in the former category who has a terrorist connection. Consider a naïve notional system in which a computer program or a human analyst examines the data associated with each individual, searching for possible indications of terrorist attack planning. This examination results in a score for each individual that indicates the relative likelihood of him or her being “worthy of further investigation” relative to all of the others being examined.31 When all of the individuals are examined, they are sorted according to this score. This rank ordering does not, in itself, determine the classification—in addition, a threshold must be established to determine what scores will correspond to each category. The critical point here is that setting this threshold is the responsibility of a human analyst—technology does not, 29 This section is adapted largely from National Research Council, Engaging Privacy and Information Technology in a Digital Age, The National Academies Press, Washington, D.C., 2007, Chapter 1. 30 An extensive treatment of false positives and false negatives (and the trade-offs thereby implied) can be found in National Research Council, The Polygraph and Lie Detection, The National Academies Press, Washington, D.C., 2003. 31 The score calculated by any given system may simply be an index with only ordinal (rank-ordering) properties. If more information is available and a more sophisticated analytical approach is possible, the score may be an actual Bayesian probability or likelihood that could be manipulated quantitatively in accordance with the mathematics of probability and statistics.
OCR for page 36
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment indeed cannot, set this threshold. Moreover, it is likely that the appropriate setting of a threshold depends on the consequences for the individual being miscategorized. If the real-world consequence of a false positive for a given individual is being denied boarding of an airplane compared with looking at more records relevant to that individual, one may wish greater certainty to reduce the likelihood of a false positive—this desire would tend to drive the threshold higher in the first instance than in the second. In addition, any real analyst will not be satisfied with a system that impedes the further investigation of someone whose score is below the threshold. That is, an analyst will want to reserve the right (have the ability) to designate for further examination an individual who may have been categorized as below threshold—to say, in effect, “That guy has a lower score than most of the others, but there’s something strange about him anyway, and I want to look at him more closely even if he is below threshold.” Because the above approach is focused on individuals, any realistic setting of a threshold is likely to result in enormous numbers of false positives. One way to reduce the number of false positives significantly is to exploit the fact that terrorists—especially those with big plans in mind—are most likely to operate in small groups (also known as cells). Thus, a more sophisticated system could consider a different unit of analysis—groups of individuals rather than individuals—that might be worth further investigation. This approach, known as collective inference, focuses on analyzing large collections of records simultaneously (e.g., people, places, organizations, events, and other entities).32 Conceptually, the output of this system could be a rank ordering of all possible groups (combinations) of two individuals, another rank ordering of all possible groups of three individuals, and so on. Once again, thresholds would be set to determine groups that were worth further investigation. The rank orderings resulting from a group-oriented analysis could also be used to rule out individuals who might otherwise be classified as worthy of further investigation—if an individual with an above-threshold score was not found among the groups with above-threshold scores, that individual would be either a lone wolf or clearly seen to be a false positive and thus eliminated before the investigation went any further. A “brute-force” search of all possible groups of two, of three, and so on when the population in question is that of the United States is daunting, to say the least. But in practice, most of those groups will be individuals with no plausible connections among them, and thus the 32 More detail on these ideas can be found in D. Jensen, M. Rattigan, and H. Blau, “Information awareness: A prospective technical assessment,” Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, available at http://kdl.cs.umass.edu/papers/jensen-et-al-kdd2003.pdf.
OCR for page 37
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment records containing information about those groups need not be examined. Identifying such groups is a problem, but other techniques may be useful in eliminating some groups at a fairly early stage—for example, if a group does not contain individuals who have communicated with each other, that group might be eliminated from further consideration. All such criteria also run the risk of incurring false negatives, and it remains to be seen how useful such pruning efforts are in practice. False positives and false negatives arise from two other sources. One is the validity of the model used to distinguish between terrorists and innocent individuals. A perfectly valid model of a terrorist is one in which a set of specific measurable characteristics, if correctly associated with a given individual, would correctly identify that individual as a terrorist with 100 percent accuracy, and other individuals lacking one or more of those characteristics would be correctly identified as an innocent individual. Of course, in the real world, no model is perfect, and so false positives and false negatives are inevitable from the imperfection of models. The second and independent source of false positives and false negatives is imperfect data. That is, even if a model were perfect, in the real world, the data asserted to be associated with a given individual is not in fact associated with that individual. For example, an individual’s height may be recorded as 6.1 meters, whereas his height may in fact be 1.6 meters. Her religion may be recorded as Protestant, but in fact she may be a practicing Catholic. Such data errors arise for a wide range of reasons, including keyboarding errors, faulty intelligence, errors of translation, and so on. Improving data quality can thus reduce the rate of false positives and false negatives, but only up to the limits inherent in the imperfections of the model. Since models, for computability, abstract only some of the variables and behaviors of reality, they are by design imperfect. Model imperfections are a built-in source of error, and better data cannot compensate for a model’s inadequacies. Model inadequacies stem from several possible sources: (1) the required data for various characteristics in the assumed causal model may not be available, (2) some variables may be left out to simplify computations, (3) some variables that are causal may be available but unknown, (4) the precise form of the relationship between the predictor variables and the assessment of degree of interest is unknown, (5) the form of the relationship may be simplified to expedite computation, and (6) the phenomenon may be dynamic in nature and therefore any datedness in the inputs could cause erroneous improper predictions. Data quality is the property of data that allows them to be used effectively and rapidly to inform and evaluate decisions.33 Ideally, data should 33 A.F. Karr, A.P. Sanil, and D.L. Banks, “Data quality: A statistical perspective,” Statistical Methodology 3:137-173, 2006; T.C. Redman, “Data: An unfolding quality disaster,”
OCR for page 38
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment be correct, current, complete, and relevant. Data quality is intimately related to false positives and false negatives, in that it is intuitively obvious that using data of poor quality is likely to result in larger numbers of false positive and false negatives than would be the case if the data were of high quality. Data quality is a multidimensional concept. Measurement error and survey uncertainty contribute (negatively) to data quality, as do issues related to measurement bias. Many issues arise as the result of missing data fields; inconsistent data fields in a given record, such as recording a pregnancy for a 9-year-old boy; data incorrectly entered into the database, such as that which might result from a typographical error; measurement error; sampling error and uncertainty; timeliness (or lack thereof); coverage or comprehensiveness (or lack thereof); improperly duplicated records; data conversion errors, as might occur when a database of vendor X is converted to a comparable database using technology from vendor Y; use of inconsistent definitions over time; and definitions that become irrelevant over time. All of the forgoing discussion relates to the implications of measurement error that could easily arise in a given environment or database. However, when data come from multiple databases, they must be linked, and the methodology for performing data linkages in the absence of clear, unique identifiers is probabilistic in nature. Even in well-designed record linkage studies, such as those developed by the Census Bureau, automated matching is capable of reliably matching only about 75 percent of the people (although some appreciable fraction of the remainder are not matchable), and hand-matching of records is required to reduce the remaining number of unresolved cases.34 The difficulty of reliable matching, superimposed on measurement error, will inevitably produce much more substantial problems of false positives and false negatives than most analysts recognize. Data issues also arise as the result of combining databases—syntactic inconsistencies (one database records phone numbers in the form 202-555-1212 and another in the form 2025551212); semantic inconsistencies (weight measured in pounds vs. weight measured in kilograms); different DM Review Magazine, August 2004, available at http://www.dmreview.com/article_sub.cfm?articleId=1007211; W.W. Eckerson, “Data warehousing special report: Data quality and the bottom line,” Application Development Trends Magazine, May 1, 2002, available at http://www.adtmag.com/article.aspx?id=6303; Y. Wand and R. Wang, “Anchoring data quality dimensions in ontological foundations,” Communications of the ACM 39(11):86-95, November 1996; and R. Wang, H. Kon, and S. Madnick, “Data quality requirements analysis and modelling,” Ninth International Conference of Data Engineering, Vienna, Austria, 1993. 34 M.J. Anderson and S.E. Fienberg, Who Counts? The Politics of Census-Taking in Contemporary America, Russell Sage Foundation, New York, 1999, p. 70.
OCR for page 39
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment provenance for different databases; inconsistent data fields for records contained in different databases on a given data subject; and lack of universal identifiers to specify data subjects. Missing data are a major cause of reduction in data quality. In the situation in which network linkages are of interest and are directly represented in a database, the problem of missing data can sometimes be easier and sometimes more challenging than in the case of a rectangular file. A rectangular file usually consists of a list of individuals with their associated characteristics. In this situation, missing data can be of three general types: item nonresponse, unit nonresponse, and undercoverage. Item and unit nonresponse, while certainly problematic in the current context, are limited in impact and can sometimes be addressed using such techniques as imputation. Even undercoverage, while troubling, is at least limited to the data for the individual in question. (If such an individual is represented on another database to which one has access, merging and unduplicating operations can be helpful to identification, and estimates of the number of omissions can be developed using dual-systems estimation.) On one hand, when the appropriate unit of analysis is networks of individuals (i.e., the individuals and their characteristics along with the various linkages between them are represented as being present or absent), the treatment of missing data can be easier when linkages from other individuals present in a database, such as phone calls, e-mails, or the joint issuance of plane tickets, etc., can help inform the analyst of another individual’s existence for whom no direct information was collected. On the other hand, treating missing data can also be a challenging problem. If the data for a person in a network is missed, not only is the information on that individual unavailable, but also the linkages between that person and others may be missing. This can have a substantial impact on the data for the missing individual, as well as the data for the other members of the group in the network and even the structure of the network, since in an extreme case it may be that the missing individual is the sole linkage between two otherwise separate groups. It is likely that existing missing data techniques can be adapted to provide some assistance in the less extreme cases, but at this point this is an area in which additional research may be warranted. False positives and false negatives are in some sense complementary for any given database and given analytical approach. More precisely, for a given database and analytical approach, one can drive the rate of false positives to zero or the rate of false negatives to zero, but not simultaneously. Decreases in the false positive rate are inevitably accompanied by increases in the false negative rate and vice versa, although not necessarily in the same proportion. However, as the quality of the data is
OCR for page 40
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment improved or if the classification technique is improved, it is possible to reduce both the false positive rate and the false negative rate, provided an accurate model for true positives and negatives is used. Both false positives and false negatives pose problems for counterterrorist efforts. In the case of false positives, a counterterrorism analyst searching for evidence of terrorist attack planning may obtain personal information on a number of individuals. All of these individuals surrender some privacy, and those who have not been involved in terrorist activity (the false positives) have had their privacy violated or their rights compromised despite the lack of such involvement. Moreover, the use of purloined identities—identity theft—has enabled various kinds of fraud and evasion of law enforcement already. If terrorists are able to assume other identities, not only will that capability enable them to evade some detection and obfuscate the data used in the models—that is, deliberately manipulate the system, resulting in the generation of false positives against innocent individuals—but also it also might result in extreme measures being taken against the innocent individuals whose identities have been stolen. Every false positive also has an opportunity cost; that is, it is associated with a waste of resources—precious investigative or analytical resources that are expended in the investigation of a innocent individual. In addition, false positives put pressure on officials to justify the expenditure of such resources, and such pressures may also lead to abuses against innocent individuals. From an operational standpoint, the key question is how many false alarms are acceptable. If one has infinite resources, it is easy to investigate every false alarm that may emerge from any system, no matter how poor its performance. But in the real world of constrained resources, it is necessary to balance the number of false alarms against the resources available to investigate them as well as the severity of the perceived threat. Furthermore, it is also important to consider other approaches that might be profitably applied to the problem, as well as other security issues in need of additional effort. False negatives are also a problem and the nightmare of the intelligence analyst. A false negative is someone who should be under suspicion and is not. That is, the analyst simply misses the terrorist. From a political standpoint, the only truly acceptable number for false negatives is zero—but this political requirement belies the technical reality that the number of false negatives can never be zero. Moreover, identifying false negatives in any given instance may be problematic. In the case of the terrorist investigation, it is essentially impossible to know with certainty if a person is a false negative until he or she is known to have committed a terrorist act. False positives and false negatives (and data quality, because it affects
OCR for page 41
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment both false positives and false negatives) are important in a discussion of privacy because they are the language in which the trade-offs between privacy and other needs are often cast. One might argue that the consequences of a false negative (a terrorist plan is not detected and many people die) are in some sense much larger than the consequences of a false positive (an innocent person loses privacy or is detained). For this reason, many decision makers assert that it is better to be safe than sorry. But this argument is fallacious. There is no reason to expect that false negatives and false positives trade off against one another in a one-for-one manner. In practice, the trade-off will almost certainly entail one false negative against an enormous number of false positives, and a society that tolerates too much harm to innocent people based on large a number of false positives is no longer a society that respects civil liberties. 1.8.5 Oversight and Prevention of Abuse Administrators of government agencies face enormous challenges in ensuring that policies and practices established by higher authorities (e.g., Congress, the Executive Office of the President, the relevant agency secretary or director) are actually followed in the field by those who do the day-to-day work of the agency. In the counterterrorism context, one especially important oversight responsibility is to ensure that the policies and practices meant to protect citizen privacy are followed in a mission environment that is focused on ensuring transportation safety, protecting borders, and pursuing counterterrorism. Challenges in this domain arise not only from external pressures based on public concern over privacy but also from internal struggles about how to motivate high performance while adhering to legal requirements and staying within budget. Preventing privacy abuses from occurring is particularly important in a counterterrorism context, since privacy abuses can erode support for efforts that might in fact have some effectiveness in or utility for the counterterrorist mission. In this context, abuse refers to practices that result in a dissemination of personally identifiable information and thereby violate promised, implied, or legally guaranteed confidentiality or civil liberties.35 This point implies that oversight must go beyond the enforcement 35 Personally identifiable information (PII) refers to any information that identifies or can be used to identify, contact, or locate the person to whom such information pertains. This includes information that is used in a way that is personally identifiable, including linking it with identifiable information from other sources, or from which other personally identifiable information can easily be derived, including, but not limited to, name, address, phone number, fax number, e-mail address, financial profiles, Social Security number, credit card information, and in some cases Internet IP address. Although PII is also said to not include information collected anonymously, the discussion above suggests that the ability to make
OCR for page 42
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment of rules and procedures established to cover known and anticipated situations, to be concerned with unanticipated situations and circumstances. Oversight can occur at the planning stage to approve intended operations, during execution to monitor performance, and retrospectively to assess previous performance so as to guide future improvements. Effective oversight may help to improve trust in government agencies and enhance compliance with stated policy. 1.9 THE NEED FOR A RATIONAL ASSESSMENT PROCESS In the years since the September 11, 2001, attacks, the U.S. government has initiated a variety of information-based counterterrorist programs that involved data mining as an important component. It is fair to say that a number of these programs, including the Total Information Awareness program and the Computer-Assisted Passenger Prescreening System II (CAPPS II), generated significant controversy and did not meet the test of public acceptability, leaving aside issues of technical feasibility and effectiveness. Such outcomes raise the question of whether the nature and character of the debate over these and similar programs could have been any different if policy makers had addressed in advance some of the difficult questions raised by a program. Although careful consideration of the privacy impact of new technologies is necessary even before a program seriously enters the research stage, it is interesting and important to consider questions in two categories: effectiveness and consistency with U.S. laws and values. The threshold consideration of any privacy-sensitive technology is whether it is effective toward a clearly defined law enforcement or national security purpose. The question of effectiveness must be assessed through rigorous testing guided by scientific standards. Research on the question of how large-scale data analytical techniques, including data mining, could help the intelligence community identify potential terrorists is certainly a reasonable endeavor. Assuming that the initial scientific research justifies additional effort based on the scientific community’s standards of success, that work should continue, but it must be accompanied by a clear method for assessing the reliability of the results. an identification may depend both on the specific values of the PII in question and on the ability to aggregate data in ways that reduce significantly or even eliminate the anonymity originally promised or implied. Thus, information that previously was not PII may at a later date become PII as new techniques are developed or as other non-PII information becomes available. In short, the definition of PII can easily vary with context. For more discussion, see National Research Council, Engaging Privacy and Information Technology in a Digital Age, The National Academies Press, Washington, D.C., 2007.
OCR for page 43
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment Even if a proposed technology is effective, it must also be consistent with existing U.S. law and democratic values. Addressing this issue may involve a two-part inquiry. One must assess whether the new technique and objective comply with existing law, yet the inquiry cannot end there. Inasmuch as some programs seek to enable the deployment of very large-scale data mining over a larger universe of data than the U.S. government has previously analyzed, the fact that a given program complies with existing law does not establish that such surveillance practice is consistent with democratic values. A framework for decision making about information-based programs couched in terms of questions in these two categories is presented in Chapter 2.