Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 218
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment I Illustrative Government Data Mining Programs and Activity Several federal agencies have sought to use data mining to reduce the risk of terrorism, including the Department of Defense (DOD), the Department of Homeland Security (DHS), the Department of Justice (DOJ), and the National Security Agency (NSA). Some of the data mining programs have been withdrawn; some are in operation; some have changed substantially in scope, purpose, and practice since they were launched; and others are still in development. This appendix briefly describes a number of the programs, their stated goals, and their current status (as far as is known publicly).1 The programs described vary widely in scope, purpose, and sophistication. Some are research efforts focused on the fundamental science of data mining; others are intended as efforts to create general toolsets and developer toolkits that could be tailored to meet various requirements. Most of the programs constitute specific deployments of one or more forms of data mining technology intended to achieve particular 1 A 2004 U.S. Government Accountability Office (GAO) report provided a comprehensive survey of data mining systems and activities in federal agencies up to that time. See GAO, Data Mining: Federal Efforts Cover a Wide Range of Uses, GAO-04-548, GAO, Washington, D.C., May 2004. Other primary resources: J.W. Seifert, Data Mining and Homeland Security: An Overview, RL31798, Congressional Research Service, Washington, D.C., updated June 5, 2007; U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006; DHS Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, DHS, Washington, D.C., August 2006.
OCR for page 219
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment operational goals. The programs vary widely in sophistication of the technologies used to achieve operational goals; they also vary widely in the sources of data used (such as government data, proprietary information from industry groups, and data from private data aggregators) and in the forms of the data (such as structured and unstructured). The array of subject matter of the projects is broad: they cover law enforcement, terrorism prevention and pre-emption, immigration, customs and border control, financial transactions, and international trade. Indeed, the combination of the variety of applications and the variety of definitions of what constitutes data mining make any overall assessment of data mining programs difficult. The scientific basis of many of these programs is uncertain or at least not publicly known. For example, it is not clear whether any of the programs have been subject to independent expert review of performance. This appendix is intended to be primarily descriptive, and the mention of a given program should not be taken as an endorsement of its underlying scientific basis. I.1 TOTAL/TERRORISM INFORMATION AWARENESS (TIA) Status: Withdrawn as such, but see Appendix J for a description. I.2 COMPUTER-ASSISTED PASSENGER PRESCREENING SYSTEM II (CAPPS II) AND SECURE FLIGHT Status: CAPPS II abandoned; Secure Flight planned for deployment in 2008. In creating the Transportation Security Administration (TSA), Congress directed that it implement a program to match airline passengers against a terrorist watch list. CAPPS II was intended to fulfill that directive. It was defined as a prescreening system whose purpose was to enable TSA to assess and authenticate travelers’ identities and perform a risk assessment to detect persons who may pose a terrorist-related threat. However, it went beyond the narrow directive of checking passenger information against a terrorist watch list and included, for instance, assessment of criminal threats. According to the DHS fact sheet on the program, CAPPS II was to be an integral part of its layered approach to security, ensuring that travelers who are known or potential threats to aviation are stopped before they or their baggage board an aircraft.2 It 2 U.S. Department of Homeland Security, “Fact Sheet: CAPPS II at a Glance,” February 13, 2004, available at http://www.dhs.gov/xnews/releases/press_release_0347.shtm.
OCR for page 220
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment was meant to be a rule-based system that used information provided by the passenger (name, address, telephone number, and date of birth) when purchasing an airline ticket to determine whether the passenger required additional screening or should be prohibited from boarding. CAPPS II would have examined both commercial and government databases to assess the risk posed by passengers. In an effort to address privacy and security concerns surrounding the program, DHS issued a press release about what it called myths and facts about CAPPS II.3 For instance, it stated that retention of data collected would be limited—that all data collected and created would be destroyed shortly after the completion of a traveler’s itinerary. It also said that no data mining techniques would be used to profile and track citizens, although assessment would have extended beyond checking against lists and would have included examining a wide array of databases. A study by GAO in 2004 found that TSA was sufficiently addressing only one of eight key issues related to implementing CAPPS II.4 The study found that accuracy of data, stress testing, abuse prevention, prevention of unauthorized access, policies for operation and use, privacy concerns, and a redress process were not fully addressed by CAPPS II. Despite efforts to allay concerns, CAPPS II was abandoned in 2004. It was replaced in August 2004 with a new program called Secure Flight. Secure Flight is designed to fulfill the Congressional directive while attempting to address a number of concerns raised by CAPPS II. For instance, unlike CAPPS II, Secure Flight makes TSA responsible for cross-checking passenger flight information with classified terrorist lists rather than allowing such checking to be done by contracted vendors. Although the possibility of using commercial databases to check for threats is still included, the use of commercial data is now precluded.5 Other differences between CAPPS II and Secure Flight include limiting screening to checking for terrorism threats, not criminal offenses (although this was initially included), and using only historical data during testing phases. TSA states that the mission of Secure Flight is “to enhance the security of domestic commercial air travel within the United States through the 3 U.S. Department of Homeland Security, “CAPPS II: Myths and facts,” February 13, 2004, available at http://www.dhs.gov/xnews/releases/press_release_0348.shtm. 4 U.S. Government Accountability Office (GAO), Aviation Security: Computer-Assisted Passenger Prescreening System Faces Significant Implementation Challenges, GAO-04-385, GAO, Washington, D.C., February 2004. 5 U.S. Transportation Security Administration, “Secure Flight: Privacy Protection,” available at http://www.tsa.gov/what_we_do/layers/secureflight/secureflight_privacy.shtm.
OCR for page 221
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment use of improved watch list matching.”6 According to TSA, when implemented, Secure Flight would: Decrease the chance of compromising watch-list data by centralizing use of comprehensive watch lists. Provide earlier identification of potential threats, allowing for expedited notification of law-enforcement and threat-management personnel. Provide a fair, equitable, and consistent matching process among all aircraft operators. Offer consistent application of an expedited and integrated redress process for passengers misidentified as posing a threat. However, Secure Flight has continued to raise concerns about privacy, abuse, and security. A 2006 GAO study of the program found that although TSA had made some progress in managing risks associated with developing and deploying Secure Flight, substantial challenges remained.7 After publication of the study report, TSA announced that it would reassess the program and make changes to address concerns raised in the report. The 2006 DHS Privacy Office report on data mining did not include an assessment of Secure Flight; it stated that searches or matches are done with a known name or subject and thus did not meet the definition of data mining used in the report.8 In a prepared statement before the Senate Committee on Commerce, Science, and Transportation in January 2007, the TSA administrator noted progress in addressing those concerns and the intention to make the program operational by some time in 2008.9 Most recently, DHS Secretary Michael Chertoff announced that Secure Flight would no longer include data mining and would restrict information collected about passengers to full name and, optionally, date of birth and sex. Chertoff stated that Secure Flight will not collect commercial data, assign risk scores, or attempt to predict behavior, as was 6 U.S. Transportation Security Administration, “Secure Flight: Layers of Security,” available at http://www.tsa.gov/what_we_do/layers/secureflight/index.shtm. 7 U.S. Government Accountability Office (GAO), Aviation Security: Significant Management Challenges May Adversely Affect Implementation of the Transportation Security Administration’s Secure Flight Program, GAO-06-374T, GAO, Washington, D.C., February 9, 2006. 8 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” July 6, 2006; DHS Office of Inspector General, Survey of DHS Data Mining Activities, OIG-06-56, DHS, Washington, D.C., August 2006, p. 20, footnote 25. 9 Prepared Statement of Kip Hawley, Assistant Secretary of the Transportation Security Administration Before the U.S. Senate Committee on Commerce, Science and Transportation, January 17, 2007, available at http://www.tsa.gov/press/speeches/air_cargo_testimony.shtm.
OCR for page 222
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment envisioned in earlier versions of the program.10 The information provided will be compared with a terrorist watch list. I.3 MULTISTATE ANTI-TERRORISM INFORMATION EXCHANGE (MATRIX) Status: Pilot program ended; no follow-on program started. This program was an effort to support information-sharing and collaboration among law-enforcement agencies.11 It was run as a pilot project administered by the Institute for Intergovernmental Research for DHS and DOJ.12 MATRIX involved collaborative information-sharing between public, private, and nonprofit institutions. A Congressional Research Service (CRS) report described MATRIX as a project that “leverages advanced computer/information management capabilities to more quickly access, share, and analyze public records to help law enforcement generate leads, expedite investigations, and possibly prevent terrorist attacks.”13 The MATRIX system was developed and operated by a private Florida-based company, and the Florida Department of Law Enforcement controlled access to the program and was responsible for the security of the data.14 Although “terrorism” is part of the program name, the primary focus appears to have been on law enforcement and criminal investigation. Until the system was redesigned, participating states were required to transfer state-owned data to a private company.15 The core function of the system was the Factual Analysis Criminal Threat Solution (FACTS) application used to query disparate data sources by using available investigative information, such as a portion of a vehicle license number, to combine records dynamically to identify people of potential interest. According 10 Michael J. Sniffen, “Feds off simpler flight screening plan,” Associated Press, August 9, 2007. 11 U.S. Department of Homeland Security (DHS), “MATRIX Report: DHS Privacy Office Report to the Public Concerning the Multistate Anti-Terrorism Information Exchange,” DHS, Washington, D.C., December 2006, p. 1. 12 The Institute for Intergovernmental Research (IIR) is a Florida-based nonprofit research and training organization specializing in law enforcement, juvenile justice, criminal justice, and homeland security. See http://www.iir.com/default.htm. 13 W.J. Krouse, The Multi-State Anti-Terrorism Information Exchange (MATRIX) Pilot Project, RL32536, U.S. Congressional Research Service (CRS), Washington, D.C., August 18, 2004, p. 1, italics original. Note that the official Web site for MATRIX program cited in this CRS report is no longer available. 14 Ibid., p. 2. The company, Seisint, was acquired by Reed Elsevier subsidiary LexisNexis in July 2004. 15 J. Rood, “Controversial data-mining project finds ways around privacy laws,” CQ Homeland Security—Intelligence, July 23, 2004, p. 1.
OCR for page 223
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment to the CRS report, FACTS included crime-mapping, association-charting, lineup and photograph montage applications, and extensive query capabilities.16 Data sources available in the system included both those traditionally available to law enforcement—such as criminal history, corrections-department information, driver’s license, and motor-vehicle data—and nontraditional ones, such as:17 Pilot licenses issued by the Federal Aviation Administration, Aircraft ownership, Property ownership, U.S. Coast Guard vessel registrations, State sexual-offender lists, Corporate filings, Uniform Commercial Code filings or business liens, Bankruptcy filings, and State-issued professional licenses. Concerns were raised that the data would be combined with private data, such as credit history, airline reservations, and telephone logs; but the MATRIX Web site stated those would not be included. The system initially included a scoring system called High Terrorist Factor (HTF) that identified people who might be considered high-risk, although it was later claimed that the HTF element of the system had been eliminated.18 The pilot program ended in April 2005. Legal, privacy, security, and technical concerns about requirements to transfer state-owned data to MATRIX administrators and continuing costs associated with using the system prompted several states that initially participated or planned to participate in MATRIX to withdraw.19 By March 2004, 11 of the 16 states that originally expressed interest in participating had withdrawn from the program. In an attempt to address some of the concerns, the architecture was changed to allow distributed access to data in such a way that no 16 Krouse, op. cit., p. 4. 17 Data sources are identified in the Congressional Research Service report (Krouse, op. cit., p. 6) as referenced from the official MATRIX program Web site, http://www.matrix-at.org, which is no longer available. 18 B. Bergstein, “Database firm gave feds terror suspects: ‘Matrix’ developer turned over 120,000 names,” Associated Press, May 20, 2004, available at http://www.msnbc.msn.com/id/5020795/. 19 See, for instance, Georgia Department of Motor Vehicle Safety, “Department of Motor Vehicle Safety’s Participation in MATIX,” September 29, 2003; New York State Police, Letter to Chairman of MATRIX, March 9, 2004; Texas Department of Public Safety, Letter to Chair, Project MATRIX, May 21, 2003. Those documents and additional information on state involvement are available from the American Civil Liberties Union (ACLU) Web site at http://www.aclu.org/privacy/spying/15701res20050308.html.
OCR for page 224
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment data transfers from state-controlled systems would be required to share data.20 The CRS report concluded that “it remains uncertain whether the MATRIX pilot project is currently designed to assess and address privacy and civil liberty concerns.”21 For instance, there appears to have been no comprehensive plan to put safeguards and policies in place to avoid potential abuses of the system, such as monitoring of activities of social activists or undermining of political activities.22 I.4 ABLE DANGER Status: Terminated in January 2001. This classified program established in October 1999 by the U.S. Special Operations Command and ended by 2001 called for the use of data mining tools to gather information on terrorists from government databases and from open sources, such as the World Wide Web.23 The program used link analysis to identify underlying connections between people.24 Analysis would then be used to create operational plans designed to disrupt, capture, and destroy terrorist cells. Link analysis is a form of network analysis that uses graph theory to identify patterns and measure the nature of a network. The related social-network analysis is now considered a critical tool in sociology, organizational studies, and information sciences. Cohesion, betweenness, centrality, clustering coefficient, density, and path length are some of the measures used in network analysis to model and quantify connections. The combination of complex mathematics and the enormous volumes of data required to gain an accurate and complete picture of a network make the use of information technology critical if useful analysis is to be performed on a large scale. Several network-mapping software packages are available commercially. Applications include fraud detection, relevance ratings in Internet search engines, and epidemiology. 20 Rood, op. cit., p. 1. 21 Krouse, op. cit., p. 10. 22 Ibid., p. 8. 23 U.S. Department of Defense (DOD) Office of the Inspector General, “Report of Investigation: Alleged Misconduct by Senior DOD Officials Concerning the Able Danger Program and Lieutenant Colonel Anthony A. Shaffer, U.S. Army Reserve,” Case Number H05L97905217, DOD, Washington, D.C., September 18, 2006. 24 “Link analysis” was an informal term used to describe the analysis of connections between individuals rather than any kind of formal “record linkage” between database records.
OCR for page 225
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment Able Danger was focused specifically on mapping and analyzing relationships within and with Al Qaeda. The program became public in 2005 after claims made by Rep. Curt Weldon that Able Danger had identified 9/11 hijacker Mohammad Atta before the attack surfaced in the mass media. A member of Able Danger, Anthony Shaffer, later identified himself as the source of Weldon’s information.25 He further claimed that intelligence discovered as part of Able Danger was not passed on to Federal Bureau of Investigation (FBI) and other civilian officials. Shaffer said a key element of Able Danger was the purchase of data from information brokers that identified visits by individuals to specific mosques.26 That information was combined with other data to identify patterns and potential relationships among alleged terrorists. Claims made by Shaffer were refuted in a report written by the DOD inspector general.27 The report showed examples of the types of charts produced by link analysis.28 It characterized Able Danger operations as initially an effort to gain familiarity with state-of-the-art analytical tools and capabilities and eventually to apply link analysis to a collection of data from other agencies and from public Web sites to understand Al Qaeda infrastructure and develop a strategy for attacking it.29 The program was then terminated, having achieved its goal of developing a (still-classified) “campaign plan” that “formed the basis for follow-on intelligence gathering efforts.”30 An investigation by the Senate Select Committee on Intelligence concluded that Able Danger had not identified any of the 9/11 hijackers before September 11, 2001.31 No follow-on intelligence effort using link-analysis techniques developed by Able Danger has been publicly acknowledged. However, the existence of a program known as Able Providence supported through the Office of Naval Intelligence, which would reconstitute and improve 25 Cable News Network, “Officer: 9/11 panel didn’t receive key information,” August 17, 2005, available at http://www.cnn.com/2005/POLITICS/08/17/sept.11.hijackers. 26 J. Goodwin, “Inside Able Danger—The secret birth, extraordinary life and untimely death of a U.S. military intelligence program,” Government Security News, September 5, 2005, available at http://www.gsnmagazine.com/cms/lib/410.pdf. 27 U.S. Department of Defense Office of the Inspector General, “Report of Investigation: Alleged Misconduct by Senior DOD Officials Concerning the Able Danger Program and Lieutenant Colonel Anthony A. Shaffer, U.S. Army Reserve,” Case Number H05L97905217, September 18, 2006. 28 Ibid., pp. 8-9. 29 Ibid., p. 14. 30 Ibid. 31 G. Miller, “Alarming 9/11 claim is baseless, panel says,” Los Angeles Times, December 24, 2006.
OCR for page 226
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment on Able Danger, was reported by Weldon in testimony to the U.S. Senate Committee on the Judiciary as part of its investigation.32 I.5 ANALYSIS, DISSEMINATION, VISUALIZATION, INSIGHT, AND SEMANTIC ENHANCEMENT (ADVISE) Status: Under development (some deployments decommissioned). This program, being developed by DHS, was intended to help to detect potentially threatening activities by using link analysis of large amounts of data and producing graphic visualizations of identified linkage patterns. It was one of the most ambitious data mining efforts being pursued by DHS. ADVISE was conceived as a data mining toolset and development kit on which applications could be built for deployment to address specific needs. An assessment of ADVISE was not included in the 2006 DHS Privacy Office report on data mining, because it was considered a tool or technology and not a specific implementation of data mining.33 That position was noted in a GAO report on the program that questioned the decision not to include a privacy assessment of the program, given that “the tool’s intended uses include applications involving personal information, and the E-Government Act, as well as related Office of Management and Budget and DHS guidance, emphasize the need to assess privacy risks early in systems development.”34 The GAO report identified the program’s intended benefit as helping to “detect activities that threaten the United States by facilitating the analysis of large amounts of data that otherwise would be very difficult to review,” noting that the tools developed as part of ADVISE are intended to accommodate both structured and unstructured data.35 The report concluded that ADVISE raised a number of privacy concerns and that although DHS had added security controls related to ADVISE, it had failed to assess privacy risks, including erroneous associations of people, misidentification of people, and repurposing of data collected for other 32 Representative Curt Weldon in testimony to the United States Senate Committee on the Judiciary, September 21, 2005, available at http://judiciary.senate.gov/testimony.cfm?id=1606&wit_id=4667. See also P. Wait, “Data-mining offensive in the works,” Government Computer News, October 10, 2005. 33 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006; DHS Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, DHS, Washington, D.C., August 2006, p. 20, footnote 25. 34 U.S. Government Accountability Office (GAO), Data Mining: Early Attention to Privacy in Developing a Key DHS Program Could Reduce Risks, GAO-07-293, GAO, Washington, D.C., February 2007, p. 3. 35 Ibid., p. 3.
OCR for page 227
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment purposes.36 It called on DHS “to conduct a privacy impact assessment of the ADVISE tool and implement privacy controls as needed to mitigate any identified risks.”37 DHS responded to the GAO report, saying that it was in the process of developing a privacy impact assessment tailored to the unique character of the ADVISE program (as a tool kit). A later DHS Privacy Office report did review the ADVISE program and drew a careful distinction between ADVISE as a technology framework and ADVISE deployments.38 The report first reviewed the technology framework in light of privacy compliance requirements of the DHS Privacy Office described in the report.39 In light of those requirements, it then assessed six planned deployments of ADVISE:40 Interagency Center for Applied Homeland Security Technology (ICAHST). ICAHST evaluates promising homeland-security technologies for DHS and other government stakeholders in the homeland-security technology community. All-Weapons of Mass Effect (All-WME). Originally begun by the Department of Energy, All-WME used classified message traffic collected by the national laboratories’ field intelligence elements to analyze information related to foreign groups and organizations involved in WME material flows and illicit trafficking. Deployment has been discontinued. Biodefense Knowledge Management System. This was a series of three deployment initiatives planned by the Biodefense Knowledge Center with the overall goal of identifying better methods for assisting DHS analysts in identifying and characterizing biological threats posed by terrorists. All the deployments have ended, and there are no plans for future deployments. Remote Threat Alerting System (RTAS). RTAS sought to determine whether the ADVISE technology framework could assist DHS Customs and Border Protection (CBP) in identifying anomalous shipments on the basis of cargo type and originating country. All RTAS activities ended in September 2006. Immigration and Customs Enforcement Demonstration (ICE Demo). This deployment was operated by the DHS Science and Technology Director- 36 Ibid., p. 18. 37 Ibid., from “Highlights: What GAO Recommends.” See also p. 23. 38 U.S. Department of Homeland Security, “DHS Privacy Office Review of the Analysis, Dissemination, Visualization, Insight and Semantic Enhancement (ADVISE) Program,” DHS, Washington, D.C., July 11, 2007. Page 2 discusses and defines these terms. 39 Ibid., pp. 3-5. 40 Ibid. Definitions and descriptions of these programs are drawn from the report beginning on p. 7.
OCR for page 228
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment ate and Lawrence Livermore National Laboratory to determine whether the ADVISE technology framework could assist DHS Immigration and Customs Enforcement (ICE) in using existing ICE data better. All activity related to this deployment of ADVISE has ended. Threat Vulnerability Integration System (TVIS). TVIS used a series of data sets to identify opportunities to test the capability of the ADVISE technology framework to help analysts in the DHS Office of Intelligence and Analysis. Early pilot deployment phases have been followed by subsequent pilot deployment phases. The report found that some of the deployments did use personally identifiable information without conducting privacy impact assessments.41 It also recommended short- and long-term actions to address the problems. In particular, it recommended actions that would integrate privacy compliance requirements into project development processes, echoing recommendations made in the GAO report on the program.42 DHS ended the program in September 2007, citing the availability of commercial products to provide similar functions at much lower cost.43 I.6 AUTOMATED TARGETING SYSTEM (ATS) Status: In use. This program is used by CBP, part of DHS, to screen cargo and travelers entering and leaving the United States by foot, car, airplane, ship, and rail. ATS aassess risks by using data mining and data-analysis techniques. The risk assessment and links to information on which the assessment is based are stored in the ATS for up to 40 years.44 The assessment is based on combining and analyzing data from several existing sources of information—including the Automated Commercial System, the Automated Commercial Environment System, the Advance Passenger Information System, and the Treasury Enforcement Communications System—and from people crossing the U.S. land border known (the Passenger Name Record). ATS compares a traveler’s name with a list of known and suspected
OCR for page 229
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment terrorists. It also performs link analysis, checking, for example, the telephone number associated with an airline reservation against telephone numbers used by known terrorists.45 Such checking has been credited by DHS with preventing entry of suspected terrorists and with identifying criminal activity,46 but concerns about high numbers of false alarms, efficacy of the risk assessment, lack of a remediation process, and ability of the agency to protect and secure collected data properly have been raised by some in the technical community and by civil-liberties groups.47 I.7 THE ELECTRONIC SURVEILLANCE PROGRAM Status: Continuing subject to oversight by the Foreign Intelligence Surveillance Court. This program, also called the Terrorist Surveillance Program, involves the collection and analysis of domestic telephone-call information with the goal of targeting the communications of Al Qaeda and related terrorist groups and affiliated individuals. Details about the program remain secret, but as part of the program the president authorized NSA to eavesdrop on communications of people in the United States without obtaining a warrant when there is “reasonable basis to conclude that one party to the communication is a member of Al Qaeda.”48 Existence of the program first surfaced in a New York Times article published in December 2005.49 Questions as to the legality of the program led a federal judge to declare the program unconstitutional and illegal and to order that it be suspended. That ruling was overturned on appeal on narrow grounds regarding the standing of the litigants rather than the legality of the program.50 In a letter to the Senate Committee on the 45 Remarks of Stewart Baker, Assistant Secretary for Policy, Department of Homeland Security at the Center for Strategic and International Studies, Washington, D.C., December 19, 2006. 46 Ibid. Baker noted the use of ATS to identify a child-smuggling ring. CBP officers who examined ATS data noticed that a woman with children had not taken them with her on the outbound flight; this led to further investigation. 47 See, for instance, B. Schneier, “On my mind: They’re watching,” Forbes, January 8, 2007; Electronic Privacy Information Center, Automated Targeting System, http://www.epic.org/privacy/travel/ats/default.html. 48 Press briefing by Attorney General Alberto Gonzales and General Michael Hayden, Principal Deputy Director for National Intelligence, December 19, 2005, available at http://www.whitehouse.gov/news/releases/2005/12/20051219-1.html. 49 J. Risen and E. Lichtblau, “Bush lets U.S. spy on callers without courts,” New York Times, December 16, 2005. 50 A. Goldstein, “Lawsuit against wiretaps rejected,” The Washington Post, July 7, 2007, p. A1.
OCR for page 230
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment Judiciary on January 17, 2007, Attorney General Alberto Gonzales stated that the program would not be reauthorized by the president although the surveillance program would continue subject to oversight by the Foreign Intelligence Surveillance Court (FISC). Although the legality of the program has been the primary focus of the press, it is unclear to what extent data mining technology is used as part of the program. Some press reports suggest that such technology is used extensively to collect and analyze data from sources that include telephone and Internet communication, going well beyond keyword searches to use link analysis to uncover hidden relationships among data points.51 The adequacy of the FISC to address technology advances, such as data mining and traffic-analysis techniques, has also been called into question.52 As this report is being written (June 2008), changes in the Foreign Intelligence Surveillance Act are being contemplated by Congress. The final disposition of the changes is not yet known. I.8 NOVEL INTELLIGENCE FROM MASSIVE DATA (NIMD) PROGRAM Status: In progress. NIMD is a research and development program funded by the Disruptive Technology Office,53 which is part of the Office of the Director of National Intelligence. The program, which has many similarities to the Total/Terrorism Information Awareness program, is focused on the development of data mining and analysis tools to be used in working with massive data. According to a “Call for 2005 Challenge Workshop Proposals,” “NIMD aims to preempt strategic surprise by addressing root causes of analytic errors related to bias, assumptions, and premature attachment to a single hypothesis.”54 Two key challenges are identified: data triage to support decision-making and real-time analysis of petabytes of data and practical knowledge representation to improve machine processing and 51 E. Lichtblau and J. Risen, “Spy agency mined vast data trove, official report,” New York Times, December 23, 2005; S. Harris, “NSA spy program hinges on state-of-the-art technology,” National Journal, January 20, 2006. 52 See, for instance, K.A. Taipale, “Whispering wires and warrantless wiretaps: Data mining and foreign intelligence surveillance,” NYU Review of Law and Security, Issue 7, Supplemental Bulletin on Law and Security, Spring 2006. 53 The Disruptive Technology Office was previously known as the Advanced Research and Development Activity (ARDA). 54 Advanced Research Development Activity, “Call for 2005 Challenge Workshop Proposals,” available at http://nrrc.mitre.org/arda_explorprog2005_cfp.pdf.
OCR for page 231
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment data-sharing among disparate agencies and technologies. The challenge identifies five focus areas for NIMD research with the overarching goal of building “smart software assistants and devil’s advocates that help analysts deal with information overload, detect early indicators of strategic surprise, and avoid analytic errors”: “modeling analysts and analytic processes, capturing and reusing prior and tacit knowledge, generating and managing hypotheses, organizing/structuring massive data (mostly unstructured text), and human interaction with information.” Advocacy groups and some members of Congress have expressed concerns that at least some of the research done as part of the TIA program has continued under NIMD.55 In contrast with TIA, Congress stipulated that technologies developed under the program are to be used only for military or foreign intelligence purposes against non-U.S. citizens. I.9 ENTERPRISE DATA WAREHOUSE (EDW) Status: Operational since 2000 and in use. This system collects data from CBP transactional systems and subdivides them into data sets for analysis.56 The data sets are referred to as data marts. Their creation is predicated on the need for a specific grouping and configuration of selected data.57 EDW acquires and combines data from several customs and other federal databases to perform statistical and trend analysis to look for patterns, for instance, to determine the impact of an enforcement action or rule change.58 EDW uses commercial off-the-shelf technology for its analysis.59 EDW data are treated as read- 55 “U.S. still minding terror data,” Associated Press, Washington, D.C., February 23, 2004; M. Williams, “The Total Information Awareness Project lives on,” Technology Review, April 26, 2006. 56 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” July 6, 2006, pp. 20-21. 57 An explanation of the distinction between a data warehouse and a data mart is provided as a footnote in DHS Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, August 2006, p. 11. 58 See U.S. Customs and Border Protection, “U.S. Customs data warehousing,” available at http://www.cbp.gov/xp/cgov/trade/automated/automated_systems/data_warehousing.xml; databases used as sources include Automated Commercial System (ACS), Automated Commercial Environment (ACE), Treasury Enforcement Communications System (TECS), Administrative and Financial Systems, the Automated Export System. See U.S. Customs, “Enterprise Data Warehouse: Where it stands, where it’s heading,” U.S. Customs Today, August 2000, available at http://www.cbp.gov/custoday/aug2000/dwartic4.htm. 59 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006, p. 21.
OCR for page 232
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment only; all changes occur in source systems propagated to it periodically (every 24 hours).60 I.10 LAW ENFORCEMENT ANALYTIC DATA SYSTEM (NETLEADS) Status: In use. This program facilitates ICE law-enforcement activities and intelligence analysis capabilities through the use of searches and pattern recognition based on multiple data sources.61 As with EDW, NETLEADS uses data marts. Link analysis is used to show relationships, such as associations with known criminals. Information analyzed includes criminal-alien information and terrorism, smuggling, and criminal-case information derived from federal and state government law-enforcement and intelligence agencies’ data sources and commercial sources.62 The technology includes timeline analysis, which allows comparisons of relationships at different times. Trend analysis across multiple cases can also be performed in the context of particular investigations and intelligence operations. I.11 ICE PATTERN ANALYSIS AND INFORMATION COLLECTION SYSTEM (ICEPIC) Status: Operating as pilot program as of July 2006; planned to enter full-scale operation in fiscal year 2008.63 Whereas the NETLEADS focus is on law enforcement, ICEPIC focuses on the goal of disrupting and preventing terrorism.64 Link analysis is performed to uncover nonobvious associations between individuals and organizations to generate counterterrorism leads. Data for analysis is drawn from DHS sources and from databases maintained by the Department of State, DOJ, and the Social Security Administration. ICEPIC uses technology from IBM called Non-obvious Relationships Awareness (NORA) to perform the analysis.65 ICEPIC, NETLEADS, and two other systems—the Data Analysis and Research for Trade Transparency System 60 Ibid. 61 Ibid., pp. 21-24. 62 Ibid., pp. 22-23. 63 Immigration and Customs Enforcement Fact Sheet, http://www.ice.gov/pi/news/fact-sheets/icepic.htm. 64 U.S. Department of Homeland Security Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, DHS, Washington, D.C., August 2006, p. 11. 65 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006, pp. 24-26.
OCR for page 233
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment (DARTTS) and the Crew Vetting System (CVS)—all use association, the process of discovering two or more variable that are related, as part of the analysis.66 I.12 INTELLIGENCE AND INFORMATION FUSION (I2F) Status: In development. Using commercial off-the-shelf systems, this program uses tools for searching, link analysis, entity resolution, geospatial analysis, and temporal analysis to provide intelligence analysts with an ability to view, query, and analyze information from multiple data sources.67 The program is focused on aiding in discovery and tracking of terrorism threats to people and infrastructure. With three other DHS programs—Numerical Integrated Processing System (NIPS), Questioned Identification Documents (QID), and Tactical Information Sharing System (TISS)—I2F uses collaboration processes that support application of cross-organizational expertise and visualization processes that aid in presentation of analysis results.68 Data may be drawn from both government and commercial sources. I.13 FRAUD DETECTION AND NATIONAL SECURITY DATA SYSTEM (FDNS-DS) Status: In use but without analytical tools to support data mining; support for data mining capabilities not expected for at least 2 years. This program (formerly the Fraud Tracking System) is used to track immigration-related fraud, public-safety referrals to ICE, and national-security concerns discovered during background checks.69 In its present form, FDNS-DS is a case-management system with no analytical or data mining tools. It is planned to add those capabilities to allow identification of fraudulent schemes. 66 U.S. Department of Homeland Security Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, DHS, Washington, D.C., August 2006, pp. 9-11. 67 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006, p. 26. 68 U.S. Department of Homeland Security Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, DHS, Washington, D.C., August 2006, p. 13. 69 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006, p. 27.
OCR for page 234
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment I.14 NATIONAL IMMIGRATION INFORMATION SHARING OFFICE (NIISO) Status: In use without data mining tools; pilot project that includes data mining capabilities being planned. This program is responsible for fulfilling requests for immigration-related information from other DHS components and law-enforcement and intelligence agencies.70 The program does not include any data mining tools and techniques, relying instead on manual searches based on specific requests to supply information to authorized requesting agencies. Plans to add such analytical capabilities are being developed. Data for analysis would include data collected by immigration services, publicly available information, and data from commercial aggregators.71 I.15 FINANCIAL CRIMES ENFORCEMENT NETWORK (FinCEN) AND BSA DIRECT Status: FinCEN in use; BSA Direct withdrawn. FinCEN applies data mining and analysis technology to data from a number of sources related to financial transactions to identify cases of money-laundering and other financial elements of criminal and terrorist activity. The goal of FinCEN is to promote information-sharing among law-enforcement, regulatory, and financial institutions.72 FinCEN is responsible for administering the Bank Secrecy Act (BSA). As part of that responsibility, it uses data mining technology to analyze data collected on the basis of requirements of BSA and to identify suspicious activity tied to terrorists and organized crime. In 2004, FinCEN began a program called BSA Direct intended to provide law-enforcement agencies with access to BSA data and to data mining capabilities similar to those available to FinCEN.73 BSA Direct was permanently halted in July 2006 after cost overruns and technical implementation and deployment difficulties.74 70 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006, p. 28. 71 Ibid. 72 See the FinCEN Web site at http://www.fincen.gov/af_faqs.html for further details on its mission. 73 Statement of Robert W. Werner before the House Committee on Government Reform Subcommittee on Criminal Justice, Drug Policy, and Human Resources, May 11, 2004, p. 3, available at http://www.fincen.gov/wernertestimonyfinal051104.pdf. 74 FinCEN, “FinCEN Halts BSA Direct Retrieval and Sharing Project,” July 13, 2006, available at http://www.fincen.gov/bsa_direct_nr.html.
OCR for page 235
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment I.16 DEPARTMENT OF JUSTICE PROGRAMS INVOLVING PATTERN-BASED DATA MINING Status: All programs under development or in use. Responding to requirements of the USA PATRIOT Improvement and Reauthorization Act of 2005,75 DOJ submitted a report to the Senate Committee on the Judiciary that identified seven programs that constitute pattern-based data mining as defined in the act.76 The report carefully scoped what was considered pattern-based data mining on the basis of the definition of the act to determine which programs it was required to report on.77 For each program identified, the report provides a description, plans for use, efficacy, potential privacy and civil-liberties impact, legal and regulatory foundation, and privacy- and accuracy-protection policies.78 The report notes that the scope of the programs and the detail provided vary widely. The following is a summary of the programs drawn from the DOJ report.79 System-to-Assess-Risk (STAR) Initiative. Focused on extending the capabilities of the Foreign Terrorist Tracking Task Force (FTTTF), this program is a risk-assessment software system that is meant to help analysts to set priorities among persons of possible investigative interest. Data used by STAR are drawn from the FTTTF data mart, an existing data repository “containing data from U.S. Government and proprietary sources (e.g., travel data from the Airlines Reporting Corporation) as well as access to publicly available data from commercial data sources (such as ChoicePoint).”80 STAR is under development. Identity Theft Intelligence Initiative. This program extracts data from the Federal Trade Commission’s Identity Theft Clearinghouse and compares them with FBI data from case complaints of identity theft and with suspicious financial transactions filed with FinCEN. Further comparisons are made with data from private data aggregators, such as LexisNexis, Accurint, and Autotrack. On the basis of the results of the analysis, FBI creates a knowledge base to evaluate identity-theft types, identify 75 U.S. Pub. L. No. 109-177, Sec. 126. 76 U.S. Department of Justice, “Report on ‘Data-mining’ Activities Pursuant to Section 126 of the USA PATRIOT Improvement and Reauthorization Act of 2005,” July 9, 2007, available at http://www.epic.org/privacy/fusion/doj-dataming.pdf. 77 Ibid., pp. 1-6. 78 The report includes a review of only six of the seven initiatives identified, saying that a supplemental report on the seventh initiative will be provided at a later date. 79 Ibid., pp. 7-30. 80 Ibid., p. 8. ChoicePoint is a private data aggregator; see http://www.choicepoint.com/index.html.
OCR for page 236
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment identity-theft rings through subject relationships, and send leads to field offices. The program has been operational since 2003. Health Care Fraud Initiative. This program is used by FBI analysts to research and investigate health-care providers. The program draws data from Medicare “summary billing records extracted from the Centers for Medicare and Medicaid Services (CMS), supported by the CMS Fraud Investigative Database, Searchpoint [the Drug Enforcement Administration’s pharmaceutical-claims database], and the National Health Care Anti-Fraud Association Special Investigative Resource and Intelligence System (private insurance data).”81 The program has been in use since 2003. Internet Pharmacy Fraud Initiative. This program’s aim is to search consumer complaints (made to the Food and Drug Administration and Internet Fraud Complaint Center) involving alleged fraud by Internet pharmacies to develop common threads indicative of fraud by such pharmacies. Data on Internet pharmacies available from open-source aggregators are also incorporated into the analysis. The program began in December 2005 and is operational. Housing Fraud Initiative. This program run by the FBI uses public-source data containing buyer, seller, lender, and broker identities and property addresses purchased from ChoicePoint to uncover fraudulent housing purchases. All analysis is done by FBI analysts manually (that is, not aided by computer programs) to identify connections between individuals and potentially fraudulent real-estate transactions. The program first became operational in 1999 and continues to be extended by ChoicePoint as new real estate transaction information becomes available. Automobile Accident Insurance Fraud Initiative. This program run by FBI was designed to identify and analyze information regarding automobile-insurance fraud schemes. Data sources include formatted reports of potential fraudulent claims for insurance reimbursement as identified and prepared by the insurance industry’s National Insurance Crime Bureau, FBI case-reporting data, commercial data aggregators, and health-care insurance claims information from the Department of Health and Human Services (DHHS) and the chiropractic industry. The program is being run as a pilot program in use by only one FBI field office. No target date has been set for national deployment. In addition to the programs identified as meeting the definition of pattern-based data mining used by the DOJ report, several programs were identified as potentially meeting other definitions of data mining. That report does not provide details about the programs, but it includes brief 81 Ibid., p. 20.
OCR for page 237
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment sketches of them. The programs identified as “advanced analytical tools that do not meet the definition in Section 126” and included in the DOJ report are as follows: 82 Drug Enforcement Administration (DEA) initiatives: SearchPoint. DEA project that uses prescription data from insurance and cash transactions obtained commercially from ChoicePoint, included the prescribing official (practitioner), the dispensing agent (pharmacy, clinic, hospital, and so on), and the name and quantity of the controlled substance (drug information) to conduct queries about practitioners, pharmacies, and controlled substances to identify the volume and type of controlled substances being subscribed and dispensed. Automation of Reports of Consolidated Orders System (ARCOS). DEA uses data collected from manufacturers and distributors of controlled substances and stored in the ARCOS database to monitor the flow of the controlled substances from their point of manufacture through commercial distribution channels to point of sale or distribution at the dispensing or retail level (hospitals, retail pharmacies, practitioners, and teaching institutions). Drug Theft Loss (DTL) Database. This is similar to ARCOS, but the data source is all DEA controlled-substance registrants (including practitioners and pharmacies). Online Investigative Project (OIP). OIP enables DEA to scan the Internet in search of illegal Internet pharmacies. The tool searches for terms that might indicate illegal pharmacy activity. Bureau of Alcohol, Tobacco, Firearms, and Explosives initiatives: Bomb Arson Tracking System (BATS). BATS enables law-enforcement agencies to share information related to bomb and arson investigations and incidents. The source of information is the various law-enforcement agencies. Possible queries via BATS include similarities of components, targets, or methods. BATS can be used, for example, to make connections between multiple incidents with the same suspect. GangNet. This system is used to track gang members, gangs, and gang incidents in a granular fashion. It enables sharing of information among law-enforcement agencies. It can also be used to identify trends, relationships, patterns, and demographics of gangs. Federal Bureau of Investigation initiative: Durable Medical Equipment (DME) Initiative. DME is designed to help in setting investigative priorities on the basis of analysis of suspicious claims submitted by DME providers by contractors for CMS. Data 82 Ibid., pp. 31-35. Descriptions are drawn from the report.
OCR for page 238
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment sources include complaint reports from the CMS and DHHS Inspector General’s office and FBI databases. Other DOJ activities: Organized Crime and Drug Enforcement Task Force (OCDETF) Fusion Center. OCDETF maintains a data warehouse named Compass that contains relevant drug and related financial intelligence information from numerous law-enforcement organizations. As stated in the report, “the goal of the data warehouse is to use cross-case analysis tools to transform multi-agency information into actionable intelligence in order to support major investigations across the globe.”83 Investigative Data Warehouse (IDW). Managed by FBI, this warehouse enables investigators to perform efficient distributed searches of data sources across FBI. IDW provides analysts with the capability to examine relationships between people, places, communication devices, organizations, financial transactions, and case-related information. Internet Crime Complaint Center (IC3). A partnership between FBI and the National White Collar Crime Center (NW3C), IC3 is focused on cybercrime. It provides a reporting mechanism for suspected violations. Reports are entered into the IC3 database, which can then be queried to discover common characteristics of complaints. Computer Analysis and Response Team (CART) Family of Systems. This is a set of tools used to support computer forensics work. CART maintains a database of information collected from criminal investigations. Data can be searched for similarities among confiscated computer hard drives. Before publication of the report, many of the programs were either unknown publicly or had unclear scopes and purposes. Commenting on the DOJ report shortly after its delivery to the Senate Committee on the Judiciary, Senator Patrick Leahy commented that “this report raises more questions than it answers and demonstrates just how dramatically the Bush administration has expanded the use of this technology, often in secret, to collect and sift through Americans’ most sensitive personal information,” and said that the report provided “an important and all too rare ray of sunshine on the Department’s data mining activities and provides Congress with an opportunity to conduct meaningful oversight of this powerful technological tool.”84 83 Ibid., p. 34. 84 Comment of Senator Patrick Leahy, Chairman, Senate Judiciary Committee on Department of Justice’s Data Mining Report, July 10, 2007; see http://leahy.senate.gov/press/200707/071007c.html.