I
Illustrative Government Data Mining Programs and Activity

Several federal agencies have sought to use data mining to reduce the risk of terrorism, including the Department of Defense (DOD), the Department of Homeland Security (DHS), the Department of Justice (DOJ), and the National Security Agency (NSA). Some of the data mining programs have been withdrawn; some are in operation; some have changed substantially in scope, purpose, and practice since they were launched; and others are still in development. This appendix briefly describes a number of the programs, their stated goals, and their current status (as far as is known publicly).1

The programs described vary widely in scope, purpose, and sophistication. Some are research efforts focused on the fundamental science of data mining; others are intended as efforts to create general toolsets and developer toolkits that could be tailored to meet various requirements. Most of the programs constitute specific deployments of one or more forms of data mining technology intended to achieve particular

1

A 2004 U.S. Government Accountability Office (GAO) report provided a comprehensive survey of data mining systems and activities in federal agencies up to that time. See GAO, Data Mining: Federal Efforts Cover a Wide Range of Uses, GAO-04-548, GAO, Washington, D.C., May 2004. Other primary resources: J.W. Seifert, Data Mining and Homeland Security: An Overview, RL31798, Congressional Research Service, Washington, D.C., updated June 5, 2007; U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006; DHS Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, DHS, Washington, D.C., August 2006.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 218
I Illustrative Government Data Mining Programs and Activity Several federal agencies have sought to use data mining to reduce the risk of terrorism, including the Department of Defense (DOD), the Depart- ment of Homeland Security (DHS), the Department of Justice (DOJ), and the National Security Agency (NSA). Some of the data mining programs have been withdrawn; some are in operation; some have changed sub- stantially in scope, purpose, and practice since they were launched; and others are still in development. This appendix briefly describes a number of the programs, their stated goals, and their current status (as far as is known publicly).1 The programs described vary widely in scope, purpose, and sophis- tication. Some are research efforts focused on the fundamental science of data mining; others are intended as efforts to create general toolsets and developer toolkits that could be tailored to meet various require- ments. Most of the programs constitute specific deployments of one or more forms of data mining technology intended to achieve particular 1A 2004 U.S. Government Accountability Office (GAO) report provided a comprehensive survey of data mining systems and activities in federal agencies up to that time. See GAO, Data Mining: Federal Efforts Coer a Wide Range of Uses, GAO-04-548, GAO, Washington, D.C., May 2004. Other primary resources: J.W. Seifert, Data Mining and Homeland Security: An Oeriew, RL31798, Congressional Research Service, Washington, D.C., updated June 5, 2007; U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Of- fice Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006; DHS Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, DHS, Washington, D.C., August 2006. 

OCR for page 218
 APPENDIX I operational goals. The programs vary widely in sophistication of the technologies used to achieve operational goals; they also vary widely in the sources of data used (such as government data, proprietary informa- tion from industry groups, and data from private data aggregators) and in the forms of the data (such as structured and unstructured). The array of subject matter of the projects is broad: they cover law enforcement, terrorism prevention and pre-emption, immigration, customs and border control, financial transactions, and international trade. Indeed, the com- bination of the variety of applications and the variety of definitions of what constitutes data mining make any overall assessment of data mining programs difficult. The scientific basis of many of these programs is uncertain or at least not publicly known. For example, it is not clear whether any of the pro- grams have been subject to independent expert review of performance. This appendix is intended to be primarily descriptive, and the mention of a given program should not be taken as an endorsement of its underlying scientific basis. I.1 TOTAL/TERRORISM INFORMATION AWARENESS (TIA) Status: Withdrawn as such, but see Appendix J for a description. I.2 COMPUTER-ASSISTED PASSENGER PRESCREENING SYSTEM II (CAPPS II) AND SECURE FLIGHT Status: CAPPS II abandoned; Secure Flight planned for deployment in 00. In creating the Transportation Security Administration (TSA), Con- gress directed that it implement a program to match airline passengers against a terrorist watch list. CAPPS II was intended to fulfill that direc- tive. It was defined as a prescreening system whose purpose was to enable TSA to assess and authenticate travelers’ identities and perform a risk assessment to detect persons who may pose a terrorist-related threat. However, it went beyond the narrow directive of checking passen- ger information against a terrorist watch list and included, for instance, assessment of criminal threats. According to the DHS fact sheet on the program, CAPPS II was to be an integral part of its layered approach to security, ensuring that travelers who are known or potential threats to aviation are stopped before they or their baggage board an aircraft. 2 It 2 U.S. Department of Homeland Security, “Fact Sheet: CAPPS II at a Glance,” February 13, 2004, available at http://www.dhs.gov/xnews/releases/press_release_0347.shtm.

OCR for page 218
0 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS was meant to be a rule-based system that used information provided by the passenger (name, address, telephone number, and date of birth) when purchasing an airline ticket to determine whether the passenger required additional screening or should be prohibited from boarding. CAPPS II would have examined both commercial and government databases to assess the risk posed by passengers. In an effort to address privacy and security concerns surrounding the program, DHS issued a press release about what it called myths and facts about CAPPS II.3 For instance, it stated that retention of data collected would be limited—that all data collected and created would be destroyed shortly after the com- pletion of a traveler’s itinerary. It also said that no data mining techniques would be used to profile and track citizens, although assessment would have extended beyond checking against lists and would have included examining a wide array of databases. A study by GAO in 2004 found that TSA was sufficiently addressing only one of eight key issues related to implementing CAPPS II.4 The study found that accuracy of data, stress testing, abuse prevention, prevention of unauthorized access, policies for operation and use, privacy concerns, and a redress process were not fully addressed by CAPPS II. Despite efforts to allay concerns, CAPPS II was abandoned in 2004. It was replaced in August 2004 with a new program called Secure Flight. Secure Flight is designed to fulfill the Congressional directive while attempting to address a number of concerns raised by CAPPS II. For instance, unlike CAPPS II, Secure Flight makes TSA responsible for cross- checking passenger flight information with classified terrorist lists rather than allowing such checking to be done by contracted vendors. Although the possibility of using commercial databases to check for threats is still included, the use of commercial data is now precluded.5 Other differ- ences between CAPPS II and Secure Flight include limiting screening to checking for terrorism threats, not criminal offenses (although this was initially included), and using only historical data during testing phases. TSA states that the mission of Secure Flight is “to enhance the security of domestic commercial air travel within the United States through the 3 U.S. Department of Homeland Security, “CAPPS II: Myths and facts,” February 13, 2004, available at http://www.dhs.gov/xnews/releases/press_release_0348.shtm. 4 U.S. Government Accountability Office (GAO), Aiation Security: Computer-Assisted Pas- senger Prescreening System Faces Significant Implementation Challenges , GAO-04-385, GAO, Washington, D.C., February 2004. 5 U.S. Transportation Security Administration, “Secure Flight: Privacy Protection,” avail- able at http://www.tsa.gov/what_we_do/layers/secureflight/secureflight_privacy.shtm.

OCR for page 218
 APPENDIX I use of improved watch list matching.”6 According to TSA, when imple- mented, Secure Flight would: • Decrease the chance of compromising watch-list data by central- izing use of comprehensive watch lists. • Provide earlier identification of potential threats, allowing for expedited notification of law-enforcement and threat-management personnel. • Provide a fair, equitable, and consistent matching process among all aircraft operators. • Offer consistent application of an expedited and integrated redress process for passengers misidentified as posing a threat. However, Secure Flight has continued to raise concerns about pri- vacy, abuse, and security. A 2006 GAO study of the program found that although TSA had made some progress in managing risks associated with developing and deploying Secure Flight, substantial challenges remained.7 After publication of the study report, TSA announced that it would reassess the program and make changes to address concerns raised in the report. The 2006 DHS Privacy Office report on data mining did not include an assessment of Secure Flight; it stated that searches or matches are done with a known name or subject and thus did not meet the definition of data mining used in the report.8 In a prepared statement before the Senate Committee on Commerce, Science, and Transportation in January 2007, the TSA administrator noted progress in addressing those concerns and the intention to make the program operational by some time in 2008.9 Most recently, DHS Secretary Michael Chertoff announced that Secure Flight would no longer include data mining and would restrict information collected about passengers to full name and, optionally, date of birth and sex. Chertoff stated that Secure Flight will not collect com- mercial data, assign risk scores, or attempt to predict behavior, as was 6 U.S. Transportation Security Administration, “Secure Flight: Layers of Security,” available at http://www.tsa.gov/what_we_do/layers/secureflight/index.shtm. 7 U.S. Government Accountability Office (GAO), Aiation Security: Significant Management Challenges May Adersely Affect Implementation of the Transportation Security Administration’s Secure Flight Program, GAO-06-374T, GAO, Washington, D.C., February 9, 2006. 8 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Of- fice Response to House Report 108-774,” July 6, 2006; DHS Office of Inspector General, Surey of DHS Data Mining Actiities, OIG-06-56, DHS, Washington, D.C., August 2006, p. 20, footnote 25. 9 Prepared Statement of Kip Hawley, Assistant Secretary of the Transportation Security Ad- ministration Before the U.S. Senate Committee on Commerce, Science and Transportation, January 17, 2007, available at http://www.tsa.gov/press/speeches/air_cargo_testimony. shtm.

OCR for page 218
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS envisioned in earlier versions of the program.10 The information provided will be compared with a terrorist watch list. I.3 MULTISTATE ANTI-TERRORISM INFORMATION EXCHANGE (MATRIX) Status: Pilot program ended; no follow-on program started. This program was an effort to support information-sharing and col- laboration among law-enforcement agencies.11 It was run as a pilot project administered by the Institute for Intergovernmental Research for DHS and DOJ.12 MATRIX involved collaborative information-sharing between pub- lic, private, and nonprofit institutions. A Congressional Research Service (CRS) report described MATRIX as a project that “leverages advanced computer/information management capabilities to more quickly access, share, and analyze public records to help law enforcement generate leads, expedite investigations, and possibly prevent terrorist attacks.”13 The MATRIX system was developed and operated by a private Florida-based company, and the Florida Department of Law Enforcement controlled access to the program and was responsible for the security of the data.14 Although “terrorism” is part of the program name, the primary focus appears to have been on law enforcement and criminal investigation. Until the system was redesigned, participating states were required to transfer state-owned data to a private company.15 The core function of the system was the Factual Analysis Criminal Threat Solution (FACTS) appli- cation used to query disparate data sources by using available investiga- tive information, such as a portion of a vehicle license number, to combine records dynamically to identify people of potential interest. According 10 Michael J. Sniffen, “Feds off simpler flight screening plan,” Associated Press, August 9, 2007. 11U.S. Department of Homeland Security (DHS), “MATRIX Report: DHS Privacy Office Report to the Public Concerning the Multistate Anti-Terrorism Information Exchange,” DHS, Washington, D.C., December 2006, p. 1. 12The Institute for Intergovernmental Research (IIR) is a Florida-based nonprofit research and training organization specializing in law enforcement, juvenile justice, criminal justice, and homeland security. See http://www.iir.com/default.htm. 13W.J. Krouse, The Multi-State Anti-Terrorism Information Exchange (MATRIX) Pilot Project , RL32536, U.S. Congressional Research Service (CRS), Washington, D.C., August 18, 2004, p. 1, italics original. Note that the official Web site for MATRIX program cited in this CRS report is no longer available. 14Ibid., p. 2. The company, Seisint, was acquired by Reed Elsevier subsidiary LexisNexis in July 2004. 15J. Rood, “Controversial data-mining project finds ways around privacy laws,” CQ Home- land Security—Intelligence, July 23, 2004, p. 1.

OCR for page 218
 APPENDIX I to the CRS report, FACTS included crime-mapping, association-chart- ing, lineup and photograph montage applications, and extensive query capabilities.16 Data sources available in the system included both those traditionally available to law enforcement—such as criminal history, corrections-department information, driver’s license, and motor-vehicle data—and nontraditional ones, such as:17 • Pilot licenses issued by the Federal Aviation Administration, • Aircraft ownership, • Property ownership, • U.S. Coast Guard vessel registrations, • State sexual-offender lists, • Corporate filings, • Uniform Commercial Code filings or business liens, • Bankruptcy filings, and • State-issued professional licenses. Concerns were raised that the data would be combined with private data, such as credit history, airline reservations, and telephone logs; but the MATRIX Web site stated those would not be included. The system initially included a scoring system called High Terrorist Factor (HTF) that identified people who might be considered high-risk, although it was later claimed that the HTF element of the system had been eliminated. 18 The pilot program ended in April 2005. Legal, privacy, security, and technical concerns about requirements to transfer state-owned data to MATRIX administrators and continuing costs associated with using the system prompted several states that initially participated or planned to participate in MATRIX to withdraw.19 By March 2004, 11 of the 16 states that originally expressed interest in participating had withdrawn from the program. In an attempt to address some of the concerns, the architecture was changed to allow distributed access to data in such a way that no 16 Krouse, op. cit., p. 4. 17 Data sources are identified in the Congressional Research Service report (Krouse, op. cit., p. 6) as referenced from the official MATRIX program Web site, http://www.matrix-at.org, which is no longer available. 18 B. Bergstein, “Database firm gave feds terror suspects: ‘Matrix’ developer turned over 120,000 names,” Associated Press, May 20, 2004, available at http://www.msnbc.msn. com/id/5020795/. 19 See, for instance, Georgia Department of Motor Vehicle Safety, “Department of Mo- tor Vehicle Safety’s Participation in MATIX,” September 29, 2003; New York State Police, Letter to Chairman of MATRIX, March 9, 2004; Texas Department of Public Safety, Letter to Chair, Project MATRIX, May 21, 2003. Those documents and additional information on state involvement are available from the American Civil Liberties Union (ACLU) Web site at http://www.aclu.org/privacy/spying/15701res20050308.html.

OCR for page 218
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS data transfers from state-controlled systems would be required to share data.20 The CRS report concluded that “it remains uncertain whether the MATRIX pilot project is currently designed to assess and address privacy and civil liberty concerns.”21 For instance, there appears to have been no comprehensive plan to put safeguards and policies in place to avoid potential abuses of the system, such as monitoring of activities of social activists or undermining of political activities.22 I.4 ABLE DANGER Status: Terminated in January 00. This classified program established in October 1999 by the U.S. Spe- cial Operations Command and ended by 2001 called for the use of data mining tools to gather information on terrorists from government data- bases and from open sources, such as the World Wide Web.23 The program used link analysis to identify underlying connections between people.24 Analysis would then be used to create operational plans designed to disrupt, capture, and destroy terrorist cells. Link analysis is a form of network analysis that uses graph theory to identify patterns and measure the nature of a network. The related social-network analysis is now con- sidered a critical tool in sociology, organizational studies, and information sciences. Cohesion, betweenness, centrality, clustering coefficient, density, and path length are some of the measures used in network analysis to model and quantify connections. The combination of complex mathemat- ics and the enormous volumes of data required to gain an accurate and complete picture of a network make the use of information technology critical if useful analysis is to be performed on a large scale. Several net- work-mapping software packages are available commercially. Applica- tions include fraud detection, relevance ratings in Internet search engines, and epidemiology. 20 Rood, op. cit., p. 1. 21 Krouse, op. cit., p. 10. 22 Ibid., p. 8. 23 U.S. Department of Defense (DOD) Office of the Inspector General, “Report of Investiga- tion: Alleged Misconduct by Senior DOD Officials Concerning the Able Danger Program and Lieutenant Colonel Anthony A. Shaffer, U.S. Army Reserve,” Case Number H05L97905217, DOD, Washington, D.C., September 18, 2006. 24 “Link analysis” was an informal term used to describe the analysis of connections between individuals rather than any kind of formal “record linkage” between database records.

OCR for page 218
 APPENDIX I Able Danger was focused specifically on mapping and analyzing relationships within and with Al Qaeda. The program became public in 2005 after claims made by Rep. Curt Weldon that Able Danger had identified 9/11 hijacker Mohammad Atta before the attack surfaced in the mass media. A member of Able Danger, Anthony Shaffer, later identi- fied himself as the source of Weldon’s information.25 He further claimed that intelligence discovered as part of Able Danger was not passed on to Federal Bureau of Investigation (FBI) and other civilian officials. Shaffer said a key element of Able Danger was the purchase of data from infor- mation brokers that identified visits by individuals to specific mosques. 26 That information was combined with other data to identify patterns and potential relationships among alleged terrorists. Claims made by Shaf- fer were refuted in a report written by the DOD inspector general.27 The report showed examples of the types of charts produced by link analysis.28 It characterized Able Danger operations as initially an effort to gain familiarity with state-of-the-art analytical tools and capabilities and eventually to apply link analysis to a collection of data from other agen- cies and from public Web sites to understand Al Qaeda infrastructure and develop a strategy for attacking it.29 The program was then terminated, having achieved its goal of developing a (still-classified) “campaign plan” that “formed the basis for follow-on intelligence gathering efforts.” 30 An investigation by the Senate Select Committee on Intelligence concluded that Able Danger had not identified any of the 9/11 hijackers before Sep- tember 11, 2001.31 No follow-on intelligence effort using link-analysis techniques devel- oped by Able Danger has been publicly acknowledged. However, the existence of a program known as Able Providence supported through the Office of Naval Intelligence, which would reconstitute and improve 25 Cable News Network, “Officer: 9/11 panel didn’t receive key information,” August 17, 2005, available at http://www.cnn.com/2005/POLITICS/08/17/sept.11.hijackers. 26 J. Goodwin, “Inside Able Danger—The secret birth, extraordinary life and untimely death of a U.S. military intelligence program,” Government Security News, September 5, 2005, available at http://www.gsnmagazine.com/cms/lib/410.pdf. 27U.S. Department of Defense Office of the Inspector General, “Report of Investigation: Alleged Misconduct by Senior DOD Officials Concerning the Able Danger Program and Lieutenant Colonel Anthony A. Shaffer, U.S. Army Reserve,” Case Number H05L97905217, September 18, 2006. 28 Ibid., pp. 8-9. 29 Ibid., p. 14. 30 Ibid. 31 G. Miller, “Alarming 9/11 claim is baseless, panel says,” Los Angeles Times, December 24, 2006.

OCR for page 218
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS on Able Danger, was reported by Weldon in testimony to the U.S. Senate Committee on the Judiciary as part of its investigation.32 I.5 ANALYSIS, DISSEMINATION, VISUALIZATION, INSIGHT, AND SEMANTIC ENHANCEMENT (ADVISE) Status: Under deelopment (some deployments decommissioned). This program, being developed by DHS, was intended to help to detect potentially threatening activities by using link analysis of large amounts of data and producing graphic visualizations of identified link- age patterns. It was one of the most ambitious data mining efforts being pursued by DHS. ADVISE was conceived as a data mining toolset and development kit on which applications could be built for deployment to address specific needs. An assessment of ADVISE was not included in the 2006 DHS Privacy Office report on data mining, because it was considered a tool or technology and not a specific implementation of data mining. 33 That position was noted in a GAO report on the program that questioned the decision not to include a privacy assessment of the program, given that “the tool’s intended uses include applications involving personal information, and the E-Government Act, as well as related Office of Man- agement and Budget and DHS guidance, emphasize the need to assess privacy risks early in systems development.”34 The GAO report identified the program’s intended benefit as help- ing to “detect activities that threaten the United States by facilitating the analysis of large amounts of data that otherwise would be very dif- ficult to review,” noting that the tools developed as part of ADVISE are intended to accommodate both structured and unstructured data.35 The report concluded that ADVISE raised a number of privacy concerns and that although DHS had added security controls related to ADVISE, it had failed to assess privacy risks, including erroneous associations of people, misidentification of people, and repurposing of data collected for other 32 Representative Curt Weldon in testimony to the United States Senate Committee on the Judiciary, September 21, 2005, available at http://judiciary.senate.gov/testimony. cfm?id=1606&wit_id=4667. See also P. Wait, “Data-mining offensive in the works,” Goern- ment Computer News, October 10, 2005. 33 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006; DHS Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, DHS, Washington, D.C., August 2006, p. 20, footnote 25. 34 U.S. Government Accountability Office (GAO), Data Mining: Early Attention to Priacy in Deeloping a Key DHS Program Could Reduce Risks, GAO-07-293, GAO, Washington, D.C., February 2007, p. 3. 35 Ibid., p. 3.

OCR for page 218
 APPENDIX I purposes.36 It called on DHS “to conduct a privacy impact assessment of the ADVISE tool and implement privacy controls as needed to mitigate any identified risks.”37 DHS responded to the GAO report, saying that it was in the process of developing a privacy impact assessment tailored to the unique character of the ADVISE program (as a tool kit). A later DHS Privacy Office report did review the ADVISE program and drew a careful distinction between ADVISE as a technology framework and ADVISE deployments.38 The report first reviewed the technology framework in light of privacy com- pliance requirements of the DHS Privacy Office described in the report.39 In light of those requirements, it then assessed six planned deployments of ADVISE:40 • Interagency Center for Applied Homeland Security Technology (ICAHST). ICAHST evaluates promising homeland-security technologies for DHS and other government stakeholders in the homeland-security technology community. • All-Weapons of Mass Effect (All-WME). Originally begun by the Department of Energy, All-WME used classified message traffic collected by the national laboratories’ field intelligence elements to analyze infor- mation related to foreign groups and organizations involved in WME material flows and illicit trafficking. Deployment has been discontinued. • Biodefense Knowledge Management System. This was a series of three deployment initiatives planned by the Biodefense Knowledge Center with the overall goal of identifying better methods for assisting DHS analysts in identifying and characterizing biological threats posed by ter- rorists. All the deployments have ended, and there are no plans for future deployments. • Remote Threat Alerting System (RTAS). RTAS sought to determine whether the ADVISE technology framework could assist DHS Customs and Border Protection (CBP) in identifying anomalous shipments on the basis of cargo type and originating country. All RTAS activities ended in September 2006. • Immigration and Customs Enforcement Demonstration (ICE Demo). This deployment was operated by the DHS Science and Technology Director- 36 Ibid., p. 18. 37 Ibid., from “Highlights: What GAO Recommends.” See also p. 23. 38 U.S. Department of Homeland Security, “DHS Privacy Office Review of the Analy- sis, Dissemination, Visualization, Insight and Semantic Enhancement (ADVISE) Program,” DHS, Washington, D.C., July 11, 2007. Page 2 discusses and defines these terms. 39 Ibid., pp. 3-5. 40 Ibid. Definitions and descriptions of these programs are drawn from the report begin- ning on p. 7.

OCR for page 218
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS ate and Lawrence Livermore National Laboratory to determine whether the ADVISE technology framework could assist DHS Immigration and Customs Enforcement (ICE) in using existing ICE data better. All activity related to this deployment of ADVISE has ended. • Threat Vulnerability Integration System (TVIS). TVIS used a series of data sets to identify opportunities to test the capability of the ADVISE technology framework to help analysts in the DHS Office of Intelligence and Analysis. Early pilot deployment phases have been followed by sub- sequent pilot deployment phases. The report found that some of the deployments did use personally identifiable information without conducting privacy impact assessments.41 It also recommended short- and long-term actions to address the prob- lems. In particular, it recommended actions that would integrate privacy compliance requirements into project development processes, echoing recommendations made in the GAO report on the program.42 DHS ended the program in September 2007, citing the availability of commercial products to provide similar functions at much lower cost.43 I.6 AUTOMATED TARGETING SYSTEM (ATS) Status: In use. This program is used by CBP, part of DHS, to screen cargo and travel- ers entering and leaving the United States by foot, car, airplane, ship, and rail. ATS aassess risks by using data mining and data-analysis techniques. The risk assessment and links to information on which the assessment is based are stored in the ATS for up to 40 years.44 The assessment is based on combining and analyzing data from several existing sources of infor- mation—including the Automated Commercial System, the Automated Commercial Environment System, the Advance Passenger Information System, and the Treasury Enforcement Communications System—and from people crossing the U.S. land border known (the Passenger Name Record). ATS compares a traveler’s name with a list of known and suspected 41 Ibid., p. 1. 42 U.S. Government Accountability Office (GAO), Data Mining: Early Attention to Priacy in Deeloping a Key DHS Program Could Reduce Risks, GAO-07-293, GAO, Washington, D.C., February 2007. See especially p. 24. 43 M.J. Sniffen, “DHS Ends Criticized Data-Mining Program,” Washington Post, September 5, 2007. 44 U.S. Department of Homeland Security, “Notice of Privacy Act system of records,” Fed- eral Register 71(212): 64543-64546, November 2, 2006.

OCR for page 218
 APPENDIX I terrorists. It also performs link analysis, checking, for example, the tele - phone number associated with an airline reservation against telephone numbers used by known terrorists.45 Such checking has been credited by DHS with preventing entry of suspected terrorists and with identifying criminal activity,46 but concerns about high numbers of false alarms, effi- cacy of the risk assessment, lack of a remediation process, and ability of the agency to protect and secure collected data properly have been raised by some in the technical community and by civil-liberties groups. 47 I.7 THE ELECTRONIC SURVEILLANCE PROGRAM Status: Continuing subject to oersight by the Foreign Intelligence Sureil- lance Court. This program, also called the Terrorist Surveillance Program, involves the collection and analysis of domestic telephone-call information with the goal of targeting the communications of Al Qaeda and related terror- ist groups and affiliated individuals. Details about the program remain secret, but as part of the program the president authorized NSA to eaves- drop on communications of people in the United States without obtaining a warrant when there is “reasonable basis to conclude that one party to the communication is a member of Al Qaeda.”48 Existence of the program first surfaced in a New York Times article published in December 2005.49 Questions as to the legality of the program led a federal judge to declare the program unconstitutional and illegal and to order that it be suspended. That ruling was overturned on appeal on narrow grounds regarding the standing of the litigants rather than the legality of the program.50 In a letter to the Senate Committee on the 45 Remarks of Stewart Baker, Assistant Secretary for Policy, Department of Homeland Security at the Center for Strategic and International Studies, Washington, D.C., December 19, 2006. 46 Ibid. Baker noted the use of ATS to identify a child-smuggling ring. CBP officers who examined ATS data noticed that a woman with children had not taken them with her on the outbound flight; this led to further investigation. 47 See, for instance, B. Schneier, “On my mind: They’re watching,” Forbes, January 8, 2007; Electronic Privacy Information Center, Automated Targeting System, http://www.epic. org/privacy/travel/ats/default.html. 48 Press briefing by Attorney General Alberto Gonzales and General Michael Hayden, Principal Deputy Director for National Intelligence, December 19, 2005, available at http:// www.whitehouse.gov/news/releases/2005/12/20051219-1.html. 49 J. Risen and E. Lichtblau, “Bush lets U.S. spy on callers without courts,” New York Times, December 16, 2005. 50A. Goldstein, “Lawsuit against wiretaps rejected,” The Washington Post, July 7, 2007, p. A1.

OCR for page 218
0 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS Judiciary on January 17, 2007, Attorney General Alberto Gonzales stated that the program would not be reauthorized by the president although the surveillance program would continue subject to oversight by the Foreign Intelligence Surveillance Court (FISC). Although the legality of the program has been the primary focus of the press, it is unclear to what extent data mining technology is used as part of the program. Some press reports suggest that such technology is used extensively to collect and analyze data from sources that include telephone and Internet communication, going well beyond keyword searches to use link analysis to uncover hidden relationships among data points.51 The adequacy of the FISC to address technology advances, such as data mining and traffic-analysis techniques, has also been called into question.52 As this report is being written (June 2008), changes in the Foreign Intelligence Surveillance Act are being contemplated by Congress. The final disposition of the changes is not yet known. I.8 NOVEL INTELLIGENCE FROM MASSIVE DATA (NIMD) PROGRAM Status: In progress. NIMD is a research and development program funded by the Dis- ruptive Technology Office,53 which is part of the Office of the Director of National Intelligence. The program, which has many similarities to the Total/Terrorism Information Awareness program, is focused on the devel- opment of data mining and analysis tools to be used in working with mas- sive data. According to a “Call for 2005 Challenge Workshop Proposals,” “NIMD aims to preempt strategic surprise by addressing root causes of analytic errors related to bias, assumptions, and premature attachment to a single hypothesis.”54 Two key challenges are identified: data triage to support decision-making and real-time analysis of petabytes of data and practical knowledge representation to improve machine processing and 51 E. Lichtblau and J. Risen, “Spy agency mined vast data trove, official report,” New York Times, December 23, 2005; S. Harris, “NSA spy program hinges on state-of-the-art technol- ogy,” National Journal, January 20, 2006. 52 See, for instance, K.A. Taipale, “Whispering wires and warrantless wiretaps: Data mining and foreign intelligence surveillance,” NYU Reiew of Law and Security, Issue 7, Supplemental Bulletin on Law and Security, Spring 2006. 53 The Disruptive Technology Office was previously known as the Advanced Research and Development Activity (ARDA). 54Advanced Research Development Activity, “Call for 2005 Challenge Workshop Propos- als,” available at http://nrrc.mitre.org/arda_explorprog2005_cfp.pdf.

OCR for page 218
 APPENDIX I data-sharing among disparate agencies and technologies. The challenge identifies five focus areas for NIMD research with the overarching goal of building “smart software assistants and devil’s advocates that help analysts deal with information overload, detect early indicators of strate- gic surprise, and avoid analytic errors”: “modeling analysts and analytic processes, capturing and reusing prior and tacit knowledge, generating and managing hypotheses, organizing/structuring massive data (mostly unstructured text), and human interaction with information.” Advocacy groups and some members of Congress have expressed concerns that at least some of the research done as part of the TIA pro- gram has continued under NIMD.55 In contrast with TIA, Congress stipu- lated that technologies developed under the program are to be used only for military or foreign intelligence purposes against non-U.S. citizens. I.9 ENTERPRISE DATA WAREHOUSE (EDW) Status: Operational since 000 and in use. This system collects data from CBP transactional systems and subdi- vides them into data sets for analysis.56 The data sets are referred to as data marts. Their creation is predicated on the need for a specific grouping and configuration of selected data.57 EDW acquires and combines data from several customs and other federal databases to perform statistical and trend analysis to look for patterns, for instance, to determine the impact of an enforcement action or rule change.58 EDW uses commercial off-the-shelf technology for its analysis.59 EDW data are treated as read- 55 “U.S. still minding terror data,” Associated Press, Washington, D.C., February 23, 2004; M. Williams, “The Total Information Awareness Project lives on,” Technology Reiew, April 26, 2006. 56 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” July 6, 2006, pp. 20-21. 57An explanation of the distinction between a data warehouse and a data mart is provided as a footnote in DHS Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, August 2006, p. 11. 58 See U.S. Customs and Border Protection, “U.S. Customs data warehousing,” available at http://www.cbp.gov/xp/cgov/trade/automated/automated_systems/data_warehousing. xml; databases used as sources include Automated Commercial System (ACS), Automated Commercial Environment (ACE), Treasury Enforcement Communications System (TECS), Administrative and Financial Systems, the Automated Export System. See U.S. Customs, “Enterprise Data Warehouse: Where it stands, where it’s heading,” U.S. Customs Today, August 2000, available at http://www.cbp.gov/custoday/aug2000/dwartic4.htm. 59 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006, p. 21.

OCR for page 218
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS only; all changes occur in source systems propagated to it periodically (every 24 hours).60 I.10 LAW ENFORCEMENT ANALYTIC DATA SYSTEM (NETLEADS) Status: In use. This program facilitates ICE law-enforcement activities and intelli- gence analysis capabilities through the use of searches and pattern recog- nition based on multiple data sources.61 As with EDW, NETLEADS uses data marts. Link analysis is used to show relationships, such as associa- tions with known criminals. Information analyzed includes criminal-alien information and terrorism, smuggling, and criminal-case information derived from federal and state government law-enforcement and intel- ligence agencies’ data sources and commercial sources.62 The technology includes timeline analysis, which allows comparisons of relationships at different times. Trend analysis across multiple cases can also be performed in the context of particular investigations and intelligence operations. I.11 ICE PATTERN ANALYSIS AND INFORMATION COLLECTION SYSTEM (ICEPIC) Status: Operating as pilot program as of July 00; planned to enter full- scale operation in fiscal year 00.63 Whereas the NETLEADS focus is on law enforcement, ICEPIC focuses on the goal of disrupting and preventing terrorism.64 Link analysis is performed to uncover nonobvious associations between individuals and organizations to generate counterterrorism leads. Data for analy- sis is drawn from DHS sources and from databases maintained by the Department of State, DOJ, and the Social Security Administration. ICEPIC uses technology from IBM called Non-obvious Relationships Awareness (NORA) to perform the analysis.65 ICEPIC, NETLEADS, and two other systems—the Data Analysis and Research for Trade Transparency System 60 Ibid. 61 Ibid., pp. 21-24. 62 Ibid., pp. 22-23. 63 Immigration and Customs Enforcement Fact Sheet, http://www.ice.gov/pi/news/fact- sheets/icepic.htm. 64 U.S. Department of Homeland Security Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, DHS, Washington, D.C., August 2006, p. 11. 65 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006, pp. 24-26.

OCR for page 218
 APPENDIX I (DARTTS) and the Crew Vetting System (CVS)—all use association, the process of discovering two or more variable that are related, as part of the analysis.66 I.12 INTELLIGENCE AND INFORMATION FUSION (I2F) Status: In deelopment. Using commercial off-the-shelf systems, this program uses tools for searching, link analysis, entity resolution, geospatial analysis, and tempo- ral analysis to provide intelligence analysts with an ability to view, query, and analyze information from multiple data sources.67 The program is focused on aiding in discovery and tracking of terrorism threats to people and infrastructure. With three other DHS programs—Numerical Inte- grated Processing System (NIPS), Questioned Identification Documents (QID), and Tactical Information Sharing System (TISS)—I2F uses collabo- ration processes that support application of cross-organizational expertise and visualization processes that aid in presentation of analysis results. 68 Data may be drawn from both government and commercial sources. I.13 FRAUD DETECTION AND NATIONAL SECURITY DATA SYSTEM (FDNS-DS) Status: In use but without analytical tools to support data mining; support for data mining capabilities not expected for at least  years. This program (formerly the Fraud Tracking System) is used to track immigration-related fraud, public-safety referrals to ICE, and national- security concerns discovered during background checks.69 In its present form, FDNS-DS is a case-management system with no analytical or data mining tools. It is planned to add those capabilities to allow identification of fraudulent schemes. 66 U.S. Department of Homeland Security Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, DHS, Washington, D.C., August 2006, pp. 9-11. 67 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006, p. 26. 68 U.S. Department of Homeland Security Office of Inspector General, “Survey of DHS Data Mining Activities,” OIG-06-56, DHS, Washington, D.C., August 2006, p. 13. 69 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006, p. 27.

OCR for page 218
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS I.14 NATIONAL IMMIGRATION INFORMATION SHARING OFFICE (NIISO) Status: In use without data mining tools; pilot project that includes data mining capabilities being planned. This program is responsible for fulfilling requests for immigration- related information from other DHS components and law-enforcement and intelligence agencies.70 The program does not include any data min- ing tools and techniques, relying instead on manual searches based on specific requests to supply information to authorized requesting agencies. Plans to add such analytical capabilities are being developed. Data for analysis would include data collected by immigration services, publicly available information, and data from commercial aggregators.71 I.15 FINANCIAL CRIMES ENFORCEMENT NETWORK (FinCEN) AND BSA DIRECT Status: FinCEN in use; BSA Direct withdrawn. FinCEN applies data mining and analysis technology to data from a number of sources related to financial transactions to identify cases of money-laundering and other financial elements of criminal and ter- rorist activity. The goal of FinCEN is to promote information-sharing among law-enforcement, regulatory, and financial institutions.72 FinCEN is responsible for administering the Bank Secrecy Act (BSA). As part of that responsibility, it uses data mining technology to analyze data col- lected on the basis of requirements of BSA and to identify suspicious activity tied to terrorists and organized crime. In 2004, FinCEN began a program called BSA Direct intended to provide law-enforcement agencies with access to BSA data and to data mining capabilities similar to those available to FinCEN.73 BSA Direct was permanently halted in July 2006 after cost overruns and technical implementation and deployment difficulties.74 70 U.S. Department of Homeland Security (DHS), “Data Mining Report: DHS Privacy Office Response to House Report 108-774,” DHS, Washington, D.C., July 6, 2006, p. 28. 71 Ibid. 72 See the FinCEN Web site at http://www.fincen.gov/af_faqs.html for further details on its mission. 73 Statement of Robert W. Werner before the House Committee on Government Reform Subcommittee on Criminal Justice, Drug Policy, and Human Resources, May 11, 2004, p. 3, available at http://www.fincen.gov/wernertestimonyfinal051104.pdf. 74 FinCEN, “FinCEN Halts BSA Direct Retrieval and Sharing Project,” July 13, 2006, avail- able at http://www.fincen.gov/bsa_direct_nr.html.

OCR for page 218
 APPENDIX I I.16 DEPARTMENT OF JUSTICE PROGRAMS INVOLVING PATTERN-BASED DATA MINING Status: All programs under deelopment or in use. Responding to requirements of the USA PATRIOT Improvement and Reauthorization Act of 2005,75 DOJ submitted a report to the Senate Com- mittee on the Judiciary that identified seven programs that constitute pat- tern-based data mining as defined in the act.76 The report carefully scoped what was considered pattern-based data mining on the basis of the defi- nition of the act to determine which programs it was required to report on.77 For each program identified, the report provides a description, plans for use, efficacy, potential privacy and civil-liberties impact, legal and regulatory foundation, and privacy- and accuracy-protection policies. 78 The report notes that the scope of the programs and the detail provided vary widely. The following is a summary of the programs drawn from the DOJ report.79 • System-to-Assess-Risk (STAR) Initiatie. Focused on extending the capabilities of the Foreign Terrorist Tracking Task Force (FTTTF), this program is a risk-assessment software system that is meant to help ana- lysts to set priorities among persons of possible investigative interest. Data used by STAR are drawn from the FTTTF data mart, an existing data repository “containing data from U.S. Government and proprietary sources (e.g., travel data from the Airlines Reporting Corporation) as well as access to publicly available data from commercial data sources (such as ChoicePoint).”80 STAR is under development. • Identity Theft Intelligence Initiatie. This program extracts data from the Federal Trade Commission’s Identity Theft Clearinghouse and com- pares them with FBI data from case complaints of identity theft and with suspicious financial transactions filed with FinCEN. Further compari- sons are made with data from private data aggregators, such as Lexis- Nexis, Accurint, and Autotrack. On the basis of the results of the analysis, FBI creates a knowledge base to evaluate identity-theft types, identify 75 U.S. Pub. L. No. 109-177, Sec. 126. 76 U.S. Department of Justice, “Report on ‘Data-mining’ Activities Pursuant to Section 126 of the USA PATRIOT Improvement and Reauthorization Act of 2005,” July 9, 2007, available at http://www.epic.org/privacy/fusion/doj-dataming.pdf. 77 Ibid., pp. 1-6. 78 The report includes a review of only six of the seven initiatives identified, saying that a supplemental report on the seventh initiative will be provided at a later date. 79 Ibid., pp. 7-30. 80 Ibid., p. 8. ChoicePoint is a private data aggregator; see http://www.choicepoint.com/ index.html.

OCR for page 218
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS identity-theft rings through subject relationships, and send leads to field offices. The program has been operational since 2003. • Health Care Fraud Initiatie. This program is used by FBI analysts to research and investigate health-care providers. The program draws data from Medicare “summary billing records extracted from the Centers for Medicare and Medicaid Services (CMS), supported by the CMS Fraud Investigative Database, Searchpoint [the Drug Enforcement Administra- tion’s pharmaceutical-claims database], and the National Health Care Anti-Fraud Association Special Investigative Resource and Intelligence System (private insurance data).”81 The program has been in use since 2003. • Internet Pharmacy Fraud Initiatie. This program’s aim is to search consumer complaints (made to the Food and Drug Administration and Internet Fraud Complaint Center) involving alleged fraud by Internet pharmacies to develop common threads indicative of fraud by such phar- macies. Data on Internet pharmacies available from open-source aggre- gators are also incorporated into the analysis. The program began in December 2005 and is operational. • Housing Fraud Initiatie. This program run by the FBI uses public- source data containing buyer, seller, lender, and broker identities and property addresses purchased from ChoicePoint to uncover fraudulent housing purchases. All analysis is done by FBI analysts manually (that is, not aided by computer programs) to identify connections between indi- viduals and potentially fraudulent real-estate transactions. The program first became operational in 1999 and continues to be extended by Choice- Point as new real estate transaction information becomes available. • Automobile Accident Insurance Fraud Initiatie. This program run by FBI was designed to identify and analyze information regarding automo- bile-insurance fraud schemes. Data sources include formatted reports of potential fraudulent claims for insurance reimbursement as identified and prepared by the insurance industry’s National Insurance Crime Bureau, FBI case-reporting data, commercial data aggregators, and health-care insurance claims information from the Department of Health and Human Services (DHHS) and the chiropractic industry. The program is being run as a pilot program in use by only one FBI field office. No target date has been set for national deployment. In addition to the programs identified as meeting the definition of pattern-based data mining used by the DOJ report, several programs were identified as potentially meeting other definitions of data mining. That report does not provide details about the programs, but it includes brief 81 Ibid., p. 20.

OCR for page 218
 APPENDIX I sketches of them. The programs identified as “advanced analytical tools that do not meet the definition in Section 126” and included in the DOJ report are as follows: 82 • Drug Enforcement Administration (DEA) initiatives: SearchPoint. DEA project that uses prescription data from insurance and cash transactions obtained commercially from ChoicePoint, included the prescribing official (practitioner), the dispensing agent (pharmacy, clinic, hospital, and so on), and the name and quantity of the controlled substance (drug information) to conduct queries about practitioners, pharmacies, and controlled substances to identify the volume and type of controlled substances being subscribed and dispensed. Automation of Reports of Consolidated Orders System (ARCOS) . DEA uses data collected from manufacturers and distributors of controlled substances and stored in the ARCOS database to monitor the flow of the controlled substances from their point of manufacture through commer- cial distribution channels to point of sale or distribution at the dispensing or retail level (hospitals, retail pharmacies, practitioners, and teaching institutions). Drug Theft Loss (DTL) Database. This is similar to ARCOS, but the data source is all DEA controlled-substance registrants (including practi- tioners and pharmacies). Online Inestigatie Project (OIP). OIP enables DEA to scan the Inter- net in search of illegal Internet pharmacies. The tool searches for terms that might indicate illegal pharmacy activity. • Bureau of Alcohol, Tobacco, Firearms, and Explosives initiatives: Bomb Arson Tracking System (BATS). BATS enables law-enforcement agencies to share information related to bomb and arson investigations and incidents. The source of information is the various law-enforcement agencies. Possible queries via BATS include similarities of components, targets, or methods. BATS can be used, for example, to make connections between multiple incidents with the same suspect. GangNet. This system is used to track gang members, gangs, and gang incidents in a granular fashion. It enables sharing of information among law-enforcement agencies. It can also be used to identify trends, relationships, patterns, and demographics of gangs. • Federal Bureau of Investigation initiative: Durable Medical Equipment (DME) Initiatie. DME is designed to help in setting investigative priorities on the basis of analysis of suspi- cious claims submitted by DME providers by contractors for CMS. Data 82 Ibid., pp. 31-35. Descriptions are drawn from the report.

OCR for page 218
 PROTECTING INDIVIDUAL PRIVACY IN THE STRUGGLE AGAINST TERRORISTS sources include complaint reports from the CMS and DHHS Inspector General’s office and FBI databases. • Other DOJ activities: Organized Crime and Drug Enforcement Task Force (OCDETF) Fusion Center. OCDETF maintains a data warehouse named Compass that con- tains relevant drug and related financial intelligence information from numerous law-enforcement organizations. As stated in the report, “the goal of the data warehouse is to use cross-case analysis tools to transform multi-agency information into actionable intelligence in order to support major investigations across the globe.”83 Inestigatie Data Warehouse (IDW). Managed by FBI, this ware- house enables investigators to perform efficient distributed searches of data sources across FBI. IDW provides analysts with the capability to examine relationships between people, places, communication devices, organizations, financial transactions, and case-related information. Internet Crime Complaint Center (IC). A partnership between FBI and the National White Collar Crime Center (NW3C), IC3 is focused on cybercrime. It provides a reporting mechanism for suspected violations. Reports are entered into the IC3 database, which can then be queried to discover common characteristics of complaints. Computer Analysis and Response Team (CART) Family of Systems. This is a set of tools used to support computer forensics work. CART maintains a database of information collected from criminal investigations. Data can be searched for similarities among confiscated computer hard drives. Before publication of the report, many of the programs were either unknown publicly or had unclear scopes and purposes. Commenting on the DOJ report shortly after its delivery to the Senate Committee on the Judiciary, Senator Patrick Leahy commented that “this report raises more questions than it answers and demonstrates just how dramatically the Bush administration has expanded the use of this technology, often in secret, to collect and sift through Americans’ most sensitive personal information,” and said that the report provided “an important and all too rare ray of sunshine on the Department’s data mining activities and provides Congress with an opportunity to conduct meaningful oversight of this powerful technological tool.”84 83 Ibid., p. 34. 84 Comment of Senator Patrick Leahy, Chairman, Senate Judiciary Committee on Depart- ment of Justice’s Data Mining Report, July 10, 2007; see http://leahy.senate.gov/press/ 200707/071007c.html.