Click for next page ( 273


The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 272
11 Best Standards for Future Developments in Computer-Assisted   Firearms Identification The technology of the Integrated Ballistics Identification System (IBIS) provides a significant benefit in reducing the time it takes to identify a match and increasing the overall capacity of toolmark examiners to find matches and to link crimes committed with the same gun. Properly used, the committee believes that the current National Integrated Ballistic Infor- mation Network (NIBIN) can be a valuable investigative tool, providing important leads to law enforcement through searches of ballistics evidence images stored in a database. However, a mature scientific approach is required to improve the reliability of automated searches and, if possible, ultimately to reduce costs, particularly labor costs associated with acquisi- tion and search. Neither the current system, nor newer technologies under development, have demonstrated the ability to operate with the precision, safety, and cost effectiveness needed for a national reference ballistic image database (RBID). The current system has been designed to support the traditional task of having a firearms examiner confirm that a particular cartridge was fired from a particular gun or that two or more cartridges were fired from the same gun. Chapter 6 provides a number of recommendations regarding the kinds of operational and technical improvements that are needed to smooth the progress of this task using the current system. However, the enormous number of firearms crimes committed annually in the United States with their accompanying toll of serious injury and death would seem to call for a far more robust research enterprise in the area of firearms identification than exists in the nation today. This chapter discusses what the government can do to advance the science in acquisition technology, search, and pattern 272

OCR for page 272
BEST STANDARDS FOR FUTURE DEVELOPMENTS 273 recognition to improve the specific performance of technologies designed to assist in firearms identification and to address systematically the problems that prevent current technologies from being scaled up. 11–A Verification, search, and the challenge of scale Forensic analysis of firearms has traditionally been a process in which an expert examiner is charged with the task of matching spent cartridge cases or bullets with a particular firearm or linking evidence from dif- ferent crimes to a particular weapon. This is fundamentally a process of verification, in which a hypothesis—that the same firearm was used in two firings—is accepted, rejected, or found to be inconclusive. This judgment is made on the basis of physical markings on the cartridge case or bullet, generally observed visually by the firearms examiner with the assistance of a microscope. An examiner must usually support the judgment of a match in court and thus seeks considerable evidence of a match in order to reach the conclusion of a definitive match. In considering the development of ballistic image databases, it is criti- cally important to distinguish this traditional process of verification, in which there are external reasons that lead investigators to ask whether two b ­ ullets or casings were fired by the same firearm, from the process of search in which a number of cases are compared with the goal of finding possible reasons to tie them together. In verification, one is validating or rejecting a specific hypothesis on the basis of additional data, in this case forensic evidence. In search, one is trying to come up with potential hypotheses by filtering through potentially large amounts of data. In general, search tasks are considerably more difficult than verification tasks. This same distinction arises in a number of areas other than ballistics, most notably biometrics. For instance, it is a considerably easier task to determine whether two par- ticular fingerprints match each other than it is to find potentially matching fingerprints from a large database. A central distinction between verification and search is that in a verifica- tion task one can be quite conservative, not accepting a match unless there is overwhelming evidence. In law enforcement this is ensured by the courts and expert testimony. In fingerprint-based security systems, this is ensured by requiring a very high-quality match of an individual’s stored fingerprint to the one read by a scanner, even if that requires several attempts by a user to have the print correctly read. In contrast, for a search task, if a system is too conservative it does not generate any useful potential matches, or hypotheses, to consider. Yet if a search system is not conservative enough, it generates too many useless hypotheses or false leads. Neither of these approaches is very useful. Thus, for search-based tasks, such as a ballistic image database, it is very important that the system have both a low false

OCR for page 272
274 BALLISTIC IMAGING alarm rate (reporting of incorrect matches) and a high true detection rate (reporting of correct matches). Simultaneously achieving a low false alarm rate and high true detection rate is well known in the statistics and pattern recognition scientific litera- tures to be challenging, although for many tasks not insurmountable. A given false alarm rate and true detection rate may even produce acceptable performance for a particular database size but still not scale up effectively to larger databases. For instance, if a database grows by a factor of 100, for a given false alarm rate the number of incorrect matches reported will also be expected to grow by a factor of 100. This may simply be too many potential leads to follow up on. Thus, as a rule of thumb, the false alarm rate often must get better (lower) as the database size increases, while at the same time maintaining the true detection rate. Early ballistic image “databases” consisted of photographs of ­bullets and shell casings hanging on the wall. These photographs were taken with a camera attached to a forensic microscope. For unsolved cases these p ­ hotographs served as reminders in the event that an examiner encountered other evidence that could possibly be tied to these cases. Ballistic image database systems, such as NIBIN, can be viewed as a means of automating this manual process of hanging photos on the wall, enabling investigators to potentially tie cases together based on images of a larger number of bul- lets and shell casings than can be considered by manual inspection. These systems are now routinely used to handle much larger databases of ballistic images than one could hang on a wall, and in several law enforcement juris- dictions have been effective for finding “cold hits” or links between cases that were not otherwise known. One can thus view NIBIN as an illustration of the potential that an automated image database search has to increase the capacity to tie cases together in comparison with the manual examination of images (or evi- dence itself). However, as detailed throughout this report, there is a finite limit on the extent to which such a database can be scaled up and still prove useful. This is both an empirical fact for the particular technologies used by the NIBIN system and a question of both theory and experimentation for other imaging technologies and other pattern recognition techniques. In this chapter we briefly review some of the relevant technologies and techniques and offer suggestions for improving the system. 11–B Visual pattern recognition The goal of visual pattern recognition methods is to find possible matches between images. Pattern recognition methods can be used as part of either a verification or a search task. As discussed above, the former involves validating a particular hypothesis, in this case assessing a potential

OCR for page 272
BEST STANDARDS FOR FUTURE DEVELOPMENTS 275 match between a particular pair of images, and search involves matching a query or probe image against a potentially large set of other images to find potential matches. Pattern matching techniques used for search are gener- ally specifically designed in order to be able to efficiently consider a large number of images. Search techniques generally also provide a ranking of how well each potential image matches the query (such rankings for collec- tions of text are now widely familiar in web search engines). There are typically two parts to the pattern recognition process: how to compare a single pair of a probe and a target image and how to structure a search over a large set of targets. Clearly, the search element incorporates the comparison stage as part of its process. The comparison step further typically involves two key elements: what features, signature, or other rep- resentations of the image content are to be used in the actual comparison operation and what measure is used to compare these features. Associ- ated with the measure will often be a set of allowable transformations: for example, objects may be allowed to translate, rotate, or scale without penalty, or they may be allowed to deform in certain other ways without penalty. These transformations are often not only geometric in nature, but also include transformations that might result from other sources of varia- tion, such as changes in lighting. Search-based pattern recognition methods involve a broad range of possible techniques. The most straightforward are sequential searches and rankings that in effect verify a match between the query image and each image in the dataset. However, this kind of approach only works for relatively small datasets. More sophisticated methods include hierarchical search methods, in which one first uses a coarse set of features to roughly rank the targets and then a refined comparison is performed only on the top few selections, or hashing function methods, in which a small set of features are used to index into a precomputed arrangement of the targets, focusing on a small set of most likely matches. Fingerprints are a good example with which to illustrate these trade­ offs. There are many choices of possible features. One could use minutiae (i.e., sets of distinctive local points in the pattern of lines, based on sharp changes in curvature). One could use a broader distribution of the overall orientation of the lines in the pattern, or the density of lines in the pattern, such as histograms of orientation. One could use a learned representation of distinctive features (that is, a set of local features that have been learned as distinctive for this particular print by a series of trials against a large database). Or one could use model-driven features, in which an analysis of the process of generation of fingerprints or an analysis of a particular pattern is used to determine which specific features are distinctive (such as using a local feature focus method). Thus, in matching fingerprints, images of two fingerprints are not

OCR for page 272
276 BALLISTIC IMAGING compared directly, pixel for pixel: instead, each fingerprint image is pre- processed to extract certain features. These features are then compared. Human finger­print experts use features such as minutiae. Pattern recogni- tion systems use features or signatures that are derived mathematically or with machine learning techniques. For automated pattern recognition systems, such formally derived features generally work better than do features that are used by human experts. A recent study by the National Institute of Standards and Technology (NIST) on the accuracy of finger- print recognition systems found that the best pattern recognition methods are able to achieve a 98.6 percent correct detection rate using a single finger and a 99.9 percent correct detection rate using four fingers, with a false alarm rate of 0.01 percent (Wilson et al., 2004). That is, such a system will correctly match two fingerprints from the same person much of the time while only incorrectly saying there is a match (when there is none) only 1 in 10,000 times (for details, see, e.g., http://www.sciencedaily. com/releases/2004/07/040716080142.htm [February 2008]). When considering possible comparison measures, there is again a broad range of options. Again we use fingerprints as an example. One approach is simply to measure the degree of overlap between two patterns—that is, search over all possible alignments of the features or signature (query for a particular target) and count something, such as the number of pixels in the query and target, to find pairs that have the same value or values within some tolerance. Note that inherent in this definition is the notion of allowable transformations between the probe and the target, which may be abstracted out in the feature extraction process, part of the matching process, or a mixture of both. 11–C Best practices for less mature technologies Current NIBIN technology has been developed using a single vendor approach. This kind of approach is common when the technological prob- lem to be solved—in this case, automating the search function in firearms identification—seems to be straightforward and the market for the result- ing product is limited. However, any vendor must necessarily choose a particular approach based on its best judgment as to what is most feasible and cost effective. The kinds and scope of empirical questions involved in advancing the technologies and improving performance and scalability are difficult for a single vendor to address. The challenge, then, is how to divide the task so that particular pieces of the application can be addressed through a competitive research and development process. There are two recent examples of government mandated large-scale sys- tem developments based on (initially) nonmature technologies: fingerprint identification and facial recognition. Both systems required the creation of

OCR for page 272
BEST STANDARDS FOR FUTURE DEVELOPMENTS 277 dedicated pattern recognition algorithms, similar to the requirements of the proposed RBID. Instead of relying on a single system produced by a single vendor, both systems were organized as competitions between vendors. In the following sections, we first describe the two competitions and then extract best practice suggestions from those experiences. 11–C.1  Fingerprint Identification The statutory mandate of NIST under Section 403c of the USA PATRIOT Act requires that NIST examine and certify biometric technolo- gies that may be used, among others, in the U.S. Visitor and Immigrant Status Indication Technology (VISIT), formerly known as the U.S. entry- exit system. The Fingerprint Vendor Technology Evaluation (FpVTE) 2003 was conducted on behalf of the Justice Management Division of the Depart- ment of Justice in the fall of 2003, to evaluate the accuracy of commercial fingerprint matching, identification, and verification systems (Wilson et al., 2004; see also http://fpvte.nist.gov [January 15, 2007]). FpVTE 2003 was designed to assess the capability of fingerprint sys- tems to meet requirements for both large-scale and small-scale real-world applications. FpVTE 2003 consists of multiple tests performed with com- binations of fingers (e.g., single fingers, 2 index fingers, 4 to 10 ­fingers) and different types and qualities of operational fingerprints (e.g., flat livescan images from visa applicants, multifinger slap livescan images from present- day booking or background check systems, or rolled and flat inked finger- prints from legacy criminal databases). FpVTE 2003 was among the most comprehensive evaluations of finger- print matching systems ever executed, particularly in terms of the number and variety of systems and fingerprints: 18 companies participated, with 34 systems tested. The test used 48,105 sets of flat slap or rolled fingerprint sets from 25,309 individuals, with a total of 393,370 distinct fingerprint images. The tests revealed that, when four fingerprints were used for matching, the most accurate fingerprint system tested always had a true accept rate that was higher than 99.9 percent with a false accept rate of 0.01 percent. The evaluations were conducted to (1) measure the accuracy of finger- print matching, identification, and verification systems using operational fingerprint data; (2) identify the most accurate fingerprint matching sys- tems; (3) determine the effect of a wide variety of variables on matcher accuracy; and (4) develop well-vetted sets of operational data from a variety of sources for use in future research. As such, the fingerprint identification system is considered to be a system in continuous evolution. As better algorithms become available, the system can be updated to improve the identification success rate. The use of a systematic competitive test between vendors ensures that

OCR for page 272
278 BALLISTIC IMAGING the best possible algorithms are developed and used. In addition, the effects of various external factors on the accuracy of the identifications can be quantitatively addressed. For instance, it was shown unambiguously that the variables that had the largest effect on system accuracy were the number of fingers used and fingerprint quality. A national RBID would require a similar systematic study of the effect of external variables on the accuracy of matching for both cartridge cases and bullets. 11–C.2  Facial Recognition The U.S. Department of Defense Counterdrug Technology Development Program Office began the Face Recognition Program (FERET) in 1993 and sponsored it through its completion in 1998. Total funding for the program was a little over $6.5 million. The goal of FERET was to develop automatic face recognition capabilities that could be employed to assist security, intel- ligence, and law enforcement personnel in the performance of their duties. FERET consisted of three major elements. First, the program sponsored research that advanced facial recognition from theory to working labora- tory algorithms. Many of the algorithms that were developed in FERET form the foundation of today’s commercial systems. Second was the collec- tion and distribution of the FERET database, which contains 14,126 facial images of 1,199 individuals. (The FERET database is currently maintained at NIST.) The development portion of the FERET database has been distrib- uted to more than 100 groups outside the original program. The final, and most recognized, part of the FERET program involved the FERET evalu- ations that compared the abilities of various facial recognition algorithms using the FERET database. A standard database of face imagery was essential to the success of FERET, both to supply standard imagery to the algorithm developers and to supply a sufficient number of images to allow testing of these algorithms. Before the start of FERET, there was no way to accurately evaluate or com- pare facial recognition algorithms (see http://www.frvt.org/FERET/default. htm [February 2008]). FERET set out to establish a large database of facial images that was gathered independently from the algorithm developers. The database made it possible for researchers to develop algorithms on a com- mon database and to report results in the literature using this database. The results reported in the standard literature did not provide a direct comparison among algorithms because each researcher reported results using different assumptions, scoring methods, and images. The indepen- dently administered FERET evaluations, using well-defined and published evaluation methodologies (Phillips et al., 2000), allowed for a direct quan- titative assessment of the relative strengths and weaknesses of different approaches. One of the most important aspects of the use of this database

OCR for page 272
BEST STANDARDS FOR FUTURE DEVELOPMENTS 279 was that the variability of the data could be controlled (e.g., images of a person taken on the same day under different lighting conditions, images taken on different days or a year apart, and so on). It is only after the i ­ntrinsic variability of the data is explicitly taken into account that a facial recognition system can function reliably. The FERET database has been used in two face-recognition vendor tests (FRVT), one in 2000 and 2002, and a face-recognition grand challenge in 2004–2006. The grand chal- lenge was motivated by advances in computer vision techniques, computer design, and sensor design that held the promise of reducing the error rate of the present systems by an order of magnitude (see http://www.frvt.org/ FRGC/ [February 2008]). The use of a standardized evaluation method also allows for a com- parison of different systems, in this case of the accuracy of the fingerprint systems and the facial recognition systems. It was concluded that leading contemporary fingerprint systems are substantially more accurate than the face-recognition systems tested in 2002 (Wilson et al., 2004). 11–D Best Practices Both of the systems discussed in the preceding sections share several commonalities with the proposed national RBID: • Fingerprint, facial recognition, and ballistic imaging all use images as input. • There is considerable variability between the images in each of these areas. • All three systems have potentially large databases that must be searched with high accuracy and within a reasonable search time. One important distinction between the three systems is that fingerprint and facial recognition attempt to directly connect an image with a person but the proposed national RBID connects an image to a weapon, not to a person. A second important distinction emanates from the stochastic nature of ballistics: that is, noise and variation in fingerprints comes from acquisi- tion; in ballistics there is the additional process of generating the physical characteristics that are then going to be acquired changes each time. Just as automated fingerprint and facial recognition systems were con- sidered to be nonmature technologies in the 1990s, automated ballistic imaging can today be considered as a nonmature technology. The use of large-scale evaluations of the fingerprint and face recognition technologies through controlled competitive vendor tests has advanced those technolo- gies tremendously. The committee believes that it is likely that a similar competitive research program for ballistic imaging—involving university,

OCR for page 272
280 BALLISTIC IMAGING federal and state agencies, and industrial researchers—would lead to sig- nificant improvements in image matching algorithms. The research could be partitioned into separable components that have applicability across a wide range of research applications. For example, image acquisition could be investigated separately from search and pattern recognition. In addi- tion, the competitive vendor tests approach could be used to test the safety, durability, and cost-effectiveness of engraving identifiers on firearms parts and or bullets and cartridge cases, such as the microstamping approaches discussed in Chapter 10. Given the cost of the current system, the need for improved perfor- mance of the system documented in this report and elsewhere, the costs to society of crimes committed with firearms, and the clear interest of state legislatures and Congress to make improvements in firearms identification, the committee believes that such an investment in research to support the development of technologies to assist in firearms identification is critically important.