5
Forensic DNA Databanks and Privacy of Information

DNA typing in the criminal-justice system has so far been used primarily for direct comparison of DNA profiles of evidence samples with profiles of samples from known suspects. However, that application constitutes only the tip of the iceberg of potential law-enforcement applications. If DNA profiles of samples from a population were stored in computer databanks (databases), DNA typing could be applied in crimes without suspects. Investigators could compare DNA profiles of biological evidence samples with a databank to search for suspects.

In many respects, the situation is analogous to that of latent fingerprints. Originally, latent fingerprints were used for comparing crime-scene evidence with known suspects. With the development of the Automated Fingerprint Identification Systems (AFIS) in the last decade, the investigative use of fingerprints has dramatically expanded. Forensic scientists can enter an unidentified latent-fingerprint pattern into the system and within minutes compare it with millions of people's patterns contained in a computer file. In its short history, automated fingerprint analysis has been credited with solving tens of thousands of crimes.1

This chapter examines whether similar databanks of DNA profiles should be created and, if so, how and when.

COMPARISON OF DNA PROFILES AND LATENT FINGERPRINTS

To identify key issues pertinent to the establishment of DNA databanks, it is instructive to compare DNA profiles and latent fingerprints.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 111
5 Forensic DNA Databanks and Privacy of Information DNA typing in the criminal-justice system has so far been used primarily for direct comparison of DNA profiles of evidence samples with profiles of samples from known suspects. However, that application constitutes only the tip of the iceberg of potential law-enforcement applications. If DNA profiles of samples from a population were stored in computer databanks (databases), DNA typing could be applied in crimes without suspects. Investigators could compare DNA profiles of biological evidence samples with a databank to search for suspects. In many respects, the situation is analogous to that of latent fingerprints. Originally, latent fingerprints were used for comparing crime-scene evidence with known suspects. With the development of the Automated Fingerprint Identification Systems (AFIS) in the last decade, the investigative use of fingerprints has dramatically expanded. Forensic scientists can enter an unidentified latent-fingerprint pattern into the system and within minutes compare it with millions of people's patterns contained in a computer file. In its short history, automated fingerprint analysis has been credited with solving tens of thousands of crimes.1 This chapter examines whether similar databanks of DNA profiles should be created and, if so, how and when. COMPARISON OF DNA PROFILES AND LATENT FINGERPRINTS To identify key issues pertinent to the establishment of DNA databanks, it is instructive to compare DNA profiles and latent fingerprints.

OCR for page 111
Latent fingerprints are found at crime scenes much more commonly than are body fluids that contain DNA. Latent-fingerprint analysis can be useful in a wide range of crimes, including many murders, rapes, assaults, robberies, and burglaries. However, the probative value of latent fingerprints is often limited to establishing that a suspect was present at a location—and that does not automatically imply guilt. DNA analysis will be useful in more limited settings. DNA analysis will be useful primarily in rapes (because semen is often recovered) and murders (those in which either the perpetrator's blood was spilled at the crime scene or the victim's blood stained the perpetrator's personal effects—only the former will assist in identifying an unknown suspect). Where it exists, DNA evidence will often be more probative than fingerprints, in that the presence of body fluids is harder to attribute to innocuous causes. That is especially true in rape cases, in which positive identification of semen in the vagina is virtual proof of intercourse (although it leaves open the issue of whether it was consensual). Consequently, the potential utility of a DNA profile databank must be evaluated in terms of the particular crimes to which it is primarily suited. Fingerprints have a defined physical pattern independent of the method of visualization, whereas DNA profiles are derived patterns that can be constructed with various protocols (e.g., different restriction enzymes to cut the DNA and different probes to examine different loci) that produce completely different patterns that cannot be readily interconverted. The advance of DNA technology will see the development of new protocols that offer technical advantages but produce different and incompatible patterns. In a sense, current DNA profiles can be thought of as extremely small bits of a person's fingerprints on all or some of the fingers. Different methods look at different fingers or different locations on a finger. Only when DNA technology is capable of sequencing the entire three billion basepairs of a person's genome could a DNA pattern be considered to be as constant and complete as a fingerprint pattern. Consequently, the development of DNA databanks is tied to the standardization of methods. A national DNA profile databank can function only if participating laboratories agree on standardized methods. However, the creation of a databank with current methods could discourage the conversion to newer, cheaper, and more powerful methods. The amount of information provided by latent fingerprints in an evidence sample is essentially fixed—it depends primarily on the portion of the finger(s) or palm found—and the forensic scientist uses all of it. DNA typing of an evidence sample yields information in an amount determined by the number of loci studied, so the forensic scientist has substantial control over the amount of information to be obtained from a sample. Conse-

OCR for page 111
quently, the creation of a DNA profile databank would require decisions about the extent of the DNA profile to be recorded. Fingerprints are more highly individualized than DNA profiles based on the RELP technology being used in forensic laboratories. Consequently, a match between an evidence sample and an entry in a DNA profile databank should not automatically lead to the assumption of identity, but should be confirmed by the examination of additional loci that are not in the databank. Obtaining an inked fingerprint from a person is much less intrusive, costly, and difficult than drawing a blood sample for DNA typing. Collection of fingerprints from known persons is inexpensive and relatively easily accomplished by someone with minimal technical background and training. In contrast, development of a DNA profile from a blood sample is time-consuming and expensive and requires extensive education, training, and quality-assurance measures. Consequently, the number of people who can be included in a DNA profile databank might be limited by economic considerations. Categories of persons to include must be selected with due consideration of costs and benefits. The computer technology required for an automated fingerprint identification system is sophisticated and complex. Fingerprints are complicated geometric patterns, and the computer must store, recognize, and search for complex and variable patterns of ridges and minutiae in the millions of prints on file. Several commercially available but expensive computer systems are in use around the world. In contrast, the computer technology required for DNA databanks is relatively simple. Because DNA profiles can be reduced to a list of genetic types (i.e., a list of numbers), DNA profile repositories can use relatively simple and inexpensive software and hardware. Consequently, computer requirements should not pose a serious problem in the development of DNA profile databanks. Fingerprints provide no information about a person other than identity. DNA typing can, in principle, also provide personal information—concerning medical characteristics, physical traits, and relatedness—that carries with it risks of discrimination. Consequently, DNA typing raises considerably greater issues of privacy than does ordinary fingerprinting. In short, ordinary fingerprints and DNA profiles differ substantially in ways that bear on the creation and design of a national DNA profile databank. CONFIDENTIALITY AND SECURITY Confidentiality and security of DNA-related information are especially important and difficult issues, because we are in the midst of two extraordi-

OCR for page 111
nary technological revolutions that show no signs of abating: in molecular biology, which is yielding an explosion of information about human genetics, and in computer technology, which is moving toward national and international networks connecting growing information resources. Molecular geneticists are rapidly developing the ability to diagnose a wide variety of inherited traits and medical conditions. The list already includes simply inherited traits, such as cystic fibrosis, Huntington's disease, and some inherited cancers. In the future, the list might grow to include more common medical conditions, such as heart disease, diabetes, hypertension, and Alzheimer's disease. Some observers even suggest that the list could include such traits as predispositions to alcoholism, learning disabilities, and other behavioral traits (although the degree of genetic influence on these traits remains uncertain). Obviously, such information could lead to discrimination by insurance companies, employers, or others against people with particular traits. In general, the committee feels that DNA profile databanks should avoid the use of loci associated with traits or diseases. That avoidance is the best guarantee against misuse of such information. Current forensic RELP typing markers are not known to be associated with particular traits or medical conditions, but they might be in the future. Current PCR typing uses the HLA DQ locus, which is in a gene that controls many important immunological functions and is associated with diseases. Even simple information about identify requires confidentiality. Just as fingerprint files can be misused, DNA profile identification information could be misused to search and correlate criminal-record databanks or medical-record databanks. Computer storage of information increases the possibilities for misuse. For example, addresses, telephone numbers, social security numbers, credit ratings, range of incomes, demographic categories, and information on hobbies are currently available for many of the citizens in our society from various distributed computerized data sources. Such data can be obtained directly through access to specific sources, such as credit-rating services, or through statistical disclosure.2 ''Statistical disclosure" refers to the ability of a user to derive an estimate of a desired statistic or feature from a databank or a collection of databanks. Disclosure can be achieved through one query or a series of queries to one or more databanks. With DNA information, queries might be directed at attaining numerical estimates of values or at deducing the state of an attribute of a person through a series of Boolean (yes-no) queries to multiple distributed databanks. Several private laboratories already offer a DNA-banking service (sample storage in freezers) to physicians, genetic counselors, and, in some cases, anyone who pays for the service. Typically, such information as name, address, birth date, diagnosis, family history, physician's name and

OCR for page 111
address, and genetic counselor's name and address is stored with the samples. That information is useful for local, independent bookkeeping and record management. But it is also ripe for statistical or correlative disclosure. Just the existence of a sample from a person in a databank might be prejudicial to the person, independently of any DNA related information. In some laboratories, the donor cannot legally prevent outsiders' access to the samples, but can request its withdrawal. A request for withdrawal might take a month or more to process. In most cases, only physicians with signed permission of the donor have access to samples, but typically no safeguards are taken to verify individual requests independently. That is not to say that the laboratories intend to violate donors' rights; they are simply offering a service for which there is a recognized market and attempting to provide services as well as they can. Much has been written on statistical databank systems and associated security issues.3 Guidelines for release of DNA samples and disclosure of DNA typing information must be designed to safeguard the rights of persons who, for one reason or another, get involved in a DNA typing (see Chapter 7 for further discussion) without burdening law-enforcement agencies and civil investigative authorities with unnecessarily protective policies. The need for safeguards of DNA information has not been completely lost on lawmakers considering databank legislation. Some state legislation has addressed the issue. For example, the Virginia law4 establishing a DNA profile databank for convicted offenders states that any person who, without authority, disseminates information contained in the databank shall be guilty of a Class 3 misdemeanor. Any person who disseminates, receives or otherwise uses or attempts to so use information in the databank, knowing that such use is for a purpose other than as authorized by law, shall be guilty of a Class 1 misdemeanor. Except as authorized by law, any person who, for purposes of having DNA analysis performed, obtains or attempts to obtain any sample submitted to the Division of Forensic Science for analysis shall be guilty of a Class 5 felony. That passage reflects recognition of the potential for abuse of information derived from a sample (and of the sample itself) and incorporates sanctions to preclude it. In the first legal test5 of the establishment of such databanks on convicted felons, Chief U. S. District Judge James C. Turk upheld the Virginia databank statute, offering the following opinion in regard to the issue of privacy: The stored information is available only to law enforcement personnel in furtherance of an official investigation of a criminal offense. Va. Code Ann. Section 19.2-310.6 (1990). In addition, the identifying information is disseminated to the law enforcement officer only if the sample provided by the officer matches a sample in the databank. Id. The procedures followed

OCR for page 111
are sufficiently stringent such that no person, including a law-enforcement official, may conduct random searches in the databank. Although that is a good start, state laws should state explicitly the types of uses that can be authorized. In particular, in addition to the points made in the opinion just quoted, investigation of DNA samples or stored information for the purpose of obtaining medical information or discerning other traits should be prohibited, and violations should be punishable by law. Several states incorporate some of those specific protections into their statutes establishing DNA profile databanks. However, the committee urges all states to be systematic in defining authorized uses of information in DNA databanks. METHODOLOGICAL STANDARDIZATION Because of the incompatibility between DNA typing methods, federal, state, and local laboratories that wish to use a national DNA profile databank must all adopt a single standardized method for analyzing samples—both databank specimens and evidence specimens. Accordingly, the development of a national DNA databank has the potential advantage of acting as a driving force for standardization in forensic DNA typing, but the potential disadvantage of ossifying a rapidly moving technology. It is broadly agreed that current RFLP typing methods constitute simply an initial approach that will be replaced in the next few years by procedures that are much easier to automate, much less expensive, and more informative. Premature development of a national databank based on current RFLP typing methods runs the risk of perpetuating a "dinosaur" technology in the face of better techniques. The committee believes that it is too early to launch a comprehensive national DNA profile databank. However, it is appropriate to carry out pilot programs based on RFLP technology with the FBI and states that have active DNA typing efforts. The initial efforts should help to define the problems and issues that will be encountered in the fashioning of a comprehensive program. Such projects should be explicitly viewed as preliminary, with the clear expectation that the databank will be supplanted in the next several years by better methods. Before even pilot projects can be begun, the degree of interlaboratory reproducibility—which is essential to the success of a databank—should be thoroughly documented. So far, there have been only a few interlaboratory-reproducibility studies to compare the ability of different laboratories to measure the same DNAs accurately under different circumstances. The National Institute of Standards and Technology (NIST), in concert with the Federal Bureau of Investigation (FBI) and the Technical Working Group on

OCR for page 111
DNA Analysis Methods (TWGDAM), sent samples to 22 laboratories in October 1990 (Dennis Reeder, personal communication, 1991); 12 laboratories have reported so far. The greatest differences were reported to be slightly less than 5%. The preliminary results are encouraging, but need to be followed by more extensive reproducibility testing before the efficacy of a national network based on this method can be demonstrated. Moreover, the committee urges that laboratories participating in any national databank be required to participate in continuing proficiency and reproducibility studies (by carrying out blind measurements of samples sent from a common source), to ensure that reproducibility does not drift over time. COST VERSUS BENEFIT An analysis of the costs and benefits of establishing DNA databanks is problematic at best. Costs will depend on a number of variables, such as methods, numbers of loci used, and types and numbers of samples to be tested. Benefits will depend on the populations included in the databank and the likelihood of finding matches. Moreover, costs and benefits must be reckoned in both monetary and nonmonetary terms. Nonmonetary costs can include the risk of loss of privacy and the misuse and abuse of genetic information. Nonmonetary benefits can include prevention of future crimes. Those diverse elements cannot be weighed except in the context of societal values. Concerning monetary costs, it is helpful to recall the comparison between latent fingerprints and DNA profiles. Collection of fingerprints from identified persons is inexpensive and relatively easily accomplished by persons with minimal technical training and background. Samples cost perhaps a few dollars; the cost reflects the personnel time involved in taking and filing the fingerprints. Although sample collection is simple, fingerprint databanks require sophisticated and expensive computer hardware and software. A typical state automated fingerprint identification system can cost $10 million. In contrast, DNA typing is time-consuming, is expensive, and requires extensive education, training, and quality-assurance measures. With current RFLP methods, blood must be obtained by venipuncture at an estimated cost of $20/sample. Storage methods and costs depend on the number of samples and the form in which they are preserved (liquid or dried blood, extracted DNA pellet, buffy coat, etc.). In any case, freezers, cryotubes, and labor can cost another $20/sample for storage. The cost of RFLP analysis can be estimated from fees charged by private laboratories: about $100-150/sample.6 Thus, a single DNA profile can cost about $120-170, and constructing 10,000 DNA profiles could cost $1.2-1.7 million. However, DNA typing databanks do not require highly sophisticated or expensive computer hardware and software.

OCR for page 111
In short, ordinary fingerprints and DNA profiles have opposite economic characteristics. Ordinary fingerprint databanks have low variable costs and high fixed costs, and DNA typing databanks have high variable costs and comparatively low fixed costs. Those considerations imply that different decisions could be appropriate as to whether, when, and how to develop each kind of databank. For example, because of the high variable cost per sample, considerable thought must given to whose DNA profiles should be stored. To maximize the "return per sample," one should concentrate on persons convicted of crimes with documented high rates of recidivism, such as rape, as discussed below. Cost analysis is made more difficult by the rapidity of change in DNA typing technology. For example, PCR-based methods might greatly reduce DNA typing costs: blood samples might be replaced with simple buccal swabs (i.e., cheek scraping); Southern blots might be replaced with non-gelbased formats; complicated scoring of the problematic continuous allele system used in RFLP analysis might be replaced with discrete mechanical allele scoring. Accordingly, today's cost assessments must be viewed as tentative. WHOSE SAMPLES SHOULD BE INCLUDED? In deciding whom to include in a DNA profile databank, it is necessary to consider the likely forensic utility of the data and the protection of individual privacy. It is helpful to consider six categories of people. Samples from Convicted Offenders DNA profile databanks containing profiles of criminal offenders must be justified on the basis of the likelihood of recidivism. The Bureau of Justice Statistics7 found that, of the 108,580 persons released from prisons in 11 states in 1983, an estimated 63% were rearrested for a felony or serious misdemeanor within 3 years, 47% were reconvicted, and 41% returned to prison or jail (Table 5-1). They were charged with a total of 326,746 new offenses in the 3-year period; more than 50,000 charges were related to violent offenses, including approximately 2,000 homicides, 1,500 kidnappings, 1,300 rapes, 2,600 other sexual assaults, 17,000 robberies, and 22,600 other assaults. Of the prisoners who had been incarcerated for violent offenses, 60% were rearrested within 3 years for similar offenses. Recidivism rates were highest in the first year. Four of every 10 released prisoners were rearrested in the first year; nearly one-fourth were convicted of new crimes; and nearly one-fifth were returned to prison or sent to jail. Most rearrests occurred in the states in which the prisoners were released, although about 15% occurred in other states. Of course, high recidivism

OCR for page 111
TABLE 5-1 Recidivism Rates of Prisoners Released in 11 States in 1983, by Most Serious Offensea   Fraction of Prisoners, %, Who Within 3 Years Were: Offense Fraction of Prisoners, % Rearrested Reconvicted Reincarcerated All offenses 100.0 62.5 46.8 41.4 Violent offenses 34.6 59.6 41.9 36.5 Murder 3.1 42.1 25.2 20.8 Negligent manslaughter 1.4 42.5 27.9 21.8 Kidnapping .6 54.5 35.7 31.3 Rape 2.1 51.5 36.4 32.3 Other sexual assault 2.1 47.9 32.6 24.4 Robbery 18.7 66.0 48.3 43.2 Assault 6.4 60.2 40.4 33.7 Other .4 50.1 33.2 31.4 Property offenses 48.3 68.1 53.0 47.7 Burglary 25.8 69.6 54.6 49.4 Larceny and theft 11.2 67.3 52.2 46.3 Motor vehicle theft 2.6 78.4 59.1 51.8 Arson .7 55.3 38.5 32.3 Fraud 5.5 60.9 47.1 43.3 Stolen property 1.7 67.9 54.9 50.5 Other .8 54.1 37.3 33.9 Drug offenses 9.5 50.4 35.3 30.3 Possession 1.2 62.8 40.2 36.7 Trafficking 4.5 51.5 34.5 29.4 Other and unspecified 3.9 45.3 34.5 29.1 Public-order offenses 6.4 54.6 41.5 34.7 Weapons 2.2 63.5 46.7 38.1 Other 4.2 49.9 38.9 33.0 Other offenses 1.1 76.8 62.9 59.2 a Data from Beck and Shipley.7 rates alone do not demonstrate the utility of a databank of DNA profiles of convicted offenders. One must also ask: What fraction of crimes committed by repeat offenders do not themselves lead to rearrest and reconviction? What fraction would end in rearrest and reconviction if a DNA profile databank were available? The first question obviously is impossible to answer explicitly. However, the FBI's Uniform Crime Reports states that there are about 20,000

OCR for page 111
murders and 100,000 forcible-rape cases per year. It is estimated that 30% of all murder cases and 70% of all rape cases are never closed by arrest (John Hicks, personal communication, 1990). It should also be pointed out that only an estimated 50% of rapes are in fact even reported. The second question is also difficult to answer, but it is clear that crimes of most types will not afford the opportunity to recover relevant biological evidence that will allow the police to identify an unknown suspect—i.e., the perpetrator's own body fluids. They include larcenies, burglaries, and assaults, for which ordinary fingerprints are frequently found. The major exception is rape, for which semen samples can be recovered in many cases and might provide prima facie evidence of sexual intercourse. In a small minority of homicides, blood, hair, or tissue samples from the perpetrator are left at the scene of the crime (e.g., because of a fight at the scene). A DNA profile databank would thus be valuable primarily in investigating forcible rape, although the databank would be useful for some other investigations. State legislatures considering setting up such databanks should weigh the benefits in terms of solved rape cases and the costs in terms of collecting samples from persons likely to commit rapes (primarily, it seems, convicted sex offenders). Initial state efforts to develop DNA profile databanks were indeed aimed at sex offenders. Interestingly, some states rapidly expanded their programs to include all convicted offenders—without explicit weighing of the potential benefits of possessing such persons' patterns for solving crimes and the potential costs. The above discussion justifies the development of a databank of DNA profiles of unknown subjects (open cases) and of offenders convicted of violent sex crimes. Such a databank would provide law enforcement with a powerful tool in linking sexual-assault cases through DNA profiles and tracking the activities of serial rapists. In light of recidivism and the continuing increase in reported rapes in this country, a databank of convicted sex offenders would provide investigators with a logical first place to look for assistance in solving unknown-offender sexual-assault cases. Samples from Suspects DNA typing profiles of suspects might also be useful in associating a person with open or unsolved cases pending in other jurisdictions or states. Although a suspect's DNA profile might ultimately be entered into a convicted-felon databank, there would no doubt be a substantial period during which a suspect might engage in other criminal activities. Thus, in the case of a serial rapist, a person under suspicion and investigation for one offense, might be responsible for several later offenses for which he is not suspected. Therefore, if a DNA profile of a suspect is entered into a databank, it would be available to be searched against future unsolved cases.

OCR for page 111
Samples from Victims To protect their privacy, victims' DNA profiles should never be entered into a national databank or searched against such a databank, with the possible exception of cases of abduction, in which it might be desirable for the victim's information to be stored and accessible to law-enforcement officials. In any exceptional case, prior permission of the victim, the victim's legal guardian, or a court should be required, and the victim's DNA should be removed from the databank when it can no longer serve the purpose for which it was entered. Samples from Missing Persons and Unidentified Bodies This portion of the databank would contain DNA profiles from unidentified bodies, body parts, and bone fragments. These would provide the greatest benefit when DNA profiles from immediate relatives (parents) could be used to reconstruct the DNA profile of a missing person for comparison. Although there would be immediate benefits from the development of these types of data, the actual number of relevant cases would be small, compared with the number of sexual assaults by unknown persons. Crime-Scene Samples from Unidentified Persons DNA profile evidence found at the scene of a crime should be stored and accessible to legally authorized investigators. Such samples might be useful for recognizing serial or multiple crimes even before a perpetrator is found and will be equally useful once a perpetrator has been identified. It might be useful to have additional cross-referenced information accessible at the national level, including modus operandi or other attributes for correlation as part of an investigation. Samples from Members of the General Population Some observers have suggested that a DNA profile databank should not be limited to criminals, but should aim, at least in the long term, to store DNA profiles from the entire general public. It is argued that many groups in the general public are already required to be fingerprinted for various security and identification purposes and the same justification could be applied to DNA profiles; furthermore, if the databanks contained everyone, rather than just previous offenders, the chance of identifying perpetrators would be much greater. The committee does not find those arguments persuasive. For identification and security purposes, DNA profiles would add nothing to ordinary

OCR for page 111
fingerprints, because ordinary fingerprints already provide a complete identifier and are far more likely to be recovered in connection with security breaches than are blood samples that are amenable to DNA analysis. As for identifying perpetrators, there is no doubt that the system would have some effect. However, Americans have generally been reluctant to allow the creation of national identification systems, and DNA profiling poses a special risk of invasion of privacy (concerning personal and medical traits). We caution against moving in that direction. Finally, we note that current technology is far too expensive to contemplate the creation of such a large databank. Samples from Anonymous Persons for Population Genetics The committee notes that statistical databanks of random population samples are required for estimating allele frequencies, as described in Chapter 3. To protect the privacy of persons whose only role is to make up a statistical sample, their identities should never be retained in a databank, and the databanks should never be searched for matches in connection with investigations. SAMPLE STORAGE Another difficult issue is the storage and maintenance of DNA samples themselves (or any reusable products of the typing process), as opposed to DNA profiles. In principle, retention of DNA samples creates an opportunity for misues—i.e., for later testing to determine personal information. In general, the committee discourages the retention of DNA samples. However, there is a practical reason to retain DNA samples for short periods. Because DNA technology is changing so rapidly, we expect the profiles produced with today's methods to be incompatible with tomorrow's methods. Accordingly, today's profiles will need to be discarded and replaced with profiles based on the successor methods. It would be extremely expensive and inefficient to have to redraw blood samples for retyping. We are therefore persuaded that retention of samples after typing should be permitted for the short term—only during the startup phase of DNA profile databanks. As databanks become established and technology stabilizes somewhat, samples should be destroyed promptly after typing. INFORMATION TO BE INCLUDED AND MAINTAINED IN A DATABANK It is worth commenting on the nature of the information that should be stored in a DNA profile databank.

OCR for page 111
Submitting-agency information should include the location of the agency, its telephone number, names of the analysts who conducted the DNA typing, the name of the person who entered the data into the databank, and agency contact information. Sample information should include entries that describe the type of sample (body-fluid stain, tissue, or known blood sample) and a unique sample identifier, the condition of the sample, unusual handling and storage, and other factors that might affect the quality of the DNA and the evaluation of partial patterns. The DNA type at a locus must be entered in standard nomenclature. For example, for RFLP typing, fragment-size data from each locus successfully probed should be entered as the number of basepairs determined for each fragment. Sizing data for the human-DNA control should also be entered. Entries into the convicted-offender files should include the name of the offender, dates of offenses and convictions, and DNA profile data. Only the profile index should be centrally stored. Case data should be stored locally, and their distribution should be under the control of the local agency. RULES ON ACCESSIBILITY Computer security should be ensured through use of the best available practices and technologies. Access to the databank should be limited to a small number of legally authorized persons and should be limited to what is required for specific official investigations. All instances of access should be audited and archived. An excellent discussion of computerized audit-trail systems is available.8 If the computer system and associated databank are to be made available for remote access by cooperating state and federal agencies, such as by telephone or networked by other means, the access mechanism (i.e., the network switch) should be made available only for specific, authorized remote-access sessions; that is, the system should not be continuously available to remote users. This type of limited access can be achieved either administratively or physically; it is a simple and inexpensive means of safeguarding sensitive information and is common practice in many national security situations. For example, secure computers are virtually never connected to unsecured computers at national defense laboratories; when newspaper headlines make statements that computers at these facilities have been breached, it has been the case that the computers were unsecured and not connected to the secure computers. In many cases, these unsecured computers have telecommunication connections available to employees for routine use, but they do not contain security information.

OCR for page 111
STATISTICAL INTERPRETATION OF DATABANK MATCHES The distinction between finding a match between an evidence sample and a suspect sample and finding a match between an evidence sample and one of many entries in a DNA profile databank is important. The chance of finding a match in the second case is considerably higher, because one does not start with a single hypothesis to test (i.e., that the evidence was left by a particular suspect), but instead fishes through the databank, trying out many hypotheses. If a pattern has a frequency of 1 in 10,000, there would still be a considerable probability (about 10%) of seeing it by chance in a databank of 1,000 people. Although there are statistical methods for correcting for such multiple testing, the committee considers that approach unwise, because it requires that the population frequency estimates of genotypes are accurate to a degree that is unlikely to be achieved (because sample sizes are limited). There is a far better solution: When a match is obtained between an evidence sample and a databank entry, the match should be confirmed by testing with additional loci. The initial match should be used as probable cause to obtain a blood sample from the suspect, but only the statistical frequency associated with the additional loci should be presented at trial (to prevent the selection bias that is inherent in searching a databank). Forensic DNA typing laboratories should recognize that they will require additional loci beyond those used in the databank to prove a case against a suspect. Preparations should be begun now to have additional loci characterized and available for general use before any DNA profile databank comes into common use. STATUS OF DATABANK DEVELOPMENT There have already been state and federal efforts toward the creation of DNA profile databanks. We review them briefly here. State Level According to a recent FBI survey,9 27% of 177 forensic science laboratories responding indicated that they have legislative authority or a mandate to construct a databank for their own jurisdictions to match DNA profiles. An additional 38% believed that such authority or mandate was likely by 1991. According to an Office of Technology Assessment survey conducted in late 1989,10 at least 17 states had passed or were considering legislation creating statewide DNA databanks. The persons to be included in the databanks range from sex offenders to all felons. Since the time of that survey,

OCR for page 111
the number has no doubt increased. Therefore, it is obvious that many state legislatures recognize the potential benefit of a DNA databank as an important investigative tool and that such databanks will become a reality. Many states are already collecting samples in earnest, although at this writing no databanks are operative. Federal Level The FBI and TWGDAM have proposed the creation of a national DNA profile databank system, including one statistical and three investigative databanks. The statistical databank would include DNA profiles of randomly selected unrelated persons and would be built collaboratively and maintained by the FBI for use by all forensic laboratories. The investigative databanks would contain DNA profiles of body fluids from the scenes of crimes for which suspects have been identified, convicted offenders, and bodies, body parts, and bone fragments of unidentified persons. In the proposed national DNA profile databank system, individual law-enforcement agencies (forensic laboratories) would contribute DNA profiles (without personal information) to a centralized databank, but retain absolute control of their own case records. The national databanks would reference the sources of the profiles, but case data would be secured and controlled by the state and local agencies. In the national program, the FBI would play the lead role. It would coordinate quality assurance with a technical advisory group to implement appropriate guidelines; coordinate with other agencies that have a law-enforcement interest in the development of the databank; provide hardware and software for the databank server and for state access to the databank; provide hardware to store and back up the databank server; provide training for states in forensic DNA technology, quality control, and databank access; determine formats for databank input and output; update index with new state and federal submissions; assemble population data for all probes used and calculate and disseminate population frequencies; and modify the system to accommodate new DNA typing methods. State and local agencies would be responsible for performing DNA analyses of samples with consensus methods; submitting new information in a specified format for incorporation into the databanks; guaranteeing the quality of their new submissions; providing hardware and software for state image-analysis workstations for telephone access to centralized index; maintaining centrally indexed case files for as long as they remain in the index; and providing relevant information from case files that are indexed centrally to other law-enforcement agencies, which subscribe when requested. Just as the Department of Defense keeps dental records and fingerprints (with the FBI) of American soldiers, it is seeking funding to collect blood

OCR for page 111
samples from each soldier and establish a DNA profile databank. When a soldier is killed and cannot be identified with usual methods, a sample of tissue, blood, or bone marrow from the remains would be subjected to DNA analysis for comparison with entries in the databank. There are 3.3 million active and reserve members of the armed forces. Given the costs associated with the current technology, a DNA databank of such scope would not be amenable to RFLP analysis. The Armed Forces Institute of Pathology therefore proposes to begin collecting and storing samples while working on the development of a DNA analysis method, which when perfected will be much less expensive and time-consuming than existing RFLP methods. A databank of military personnel could also offer ancillary forensic applications: criminal investigations conducted by criminal investigation divisions of the armed forces could be aided in the same manner as those of other law-enforcement agencies, identification of subjects for security purposes could be enhanced, and identification of urine samples from disputed sources for drug testing could be verified. The present committee has not been asked to comment on this program; we simply acknowledge its existence. MODEL COOPERATIVE INFORMATION RESOURCE Local autonomy as to databank structure and function is recommended, for several reasons: a databank can be tailored to meet local needs, the local databank administrator will not have to rely on outside entities for maintenance and change, and security can best be managed with smaller, discrete, well-understood databanks. That is not to say that standards and guidelines should be avoided. On the contrary, very strict regulations, standards, and guidelines for all aspects of the operation should be enforced and monitored. Databank requirements involve determining what a system must accomplish; there are typically many alternative implementation details that can accomplish the same goals. The experimental protocols used to derive DNA profiles will probably continue to change as the associated technologies continue to mature. That presents a problem that is common in databank applications when the underlying science is in flux: maintaining data integrity while keeping the system current with the most appropriate technology. It will be challenging, but necessary to ensure competence. In practice, that means designing for change, which requires partitioning the problem into two domains—one that is relatively stable and one that is relatively dynamic. For example, data within the sample context are relatively stable, whereas those associated with experiments and derived data are relatively dynamic. Figure 5-1 is a high-level data flow diagram that shows one possible model for the flow of information from state or regional laboratories to a

OCR for page 111
FIGURE 5-1 Hypothetical national information resource. Data flow starts with forensic laboratories in various states that provide raw data. Data reduction process provides information to national information resource databank. Merged and reduced data are provided only to authorized users. national information resource. Each state or region could have several participating local facilities, but for simplification it is recommended that each state or region have one official clearinghouse for locally derived information. Regional facilities could range from situations where a state has several laboratories serving a high-volume workload, as in California, to a regional group of states that each have only periodic and low-volume workloads. All the locally generated raw data and results would be stored

OCR for page 111
at the state or regional level. Thus, all the information concerning the sample, experimental, and result contexts would be stored at the state or regional level; only data associated with the result context would be accessed at the national level. The box labeled ''Data Reduction Process" in the center of the figure ideally represents a standardized method for DNA typing that all laboratories use. The hypothetical national information resource shown in Figure 5-1 does not necessarily represent a physical entity, but could be simply a view of derived data from all the various regional databanks. A view would be achieved by having access software sitting "on top" of the various state or regional databanks. The software would have distinctly different requirements for level of access to data in the databank. For example, "outside views" would only need access to VNTR profiles and some arbitrary identification number; no further information at this first level of access would be required for initial identification searching. SUMMARY OF RECOMMENDATIONS In principle, a national DNA profile databank should be created that contains information on felons convicted of violent crimes with high rates of recidivism. The case is strongest for felons who have committed rape, because perpetrators typically leave biological evidence (semen) that could allow them to be identified. The case is somewhat weaker for violent offenders who are most likely to commit homicide as a recidivist offense, because killers leave biological evidence only in a minority of cases. The wisdom of including other offenders depends primarily on the rate at which they are likely to commit rape, because rape is the crime for which the databank will be of primary use. There are a number of scenarios that illustrate the point that the databank need not be limited to persons convicted of specified crimes. The databank should also contain DNA profiles of samples from unidentified persons collected at the scenes of violent crimes. Databanks containing DNA profiles of members of the general population (as exist for ordinary fingerprints for identification purposes) are not appropriate, for reasons of both privacy and economics. DNA profile databanks should be accessible only to legally authorized persons and should be stored in a secure information resource. Legal policy concerning access and use of both DNA samples and DNA databank information should be established before widespread proliferation of samples and information repositories. Interim protection and sanctions against misuse and abuse of information derived from DNA typ

OCR for page 111
ing should be established immediately. Policies should explicitly define authorized uses and should provide for criminal penalties for abuses. Although the committee endorses the concept of a limited national DNA profile databank, we doubt that existing RFLP-based technology provides a wise long-term foundation for such a databank. We expect current methods to be replaced soon with techniques that are simpler, easier to automate, and less expensive—but incompatible with existing DNA profiles. Accordingly, we do not recommend establishing a comprehensive DNA profile databank yet. For the short term, we recommend the establishment of pilot projects that involve prototype databanks based on RFLP technology and consisting primarily of profiles of violent sex offenders. Such pilot projects could be worthwhile for identifying problems and issues in the creation of databanks. However, in the intermediate term, more efficient methods will replace the current one, and the forensic community should not allow itself to become locked into an outdated method. State and federal laboratories, which have a long tradition and much experience with the management of other types of basic evidence, should be given primary responsibility, authority, and additional resources to handle forensic DNA testing and all the associated sample-handling and data-handling requirements. Private-sector firms should not be discouraged from continuing to prepare and analyze DNA samples for specific cases or for databank samples, but they must be held accountable for misuse and abuse to the same extent as government-funded laboratories and government authorities. Discovery of a match between an evidence sample and a databank entry should be used only as the basis for further testing using markers at additional loci. The initial match should be used as probable cause to obtain a blood sample from the suspect, but only the statistical frequency associated with the additional loci should be presented at trial. REFERENCES 1. Wilson T. Automated fingerprint identification systems. Law Enfore Technol. 1986. Aug-Sep:17-20, 45-48. 2. Dalenius T. Towards a methodology and statistical disclosure control. Statistik Tid-skrift. 15:213-225, 1977. 3. Adam NR, Wortmann JC. Security-control methods for statistical databases: a compara­tive study. ACM Comput Surv, 21:515-556, 1989. 4. Code of Virginia, Title 19.2-310.6. Unauthorized uses of DNA data bank; forensic samples; penalties (1990, c. 669). 5. Jones v. Murray,763 F. Supp. 842 (W.D. Va. 1990). 6. Cellmark Diagnostics. Proposal for DNA databasing, Division of Criminal Investigation Forensic Laboratory, South Dakota. Dec. 21, 1990.

OCR for page 111
7. Beck A, Shipley B. Recidivism of prisoners released in 1983. Bureau of Justice Statis­tics Special Report NCJ-16261. Washington, D.C., 1989. 8. Lunt TF, Tamaru A, Gilham F, Jagannathan R, Neuman PG, Jalali C. IDES: a progress report. Proceedings of the Sixth Annual Computer Security Applications , Tucson, Arizo­na: ACM Press, 1990. 9. Miller J. The outlook for forensic DNA testing in the United States. Crime Lab Digest. 17(Suppl. 1 ):1-14, 1990. 10. U.S. Congress, Office of Technology Assessment. Genetic witness: forensic uses of DNA test. OTA-BA-438. Washington, D.C.: U.S. Government Printing Office, 1990.