Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 129
Reference Guide on
DNA Identification Evidence
DAVID H. KAYE AND GEORGE SENSABAUGH
David H. Kaye, M.A., J.D., is Distinguished Professor of Law, Weiss Family Scholar, and
Graduate Faculty Member, Forensic Science Program, The Pennsylvania State University,
University Park, and Regents’ Professor Emeritus, Arizona State University Sandra Day
O’Connor College of Law and School of Life Sciences, Tempe.
George Sensabaugh, D.Crim., is Professor of Biomedical and Forensic Sciences, School of
Public Health, University of California, Berkeley.
ConTenTs
I. Introduction, 131
A. Summary of Contents, 131
B. A Brief History of DNA Evidence, 132
C. Relevant Expertise, 134
II. Variation in Human DNA and Its Detection, 135
A. What Are DNA, Chromosomes, and Genes? 136
B. What Are DNA Polymorphisms and How Are They Detected? 139
1. Sequencing, 139
2. Sequence-specific probes and SNP chips, 140
3. VNTRs and RFLP testing, 140
4. STRs, 141
5. Summary, 142
C. How Is DNA Extracted and Amplified? 143
D. How Is STR Profiling Done with Capillary Electrophoresis? 144
E. What Can Be Done to Validate a Genetic System for
Identification? 148
F. What New Technologies Might Emerge? 148
1. Miniaturized “lab-on-a-chip” devices, 148
2. High-throughput sequencing, 149
3. Microarrays, 150
4. What questions do the new technologies raise? 150
III. Sample Collection and Laboratory Performance, 151
A. Sample Collection, Preservation, and Contamination, 151
1. Did the sample contain enough DNA? 151
2. Was the sample of sufficient quality? 152
129
OCR for page 130
Reference Manual on Scientific Evidence
B. Laboratory Performance, 153
1. What forms of quality control and assurance should be
followed? 153
2. How should samples be handled? 156
IV. Inference, Statistics, and Population Genetics in Human Nuclear DNA
Testing, 159
A. What Constitutes a Match or an Exclusion? 159
B. What Hypotheses Can Be Formulated About the Source? 160
C. Can the Match Be Attributed to Laboratory Error? 161
D. Could a Close Relative Be the Source? 162
E. Could an Unrelated Person Be the Source? 163
1. Estimating allele frequencies from samples, 164
2. The product rule for a randomly mating population, 165
3. The product rule for a structured population, 166
F. Probabilities, Probative Value, and Prejudice, 167
1. Frequencies and match probabilities, 167
2. Likelihood ratios, 172
3. Posterior probabilities, 173
G. Verbal Expressions of Probative Value, 174
1. “Rarity” or “strength” testimony, 175
2. Source or uniqueness testimony, 175
V. Special Issues in Human DNA Testing, 176
A. Mitochondrial DNA, 176
B. Y Chromosomes, 181
C. Mixtures, 182
D. Offender and Suspect Database Searches, 186
1. Which statistics express the probative value of a match to a
defendant located by searching a DNA database? 186
2. Near-miss (familial) searching, 189
3. All-pairs matching within a database to verify estimated
random-match probabilities, 191
VI. Nonhuman DNA Testing, 193
A. Species and Subspecies, 193
B. Individual Organisms, 195
Glossary of Terms, 199
References on DNA, 210
130
OCR for page 131
Reference Guide on DNA Identification Evidence
I. Introduction
Deoxyribonucleic acid, or DNA, is a molecule that encodes the genetic informa-
tion in all living organisms. Its chemical structure was elucidated in 1954. More
than 30 years later, samples of human DNA began to be used in the criminal
justice system, primarily in cases of rape or murder. The evidence has been the
subject of extensive scrutiny by lawyers, judges, and the scientific community. It
is now admissible in all jurisdictions, but there are many types of forensic DNA
analysis, and still more are being developed. Questions of admissibility arise as
advancing methods of analysis and novel applications of established methods are
introduced.1
This reference guide addresses technical issues that are important when con-
sidering the admissibility of and weight to be accorded analyses of DNA, and it
identifies legal issues whose resolution requires scientific information. The goal is
to present the essential background information and to provide a framework for
resolving the possible disagreements among scientists or technicians who testify
about the results and import of forensic DNA comparisons.
A. Summary of Contents
Section I provides a short history of DNA evidence and outlines the types of
scientific expertise that go into the analysis of DNA samples.
Section II provides an overview of the scientific principles behind DNA typ-
ing. It describes the structure of DNA and how this molecule differs from person
to person. These are basic facts of molecular biology. The section also defines
the more important scientific terms and explains at a general level how DNA
differences are detected. These are matters of analytical chemistry and laboratory
procedure. Finally, the section indicates how it is shown that these differences
permit individuals to be identified. This is accomplished with the methods of
probability and statistics.
Section III considers issues of sample quantity and quality as well as laboratory
performance. It outlines the types of information that a laboratory should produce
to establish that it can analyze DNA reliably and that it has adhered to established
laboratory protocols.
Section IV examines issues in the interpretation of laboratory results. To assist
the courts in understanding the extent to which the results incriminate the defen-
dant, it enumerates the hypotheses that need to be considered before concluding
that the defendant is the source of the crime scene samples, and it explores the
1. For a discussion of other forensic identification techniques, see Paul C. Giannelli et al., Ref-
erence Guide on Forensic Identification Expertise, in this manual. See also David H. Kaye et al., The
New Wigmore, A Treatise on Evidence: Expert Evidence (2d ed. 2011).
131
OCR for page 132
Reference Manual on Scientific Evidence
issues that arise in judging the strength of the evidence. It focuses on questions of
statistics, probability, and population genetics.2
Section V describes special issues in human DNA testing for identification.
These include the detection and interpretation of mixtures, Y-STR testing,
mitochondrial DNA testing, and the evidentiary implications of DNA database
searches of various kinds.
Finally, Section VI discusses the forensic analysis of nonhuman DNA. It iden-
tifies questions that can be useful in judging whether a new method or application
of DNA science has the scientific merit and power claimed by the proponent of
the evidence.
A glossary defines selected terms and acronyms encountered in genetics,
molecular biology, and forensic DNA work.
B. A Brief History of DNA Evidence
“DNA evidence” refers to the results of chemical or physical tests that directly
reveal differences in the structure of the DNA molecules found in organisms as
diverse as bacteria, plants, and animals.3 The technology for establishing the iden-
tity of individuals became available to law enforcement agencies in the mid to
late 1980s.4 The judicial reception of DNA evidence can be divided into at least
five phases.5 The first phase was one of rapid acceptance. Initial praise for RFLP
(restriction fragment length polymorphism) testing in homicide, rape, paternity,
and other cases was effusive. Indeed, one judge proclaimed “DNA fingerprinting”
to be “the single greatest advance in the ‘search for truth’ . . . since the advent of
cross-examination.”6 In this first wave of cases, expert testimony for the prosecu-
tion rarely was countered, and courts readily admitted DNA evidence.
In a second wave of cases, however, defendants pointed to problems at two
levels—controlling the experimental conditions of the analysis and interpreting the
results. Some scientists questioned certain features of the procedures for extracting
and analyzing DNA employed in forensic laboratories, and it became apparent
2. For a broader discussion of statistics, see David H. Kaye & David A. Freedman, Reference
Guide on Statistics, in this manual.
3. Differences in DNA also can be revealed by differences in the proteins that are made accord-
ing to the “instructions” in a DNA molecule. Blood group factors, serum enzymes and proteins,
and tissue types all reveal information about the DNA that codes for these chemical structures. Such
immunogenetic testing predates the “direct” DNA testing that is the subject of this chapter. On the
nature and admissibility of the “indirect” DNA testing, see, for example, David H. Kaye, The Double
Helix and the Law of Evidence 5–19 (2010); 1 McCormick on Evidence § 205(B) (Kenneth Broun
ed., 6th ed. 2006).
4. The first reported appellate opinion is Andrews v. State, 533 So. 2d 841 (Fla. Dist. Ct. App.
1988).
5. The description that follows is adapted from 1 McCormick on Evidence, supra note 3, § 205(B).
6. People v. Wesley, 533 N.Y.S.2d 643, 644 (Alb. County. Ct. 1988).
132
OCR for page 133
Reference Guide on DNA Identification Evidence
that declaring matches or nonmatches in the DNA variations being compared
was not always trivial. Despite these concerns, most cases continued to find the
DNA analyses to be generally accepted, and a number of states provided for
admissibility of DNA tests by legislation. Concerted attacks by defense experts of
impressive credentials, however, produced a few cases rejecting specific proffers
on the ground that the testing was not sufficiently rigorous.7
A different attack on DNA profiling begun in cases during this period proved
far more successful and led to a third wave of cases in which many courts held
that estimates of the probability of a coincidentally matching DNA profile were
inadmissible. These estimates relied on a simple population genetics model for the
frequencies of DNA profiles, and some prominent scientists claimed that the appli-
cability of the mathematical model had not been adequately verified. A heated
debate on this point spilled over from courthouses to scientific journals and con-
vinced the supreme courts of several states that general acceptance was lacking. A
1992 report of the National Academy of Sciences proposed a more “conservative”
computational method as a compromise,8 and this seemed to undermine the claim
of scientific acceptance of the less conservative procedure that was in general use.
In response to the population genetics criticism and the 1992 report came an
outpouring of critiques of the report and new studies of the distribution of the DNA
variations in many populations. Relying on the burgeoning literature, a second
National Academy panel concluded in 1996 that the usual method of estimating fre-
quencies in broad racial groups generally was sound, and it proposed improvements
and additional procedures for estimating frequencies in subgroups within the major
population groups.9 In the corresponding fourth phase of judicial scrutiny of DNA
evidence, the courts almost invariably returned to the earlier view that the statistics
associated with DNA profiling are generally accepted and scientifically valid.
In the fifth phase of the judicial evaluation of DNA evidence, results obtained
with the newer “PCR-based methods” entered the courtroom. Once again,
courts considered whether the methods rested on a solid scientific foundation and
were generally accepted in the scientific community. The opinions are practically
unanimous in holding that the PCR-based procedures satisfy these standards.
Before long, forensic scientists settled on the use of one type of DNA variation
(known as short tandem repeats, or STRs) to include or exclude individuals as
the source of crime scene DNA.
7. Moreover, a minority of courts, perhaps concerned that DNA evidence might be conclusive
in the minds of jurors, added a “third prong” to the general-acceptance standard of Frye v. United
States, 293 F. 1013 (D.C. Cir. 1923). This augmented Frye test requires not only proof of the general
acceptance of the ability of science to produce the type of results offered in court, but also of the
proper application of an approved method on the particular occasion. For criticism of this approach,
see David H. Kaye et al., supra note 1, § 6.3.3(a)(2).
8. National Research Council, DNA Technology in Forensic Science (1992) [hereinafter NRC I].
9. National Research Council, The Evaluation of Forensic DNA Evidence (1996) [hereinafter
NRC II].
133
OCR for page 134
Reference Manual on Scientific Evidence
Throughout these phases, DNA tests also exonerated an increasing number
of men who had been convicted of capital and other crimes, posing a challenge
to traditional postconviction remedies and raising difficult questions of postcon-
viction access to DNA samples.10 The value of DNA evidence in solving older
crimes also prompted extensions of some statutes of limitations.11
In sum, in little more than a decade, forensic DNA typing made the transition
from a novel set of methods for identification to a relatively mature and well-
studied forensic technology. However, one should not lump all forms of DNA
identification together. New techniques and applications continue to emerge,
ranging from the use of new genetic systems and new analytical procedures to the
typing of DNA from plants and animals. Before admitting such evidence, courts
normally inquire into the biological principles and knowledge that would justify
inferences from these new technologies or applications. As a result, this guide
describes not only the predominant STR technology, but also newer analytical
techniques that can be used for forensic DNA identification.
C. Relevant Expertise
Human DNA identification can involve testimony about laboratory findings,
about the statistical interpretation of those findings, and about the underlying
principles of molecular biology. Consequently, expertise in several fields might be
required to establish the admissibility of the evidence or to explain it adequately to
the jury. The expert who is qualified to testify about laboratory techniques might
not be qualified to testify about molecular biology, to make estimates of popula-
tion frequencies, or to establish that an estimation procedure is valid.12
10. See, e.g., Osborne v. District Attorney’s Office for Third Judicial District, 129 S. Ct. 2308 (2009)
(narrowly rejecting a convicted offender’s claim of a due process right to DNA testing at his expense,
enforceable under 42 U.S.C. § 1983, to establish that he is probably innocent of the crime for which
he was convicted after a fair trial, when (1) the convicted offender did not seek extensive DNA testing
before trial even though it was available, (2) he had other opportunities to prove his innocence after a
final conviction based on substantial evidence against him, (3) he had no new evidence of innocence (only
the hope that more extensive DNA testing than that done before the trial would exonerate him), and
(4) even a finding that he was not source of the DNA would not conclusively demonstrate his innocence);
Skinner v. Switzer, 131 S. Ct. 1289 (2011); Brandon L. Garrett, Judging Innocence, 108 Colum. L. Rev.
55 (2008); Brandon L. Garrett, Claiming Innocence, 92 Minn. L. Rev. 1629 (2008).
11. See, e.g., Veronica Valdivieso, DNA Warrants: A Panacea for Old, Cold Rape Cases? 90 Geo.
L.J. 1009 (2002).
12. Nonetheless, if previous cases establish that the testing and estimation procedures are legally
acceptable, and if the computations are essentially mechanical, then highly specialized statistical exper-
tise might not be essential. Reasonable estimates of DNA characteristics in major population groups can
be obtained from standard references, and many quantitatively literate experts could use the appropriate
formulae to compute the relevant profile frequencies or probabilities. NRC II, supra note 9, at 170.
Limitations in the knowledge of a technician who applies a generally accepted statistical procedure
can be explored on cross-examination. See Kaye et al., supra note 1, § 2.2. Accord Roberson v. State,
16 S.W.3d 156, 168 (Tex. Crim. App. 2000).
134
OCR for page 135
Reference Guide on DNA Identification Evidence
Trial judges ordinarily are accorded great discretion in evaluating the qualifi-
cations of a proposed expert witness, and the decisions depend on the background
of each witness. Courts have noted the lack of familiarity of academic experts—
who have done respected work in other fields—with the scientific literature on
forensic DNA typing and on the extent to which their research or teaching lies in
other areas.13 Although such concerns may affect the persuasiveness of particular
testimony, they rarely result in exclusion on the grounds that the witness simply
is not qualified as an expert.
The scientific and legal literature on the objections to DNA evidence is
extensive. By studying the scientific publications, or perhaps by appointing a spe-
cial master or expert adviser to assimilate this material, a court can ascertain where
a party’s expert falls within the spectrum of scientific opinion. Furthermore, an
expert appointed by the court under Federal Rule of Evidence 706 could testify
about the scientific literature generally or even about the strengths or weaknesses
of the particular arguments advanced by the parties.
Given the great diversity of forensic questions to which DNA testing might
be applied, it is not feasible to list the specific scientific expertise appropriate to all
applications. Assessing the value of DNA analyses of a novel application involv-
ing unfamiliar species can be especially challenging. If the technology is novel,
expertise in molecular genetics or biotechnology might be necessary. If testing
has been conducted on a particular organism or category of organisms, expertise
in that area of biology may be called for. If a random-match probability has been
presented, one might seek expertise in statistics as well as the population biology
or population genetics that goes with the organism tested. Given the penetration
of molecular technology into all areas of biological inquiry, it is likely that indi-
viduals can be found who know both the technology and the population biology
of the organism in question. Finally, when samples come from crime scenes, the
expertise and experience of forensic scientists can be crucial. Just as highly focused
specialists may be unaware of aspects of an application outside their field of exper-
tise, so too scientists who have not previously dealt with forensic samples can be
unaware of case-specific factors that can confound the interpretation of test results.
II. Variation in Human DNA and Its
Detection
DNA is a complex molecule that contains the “genetic code” of organisms as
diverse as bacteria and humans. Although the DNA molecules in human cells are
13. E.g., State v. Copeland, 922 P.2d 1304, 1318 n.5 (Wash. 1996) (noting that defendant’s
statistical expert “was also unfamiliar with publications in the area,” including studies by “a leading
expert in the field” whom he thought was “a ‘guy in a lab somewhere’”).
135
OCR for page 136
Reference Manual on Scientific Evidence
largely identical from one individual to another, there are detectable variations—
except for identical twins, every two human beings have some differences in the
detailed structure of their DNA. This section describes the basic features of DNA
and some ways in which it can be analyzed to detect these differences.
A. What Are DNA, Chromosomes, and Genes?
The DNA molecule is made of subunits that include four chemical structures
known as nucleotide bases. The names of these bases (adenine, thymine, guanine,
and cytosine) usually are abbreviated as A, T, G, and C. The physical structure of
DNA is often described as a double helix because the molecule has two spiraling
strands connected to each other by weak bonds between the nucleotide bases.
As shown in Figure 1, A pairs only with T and G only with C. Thus, the order
of the single bases on either strand reveals the order of the pairs from one end of
the molecule to the other, and the DNA molecule could be said to be like a long
sequence of As, Ts, Gs, and Cs.
Figure 1. Sketch of a small part of a double-stranded DNA molecule. Nucleotide
bases are held together by weak bonds. A pairs with T; C pairs with G.
Most human DNA is tightly packed into structures known as chromo-
somes, which come in different sizes and are located in the nuclei of cells. The
chromosomes are numbered4-1descending order of size) 1 through 22, with the
(in fixed image
remaining chromosome being an X or a much smaller Y. If the bases are like
letters, then each chromosome is like a book written in this four-letter alphabet,
and the nucleus is like a bookshelf in the interior of the cell. All the cells in one
136
OCR for page 137
Reference Guide on DNA Identification Evidence
individual contain identical copies of the same collection of books. The sequence
of the As, Ts, Gs, and Cs that constitutes the “text” of these books is referred to
as the individual’s nuclear genome.
All told, the genome comprises more than three billion “letters” (As, Ts, Gs,
and Cs). If these letters were printed in books, the resulting pile would be as high
as the Washington Monument. About 99.9% of the genome is identical between
any two individuals. This similarity is not really surprising—it accounts for the
common features that make humans an identifiable species (and for features that
we share with many other species as well). The remaining 0.1% is particular to an
individual. This variation makes each person (other than identical twins) geneti-
cally unique. This small percentage may not sound like a lot, but it adds up to
some three million sites for variation among individuals.
The process that gives rise to this variation among people starts with the pro-
duction of special sex cells—sperm cells in males and egg cells in females. All the
nucleated cells in the body other than sperm and egg cells contain two versions of
each of the 23 chromosomes—two copies of chromosome 1, two copies of chromo-
some 2, and so on, for a total of 46 chromosomes. The X and Y chromosomes are
the sex-determining chromosomes. Cells in females contain two X chromosomes,
and cells in males contain one X and one Y chromosome. An egg cell, however,
contains only 23 chromosomes—one chromosome 1, one chromosome 2, . . . , and
one X chromosome—each selected at random from the woman’s full complement
of 23 chromosome pairs. Thus, each egg carries half the genetic information present
in the mother’s 23 chromosome pairs, and because the assortment of the chromo-
somes is random, each egg carries a different complement of genetic information.
The same situation exists with sperm cells. Each sperm cell contains a single copy
of each of the 23 chromosomes selected at random from a man’s 23 pairs, and each
sperm differs in the assortment of the 23 chromosomes it carries. Fertilization of an
egg by a sperm therefore restores the full number of 46 chromosomes, with the 46
chromosomes in the fertilized egg being a new combination of those in the mother
and father. The process resembles taking two decks of cards (a male and a female
deck) and shuffling a random half from the male deck into a random half from the
female deck, to produce a new deck.
During pregnancy, the fertilized cell divides to form two cells, each of which
has an identical copy of the 46 chromosomes. The two then divide to form four,
the four form eight, and so on. As gestation proceeds, various cells specialize
(“differentiate”) to form different tissues and organs. Although cell differentiation
yields many different kinds of cells, the process of cell division results in each prog-
eny cell having the same genomic complement as the cell that divided. Thus, each
of the approximately 100 trillion cells in the adult human body has the same DNA
text as was present in the original 23 pairs of chromosomes from the fertilized egg,
one member of each pair having come from the mother and one from the father.
A second mechanism operating during the chromosome reduction process in
sperm and egg cells further shuffles the genetic information inherited from mother
137
OCR for page 138
Reference Manual on Scientific Evidence
and father. In the first stage of the reduction process, each chromosome of a
chromosome pair aligns with its partner. The maternally inherited chromosome 1
aligns with the paternally inherited chromosome 1, and so on through the 22 pairs;
X chromosomes align with each other as well, but X and Y chromosomes do not.
While the chromosome pairs are aligned, they exchange pieces to create new com-
binations. The recombined chromosomes are passed on in the sperm and eggs. As a
consequence, the chromosomes we inherit from our parents are not exact copies of
their chromosomes, but rather are mosaics of these parental chromosomes.
The swapping of material between chromosome pairs (as they align in the
emerging sex cells) and the random selection (of half of each parent’s 46 chromo-
somes) in making sex cells is called recombination. Recombination is the principal
source of diversity in individual human genomes.
The diverse variations occur both within the genes and in the regions of
DNA sequences between the genes. A gene can be defined as a segment of DNA,
usually from 1000 to 10,000 base pairs long, that “codes” for a protein. The cell
produces specific proteins that correspond to the order of the base pairs (the
“letters”) in the coding part of the gene.14 Human genes also contain noncoding
sequences that regulate the cell type in which a protein will be synthesized and
how much protein will be produced.15 Many genes contain interspersed non-
coding, nonregulatory sequences that no longer participate in protein synthesis.
These sequences, which have no apparent function, constitute about 23% of the
base pairs within human genes.16 In terms of the metaphor of DNA as text, the
gene is like an important paragraph in the book, often with some gibberish in it.
Proteins perform all sorts of functions in the body and thus produce observ-
able characteristics. For example, a tiny part of the sequence that directs the pro-
duction of the human group-specific complement protein (a protein that binds to
vitamin D and transports it to certain tissues) is
G C A A A A T T G C C T G A T G C C A C A C C C A A G G A A C T G G C A.
14. The sequence in which the building blocks (amino acids) of a protein are arranged corre-
sponds to the sequence of base pairs within a gene. (A sequence of three base pairs specifies a particular
1 of the 20 possible amino acids in the protein. The mapping of a set of three nucleotide bases to a par-
ticular amino acid is the genetic code. The cell makes the protein through intermediate steps involving
coding RNA transcripts.) About 1.5% of the human genome codes for the amino acid sequences.
15. These noncoding but functional sequences include promoters, enhancers, and repressors.
16. This gene-related DNA consists of introns (which interrupt the coding sequences, called
exons, in genes and which are edited out of the RNA transcript for the protein), pseudogenes (evo-
lutionary remnants of once-functional genes), and gene fragments. The idea of a gene as a block of
DNA (some of which is coding, some of which is regulatory, and some of which is functionless) is
an oversimplification, but it is useful enough here. See, e.g., Mark B. Gerstein et al., What Is a Gene,
Post-ENCODE? History and Updated Definition, 17 Genome Res. 669 (2007).
138
OCR for page 139
Reference Guide on DNA Identification Evidence
This gene always is located at the same position, or locus, on chromosome 4.
As we have seen, most individuals have two copies of each gene at a given locus—
one from the father and one from the mother.
A locus where almost all humans have the same DNA sequence is called
monomorphic (“of one form”). A locus where the DNA sequence varies among
significant numbers of individuals (more than 1% or so of the population pos-
sesses the variant) is called polymorphic (“of many forms”), and the alternative
forms are called alleles. For example, the GC protein gene sequence has three
common alleles that result from substitutions in a base at a given point. Where an
A appears in one allele, there is a C in another. The third allele has the A, but at
another point a G is swapped for a T. These changes are called single nucleotide
polymorphisms (SNPs, pronounced “snips”).
If a gene is like a paragraph in a book, a SNP is a change in a letter some-
where within that paragraph (a substitution, a deletion, or an insertion), and the
two versions of the gene that result from this slight change are the alleles. An
individual who inherits the same allele from both parents is called a homozygote.
An individual with distinct alleles is a heterozygote.
DNA sequences used for forensic analysis usually are not genes. They lie in
the vast regions between genes (about 75% of the genome is extragenic) or
in the apparently nonfunctional regions within genes. These extra- and intragenic
regions of DNA have been found to contain considerable sequence variation,
which makes them particularly useful in distinguishing individuals. Although
the terms “locus,” “allele,” “homozygous,” and “heterozygous” were developed
to describe genes, the nomenclature has been carried over to describe all DNA
variation—coding and noncoding alike. Both types are inherited from mother and
father in the same fashion.
B. What Are DNA Polymorphisms and How Are They
Detected?
By determining which alleles are present at strategically chosen loci, the forensic
scientist ascertains the genetic profile, or genotype, of an individual (at those loci).
Although the differences among the alleles arise from alterations in the order of
the ATGC letters, genotyping does not necessarily require “reading” the full DNA
sequence. Here we outline the major types of polymorphisms that are (or could
be) used in identity testing and the methods for detecting them.
1. Sequencing
Researchers are investigating radically new and efficient technologies to sequence
entire genomes, one base pair at a time, but the direct sequencing methods now in
existence are technically demanding, expensive, and time-consuming for whole-
genome sequencing. Therefore, most genetic typing focuses on identifying only
139
OCR for page 200
Reference Manual on Scientific Evidence
autosome. A chromosome other than the X and Y sex chromosomes.
band. See autoradiograph.
band shift. Movement of DNA fragments in one lane of a gel at a different rate
than fragments of an identical length in another lane, resulting in the same
pattern “shifted” up or down relative to the comparison lane. Band shift does
not necessarily occur at the same rate in all portions of the gel.
base pair (bp). Two complementary nucleotides bonded together at the match-
ing bases (A and T or C and G) along the double helix “backbone” of the
DNA molecule. The length of a DNA fragment often is measured in numbers
of base pairs (1 kilobase (kb) = 1000 bp); base-pair numbers also are used to
describe the location of an allele on the DNA strand.
Bayes’ theorem. A formula that relates certain conditional probabilities. It
can be used to describe the impact of new data on the probability that a
hypothesis is true. See the chapter on statistics in this manual.
bin, fixed. In VNTR profiling, a bin is a range of base pairs (DNA fragment
lengths). When a database is divided into fixed bins, the proportion of bands
within each bin is determined and the relevant proportions are used in esti-
mating the profile frequency.
binning. Grouping VNTR alleles into sets of similar sizes because the alleles’
lengths are too similar to differentiate.
bins, floating. In VNTR profiling, a bin is a range of base pairs (DNA fragment
lengths). In a floating bin method of estimating a profile frequency, the bin is
centered on the base-pair length of the allele in question, and the width of the
bin can be defined by the laboratory’s matching rule (e.g., ±5% of band size).
blind proficiency test. See proficiency test.
capillary electrophoresis. A method for separating DNA fragments (includ-
ing STRs) according to their lengths. A long, narrow tube is filled with an
entangled polymer or comparable sieving medium, and an electric field is
applied to pull DNA fragments placed at one end of the tube through the
medium. The procedure is faster and uses smaller samples than gel electro-
phoresis, and it can be automated.
ceiling principle. A procedure for setting a minimum DNA profile frequency
proposed in 1992 by a committee of the National Academy of Sciences. One
hundred persons from each of 15 to 20 genetically homogeneous populations
spanning the range of racial groups in the United States are sampled. For each
allele, the higher frequency among the groups sampled (or 5%, whichever is
larger) is used in calculating the profile frequency. Compare interim ceiling
principle.
chip. A miniaturized system for genetic analysis. One such chip mimics capil-
lary electrophoresis and related manipulations. DNA fragments, pulled by
200
OCR for page 201
Reference Guide on DNA Identification Evidence
small voltages, move through tiny channels etched into a small block of glass,
silicon, quartz, or plastic. This system should be useful in analyzing STRs.
Another technique mimics reverse dot blots by placing a large array of oligo-
nucleotide probes on a solid surface. Such hybridization arrays are useful in
identifying SNPs and in sequencing mitochondrial DNA.
chromosome. A rodlike structure composed of DNA, RNA, and proteins.
Most normal human cells contain 46 chromosomes, 22 autosomes and a sex
chromosome (X) inherited from the mother, and another 22 autosomes and
one sex chromosome (either X or Y) inherited from the father. The genes are
located along the chromosomes. See also homologous chromosomes.
coding and noncoding DNA. The sequence in which the building blocks
(amino acids) of a protein are arranged corresponds to the sequence of base
pairs within a gene. (A sequence of three base pairs specifies a particular
one of the 20 possible amino acids in the protein. The mapping of a set of
three nucleotide bases to a particular amino acid is the genetic code. The
cell makes the protein through intermediate steps involving coding RNA
transcripts.) About 1.5% of the human genome codes for the amino acid
sequences. Another 23.5% of the genome is classified as genetic sequence but
does not encode proteins. This portion of the noncoding DNA is involved
in regulating the activity of genes. It includes promoters, enhancers, and
repressors. Other gene-related DNA consists of introns (that interrupt the
coding sequences, called exons, in genes and that are edited out of the RNA
transcript for the protein), pseudogenes (evolutionary remnants of once-
functional genes), and gene fragments. The remaining, extragenic DNA
(about 75% of the genome) also is noncoding.
CODIS (combined DNA index system). A collection of databases on STR
and other loci of convicted felons, maintained by the FBI.
complementary sequence. The sequence of nucleotides on one strand of DNA
that corresponds to the sequence on the other strand. For example, if one
sequence is CTGAA, the complementary bases are GACTT.
control region. See D-loop.
cytoplasm. A jelly-like material (80% water) that fills the cell.
cytosine (C). One of the four bases, or nucleotides, that make up the DNA
double helix. Cytosine binds only to guanine. See nucleotide.
database. A collection of DNA profiles.
degradation. The breaking down of DNA by chemical or physical means.
denature, denaturation. The process of splitting, as by heating, two comple-
mentary strands of the DNA double helix into single strands in preparation
for hybridization with biological probes.
201
OCR for page 202
Reference Manual on Scientific Evidence
deoxyribonucleic acid (DNA). The molecule that contains genetic informa-
tion. DNA is composed of nucleotide building blocks, each containing a
base (A, C, G, or T), a phosphate, and a sugar. These nucleotides are linked
together in a double helix—two strands of DNA molecules paired up at
complementary bases (A with T, C with G). See adenine, cytosine, guanine,
thymine.
diploid number. See haploid number.
D-loop. A portion of the mitochrondrial genome known as the “control
region” or “displacement loop” instrumental in the regulation and initiation
of mtDNA gene products. Two short “hypervariable” regions within the
D-loop do not appear to be functional and are the sequences used in identity
or kinship testing.
DNA polymerase. The enzyme that catalyzes the synthesis of double-stranded
DNA.
DNA probe. See probe.
DNA profile. The alleles at each locus. For example, a VNTR profile is the
pattern of band lengths on an autorad. A multilocus profile represents the
combined results of multiple probes. See genotype.
DNA sequence. The ordered list of base pairs in a duplex DNA molecule or of
bases in a single strand.
DQ. The antigen that is the product of the DQA gene. See DQA, human
leukocyte antigen.
DQA. The gene that codes for a particular class of human leukocyte antigen
(HLA). This gene has been sequenced completely and can be used for forensic
typing. See human leukocyte antigen.
EDTA. A preservative added to blood samples.
electropherogram. The PCR products separated by capillary electrophoresis
can be labeled with a dye that glows at a given wavelength in response to
light shined on it. As the tagged fragments pass the light source, an electronic
camera records the intensity of the fluorescence. Plotting the intensity as a
function of time produces a series of peaks, with the shorter fragments pro-
ducing peaks sooner. The intensity is measured in relative fluorescent units
and is proportional to the number of glowing fragments passing by the detec-
tor. The graph of the intensity over time is an electropherogram.
electrophoresis. See capillary electrophoresis, gel electrophoresis.
endonuclease. An enzyme that cleaves the phosphodiester bond within a
nucleotide chain.
environmental insult. Exposure of DNA to external agents such as heat, mois-
ture, and ultraviolet radiation, or chemical or bacterial agents. Such exposure
202
OCR for page 203
Reference Guide on DNA Identification Evidence
can interfere with the enzymes used in the testing process or otherwise make
DNA difficult to analyze.
enzyme. A protein that catalyzes (speeds up or slows down) a reaction.
epigenetic. Heritable changes in phenotype (appearance) or gene expression
caused by mechanisms other than changes in the underlying DNA sequence.
Epigenetic marks are molecules attached to DNA that can determine whether
genes are active and used by the cell.
ethidium bromide. A molecule that can intercalate into DNA double helices
when the helix is under torsional stress. Used to identify the presence of DNA
in a sample by its fluorescence under ultraviolet light.
exon. See coding and noncoding DNA.
fallacy of the transposed conditional. See transposition fallacy.
false match. Two samples of DNA that have different profiles could be declared
to match if, instead of measuring the distinct DNA in each sample, there is
an error in handling or preparing samples such that the DNA from a single
sample is analyzed twice. The resulting match, which does not reflect the
true profiles of the DNA from each sample, is a false match. Some people
use “false match” more broadly, to include cases in which the true profiles of
each sample are the same, but the samples come from different individuals.
Compare true match. See also match, random match.
gel, agarose. A semisolid medium used to separate molecules by electrophoresis.
gel electrophoresis. In RFLP analysis, the process of sorting DNA fragments
by size by applying an electric current to a gel. The different-size fragments
move at different rates through the gel.
gene. A set of nucleotide base pairs on a chromosome that contains the “instruc-
tions” for controlling some cellular function such as making an enzyme. The
gene is the fundamental unit of heredity; each simple gene “codes” for a
specific biological characteristic.
gene frequency. The relative frequency (proportion) of an allele in a population.
genetic drift. Random fluctuation in a population’s allele frequencies from
generation to generation.
genetics. The study of the patterns, processes, and mechanisms of inheritance of
biological characteristics.
genome. The complete genetic makeup of an organism, including roughly
23,000 genes and many other DNA sequences in humans. Over three billion
nucleotide base pairs comprise the haploid human genome.
genotype. The particular forms (alleles) of a set of genes possessed by an organ-
ism (as distinguished from phenotype, which refers to how the genotype
expresses itself, as in physical appearance). In DNA analysis, the term is
203
OCR for page 204
Reference Manual on Scientific Evidence
applied to the variations within all DNA regions (whether or not they con-
stitute genes) that are analyzed.
genotype, multilocus. The alleles that an organism possesses at several sites in
its genome.
genotype, single locus. The alleles that an organism possesses at a particular
site in its genome.
guanine (G). One of the four bases, or nucleotides, that make up the DNA
double helix. Guanine binds only to cytosine. See nucleotide.
haploid number. Human sex cells (egg and sperm) contain 23 chromosomes
each. This is the haploid number. When a sperm cell fertilizes an egg cell, the
number of chromosomes doubles to 46. This is the diploid number.
haplotype. A specific combination of linked alleles at several loci.
Hardy-Weinberg equilibrium. A condition in which the allele frequencies
within a large, random, intrabreeding population are unrelated to patterns of
mating. In this condition, the occurrence of alleles from each parent will be
independent and have a joint frequency estimated by the product rule. See
independence, linkage disequilibrium.
heteroplasmy, heteroplasty. The condition in which some copies of mito-
chondrial DNA in the same individual have different base pairs at certain
points.
heterozygous. Having a different allele at a given locus on each of a pair of
homologous chromosomes. See allele. Compare homozygous.
homologous chromosomes. The 44 autosomes (nonsex chromosomes) in the
normal human genome are in homologous pairs (one from each parent) that
share an identical set of genes, but may have different alleles at the same loci.
homozygous. Having the same allele at a given locus on each of a pair of
homologous chromosomes. See allele. Compare heterozygous.
human leukocyte antigen (HLA). Antigen (foreign body that stimulates an
immune system response) located on the surface of most cells (excluding red
blood cells and sperm cells). HLAs differ among individuals and are associated
closely with transplant rejection. See DQA.
hybridization. Pairing up of complementary strands of DNA from differ-
ent sources at the matching base-pair sites. For example, a primer with
the sequence AGGTCT would bond with the complementary sequence
TCCAGA on a DNA fragment.
independence. Two events are said to be independent if one is neither more
nor less likely to occur when the other does.
interim ceiling principle. A procedure proposed in 1992 by a committee of
the National Academy of Sciences for setting a minimum DNA profile fre-
quency. For each allele, the highest frequency (adjusted upward for sampling
204
OCR for page 205
Reference Guide on DNA Identification Evidence
error) found in any major racial group (or 10%, whichever is higher), is used
in product-rule calculations. Compare ceiling principle.
intron. See coding and noncoding DNA.
kilobase (kb). A measure of DNA length (1000 bases).
likelihood ratio. A measure of the support that an observation provides for one
hypothesis as opposed to an alternative hypothesis. The likelihood ratio is
computed by dividing the conditional probability of the observation given
that one hypothesis is true by the conditional probability of the observation
given the alternative hypothesis. For example, the likelihood ratio for the
hypothesis that two DNA samples with the same STR profile originated
from the same individual (as opposed to originating from two unrelated
individuals) is the reciprocal of the random-match probability. Legal scholars
have introduced the likelihood ratio as a measure of the probative value of
evidence. Evidence that is 100 times more probable to be observed when
one hypothesis is true as opposed to another has more probative value than
evidence that is only twice as probable.
linkage. The inheritance together of two or more genes on the same chromosome.
linkage equilibrium. A condition in which the occurrence of alleles at different
loci is independent.
locus. A location in the genome, that is, a position on a chromosome where a
gene or other structure begins.
mass spectroscopy. The separation of elements or molecules according to their
molecular weight. In the version being developed for DNA analysis, small
quantities of PCR-amplified fragments are irradiated with a laser to form
gaseous ions that traverse a fixed distance. Heavier ions have longer times of
flight, and the process is known as matrix-assisted laser desorption-ionization
time-of-flight mass spectroscopy. MALDI-TOF-MS, as it is abbreviated, may
be useful in analyzing STRs.
match. The presence of the same allele or alleles in two samples. Two DNA
profiles are declared to match when they are indistinguishable in genetic type.
For loci with discrete alleles, two samples match when they display the same
set of alleles. For RFLP testing of VNTRs, two samples match when the
pattern of the bands is similar and the positions of the corresponding bands
at each locus fall within a preset distance. See match window, false match,
true match.
match window. If two RFLP bands lie within a preset distance, called the
match window, that reflects normal measurement error, they can be declared
to match.
microsatellite. Another term for an STR.
minisatellite. Another term for a VNTR.
205
OCR for page 206
Reference Manual on Scientific Evidence
mitochondria. A structure (organelle) within nucleated (eukaryotic) cells that
is the site of the energy-producing reactions within the cell. Mitochondria
contain their own DNA (often abbreviated as mtDNA), which is inherited
only from mother to child.
molecular weight. The weight in grams of 1 mole (approximately 6.02 × 1023
molecules) of a pure, molecular substance.
monomorphic. A gene or DNA characteristic that is almost always found in
only one form in a population. Compare polymorphism.
multilocus probe. A probe that marks multiple sites (loci). RFLP analysis using
a multilocus probe will yield an autorad showing a striped pattern of 30 or
more bands. Such probes are no longer used in forensic applications.
multilocus profile. See profile.
multiplexing. Typing several loci simultaneously.
mutation. The process that produces a gene or chromosome set differing from
the type already in the population; the gene or chromosome set that results
from such a process.
nanogram (ng). A billionth of a gram.
nucleic acid. RNA or DNA.
nucleotide. A unit of DNA consisting of a base (A, C, G, or T) and attached to
a phosphate and a sugar group; the basic building block of nucleic acids. See
deoxyribonucleic acid.
nucleus. The membrane-covered portion of a eukaryotic cell containing most
of the DNA and found within the cytoplasm.
oligonucleotide. A synthetic polymer made up of fewer than 100 nucleotides;
used as a primer or a probe in PCR. See primer.
paternity index. A number (technically, a likelihood ratio) that indicates the sup-
port that the paternity test results lend to the hypothesis that the alleged father
is the biological father as opposed to the hypothesis that another man selected
at random is the biological father. Assuming that the observed phenotypes cor-
rectly represent the phenotypes of the mother, child, and alleged father tested,
the number can be computed as the ratio of the probability of the phenotypes
under the first hypothesis to the probability under the second hypothesis. Large
values indicate substantial support for the hypothesis of paternity; values near
zero indicate substantial support for the hypothesis that someone other than
the alleged father is the biological father; and values near unity indicate that
the results do not help in determining which hypothesis is correct.
pH. A measure of the acidity of a solution.
phenotype. A trait, such as eye color or blood group, resulting from a genotype.
point mutation. See SNP.
206
OCR for page 207
Reference Guide on DNA Identification Evidence
polymarker. A commercially marketed set of PCR-based tests for protein
polymorphisms.
polymerase chain reaction (PCR). A process that mimics DNA’s own repli-
cation processes to make up to millions of copies of short strands of genetic
material in a few hours.
polymorphism. The presence of several forms of a gene or DNA characteristic
in a population.
population genetics. The study of the genetic composition of groups of
individuals.
population structure. When a population is divided into subgroups that do not
mix freely, that population is said to have structure. Significant structure can
lead to allele frequencies being different in the subpopulations.
primer. An oligonucleotide that attaches to one end of a DNA fragment and
provides a point for more complementary nucleotides to attach and replicate
the DNA strand. See oligonucleotide.
probe. In forensics, a short segment of DNA used to detect certain alleles. The
probe hybridizes, or matches up, to a specific complementary sequence.
Probes allow visualization of the hybridized DNA, either by a radioactive
tag (usually used for RFLP analysis) or a biochemical tag (usually used for
PCR-based analyses).
product rule. When alleles occur independently at each locus (Hardy-Weinberg
equilibrium) and across loci (linkage equilibrium), the proportion of the
population with a given genotype is the product of the proportion of each
allele at each locus, times factors of two for heterozygous loci.
proficiency test. A test administered at a laboratory to evaluate its performance.
In a blind proficiency study, the laboratory personnel do not know that they
are being tested.
prosecutor’s fallacy. See transposition fallacy.
protein. A class of biologically important molecules made up of a linear string
of building blocks called amino acids. The order in which these components
are arranged is encoded in the DNA sequence of the gene that expresses the
protein. See coding DNA.
pseudogenes. Genes that have been so disabled by mutations that they can no
longer produce proteins. Some pseudogenes can still produce noncoding
RNA.
quality assurance. A program conducted by a laboratory to ensure accuracy
and reliability.
quality audit. A systematic and independent examination and evaluation of a
laboratory’s operations.
207
OCR for page 208
Reference Manual on Scientific Evidence
quality control. Activities used to monitor the ability of DNA typing to meet
specified criteria.
random match. A match in the DNA profiles of two samples of DNA, where one
is drawn at random from the population. See also random-match probability.
random-match probability. The chance of a random match. As it is usually
used in court, the random-match probability refers to the probability of a true
match when the DNA being compared to the evidence DNA comes from
a person drawn at random from the population. This random true match
probability reveals the probability of a true match when the samples of DNA
come from different, unrelated people.
random mating. The members of a population are said to mate randomly with
respect to particular genes of DNA characteristics when the choice of mates
is independent of the alleles.
recombination. In general, any process in a diploid or partially diploid cell that
generates new gene or chromosomal combinations not found in that cell or
in its progenitors.
reference population. The population to which the perpetrator of a crime is
thought to belong.
relative fluorescent unit (RFU). See electropherogram.
replication. The synthesis of new DNA from existing DNA. See polymerase
chain reaction.
restriction enzyme. Protein that cuts double-stranded DNA at specific base-
pair sequences (different enzymes recognize different sequences). See restric-
tion site.
restriction fragment length polymorphism (RFLP). Variation among people
in the length of a segment of DNA cut at two restriction sites.
restriction fragment length polymorphism (RFLP) analysis. Analysis of
individual variations in the lengths of DNA fragments produced by digesting
sample DNA with a restriction enzyme.
restriction site. A sequence marking the location at which a restriction enzyme
cuts DNA into fragments. See restriction enzyme.
reverse dot blot. A detection method used to identify SNPs in which DNA
probes are affixed to a membrane, and amplified DNA is passed over the
probes to see if it contains the complementary sequence.
ribonucleic acid (RNA). A single-stranded molecule “transcribed” from
DNA. “Coding” RNA acts as a template for building proteins according
the sequences in the coding DNA from which it is transcribed. Other RNA
transcripts can be a sensor for detecting signals that affect gene expression, a
switch for turning genes off or on, or they may be functionless.
208
OCR for page 209
Reference Guide on DNA Identification Evidence
sequence-specific oligonucleotide (SSO) probe. Also, allele-specific oligo-
nucleotide (ASO) probe. Oligonucleotide probes used in a PCR-associated
detection technique to identify the presence or absence of certain base-pair
sequences identifying different alleles. The probes are visualized by an array
of dots rather than by the electrophoretograms associated with STR analysis.
sequencing. Determining the order of base pairs in a segment of DNA.
short tandem repeat (STR). See variable number tandem repeat.
single-locus probe. A probe that only marks a specific site (locus). RFLP analy-
sis using a single-locus probe will yield an autorad showing one band if the
individual is homozygous, two bands if heterozygous. Likewise, the probe
will produce one or two peaks in an STR electrophoretogram.
SNP (single nucleotide polymorphism). A substitution, insertion, or deletion
of a single base pair at a given point in the genome.
SNP chip. See chip.
Southern blotting. Named for its inventor, a technique by which processed
DNA fragments, separated by gel electrophoresis, are transferred onto a nylon
membrane in preparation for the application of biological probes.
thymine (T). One of the four bases, or nucleotides, that make up the DNA
double helix. Thymine binds only to adenine. See nucleotide.
transposition fallacy. Also called the prosecutor’s fallacy, the transposition
fallacy confuses the conditional probability of A given B [P(A|B)] with that
of B given A [P(B|A)]. Few people think that the probability that a person
speaks Spanish (A) given that he or she is a citizen of Chile (B) equals the
probability that a person is a citizen of Chile (B) given that he or she speaks
Spanish (A). Yet, many court opinions, newspaper articles, and even some
expert witnesses speak of the probability of a matching DNA genotype (A)
given that someone other than the defendant is the source of the crime scene
DNA (B) as if it were the probability of someone else being the source (B)
given the matching profile (A). Transposing conditional probabilities correctly
requires Bayes’ theorem.
true match. Two samples of DNA that have the same profile should match
when tested. If there is no error in the labeling, handling, and analysis of the
samples and in the reporting of the results, a match is a true match. A true
match establishes that the two samples of DNA have the same profile. Unless
the profile is unique, however, a true match does not conclusively prove that
the two samples came from the same source. Some people use “true match”
more narrowly, to mean only those matches among samples from the same
source. Compare false match. See also match, random match.
variable number tandem repeat (VNTR). A class of RFLPs resulting from
multiple copies of virtually identical base-pair sequences, arranged in succes-
sion at a specific locus on a chromosome. The number of repeats varies from
209
OCR for page 210
Reference Manual on Scientific Evidence
individual to individual, thus providing a basis for individual recognition.
VNTRs are longer than STRs.
window. See match window.
X chromosome. See chromosome.
Y chromosome. See chromosome.
References on DNA
Forensic DNA Interpretation (John Buckleton et al. eds., 2005).
John M. Butler, Fundamentals of Forensic DNA Typing (2010).
Ian W. Evett & Bruce S. Weir, Interpreting DNA Evidence: Statistical Genetics
for Forensic Scientists (1998).
William Goodwin et al., An Introduction to Forensic Genetics (2d ed. 2011).
David H. Kaye, The Double Helix and the Law of Evidence (2010).
National Research Council Committee on DNA Forensic Science: An Update,
The Evaluation of Forensic DNA Evidence (1996).
National Research Council Committee on DNA Technology in Forensic Science,
DNA Technology in Forensic Science (1992).
The President’s DNA Initiative, Forensic DNA Resources for Specific Audiences,
available at www.dna.gov/audiences/.
210