Below are the first 10 and last 10 pages of uncorrected machineread text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapterrepresentative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 55
Discrimination in the
Criminal Justice System
A Critical Appraisal
of the Literature
Steven Klepper, Daniel Nagin,
and Lukelon Tierney
INTRODUCTION
.
Discrimination in the criminal justice system is an issue
of substantial social concern. The discretionary
of the principal actorsthe police, prosecutors,
judgesare considerable and allow ample latitude
unfair treatment of persons of a specific race or
powers
and
for
social
background. A large empirical literature has emerged
concerning the extent of discrimination in the criminal
justice process.
These studies examine separately or in
combination the effect of race or social class on the
likelihood of arrest, prosecution, bail, conviction, and
the type and severity of sentence. The findings of the
studies are by no means consistent. Some find evidence
of discrimination while others do not.
In this paper we argue that there are major flaws in
the literature we have reviewed that limit its usefulness
for making inferences about the extent of discrimination
in the criminal justice system. We also suggest research
strategies to remedy these weaknesses. Our critique and
suggestions are prompted by a review of 10 papers, chosen
by the panel on the basis of their salience in the
literature and their quality, as well as a number of
additional papers.
While our paper is based on a review of a small sample
of studies, we are confident that our conclusions apply
generally to the larger literature. First, to some
degree our criticisms apply to all of the studies
55
OCR for page 55
56
reviewed, which makes it unlikely that they do not apply
to the larger literature. Second, implementation of
several of our recommendations requires the use of
statistical methods that have only recently been
developed and are not yet widely employed. Third,
implementation of these statistical procedures requires
the use of modeling approaches that have not been widely
adopted in the criminological and sociological literature.
Our review suggests three major remediable flaws in
the literature:
(1) The Absence of Formal Models of Processing
Decisions in the Criminal Justice System Case dispo
sitionwhether it is dismissal, acquittal, conviction,
or sentencingis the consequence of the interplay of a
diverse set of actors, each with individual objectives.
Even if a disposition does not directly involve one of
these actors, expectations about their actions if they
were to become involved may affect decisions. For
example, a prosecutor may choose to dismiss a case based
on the expectation that a judge will do the same if the
case is prosecuted. Similarly, a defendant may choose to
accept a plea bargain on the basis of an expectation of
the likelihood of conviction at a jury trial and the
sentence if convicted.
In order to model decisions at each stage of the
criminal justice system, a theory of the important
decision criteria of each of the major actors and their
interaction is required. Without such a theory,
estimating equations are likely to be misspecified, which
in turn is likely to result in serious biases in the
estimated effects of included variables and an inability
to discern the effects of more subtle influences. The
latter is particularly pertinent to measuring the effects
of social class and race because their influence, while
possibly of sufficient magnitude to warrant concern, are
probably less important in affecting disposition than
clearly legally relevant variables like case quality and
the seriousness of the crime. Perhaps even more
important is that without a wellstructured theory,
inferences about the role of social status and other
factors at each processing stage may be extremely
misleading. For example, an observation that social
status affects sentences in negotiated pleas may not
reflect prosecutorial bias but rather the biases of
judges or juries. We regard this point as crucial
because implementation of policies to rectify any
undesirable effects of race and social status on
OCR for page 55
57
disposition requires a knowledge of the stage(s) of the
criminal justice system at which these factors are
important.
(2) Sample Selection Biases Resulting from Screening
and Processing Decisions The criminal justice system has
been likened to a leaky sieve. ~
A ~
In Washington, D.C., for
example, of every 100 felony arrests only 13 result in
felony convictions. Of the remaining 87, 16 result in
misdemeanor convictions. Nearly all the rest are
rejected for further processing at an initial screening
or subsequently dismissed by a prosecutor, judge, or
grand jury. Of those convicted only about 32 percent are
incarcerated (Forst et al., 1977). Thus, cases that
reach the sentencing stage are a very select group that
typically represent only a small proportion of the
population of "similar" cases (e.g., same arrest charges)
that originally entered the system. Moreover, even those
cases entering the system via an arrest are themselves a
selected sample of crimes. In most major metropolitan
areas, clearance rates (crimes solved by the police,
typically by the arrest of a suspect) hover around 20
percent. This low clearance rate principally reflects
the absence of any suspect but is also affected by the
exercise of arrest discretion by the police.
By the very nature of the system, analyses of the
determinants of sentence must be executed on a selected
sample of cases, namely those that have resulted in
conviction. Since the selection process is by no means
random. it mav induce serious biases in parameter
estimates of included variables.
Such biases may, for
example, result in an inappropriate conclusion that
racial considerations influence sentencing decisions when
in fact they do not.
Recently developed econometric
procedures can be employed in some circumstances to cope
with the biases induced by sample selection.
(3) Use of Arbitrary Scales to Measure Qualitatively
Different Dispositions
dismissal or acquittal.
sentences include fines, probation, and prison or some
combination of these at a specified amount or duration.
Many of the papers we reviewed employ arbitrary rules for
measuring these qualitatively different outcomes. The
index that results serves as the basis (e.g., the
dependent variable in a regression model)
of the correlates of "severity of outcome." While the
scales that are applied are not patently unreasonable,
serious questions remain about the degree to which
A case may be disposed by
For convictions, possible
for an analysis
OCR for page 55
58
findings are simply an artifact of an artificial scale.
We are particularly concerned that use of these arbitrary
scales may conceal the importance of subtle influences
that could be measured if such scales were not used.
ORGANI ZATI ON OF THE PARE R
While the approaches we suggest for coping with these
problems will improve the quality of statistical
inference about discrimination, we are under no illusion
that their adoption will yield definitive results. The
combination of our relative ignorance about the factors
determining caseprocessing decisions and the problems of
using nonexperimental data ensure that definitive
findings will not be forthcoming soon. In response to
the inherent limitation of studies based on nonex
perimental data, we have included a section on the use of
experiments to measure discrimination in the criminal
justice system. This section discusses the limitation of
experiments, approaches for minimizing these limitations,
and strategies for combining experimental and nonexperi
mental data.
The paper is organized as follows. We begin with a
review of statistical issues that arise in the analysis
of binary data. Next we discuss the socalled sample
selection phenomenon and elaborate on its effects. We
then develop a model of the criminal justice system.
Next we review selected studies in the context of the
sample selection phenomenon and the model developed. We
then discuss alternative models of the sentencing
decision that do not require the use of arbitrary
severity indices. Next we discuss experimental
approaches to measuring discrimination. We conclude with
a summary of our major points.
THE ANALYST S OF BINARY VARIABLES
Many decisions in the criminal justice system involve
binary outcomes, such as the prosecutor's choice to
dismiss or prosecute a case or the jury's decision to
find the defendant guilty or innocent. It is common
practice to define a binary y such that y = 1 if one
outcome (say, a verdict of guilty) occurs and y = 0
otherwise. In a number of the studies we reviewed, the
relationship between the likelihood of the event y = 1
OCR for page 55
59
and a vector of variables x is examined by regressing y
on x. The purpose of this section is to point out some
hazards of this approach and to describe an alternative
approach that we employ in subsequent sections.
We begin with a discussion of the classical regression
model. The model assumes that a random variable y can be
related to a vector of variables x by
Yi = Xi ~ + hi
i = 1, . . . , N , (1)
where column vectors are underlined, ~ is the K x 1
vector of regressors for the ith observation in a sample
of size N. ~ is a K x 1 vector of unknown parameters, and
si is the disturbance or error associated with the ith
observation. The errors c1, . . . ~ eN are assumed to be
independent with zero mean and common variance o2. The
regressors ~ , . . . , ~ are often assumed to be
nonstochastic, although they may also be assumed to be
random variables that are independent of the errors
' e ~ ~ ~ £ N.
These assumptions imply that the conditional
distribution of y given x is such that
E(yil_i) = Xi $_
and
V(yilxi) = o2 .
(2)
(3)
Equations (2) and (3) state that for each xi, the
distribution Of Yi given xi is such that E(yilxi) is
linear in xi and V(yilxi) is constant for all i. Under
these assumptions, it is well known that ordinary least
squares provide consistent and unbiased estimates of the
coefficient vector $.
The assumptions of the classical regression model are
appropriate in cases in which the dependent variable has
a large, approximately continuous range of possible
values. However, in the case of a binary variable, many
~ . . . . . . _ .
~ IS a
binary variable that takes on only the values of zero and
one. Let p(x) equal the probability that y = 1 given x.
Then it is easy to demonstrate that
or tne assumptions are no longer tenable. suppose Y
E(yilXi) = P(Xi)
and
(4)
OCR for page 55
60
Var(yil xi) = p(xi) [1  p(Xi)] 
(5)
Equation (4) indicates that the conditional expectation
of y given x is equal to the conditional probability that
y equals one given x. If the range of the observed x
values is very small, then it may be appropriate to
approximate p(x) by x'0. In this case, ordinary least
squares will consistently estimate $.
(5) indicates that V(yilxi) is not the same for se' i.
This implies that ordinary leastsquares estimates are
inefficient and the standard hypothesis tests are
invalid. These problems can be corrected by using
standard techniques for adjusting for heteroskedas
ticity. If, however, the range of the x values is large,
then the fact that p(~ can take on only values between
zero and one (it is a probability) implies that it cannot
be approximated by a linear function of x.1 In this
case ordinary leastsquares estimates are consistent, and
the inconsistency may be very severe.
To illustrate this point, consider the linear
probability model:
However, equation
_'D if O ~ x'D < 1_ _ _ _
p(x) = 0 if x'0 ~ 0
1 if x'D > 1 . (6)
Figure 21 displays the form of p(x) for the case in
which x is a scalar. Also shown is a set of hypothetical
observations that might arise in which a number of the
xvalues fall outside the range where p(~ is linearly
increasing. The dashed line depicts the model that would
be estimated by ordinary least squares.
By choosing
enough observations with very high or very low xvalues,
the slope of the model estimated by ordinary least
squares can be made arbitrarily small. This difficulty
can be avoided by fitting the linear probability model
specified in equation (6) using nonlinear least squares.
However, this introduces severe computational problems.
Therefore, it is useful to consider alternative models.
Most models for binary variables that have been
proposed in the statistical literature can be written as
p(x) = F(x't) ,
where F is a continuous, nondecreasing function with
F(~) = 0 and F (be) = 1, i.e., F is a continuous
distribution function. The choice of a particular form
(7)
OCR for page 55
 ~

/
61
it_
The Linear Probability Model
OLS Regression Line
x';
· Observed Data Point
FIGURE 21 The Bias in Ordinary LeastSquares Estimation
for F is usually somewhat arbitrary and should be taken
in the same spirit as the assumptions of linearity and
normally distributed errors in simple regression models.
The most popular models of this form are the linear
probability model discussed above, which is obtained by
setting
o
F(z) = z
if z < 0
if 0 < z < 1
if z > 1 ,
the PROBIT model, where F(z) is set equal to the
cumulative standard normal distribution function, and the
LOGIT model, with F(z) = eZ/(l + eZ).
Models of this form can be motivated in a number of
ways. We employ one such motivation repeatedly in the
following sections. Let y* represent a latent,
unobserved variable that can vary between plus and minus
infinity. The latent variable y* is assumed to be
related to x by the standard regression function
Yi* = Xi '0 + Pi ~
(8)
where ci is an unobserved disturbance with mean zero and
constant variance o2. The dichotomous variable y is then
assumed to be related to y* by
Yi = 1 if Yi* > b
0 if Yi* < b ,
(9)
OCR for page 55
62
where b is an unknown cutoff level.
this implies
P(Yi = Taxi)
P(Yi = 0lxi)
Using equation (8),
P(_i'6 + ~ i > b)
p (£ i > b  _i'6) ~
1  p(£ i > b  Xi 0)
If F(.) is the~distribution function of c, then this can
be alternatively stated as
P (Yi = 1' Xi) _ p (Xi) = F (b  Xi ~ )
P(Yi = 0Ixi) _ 1  P(_i) = 1  F(b  Xi B)
(10a)
. (lob)
Note that equation (10) is in essentially the same form
as equation (7).
The unknown parameters of the model are the vector of
coefficients 6, the variance o2, and the cutoff level b.
It can be shown that neither o2 nor b can be estimated
because they are not uniquely defined.2 But the
coefficient vector ~ can be estimated directly from a
sample of observations on the binary variable y and x.
One widely employed estimation procedure is called
maximum likelihood estimation. It has a number of
desirable features for cases in which a relatively large
sample of observations is available on (y, x). For the
LOGIT and PROBIT models, specially designed computer
algorithms for calculating the maximum likelihood
estimator and its estimated standard errors in large
samples are available. For a more complete discussion of
estimation and other issues in binary variable models,
see Goldberger (1964:248251) and Cox (1970).
To get a better idea of how this approach can be used
to model events that occur in the criminal justice
system, consider the example of a jury determining
whether a defendant is guiltye The jury hears the
evidence, which can be summarized in terms of various
attributes of the case, such as the number of
eyewitnesses, whether a weapon belonging to the defendant
was recovered, etc. Suppose that the investigator can
observe some of these attributes, perhaps from court
records, and can quantify them in terms of a numerical
vector x  (xl, x2, · , xK)'. Other attributes of
the case, such as the credibility of the witnesses, are
not recorded in court records and hence cannot be
observed by the investigator. Let their composite
influence be represented by s. The jury then might be
viewed as computing an index y* = x'D + ~ measuring the
OCR for page 55
63
strength of the case against the defendant. The
observable factors xl, x2, . . . , xK are given weights
of 01, 62, . . . ~ ~K, respectively relative to the
weight assigned to s. The jury then determines whether to
convict the defendant on the strength of the evidence,
.
measured by y*, by comparing y* to a level b and
Declaring the defendant guilty if y* > b and not guilty
otherwise. The critical level b is presumed to be
determined according to the interpretation of the notion
"beyond a reasonable doubt."
The statistical problem is to determine the factors
the jury takes into account and their relative
importance. Among other matters, the investigator might
be interested in testing whether juries discriminate
against certain types of defendants, in which case the
personal characteristics of the defendant might be
included in x. The problem facing the investigator is
that he or she observes the vector of case attributes x
and whether the defendant is convicted, but not y* and
c. To estimate ~ using this information, the jury's
decision process can be modeled as
Ii = 1 if Yi* = Xi ~ + hi _ b
0 if Yi* = Xi ~ + ~ ~ b ~
where I is a binary variable that equals one for
conviction and zero otherwise. The vector of
coefficients ~ are the parameters of interest. This is
in precisely the same form as equations (8) and (9).
Hence the weights 61, 62, . . . ~ OK can be estimated
directly using the approach discussed above. A similar
approach is taken in the subsequent sections to model
other decisions in the criminal justice system that
involve binary outcomes.
~ . ~ [~
. . .
SELECTION
Selection Bias
The criminal justice process can be thought of as a
series of stages, each involving a different set of
actors. The first stage involves the detection of a
crime, followed by communication of the crime to the
police, arrest, prosecution, trial, and sentencing. The
literature indicates that the various actors involved at
each stage make calculated decisions about the types of
OCR for page 55
64
crimes that are processed to the next stage. For
example, studies of the prosecutor indicate that less
serious crimes and those with weak evidence are more
likely to be dismissed following arrest. These same two
characteristics appear to influence the decision by the
police to make an arrest, while the quality of the
evidence certainly affects the likelihood that a jury
will render a verdict of guilty and pass a case on to the
sentencing stage. Other factors, such as the prior
record and socioeconomic status of the criminal, also
appear to play a role in some of the stages.
As a result of deliberate actions of the various
actors in the system, the crimes that reach each
successive stage in the system after the first are not
representative of the broader population of crimes.
Samples used to study the various stages in the system
are thus selected according to certain characteristics.
This does not itself pose a problem for the
investigator. A potential problem does arise, however,
from the combination of the sample selection process and
the fact that some of the features of a case that affect
the way it is processed cannot be observed by the
investigator. For example, prosecutors and judges may
possess a great deal of qualitative evidence about a case
that the investigator cannot observe from court records.
In other instances, the investigator may not observe
other, less qualitative types of evidence, such as
whether the criminal used a weapon. The combination of
screening and incomplete measurement implies that
criminals reaching the later processing stages are not
representative of the unobservable (to the investigator)
as well as the observable features of the population of
cases entering the system. This introduces the
possibility of sample selection bias.
The type of biases that may arise can be illustrated
best with an example. Consider the sentencing of
convicted criminals. Suppose that the various actors in
the system discriminate against individuals with low
socioeconomic status (SES) as well as individuals
committing more serious crimes. (The latter form of
"discrimination" may be socially desirable.) Then
consider highSES individuals who are convicted of a
crime. Holding the effect of factors that are observable
to the investigator constant, such individuals would
ordinarily have a lower probability of reaching the
sentencing stage (given the hypothetical assumption of
OCR for page 55
65
discrimination). If they have been convicted, then,
holding constant the effect of the factors observable to
the investigator, they must be unrepresentative
concerning the factors unobservable to the investigator
that contribute to reaching the sentencing stage. For
example, they may have exhibited a greater degree of
premeditation than lowSES individuals who have been
convicted, or a greater fraction of them may have used a
weapon than lowSES convicted criminals. (The degree of
premeditation and weapon use are assumed to be
unobservable to the investigator.)
This may cause problems when the investigator tries to
determine the factors influencing the sentencing
decision. Suppose that SES does not affect sentencing,
but seriousness of the crime does. By the above
argument, if discrimination exists against lowSES
individuals at the earlier stages of the criminal justice
system, then, ceteris paribus, highSES convicted
criminals will be above average on both the observable
and unobservable dimensions of the seriousness of a
crime. Judges are assumed to observe both sets of
factors and to take both into account when deciding on a
sentence. The investigator, however, can observe only
one set of dimensions. Even after taking account of the
observable differences in the cases of highand lowSES
criminals, the investigator will still find that highSES
criminals receive longer sentences. This will suggest
that judges discriminate against highSES individuals,
even though there is no discrimination at the sentencing
stage and there is discrimination against lowSES
individuals at the stages preceding sentencing.
More generally, this example points out that if there
does exist discrimination against lowSES individuals at
the sentencing stage, then the biases induced by sample
selection might mask the true extent of the
discrimination. It is conceivable that the biases might
even create the illusion of reverse discrimination at the
sentencing stage, when in reality discrimination against
lowSES individuals is present at all stages. The biases
induced by sample selection are of course more general
than the examplethey might occur at any stage in the
system following the first screening stage, and they
might distort the effect of any of the observable
features of a crime. This suggests that it is essential
to try to account for the effects of sample selection in
order to make reliable inferences about the various
processing stages in the criminal justice system.
OCR for page 55
118
to act out the hearing. This could be done once and
recorded on video tape or it could be done repeatedly
with each judge in the experiment acting as the presiding
judge.
The drawback of this approach is its high cost, both
in time and money. Case files are less realistic but
also easier and less expensive to use. It is not clear
how much is actually lost if a welldesigned case file is
substituted for an actual or reenacted hearing.
Preliminary experiments might be used to determine a
reasonable format for presenting cases. Certain control
questions designed to determine whether the information
presented is adequate might be composed. For example,
judges might be asked whether they felt that any
additional information that might be available in court
would change their decision. Several different questions
might be asked, such as "What decision would you make
based on the information you have?" and "What decision
would you most likely make if you encountered this case
in court?"
If the judges use the available information to
construct subjective probability distributions over the
possible values of the unavailable information, then
these two questions address different aspects of these
distributions. The answer to the second question would
be the sentence associated with the mode of the
subjective distribution, whereas, under quadratic loss,
the first question would be answered with the sentence
corresponding to the mean of that distribution. Thus in
the presence of incomplete information, the answers to
these questions might differ, whereas they would be the
same if the necessary information was provided.
Different answers can therefore be taken as an indication
that the case information was not adequate (Manski and
Nagin, 1981, discuss this point in the context of
consumer choice surveys).
If an experiment is based on a subset of judges, then
it is imperative that the subset be representative. In
many sentencing experiments the participating judges are
volunteers. Even when all judges in a particular
jurisdiction participate in an experiment, nonresponse
rates are often so high that participation in the
experiment has to be viewed as essentially voluntary. As
a result, judges who do participate are likely to be more
conscious of existing problems and more interested in
reducing them than the average judge. An experiment
based on such a sample will tend to underestimate the
OCR for page 55
119
seriousness of these problems. To protect against this
kind of bias, an experimental format, such as personal
interviews, can be used to minimize the nonresponse rate.
Even if the cases presented to the judges are real
cases presented in their natural setting, the judges will
always be aware of the fact they are participating in an
experiment. Their decisions clearly will not have the
impact of decisions handed down in court, for a prison
sentence handed down in an experiment does not send
anyone to prison. As a result, judges may treat a
decision in an experiment less seriously than a decision
in court. To alleviate this problem, judges must be
provided with an incentive to treat experiments with the
same importance they would treat an actual case. For
example, decisions made by a panel of judges on an actual
case might be provided to the presiding judge before a
decision is rendered. This is done in the sentencing
council experiments discussed in Diamond and zeisel
(1975).
The most serious problem in sentencing experiments is
the evaluative nature of the experiments themselves.
Most experiments are designed to collect data on a
specific problem, e.g., the extent of discrimination and
Disparity in sentencing. This can rarely be concealed
from the judges participating in the experiment. As a
result, individual judges may try to ensure that they do
not deviate too far from perceived norms, thus leading to
an underestimate of the severity of the problem under
study.
This individual sensitivity to evaluation can be
reduced by keeping responses anonymous. However, the
fact that results of the experiments may be used by
critics of the judiciary to support changes in the system
may cause judges who want to maintain the status quo to
adjust their decisions to reduce the apparent severity of
the problems under study. This bias is likely to be
particularly severe if the experiment is an unusual event
rather than a routine matter. It may be reduced if
making decisions on experimental cases is required of all
jduges in a jurisdiction on a regular basis.
The reaction to the experimenter's intent may also be
reduced if the experimenter can deceive the judges as to
the purpose of the experiment. This requires a
convincing cover story and a carefully designed
questionnaire that does not reveal the true purpose of
the experiment. Deceptions of this type are often used
for similar reasons in psychological experiments,
OCR for page 55
120
although they raise serious ethical questions (see
Rosenthal and Rosnow, 1969). Furthermore, in view of the
narrow range of issues considered in most sentencing
studies, it is not clear whether these deceptions will
succeed. Thus it is unlikely that these biases can be
eliminated completely.
If it is not possible to prevent the judges from
adjusting their answers in an experiment, then it might
be possible to control for these adjustments by modeling
the process that generates them. Suppose, for example,
that experimental cases have been constructed from actual
cases by varying, say, the race of the defendant. In
this case, using equations (33) and (35), for each k
there is one pair i,j (the race of the actual defendant
and the judge who heard the case) for which the actual
decision is available. For the other i,j pairs only
experimental observations exist. Thus for each k,
Yijk = ~ + Vi + JO + Ck + Sick
for one pair id, and
Yijk = H* + Vi* + ad* ~ Ok* + £ijk
otherwise. These observations might be combined by
assuming that the overall mean sentence and the case
effects are the same for the experimental and
nonexperimental observations, i.e., p* = ~ and Ck* = Ok,
but that the effects of the discriminatory factor and the
disparities in the experimental observations have been
scaled down by the factors a, 8, and y, respectively.
Thus vi = avi, dj* = D6j, and eijk* = Y£ijk, where
a, 8, y > 0 (and probably less than one). The observed
court cases can then be used to calibrate the experimental
responses.
This point illustrates one way in which experiments can be
used in conjunction with nonexperimental data. Experiments
can be used to validate results obtained from nonexperimental
data or to provide alternative estimates with different
biases. In particular, as noted above, observed court cases
provide only an upper bound on the disparity within judges,
whereas experiments (before adjustment) tend to underestimate
this anantitv. Simultaneous use of experiments and courtroom
~ . . . _
observations can thus provide bounds on the severity ot
disparity.
Experiments might also be used to deal with the selection
problem. Wilkins et al. (1973) and others use experiments to
OCR for page 55
121
analyze the details of the judges' decision processes,
including the variables they use in making decisions and the
order in which these variables are considered. Similar
experiments could be performed with other members of the
criminal justice system, such as the prosecutor. The results
might provide information about the factors that contribute
to the correlation between the unobserved variables in the
different stages of the selection process. This information
might help the investigator assess the magnitude of the
correlation and determine which, if any, additional variables
should be measured.
Experiments can be used to address a number of questions
that cannot be answered using observational data. For
example, judges might be asked to choose both a determinate
sentence and a minimum and a maximum sentence for
hypothetical cases. Their responses could be used to
evaluate the implications of laws on determinate sentencing.
Experiments can also provide information about cases that
occur too infrequently in court for observational data to
provide accurate results. Many studies, for example, have
found it impossible to investigate the relationship between
sentence and the defendant's sex because the number of women
in their samples was negligible.
So far our discussion has been concerned with experiments
for analyzing the behavior of judges. Other aspects of the
criminal justice system can also be analyzed with
experiments. For example, experiments could be designed to
determine whether prosecutors act in a discriminatory fashion
when deciding whether to prosecute a case. Experiments might
also be useful aids for constructing models of the plea
· ~
Bargaining process.
In addition to providing data for analysis, experiments
may also have a beneficial side effect, especially if they
are conducted on a regular basis. Many judges and other
members of the criminal justice system are sensitive to the
problems of disparity and discrimination in sentencing. The
results of regular controlled experiments might reduce
disparity and discrimination by helping judges understand and
calibrate their own decisions.
The major drawback to experiments is their cost. The
problems associated with experimental data may seem easier to
solve than the problems of observational data, but the cost
of running experiments, both in money and in the demands they
place on the judge's time, make it difficult to obtain
samples that are large enough to provide very precise
estimates of the parameters of interest. Thus it is unlikely
that observational data, in which sample sizes are typically
OCR for page 55
122
large, can be dispensed with entirely. The simultaneous use
of both approaches, in which the particular advantages of
each approach can be exploited, is an avenue that deserves
more attention in future work.
C ONCLUSI ONS
We argued that the studies of discrimination in case
disposition generally suffer from at least one of three major
shortcomings: (1) the absence of formal models of the
processing decisions in the criminal justice system, (2)
failure to consider the sample selection biases that result
from the many screening decisions in the criminal justice
system, and (3) the use of arbitrary scales for scaling
qualitatively different dispositions.
Most of our discussion of these problems focused on ways
in which they can lead to underestimates of the severity of
discrimination in the criminal justice system. Despite these
problems, some studies do find evidence of discrimination.
However, this should not be interpreted as suggesting that
discrimination is actually present. There are many other
problems, such as the omission of important variables
possibly correlated with race or social status, that can lead
to overestimates of the severity of discrimination. Some of
these points are discussed in detail in Garber et al. (in
this volume).
Each of the shortcomings enumerated above is, in
principle, remediable. However, correcting them will require
a formidable research agenda. Carefully specified models
reflecting the essential motivations of the principal actors
in the criminal justice system and the dynamics of their
interplay are required. Furthermore, the data sets to be
considered will have to be carefully chosen and perhaps
combined with the results of designed experiments in order to
mitigate the effects of sample selection. Novel and complex
statistical techniques will be needed for the analysis.
While these obstacles are formidable, we see no alternative
to addressing these problems. If they continue to be
neglected, then the extent of discrimination in the criminal
justice system will continue to be mired in uncertainties so
great that no generally accepted resolution will ever be
reached.
OCR for page 55
123
APPENDIX
Proposition: If x is uniformly distributed then t1 = t2,
where
t1  E[xlx + w1 ~ (I + 6)]  E(xlx + w1 ~ a) (A1)
t2  E[xlx + w1 > (a + 0), x + w2 ~ (a + 0) ]
 E(xlx + w1 ~ a, x + w2 > a) (A2)
Proof: Equations (A1) and (A2) can be rewritten as
t1 = E[xlx > (a + 8)  wl)  E(xlx > a  wl) (A3)
t2 = E[xlx > (a + B)  wl, x >  (a + B)  w2)
 E(xlx > a  wl, x > a  w2) . (A4)
Let fl(r) ~ P(W1 = r) and f2(r) _ p[max(wl,w2) = r].
Note that given w1 and w2, one of the two conditioning
arguments in each of the two terms on the righthand side
of equation (A4) is redundant. Using this and fl(~) and
f2(~) to integrate out w1 and w2, equations (A3) and
(A4) can be written as
¢1 = S{E[XIX > (a + D) + P]  E(xlx > a + f)}fl(~)df
t2 = S{E[XIX > (a + $) + P]  E(xlx > a + P)}f2(~)dT,
which implies
t1 ~ t2 = S{E[XIX > (a + 6) + P]
 E(xlx > a + f)}[fl(~) ~ f2(~)]d~
~ (A5)
Using the fact that if x is uniformly distributed,
E(xlx > \) = (a + A)/2, where a is the maximum value x
can assume, equation (A5) implies
t1 ~ ¢2 = 1/2J~[fl(r)  f2(~)]df = 0 ,
where the second equality follows from the fact that
fl(~) and f2(r) are proper probability density
functions. This result generalizes trivially if x is
multiplied by any scalar in the conditioning arguments in
equations (A1) and (A2).
This establishes the assertion in the text that if x
is uniformly distributed and Y1 = Y2 = Y3 then 82 ~ 63
OCR for page 55
124
NOTES
1. This is because a linear function of
constrained to lie between zero and one.
is not
2. In the jargon of statistics, neither a2 nor b is
identified (assuming, in the case of b, that x contains a
constant regressor). This can be seen as follows.
Multiply a2, b, and ~ by the same positive constant.
Then P(Yi = lax) is unchanged. Hence it is not possible
to estimate the levels of both ~ and a2. Instead, a2 is
typically set equal to one for estimation purposes and ~
is effectively estimated relative to the arbitrary value
assigned to a2. As for b, suppose that x contains a
constant regressor. Then if 01, the constant term in the
regression, and b are changed by the same amount, b 
xi'8, hence F(b  xi'D) remains unchanged. As a result,
for estimation purposes, b is typically set equal to zero
and the cutoff level is subsumed into the constant.
3. The coefficient of ui in this expression follows from
the fact that if E(ylz) is linear in z then y can be
expressed as
y = nz + [Cov(y,z)/Var(z)](z  nz) + v ,
where nz _ E(z), V(v) = oy(1 _ p2), p = [Cov(y,z)/ay~z],
V(z)  c2z, and V(y) _ o2y .
4. The selection that occurs as a result of the
imprisonment decision is somewhat different from other
selection mechanisms we have discussed. The imprisonment
decision is made by the judge who also determines the
length of the sentence. The formal distinction between
the imprisonment decision and the determination of the
sentence length is thus somewhat artificial.
Nevertheless, if the two decisions are viewed as
separable, which is implicit in studies that investigate
the sentence length for individuals that have been sent
to prison, then the appropriate mathematical formulation
of this process is the same as the one that would be
appropriate if the decisions were made by separate
individuals. As a result, the same model applies.
5. We do not distinguish between jury and bench trials.
The model could easily be generalized to include this
option, but such a generalization would only complicate
OCR for page 55
125
the discussion without further illuminating the points we
wish to make.
6. Another relevant factor is time spent in pretrial
detention. Conditions in jail are frequently worse than
in prison. If the defendant opts for a trial, the time
spent in pretrial detention is likely to be increased.
7. The decision to charge includes the choice of whether
to prosecute and the choice of which charges to file
given prosecution. We consider only the former choice.
8. Dismissal can occur before or after charges have been
filed. We treat dismissals that occur after charges have
been filed as decisions not to charge. The term
dismissal is restricted to instances in which the
prosecutor declines to prosecute after an arrest has been
made.
9. The factors giving rise to selection bias involve the
stages preceding the sentence length decision and thus
are not related to the true extent of discrimination in
the sentence length decision of each judge.
10. However, we argue below that this finding may
actually be the result of discrimination at the
prosecution andVor conviction stage rather than in
sentencing.
11. The purpose of introducing this model is merely to
fix ideas. The discussion could equally well be based on
a more complicated AN OVA model, one in which the effects
of the discriminatory factors are viewed as nested within
judges, a binary model, a binary plus a conditional
continuous model, or an ordered multiple response model.
REFERENCES
Administrative Office of the United States Courts
1973 Federal Offenders in United States District
Court, 1971. Washington, D.C.: Administrative
Office of U.S. Courts.
Altman, E. I., R. A. Avery, R. A. Eisenbeis, and J. F.
Sinkey, Jr.
1981 Application of Classification Techniques in
Business, Banking, and Finance. Greenwich,
Conn.: JAI Press.
OCR for page 55
126
Chiricos, T. G., and G. P. Waldo
1975 Socioeconomic status and criminal sentencing:
an empirical assessment of a conflict
proposition. American Sociological Review
40(December):753772.
Clarke, S. H., and G. G. Koch
1976 The influence of income and other factors on
whether criminal defendants go to prison. Law &
Society Review 11(1):5992.
Cook, P. J., and D. S. Nagin
1979 Does the Weapon Matter? An Evaluation of a
WeaponEmphasis Policy in the Prosecution of
Violent Offenders. Washington, D.C.: Institute
for Law and Social Research.
Cox, D. R.
1970 Analysis of Binary Data.
London: Methuen & Co.
Diamond, S. S., and H. Zeisel
1975 Sentencing councils: a study of sentence
disparity and its reduction. University of
Chicago Law Review 43:109149.
Farrell, R. A., and V. L. Swigert
1978 Prior offense record as a selffulfilling
prophecy. Law and Society 12tSpring):437453.
Forst, B., and K. Brosi
1977 A theoretical and empirical analysis of the
prosecutor. Journal of Legal Studies
6(1):177191.
Forst, B., J. Lucianovic, and S. J. Cox
1977 What Happens After Arrest? A Court Perspective
of Police Operations in the District of
Columbia. Washington, D.C.: Institute for Law
and Social Research.
Frase, R. S.
1978 The decision to prosecute federal criminal
charges: a quantitative study of prosecutorial
discretion. University of Chicago Law Review
47:246330.
Gibson, J. L.
1978 Race as a determinant of criminal sentences: a
methodological critique and a case study. Law
and Society Review 12(Spring):455478.
Goldberger, A. S.
1964 Econometric Theory.
Sons.
New York: John Wiley &
1980 Abnormal Selection Bias. Unpublished
manuscript. University of Wisconsin.
OCR for page 55
127
Greenwood, P., et al.
1973 Prosecution of Adult Felony Defendants in Los
l
Angeles County: A Policy Perspective. Santa
Monica, Calif.: Rand Corporation.
Heckman, J. J.
Hagan, J.
1975 Parameters of criminal prosecution: an
application of path analysis to a problem of
criminal justice. Journal of Criminal Law &
Criminology 65(4):536544.
1979 Sample selection bias as a specification error.
Econometrica 47(1):153161.
LaFree, G. D.
1980 The effect of sexual stratification by race on
official reactions to rape. American

Sociological Review 45(October):842854.
Landes, W. M.
1971 An economic analysis of the courts. Journal of
Law and Economics 14:61106.
Lizotte, A. J.
1977 Extralegal factors in Chicago's criminal
courts: testing the conflict model of crimina 1
justice. Social Problems 25(5):564580.
l
Manski, C. F., and D. Se Nagin
1981 Behavioral Intentions and Revealed Preference.
Unpublished manuscript. CarnegieMellon
Un iversity.
Olsen, R.
1980 A least squares correction for selectivity
bias Econometrica 48:18151820.
.
Partridge, A., and W. G. Eldridge
1974 The second circuit sentencing study: a repor t
to the judges of the second circuit. Federal
Judicial Center No. 744.
Reiss, A. J.
1975 Public prosecutors and criminal prosecution in
the United States of America. Juridical
Review:121.
Rosenthal, R., and R. L. Rosnow, eds.
1969 Artifacts in Behavioral Research. New York:

Academic Press.
Swige' , V. L., and R. A. Farrell
1977 Normal homicides and the law. American
Sociological Review 42(February):1632.
Tiffany, L. P., Y. Avichai, and G. W. Peters
1975 A statistical analysis of sentencing in federal
courts: defendants convicted after trial, 1967
1968. The Journal of Legal Studies 4:397417.
OCR for page 55
128
Wilkins, L. J., D. U. Gottfredson, J. O. Robinson, and
C. A. Sadowsky
1973 Information Selection and Use in Parole
DecisionMaking. NCCD Research Center, National
Council on Crime and Delinquency, Davis, Calif.
Wolfgang, M. E., and M. Reidel
1973 Race, judicial discretion, and the death
penalty. The Annals of the American Academy of
Political and Social Science 407(May):119133.
Zimring, F. E., J. Eigen, and S. OtMalley
1976 Punishing homicide in Philadelphia:
perspectives on the death penalty. University
of Chicago Law Review 43(2):227252.