Read "Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality" at NAP.edu

Page 115 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

6
Using Licensure Tests to Improve Teacher Quality and Supply

Licensure tests are only one factor that influences the overall quality of teachers and teaching. Changes in the quality and effectiveness of teachers depend on many things. Salaries and working conditions affect who enters teaching, as do schooling conditions. The quality of teacher education and of professional development influences teachers’ knowledge and skills. Furthermore, as noted earlier, teaching rests on more than teachers; school organizational factors such as use of time, quality of curriculum materials, and student/ teacher ratios affect the quality of teaching.

The belief that testing can improve the quality of the teaching force is based on an assumption that the tests used are good measures of the competencies needed for effective teaching and that their salutary effects on training and selection are not outweighed by negative consequences for supply (including, for example, eliminating competent teachers from the pool and dissuading some from considering teaching). As discussed below, some tests measure qualities that are reasonably related to aspects of teacher effectiveness. However, there are questions about the extent to which different tests capture the way this knowledge is actually used in teaching. There is a paucity of evidence concerning the ability of teacher licensure tests to distinguish minimally competent candidates from those who are not.

This chapter presents a theoretical model suggesting that the quality of prospective beginning teachers depends on a number of factors, including the accuracy of licensure tests in distinguishing between those who would be competent and those who would not; the actual and perceived opportunity costs to applicants of licensure testing, the level of teachers’ salaries and working conditions,

Page 116 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

and the attractiveness of labor market alternatives. These effects are discussed by building a logical argument based on an economic model of occupational choice. A discussion of the evidentiary base for the relationship between licensure tests and teacher competence follows the description of the model. The measurement and research design challenges that mark this field of research are discussed, and some empirical findings are reviewed.

LICENSING TESTS AND THE QUANTITY AND QUALITY OF TEACHERS

This section is based on an economic model of supply and demand for teachers.¹ The theory is used to both understand the potential consequences of licensure testing for the quality and quantity of beginning teachers and provide guidance as to the kind of information and empirical analysis needed to conduct a quantitative assessment of those consequences. The analysis assumes that beginning teachers have met whatever other licensing requirements exist (e.g., completion of an accredited teacher education program) prior to attempting to meet the testing requirement. The counterfactual, in which passing a test is not a requirement for licensure, assumes that in the absence of a licensure test the hiring practices of school districts would lead to a teacher work force with a higher proportion of “unqualified” teachers.²

As already noted, teacher licensure testing is intended to distinguish between those who are competent to enter the classroom in terms of the skills measured by the test and those who are not. Ideally, tests would do this, as in other professions, by limiting the supply of teachers only to those who are competent.

The supply side of the model assumes that individuals choose between teaching and other occupations according to which provides the larger expected (net) benefit, wages, and nonmonetary forms of compensation after education and other training costs are paid.³ As a baseline case, consider the situation where there is no test; in that case the model assumes that individuals who are potentially competent teachers are indistinguishable from those who are potentially incompetent. The net benefit to teaching in any given labor market is thus taken to be the same for all individuals independent of their potential competency.⁴ However, individuals are assumed to differ in the net benefits they receive in

¹	A full presentation and discussion of the model are provided in Appendix E.
²	Ballou (1996) provides some evidence that school districts do not do a particularly good job of screening candidates under current accountability systems. However, it is not known whether the same would be true under a different accountability system (e.g., one based on student performance). It is beyond the scope of this report to consider alternatives to the current licensure system.
³	The net benefits may also include psychic rewards.
⁴	To simplify the analysis, the possibility that psychic benefits to teaching might differ according to potential competency is ignored.

Page 117 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

alternative occupations. Increasing the compensation of teachers would thus lead to an increase in the supply of both competent and incompetent individuals.⁵ To the extent that those who would be competent as teachers can obtain higher wages in alternative occupations than those who are not competent, the incentive for those who are competent to enter teaching will be less than for those who are not competent at any given level of teacher wages.⁶ The proportion of those choosing teaching who would be competent would depend on the difference in the distributions of net benefits in alternative occupations between potentially competent and potentially incompetent teachers.

By its nature, licensure testing increases the costs of entering an occupation. Licensure tests require payment of testing fees, allocation of time and effort to prepare for the tests, and, given a nontrivial failure rate, create uncertainty about obtaining employment in teaching. Moreover, the cost of failure is increased by specialized coursework required for licensure in teaching. To the extent that these education courses have a lower market payoff outside teaching than would alternative courses an individual might have completed had the teaching occupation not been chosen, an opportunity cost is incurred. Individuals who fail licensure tests, and thus do not get teaching jobs, will receive lower wages in alternative jobs compared to the wages they would have received had they taken courses in pursuit of alternative occupations. The total cost of the licensure test thus includes this difference in wages.

The direct cost of a licensure test, as well as the opportunity cost that arises in the case of failure, makes the teaching occupation less attractive relative to alternative occupations than it would be in the absence of a test.⁷ In general, if all else is equal, the greater the cost that licensure tests impose on teacher candidates, the smaller will be the supply of both potentially competent and potentially incompetent teachers.

Due to errors in measurement, tests are not perfectly accurate and reliable. In theory a “perfect” test is one with a passing score set such that every candidate who scores at or above that level is truly competent in the skills measured and that every candidate who scores below that level is not. An “imperfect” test, however, does not have such a passing score. Instead, for any given passing

⁵	There is considerable empirical evidence that the supply of teachers is increasing in the wage that is offered, (see e.g., Manski 1987; Ballou and Podgursky, 1997; and Stinebrikner, forthcoming).
⁶	Generally, this difference would lead to a greater proportion of incompetent than competent persons choosing teaching. However, the fact that competence in teaching may be positively related to wages in alternative occupations does not imply that increasing teachers’ wages will attract relatively more competent than incompetent people, although the supply of both would increase. For technical reasons the model assumes that the proportion of competent to incompetent people who choose teaching is invariant to increases in teacher wages.
⁷	This conclusion ignores the argument that the prestige of the profession may be augmented by its becoming more selective, which may increase its attractiveness. It is assumed that to the extent this positive effect exists, it does not dominate the negative effect of the direct costs of the test.

Page 118 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

score on an imperfect test, some who are not competent will still score above the passing level and therefore will be misclassified as competent; this type of error is called the type 1 error of the test. Some candidates who are truly competent will score below the passing level and will be mistakenly classified as incompetent, which is known as the type 2 error. A perfect test classifies everyone correctly and has no type 1 or type 2 errors.

The inability of a test to determine competency with perfect accuracy magnifies the actual cost of the test. The greater the probability that a candidate will fail the test (regardless of competence), the greater will be the perceived cost of the test.⁸ For example, in the model, if the direct cost of the test is $500 (including the monetary and test preparation costs) and the probability of passing is 0.5, the perceived cost of the test will be equivalent to that of a perfect test in which the direct cost is $1,000.⁹ In the case of truly incompetent individuals the perceived cost of the higher failure rate (which lowers type 1 errors) will reduce the supply of incompetent teacher candidates in the total pool. For truly competent individuals the higher the probability that they will mistakenly be classified as not competent (the higher the type 2 error), the more they will be discouraged from entering the teaching profession.

A perfect test would discourage only incompetent individuals from entering the teaching occupation. In that case the supply of beginning teachers would all be competent, although the number of competent individuals who choose to enter teaching will be reduced if the direct costs of the test are substantial.¹⁰ An imperfect test discourages both incompetent and competent individuals from choosing teaching. If the failure rate is higher for those who are incompetent (than it is for the competent) and the cost of the test is not greater for the competent (than for the incompetent), the test will tend to increase the proportion of competent individuals in the total supply.

In addition to the effect of test costs on the potential supply, licensing tests affect the actual supply of teachers after they have completed their educational preparation. Depending on the accuracy of the test (the extent to which it correctly distinguishes between competent and incompetent individuals), the share of competent individuals excluded or incompetent individuals admitted will vary.

⁸	In the case of those who are incompetent, this probability is one minus the type 1 error; for those who are competent, it is the type 2 error.
⁹	The opportunity cost is multiplied by the odds of failure (the probability of failure divided by the probability of passing), which are less than one if the failure rate is below 0.5 and greater than one if it is above 0.5 (see Appendix E).
¹⁰	In the formal model it is assumed that people know whether they are competent as well as what the true failure probability is (i.e., the type 1 and type 2 errors of the test). These assumptions, although clearly too strong, may be considerably weakened without affecting the main conclusions of the model.

Page 119 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

To the extent that some competent individuals are misclassified by an imperfect test (type 2 error), the supply of competent teachers will be reduced.

Reducing the supply of teachers may be a desirable outcome of licensure testing as long as the proportion of competent teachers in the total supply increases sufficiently. The passing score of a test will directly affect the overall supply of teachers as well as the proportion who are competent. Setting a low passing score will tend to have a small effect on supply both because many candidates pass the test and because the perceived cost of the test differs little from the actual cost due the high chance of passing. Relative to no licensure test, a low passing score will tend not to alter the proportion of competent teachers by very much. Raising the passing score will tend to reduce the supply of both competent and incompetent teachers, as a higher proportion of both are labeled by the test as incompetent and as the perceived cost rises. Since passing scores are continually raised, it becomes more and more likely that the supply of competent teachers will decrease more than the supply of incompetent teachers because fewer incompetent people presumably score at higher levels.¹¹

Some states require tests at many points in the process of teacher preparation and some require more than one subject matter test. Some require additional assessments of teaching knowledge and skill in the first year or two of teaching. If just 10 percent of test takers fail each test in a series of, say, five it would be possible to eliminate 50 percent or more of the potential teaching force from the pool of license-eligible individuals. At any of these junctures, if most of the remaining teachers are competent, raising the passing score will eliminate mostly teachers who are competent.

The number of new teachers employed and the resulting number that are competent depend not only on the supply of teachers but also on the demand for new teachers. The model of demand assumes that communities care about the achievement of their children, which is positively related to the number of competent teachers that are employed, but that they face alternative uses for their scarce resources. Assuming that communities face a competitive labor market for teachers and that their total expenditures are constrained by their tax revenues, it is shown, as is standard, that the demand for teachers falls with the level of compensation. In addition, it is shown that the demand for teachers at any given level of compensation depends on the proportion of competent teachers in the supply. Indeed, the theory implies that an increase in the proportion of com-

¹¹

Consider a test that is perfect at one unique (optimal) passing score. If the passing score is set below that point, there will be a nonzero type 1 error. Raising it from that point up to the optimal passing score will eliminate only incompetent people, and the type 1 error will fall. Raising the passing score beyond the optimal point will eliminate only competent people, creating a nonzero type 2 error.

Page 120 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

petent teachers in the supply will never lead to a fall in the demand for teachers (at the same wage) sufficient to reduce the number of competent teachers that would be employed.

To understand the implications of the supply and demand models for teacher employment, consider again a perfect test. If the cost of the test is small, the supply of teachers is reduced exactly to the number of competent people who would choose teaching absent the test. The increase in the proportion of beginning teachers who are competent to unity may either increase or decrease the overall demand for teachers. However, the change in demand, regardless of whether it increases or falls, will never be such that the number of competent teachers employed falls below what it was without the test. If instead test takers bear a substantial cost, inducing some competent people to choose an alternative occupation relative to the case without a test, although the proportion of competent people must still increase to unity given that the test is perfect, the number of competent teachers employed may actually fall relative to the case in which there is no test.

Restrictions on supply also increase wages and other forms of compensation. However, if a test is highly accurate, so that almost all teachers who pass it are competent, and if it is of low cost, so that the attractiveness of teaching is not unduly adversely affected, the resulting increase in wages will reflect the true scarcity of competent teachers. Furthermore, as described above, the combined effect of the test on supply and demand will be to increase the number of competent beginning teachers and thus to increase student learning. On the other hand, if the test is highly imperfect, so that the proportion of competent teachers in the total (smaller) supply is not altered much by the test, or the test is viewed by teacher candidates as especially onerous and costly (e.g., because the failure rate is high even for competent teachers), the resulting increased wages will reflect mainly an artificially created scarcity. An artificial scarcity can also be created even with a highly accurate test if the passing score is raised to a point where most of those being eliminated from the supply are competent. In either case, there could be fewer competent teachers relative to having no test and student learning may be diminished.

The theoretical model is ambiguous as to whether licensure tests are efficacious in improving teacher competency. To determine the quantitative effects of licensure tests on the overall supply of new teachers and on their competency requires a great deal of information. It is necessary to know not only about the accuracy of the test (i.e., its type 1 and type 2 errors) but also about the direct and opportunity costs to the test takers, the alternative market opportunities of potential teachers, the constraints on the tax revenues of school districts, and the effects on student learning of alternative uses of school funds.

As is clear, establishing what constitutes optimal licensure testing in a given state is a complex issue. This complexity is multiplied when taking a national perspective that considers the effect that one state’s licensure testing requirement

Page 121 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

can have on teacher supply and competency and thus on student learning in other states. Reciprocity across states in the licensing of teachers is quite limited, because different states require different tests or have established different passing scores for the same test. This lack of reciprocity has several consequences. First, it reduces the attractiveness of teaching as an occupation because it increases the cost of changing jobs across states. Second, it creates barriers to mobility that impede the responsiveness of teachers to changes in the demand for teachers across states.¹² Finally, individual states, by taking independent uncoordinated actions, can affect the labor market for teachers in other states without knowledge or consideration of those effects. On the other hand, states have intentionally different objectives in their testing policies and requirements for licensure, which may be one reason that voluntary reciprocity agreements are limited. The extent to which coordination in state policies should be fostered is an important issue for examination.

This analysis has assumed that only licensed teachers are employed in public schools. As discussed in Chapter 3, though, almost all states permit waivers of their licensure rules to allow school districts to hire teachers on an emergency basis under certain circumstances. To the extent that those waivers are used, the restriction that licensure tests impose on the supply of teachers will be loosened and the effects discussed above will be mitigated (i.e., the potential gains from accurate licensure testing will be reduced as might be the potential losses from inaccurate and costly testing).

RESEARCH ON TEACHER LICENSING TESTS AND TEACHER COMPETENCE

Questions about test validity are key to the analysis described above. The extent to which teacher licensure tests identify candidates with the knowledge and skills minimally needed for competent practice is a key concern. The content of teacher tests generally is determined through logical and empirical processes (Educational Testing Service, 1999a, 1999e). Educators are asked to identify the knowledge, skills, abilities, and dispositions that are minimally needed for teaching. Tests are constructed to align with these specifications. Standards are set for performance on tests in order to differentiate those candidates who have sufficient levels of competence to practice from those who do not. Scores on the tests are used by policy makers to help decide which candidates are licensed.

¹²

While K-12 enrollments are anticipated to increase by more than 10 percent in many states in the West and South by the year 2007, in most parts of the Northeast and Midwest enrollments are expected to decline (U.S. Department of Education, 1996a). This may be a particularly acute concern as student enrollments have been growing in some parts of the country and shrinking in others, while the distribution of trained teachers is uneven across states.

Page 122 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

Whether these practices result in tests that actually identify individuals who will become minimally competent beginning teachers is an important question. Research on the relationship between scores on teacher licensure tests and measures of teacher performance should provide some answers. To examine this question, the committee commissioned a review of the literature on the relationships between teacher licensure tests and teacher competence (Youngs, 2000). Electronic databases in education, psychology, and economics were searched; metanalyses and literature reviews were examined to identify other potentially relevant articles, books, and chapters; and researchers were contacted to learn about current or other recent research.

Initially, the search was limited to evidence about teacher licensure tests currently in use. This search strategy uncovered no relevant research. Some currently used tests are newly introduced and have not been in place long enough to support research. Next, the literature was examined for research on retired teacher licensure tests. This yielded a small body of studies, including work by Ferguson (1991), Ferguson and Brown (2000), Strauss and Sawyer (1986), Summers and Wolfe (1975), Sheehan and Marcus (1978), and Ayers and Qualls (1979). Finally, the search criteria were expanded to include research on the relationship between teacher performance and tests of the content domains currently measured by teacher licensure tests. This approach yielded additional information and expanded the committee’s analysis to include research that might inform questions about the relationship between performance on teacher licensure tests and teacher competence (Ehrenberg and Brewer, 1995; Ferguson and Ladd, 1996; Bassham, 1962; Rothman, 1969; Begle, 1972; Rowan et al., 1997; Clary, 1972). Thirteen studies were found.

The search could have been expanded to include studies of the relationship between teacher performance and teachers’ knowledge and skills in the areas that licensure tests examine, regardless of how the knowledge and skills are measured. This approach would have allowed the broadest possible range of indicators of teacher characteristics, including the number of courses that candidates took, whether candidates majored or minored in the subject taught, and the highest degree level obtained. The committee elected to not undertake this search; readers are referred to reviews in Teacher Quality and Student Achievement: A Review of State Policy Evidence (Darling-Hammond, 2000), A License to Teach (Darling-Hammond et al., 1999), and elsewhere.

There were substantial interpretive problems with the body of evidence uncovered. This type of research is very difficult to mount because of measurement and research design issues. Although it is difficult to examine the relationship between scores on teacher licensure tests and teaching quality, it is certainly possible and important to do so. Analyses of the relationships between scores on teacher licensure tests and effectiveness in the classroom would provide a better understanding of what the tests do and do not measure.

To understand the difficulties involved in determining the extent to which

Page 123 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

teacher characteristics affect student achievement, it is useful to describe a paradigm in which the existing analyses fall. One can think of student achievement as measured at any grade level as the outcome of the school and family resources provided to children over their lifetimes, not just within the grade level at which the measurement is taken. Because the qualities of the educational institutions and the availability of family-supplied resources that are complementary to achievement (e.g., books, computers, tutors) are generally not unrelated to each other, researchers have recognized the importance of obtaining measures of “inputs” into the production of student achievement from all of these sources even when their interest is in estimating the impact of only a single input, such as teacher subject matter knowledge. However, data are limited; researchers do not have measures of all of these inputs and measures of the ones they do have are often imperfect.

For instance, many studies compare district-wide or school-wide average teacher test scores with comparable averages in students’ test scores. Often the data on which these studies are based have only crude measures of family inputs. Moreover, researchers tend not to include in their analyses inputs, both of schools and families, prior to the grade level at which achievement is measured. It is, therefore, likely that districts or schools whose students score better than would be predicted given the observed school and family inputs are also districts or schools whose students have available to them important inputs that were not measured. To the extent that districts that serve higher scoring students, beyond that of their measured inputs, also employ higher scoring teachers (whatever the reason), the relationship between teacher test scores and student achievement will be confounded. That is, some of the effect of the omitted inputs that cause student achievement to be higher will be attributed to higher teacher test scores.

Other studies examine the issue using matched student-teacher data at the individual student level, rather than relying on district- or school-wide averages. The advantage of these data is that, with observations of many students within a district or school, district- or school-level inputs that are not measured can nevertheless be accounted for by making use of within-school (or within-district) variation. Thus, the problem that teachers are not randomly employed in districts (or in schools within districts) with respect to unmeasured inputs that influence student achievement is circumvented with such data. However, it may still be true that teacher assignments within districts, or even schools, are related to student achievement that is due to unmeasured inputs or student abilities, and such nonrandom assignment would again confound the effect of teacher test scores. Although the availability of such data generally reduces the extent to which there are omitted factors that affect student achievement, it is not possible to say that the bias due to nonrandom teacher assignment is lessened by the use of matched student-teacher data without further assertions about the teacher assignment process.

Having matched student-teacher data does not preclude the necessity to ac-

Page 124 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

count for the cumulative nature of student learning. Test score measurements at a single point in time (grade level) would be related not only to the characteristics of the current teacher and current family-provided inputs but also to the characteristics of past teachers and past family inputs (even prior to school attendance and arguably back to a child’s conception). There is no source of data that meets that requirement. One way researchers have attempted to circumvent this data limitation is by exploiting the availability of achievement measures at more than one point in time. Such studies look at the relationship of changes in achievement between two grade levels to school and family inputs applied between the measurements, the so-called value-added approach found in the education production function literature (Hanushek, 1986). The seemingly plausible argument that because the teacher’s task is to add knowledge to what students already know, one can ignore teacher and other inputs prior to the initial measurement, however, is in fact only valid under additional assumptions about how rapidly the effects of prior inputs diminish over time. Moreover, while measuring initial achievement “controls” for all of the inputs that went into the determination of initial achievement (in a specific way), it is still possible that there are omitted school and family inputs that affect (the change in) achievement, that are known to the school, and that are related to teacher assignment.

Measurement problems also complicate this research. Much is said in this report about the difficulty of measuring teachers’ effectiveness in the classroom; it has been noted that there is no commonly accepted valid and fair measure of effective teaching. Research is further hampered by the difficulty of accurately distinguishing minimally competent from minimally incompetent classroom practice. In Chapter 2 it is said that most current teaching standards do not specify whether the knowledge and skills they describe are to be demonstrated by minimally competent or more proficient beginning teachers. It is difficult to measure minimally competent performance in the absence of a clear definition. Most of the studies described below use student achievement as a proxy for teacher performance. Even comprehensive, highly reliable measures of student learning are incomplete indicators of teaching effectiveness. Available student achievement measures are considered narrow by some researchers and lacking in detail (Porter et al., 2000). One study uses principals’ ratings of teacher performance.

A final design obstacle in this field of research follows from discussion of measurement problems. The absence of job performance information for unlicensed examinees is a notable limitation. The criterion of greatest interest to research on the relationship between licensure test results and job performance is a measure of minimally competent beginning teaching. Because candidates who fail licensure tests generally are ineligible for licensure and employment as classroom teachers, job performance information is unavailable for them. However, the fact that unlicensed individuals are now hired in relatively large numbers by private schools and by public schools on an emergency basis makes possible the collection of job performance information for candidates scoring below passing

Page 125 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

levels. As noted earlier, well over half of the states permit individuals to begin teaching without meeting licensure test requirements, some for as long as the district says it cannot find qualified applicants (U.S. Department of Education, 2000a; Blair, 1999; Education Week, 2001). Job performance information is obtainable for these individuals. In addition, as states raise the passing scores on their tests, it will be possible to compare the performance of teachers who failed to those who did not fail under the new higher passing score.

These important measurement and research design problems limit the inferences that can be drawn from the existing research. The degree of bias in estimates of teacher effects associated with the use of these different sources of data, district averages, school averages, cross-sectional matched student-teacher data, and longitudinal matched student-teacher data that is likely to arise from nonrandom teacher assignment is unknown. The degree of error associated with incomplete and imperfect measurement also is unknown. All of the studies that the committee uncovered have one or more design and measurement limitations. Table 6.1 describes the studies and documents their measurement and design characteristics.

Table 6.1 describes for each study the measure of teacher performance used by the researchers. Eleven of the studies used student achievement tests as measures of teacher competence; all reported student test results on a continuous scale. One study used principals’ evaluations of teachers’ job performance. The table also describes the teacher licensure or other tests of teachers used by the researchers. Five studies examined teacher licensure tests, and seven looked at teachers’ performance on tests measuring some of the same knowledge and skills examined by licensure tests (e.g., ACT test). As with the student test data, all but one set of teacher results were reported on a continuous scale. The table additionally notes whether researchers included other teacher data, like degree type or racial/ethnic status, as measures of other characteristics potentially related to teacher performance. The table shows whether baseline data were available to describe students’ academic achievement prior to the teachers’ work with them. Likewise, it shows whether other student, school, or family data, such as teacher/pupil ratios, poverty levels, or language status, were available for study. The table also shows whether teachers, schools, or districts were the unit of analysis in the research and gives sample sizes for the studies. It notes the amount of time that elapsed between teacher testing and measurement of their performance in the classroom. Finally, the table documents the statistical procedures used by the researchers and records their interpretations of the data.

The remainder of this chapter provides a description of the findings from this body of evidence as the original authors presented them. Some readers will find problematic some of the inferences these researchers drew from the data, as does the committee. However, the committee would like to provide the reader with a sense of the research and the research findings it uncovered. What follows is a review of findings from research on tests of the knowledge and skills exam-

Page 126 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

TABLE 6.1 Research on Tests of Teachers and Teaching Outcomes

	Outcome Variables			Other Variables
Study	Student Achievement Measure	Teacher Evaluation by Principal	Teacher Test	Other Teacher Variables	Baseline Student Achievement Measure	Other Student Variables	School Variables
Basic Skills or General Knowledge
Ehrenberg and Brewer (1995)	Verbal and nonverbal test, reading and math test (test names not provided)		Verbal aptitude test (test name not provided)	√		√	√
Ferguson (1991); Ferguson and Brown (2000)	Texas Educational Assessment of Minimum Skills (reading, math)		Texas Examination of Current Administrators and Teachers (reading, writing, professional knowledge)	√		√	√
Ferguson and Ladd (1996); Ferguson and Brown (2000)	Basic Competency Test, Stanford Achievement Test (reading, math)		American College Test (English, reading, math, science)	√	For school analysis		√
	Basic Competency Test, Stanford Achievement Test (reading, math)		American College Test (English, reading, math, science)		Used data for third and fourth graders as baseline for district analysis

Page 127 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

	Study Characteristics					Findings
Family Variables	Unit of Analysis	Sample Characteristics	Sample Size	Time Between Teacher Test and Outcome Measure	Statistical Procedure Used	Estimated Relation-ship Between Teacher Test Data and Student Test Data or Principal’s Evaluation
√	School	Schools that had grades 3 and 6 or 9 and 12	969 elementary schools, 256 secondary schools		Synthetic gain scores (mean test scores of upper grades in a school minus mean test scores of lower grades in same school)	Positive and statistically significant for both elementary and secondary schools (higher verbal scores of teachers were associated with higher gains in scores for white, but not black, high school students)
√	District	All Texas school districts: Houston, Dallas, and very small districts	Ranged from 857 districts for grades 11 to 890 districts for grade 1	Within academic year and then 2 and 4 years later	Multiple regression	Positive and statistically significant
√	School	Schools with both third and fourth grades in same school	Grade 4 cohort students, data available for only 1/4 of teachers across schools; (690 schools); only 35 schools (of 690) with full data for teachers	ACT (from entrance to college, so time since ACT varies among teachers)		Positive and statistically significant for teachers’ scores on reading at school level; positive but not significant for math
	District		127 school districts			Positive and stastically significant results for students’ math test scores

Page 128 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

	Outcome Variables			Other Variables
Study	Student Achievement Measure	Teacher Evaluation by Principal	Teacher Test	Other Teacher Variables	Baseline Student Achievement Measure	Other Student Variables	School Variables
Subject Matter Knowledge
Bassham (1962)	California Achievement Test (reading, math)		Test of Basic Mathematical Understanding		√	√	√
Rothman (1969)	Test on Understanding Science, Project Physics Achievement Test, Science Welch Process Inventory		Test of Selected Topics in Physics, Test on Understanding Science	√	√
Begle (1972)	Mathematics Inventory (I–IV)		Algebra Inventory (Forms A and B) and Abstract Algebra Inventory (Form C)		√
Pedagogical Knowledge
Clary (1972)	Science Research Associates Achievement Series in Reading		Inventory of Teacher Knowledge of Reading	√

Page 129 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

	Study Characteristics					Findings
Family Variables	Unit of Analysis	Sample Characteristics	Sample Size	Time Between Teacher Test and Outcome Measure	Statistical Procedure Used	Estimated Relation-ship Between Teacher Test Data and Student Test Data or Principal’s Evaluation
	Teacher	Grade 6 teachers in an urban school district (14,000 students; 28 teachers)	28 grade 6 teachers in an urban school district (14,000 students)	Within academic year	Multiple regression	Positive and statistically significant for above average, but not below average, students
	Teacher	All students of participating teachers (number of students not provided)	51 high school physics teachers randomly selected from a list of 17,000	Within academic year	Canonical correlation	Positive and statistically significant relationship between teachers’ test scores and students’ scores
	Teacher	Teachers from across the country who participated in National Science Foundation summer institutes and grade 9 students	308 math teachers	Within a calendar year	Multiple regression	Significant positive but modest relationship between teachers’ test scores and students’ understanding of algebra, but not for student achievement
	Teacher	Most (23 of 25) grade 4 teachers in one district	23 grade 4 teachers		Multiple regression	Positive and statistically significant

Page 130 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

	Outcome Variables			Other Variables
Study	Student Achievement Measure	Teacher Evaluation by Principal	Teacher Test	Other Teacher Variables	Baseline Student Achievement Measure	Other Student Variables	School Variables
Summers and Wolfe (1975)	Iowa Test of Basic Skills		National Teacher Examination: Common Examination (general and professional knowledge)	√	√	√	√
Sheehan and Marcus (1978)	Metropolitan Reading Test (vocabulary, math), Iowa Test of Basic Skills (vocabulary, math)		Weighted Common Examinations Total (NTE; general and professional knowledge)	√	√		√ (controlled for background factors by entering pretest measures first)
Ayers and Qualls (1979)		Evaluation by Supervisor form	Weighted Common Examinations Total (NTE; general and professional knowledge) Education in the Elementary School	√
Strauss and Sawyer (1986)	Norm-Referenced Achievement (reading, math)		National Teacher Examination (plus five other non-teacher— related variables)		√	√	√

ined by current teacher licensure tests: (a) basic skills and general knowledge, (b) subject matter knowledge, (c) pedagogical knowledge, and (d) pedagogical content knowledge.

Page 131 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

	Study Characteristics					Findings
Family Variables	Unit of Analysis	Sample Characteristics	Sample Size	Time Between Teacher Test and Outcome Measure	Statistical Procedure Used	Estimated Relation-ship Between Teacher Test Data and Student Test Data or Principal’s Evaluation
√	School		Urban elementary schools (103 schools)	Used a school average of grade 6 teachers’ scores; time since test varies based on experience of teacher	Multiple regression	Negative and statistically significant but small
√ (controlled for background factors by entering pretest measures first)	Teacher	Students not randomly selected; class average of students	119 teachers, 1,836 students	Range of teacher experience (1–40 years); time varied since taking NTE Generally one year	Stepwise regression	Positive and significant association with students’ math and reading; relationships disappeared when teachers’ race was considered
	Teacher	Elementary and secondary teachers	84 elementary and 49 secondary teachers		Correlational (compared means and standard deviations)	Mixed, small correlations
√	District	High school juniors	145 districts in North Carolina (105 districts with capital stock info)		Production function	Significant but modest relationship between teacher test scores and student achievement

Tests of Basic Skills and General Knowledge

Three sets of researchers examined the relationships between teachers’ performance on basic skills or general knowledge tests and student achievement: Ehrenberg and Brewer (1995), Ferguson (1991), and Ferguson and Ladd (1996).

Page 132 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

Ferguson and Brown (2000) reexamined data from the two earlier Ferguson studies. One study examined teacher licensure test results (Ferguson, 1991), and the others examined other tests of basic skills. Ferguson (1991) studied teachers’ performance on the Texas licensing test, the Texas Examination of Current Administrators and Teachers (TECAT), which measures reading and writing skill, including verbal ability and research skills, as well as a limited body of professional knowledge. Ferguson found that the following four district average teacher and school variables were related to student performance on the Texas Educational Assessment of Minimum Skills examinations in reading and mathematics: TECAT scores, teachers’ experience, number of students per teacher, and percentage of teachers with master’s degrees. TECAT scores were found to account for 20 to 25 percent of all variation across districts in student average scores.

Ferguson and Ladd (1996) conducted similar district-level analyses in Alabama but used ACT scores (not scores on a licensing examination) as measures of teacher ability. School average scores for teachers on the ACT test were related to student achievement but less so than for the earlier TECAT study. Ehrenberg and Brewer (1995) also found positive relationships between teachers’ performance on basic skills tests and student achievement, though results varied for elementary and high school students and by students’ racial/ethnic status.

Subject Matter Knowledge

Research has also examined the relationships between teachers’ subject matter knowledge and their competence. Four sets of researchers looked at the relationship between tests of teachers’ subject matter knowledge and student achievement: Bassham (1962), Rothman (1969), Begle (1972), and Rowan et al. (1997). Bassham studied the relationship between teachers’ performance on a Test of Basic Mathematical Understandings and students’ mathematics gains on pre-and posttests over the course of a year. This researcher found a significant relationship between teachers’ and students’ scores only for students of above-average achievement. Rothman (1969) reported a significant positive relationship between teachers’ and student’ performance on some measures of science and physics knowledge. Begle found different relationships from year to year and class to class between teacher scores on the algebra inventory test and student achievement. Rowan et al. reported a positive and significant relationship between students’ performance on the 1998 National Educational Longitudinal Study (NELS) math achievement test and their teachers’ responses to a one-item measure of mathematics knowledge on the NELS teacher questionnaire.

Pedagogical Knowledge

Five studies examined the relationship between teacher pedagogical knowl-

Page 133 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

edge, as measured by paper-and-pencil tests, and their performance in the classroom: Clary (1972), Summers and Wolfe (1975), Sheehan and Marcus (1978), Ayers and Qualls (1979), and Strauss and Sawyer (1986). Four studies used student achievement test data as measures of teacher competence, and one used supervisor evaluations. Clary examined teachers’ understanding of how to teach reading, as evaluated by an Inventory of Teacher Knowledge of Reading, in relation to students’ reading achievement on pre- and posttests from the Science Research Associates Reading Achievement Series. The author reported statistically significant relationships between pedagogical knowledge and student performance, concluding that “there is a direct relationship between the person who exhibits proficient knowledge about teaching reading and that person’s success in producing students who make an appreciable amount of progress in reading achievement” (p. 15).

Four sets of researchers—Summers and Wolfe (1975), Sheehan and Marcus (1978), Ayers and Qualls, (1979), and Strauss and Sawyer (1986)—examined pedagogical knowledge as tested by the National Teacher Examinations (NTE), precursors to the current Praxis tests. Strauss and Sawyer looked at the relationships between students’ test scores and teachers’ performance on the NTE. The authors included data on six inputs in their examination of these relationships; they looked at the number of teachers in each of 145 school districts, the number of students per district, the number of high school students interested in postsecondary education in each district, the racial/ethnic composition of the schools, the value of the districts’ capital stock, and teachers’ test scores. A modest positive and statistically significant relationship was found between district average NTE scores and student test scores.

Two other studies looked at the relationship between NTE scores and student achievement test data. One found a small negative and statistically significant relationship between school average teacher and student scores (Summers and Wolfe, 1975). The other reported a positive significant relationship between teachers’ and students’ scores, but when teacher race was used as a control variable, teacher scores showed no effect on student achievement (Sheehan and Marcus, 1978). Ayers and Qualls (1979) reported small positive and small negative correlations between teachers’ NTE scores and principals’ ratings of teacher competence.

Pedagogical Content Knowledge

As noted earlier, the idea of pedagogical content knowledge is relatively new to education discourse. Prior to the mid-1980s, discussions of teacher knowledge tended to distinguish subject matter knowledge from knowledge of teaching or of students. No studies were found that examined licensure tests of pedagogical content knowledge. Other researchers have examined questions

Page 134 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

similar to those addressed by the studies discussed above including Andrews et al. (1980) and Hanushek (1986).

DIRECTIONS FOR RESEARCH

Teaching is a public enterprise. The public has an important interest in the quality of its teaching force and in current initiatives to improve teaching and learning. The committee encourages the federal government and others to conduct research that has the potential to improve the quality of licensure tests and, possibly, the capabilities of the beginning teacher work force.

In Chapters 4 and 5 the committee discussed the kinds of data that might provide supportive empirical evidence for the validity of teacher licensure tests. These include data on the relationships between test results and other measures of candidate knowledge and skill and data on the extent to which licensure tests distinguish minimally competent candidates from those who are not. The committee also described several licensing and employment conditions that permit observations of job performance for candidates who fail licensing tests. The committee explained that job performance data are now available for unlicensed candidates who are teaching with emergency licenses. Data are also available for candidates who passed licensure tests under different passing standards. These are fairly recent conditions for entering the teaching profession, and they provide an important opportunity to collect job-related evidence for candidates scoring above and below passing scores on the tests. This chapter describes and illustrates the difficulty of mounting this research. However, the measurement and research design problems that mark this research are not unique to teacher licensure tests. They characterize research on many other social science questions as well.

Given the complexity of these issues, it would be valuable to undertake an interagency study to define needed research. Representatives from the U.S. Department of Education, National Science Foundation, U.S. Department of Health and Human Services, National Institute of Child Health and Development, U.S. Department of Labor, and the Census Bureau should be appointed to define research aimed at examining and improving the quality of teacher licensure tests, teacher licensing, and, potentially, the capabilities of the new teacher work force. Representatives should include educators, child development specialists, labor economists, statisticians, demographers, anthropologists, and others.

These individuals should be charged with defining a multidisciplinary, multiple-methods research program. Representatives should specify the primary and secondary research questions, sampling designs, measurement tools, data collection methods, and data triangulation and analysis techniques to be used. They should specify a broad-based omnibus research program that begins collecting data on students and their families at a very early age; collects information on students’ physical and intellectual development, family characteristics, and school

Page 135 Cite

Suggested Citation:"6. Using Licensure Tests to Improve Teacher Quality and Supply." National Research Council. 2001. Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality. Washington, DC: The National Academies Press. doi: 10.17226/10090.

×

achievement; and follows students over time. The research should track students in and out of classrooms and schools, collecting relevant data on teacher and school characteristics. It should collect information about teachers’ backgrounds, education, and licensure; it should catalog school resources.

The research should examine licensure testing, beginning teacher performance, and student learning. Representatives should look at existing data sources, such as the National Center for Education Statistics Early Childhood Longitudinal Survey (www.nces.ed.gov/ecls) and the Bureau of Labor Statistics National Longitudinal Surveys of Youth-Child Data (www.states.bls.gov.nlsy79ch.htm), to evaluate their utility and build on any useful data collection systems.

CONCLUSION

Initial licensure tests are only one factor influencing the supply of new teachers. The quality and size of the pool of new teachers depend on many things, including recruiting efforts, other licensing requirements, labor market forces, licensing reciprocity, teacher salaries, and the conditions under which teachers work.

The committee’s analysis of teacher quality and supply issues leads to the following conclusions:

To the extent that the tests provide accurate measurements, setting higher passing scores would be expected to increase the proportion of teacher candidates in the hiring pool who are competent in the knowledge and skills measured by the tests, although higher passing scores will tend to lower the number of candidates who pass the tests. To the extent that test scores have measurement error, setting higher passing scores could eliminate competent candidates.
Reducing the number of newly licensed teachers could require districts to make difficult choices, such as hiring uncredentialed teachers, increasing class sizes, or increasing salaries to attract licensed teachers from other districts and states.
Setting substantially higher passing scores on licensure tests is likely to reduce the diversity of the teacher applicant pool, further adding to the difficulty of obtaining a diverse school faculty.
Little research has been conducted on the extent to which scores on current teacher licensure tests relate to other measures of beginning teacher competence. Much of the research that has been conducted suffers from methodological problems that interfere with making strong conclusions about the results. This makes it hard to determine what effect licensure tests might have on improving the actual competence of beginning teachers.