National Academies Press: OpenBook

Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium (2006)

Chapter: Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems

« Previous: Ongoing Challenges in Face Recognition
Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×

Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems

P. JONATHON PHILLIPS

National Institute of Standards and Technology

Gaithersburg, Maryland


Automatic face recognition is the only area of computer vision and pattern recognition with more than a decade of history of challenge problems and independent evaluations. These challenge problems have provided the face-recognition community with a large corpus of data for algorithm and technology development. Periodic evaluations have been conducted to measure progress in the performance of these algorithms and technologies. As face-recognition technology has matured, so has the sophistication of challenge problems and evaluations.

Prior to the first face-recognition evaluations, researchers reported performance on small propriety databases, usually of fewer than 100 people, for partially automatic algorithms. Partially automatic algorithms require that the location of the eyes (a “ground truth”) be provided; fully automatic algorithms process facial images without manual intervention or ground truths. From self-reported results on propriety data sets, it was not possible to make objective comparisons of different approaches or to access the best techniques.

The tradition of challenge problems and evaluations in face recognition began with the FERET Program, which ran from 1993 to 1997. Under this program, a large data set was collected and then partitioned into two sets—a development set and a sequestered set. The development set of images was made available to researchers for algorithm development. The sequestered set, as the name implies, was sequestered and used to make independent evaluations of algorithm performance on images they had not seen before. Because all algo-

Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×

rithms are tested on exactly the same data, it is possible to make direct comparisons of performance with different algorithms.

The initial goal of the FERET Program was to determine if automatic face recognition was possible. This question was answered in the affirmative by the August 1994 evaluation (Phillips et al., 1998). Two more FERET evaluations were conducted, in March 1995 and September 1996, to measure progress under the FERET Program (Phillips et al., 2000). The last FERET evaluation, in September 1996, measured performance on a data set of 1,196 people and 3,323 images.

The FERET evaluations showed significant advances in the development of face-recognition technology. The FERET evaluations addressed the following basic problems: (1) the effect on performance of gallery size and (2) the effect on performance of temporal variations. At the conclusion of FERET, state-of-the-art algorithms were fully automatic, could process 3,815 images of 1,196 people, and could recognize the faces of people from pairs of facial images taken 18 months apart.

In biometrics, including face recognition, performance is reported for three types of tasks—verification, identification, and watch list tasks. A verification task asks, “Am I who I say I am?” An identification task asks, “Who am I?” And a watch list task asks, “Am I someone you are looking for?”

In a verification task, a person presents his or her biometric and an identity claim to a face-recognition system. The system then compares the presented biometric with a stored biometric of the claimed identity. Based on the results of the comparison between the new and stored biometric, the system either accepts or rejects the claim. There are two types of system users—legitimate users and persons who attempt to impersonate legitimate users. Verification performance is characterized by two performance statistics that show the success rate for the two types of users. The verification rate is the rate at which legitimate users are granted access. The false accept rate (FAR) is the rate at which imposters are granted access. An ideal system has a verification rate of 100 percent and a FAR of 0 percent.

Unfortunately, no ideal system exists. In real-world systems, there is always a trade-off between the verification rate and the FAR. Therefore, it is critical that FARs and verification rates be considered together in determining the performance capabilities of a face-recognition system. It is easy to build a system that always grants access to a subject. This system will have a 100 percent verification rate because access will always be granted in response to a legitimate user’s request. Conversely, this system will also have a 100 percent FAR because it also grants access to imposters. The best system is one that balances the verification rate with the FAR in a manner consistent with operational needs.

Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×

FRVT 2002

The successor to the FERET series of evaluations is the Face Recognition Vendor Test (FRVT) series of evaluations. The primary objective of FRVT 2002, a large-scale evaluation of automatic face-recognition technology, was to provide performance measures against real-world requirements. FRVT 2002 measured performance of the core capabilities of face-recognition technology and provided an assessment of the potential of face-recognition technology for meeting the requirements for operational applications (Phillips et al., 2003).

FRVT 2002 was independently administered to ten participants that were evaluated under the direct supervision of the FRVT 2002 organizers at a U.S. government facility in Dahlgren, Virginia, in July and August 2002. Participants were tested using data that they had not seen previously. The heart of the FRVT 2002 was the high computational intensity test (HCInt), which consisted of 121,589 operational images of 37,437 people from the U.S. Department of State Mexican Non-Immigrant Visa Archive. From these data, real-world performance figures on a very large data set were computed for verification, identification, and watch list tasks.

The most likely application of face-recognition technology would use images taken indoors. FRVT 2002 results show that normal changes in indoor lighting do not significantly affect the performance of the top systems. In FRVT 2000, the results obtained using two indoor data sets with different lighting were approximately the same. In both experiments, the best performer had a 90 percent verification rate and a FAR of 1 percent.

For the best face-recognition systems, the recognition rate for faces captured outdoors, at a FAR of 1 percent, was 50 percent. Thus, face recognition from outdoor imagery remains a research challenge area. The FRVT 2002 database also consisted of images of the same person taken on different days. The performance results for indoor images showed that the capabilities of face-recognition systems had improved over similar experiments conducted two years earlier in FRVT 2000. The results of FRVT 2002 indicated that there had been a 50 percent reduction in error rates.

A very important question for real-world applications is the rate of decrease in performance as the time interval increases between the acquisition of the database images and new images presented to a system. For the top systems, performance degraded at approximately 5 percentage points per year.

One open question is still how the size of the database and the size of the watch list affect performance. Because of the large number of people and images in the FRVT 2002 data set, we were able to report the first large-scale results on this question. For the best system, the top-rank identification rate was 85 percent on a database of 800 people, 83 percent on a database of 1,600, and 73 percent on a database of 37,437. For every doubling of the size of the database, performance

Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×

decreased by 2 to 3 overall percentage points. In mathematical terms, identification performance decreased linearly with respect to the logarithm of the database size.

A similar effect was observed for the watch list task. As the watch list size increased, performance decreased. For the best system, the identification and detection rate was 77 percent at a FAR of 1 percent for a watch list of 25 people. For a watch list of 300 people, the identification and detection rate was 69 percent at a FAR of 1 percent. In general, systems performed better with a watch list of 25 to 50 people than with a longer watch list.

Previous evaluations had reported face-recognition performance as a function of imaging properties. For example, they compared the differences in performance for indoor and outdoor images, or frontal and non-frontal images. FRVT 2002, for the first time, considered the effects of demographics on performance. This revealed two major effects. First, recognition rates for male images were higher than for female images. For the top systems, identification rates for male images were 6 to 9 percentage points higher than for female images. For the best system, identification rates for male images was 78 percent and for females 69 percent. Second, recognition rates for images of older people were higher than for images of younger people. For 18 to 22 year olds, the average identification rate for the top systems was 62 percent, and for 38 to 42 year olds it was 74 percent. For every ten-year increase in age, average performance improved approximately 5 percent through age 63. All identification rates were computed from a database of 37,437 individuals.

Since FRVT 2000, new techniques and approaches to face recognition have emerged. Two of these new techniques were evaluated in FRVT 2002. The first was a three-dimensional morphable model technique developed by Blanz and Vetter (1999) to improve recognition of non-frontal images. We found that Blanz and Vetter’s technique significantly improved recognition performance. The second technique was recognition from video sequences. Using FRVT 2002 data sets, we found that recognition levels using video sequences was the same as with still images.

In summary, several key lessons were learned from FRVT 2002:

  • Given reasonable, controlled indoor lighting, the current state of the art in face recognition is 90 percent verification at a 1 percent FAR.

  • The use of morphable models can significantly improve non-frontal face recognition.

  • Watch list performance decreases as a function of size—performance for smaller watch lists is better than performance for larger watch lists.

  • Demographic characteristics, such as age and sex, can significantly affect performance. Therefore, accommodations should be made for demographic information.

Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×

FACE RECOGNITION GRAND CHALLENGE

In the last few years, researchers have been developing new techniques fueled by advances in computer-vision techniques, computer design, sensor design, and the growing interest in fielding face-recognition systems. Proposed new techniques include recognition from three-dimensional (3-D) scans, recognition from high-resolution still images, recognition from multiple still images, multi-modal face recognition, multi-algorithms, and preprocessing algorithms to correct for variations in illumination and pose. The hope is that these advances will reduce the error rate in face-recognition systems by an order of magnitude over FRVT 2002 (Phillips et al., 2003).

The Face Recognition Grand Challenge (FRGC), a technology-development project at the National Institute of Standards and Technology, is designed to achieve this performance goal by pursuing the development of algorithms for all of the proposed methods. To facilitate the development of new algorithms, a data corpus of 50,000 recordings, divided into training and validation partitions, was provided to researchers. The images consisted of 3-D scans and high-resolution still images taken under controlled and uncontrolled conditions. The 3-D scans consist of both shape and texture channels.

A primary goal of the upcoming FRVT 2006 is to determine if the goals of FRGC have been met.1 The starting point for measuring improvements in performance is the HCInt of FRVT 2002, which used images taken indoors under controlled lighting conditions. The performance point selected as the reference is a verification rate of 80 percent (error rate of 20 percent) at an FAR of 0.1 percent (the performance level of the top three FRVT 2002 participants). An order of magnitude increase in performance would be a verification rate of 98 percent (2 percent error rate) at the same fixed FAR of 0.1 percent.

SUMMARY OF GRAND CHALLENGE PERFORMANCE

Participants in FRGC submitted raw similarity scores to FRGC organizers on January 14, 2005 (for a detailed description of the FRGC challenge problem, data, and experiments, see Phillips et al., 2005). The experiments in FRGC ver2.0 are designed to advance face recognition in general, with an emphasis on 3-D and high-resolution still imagery. Ver2.0 consists of six experiments.

Experiments

Experiment 1 measures performance on the classic face-recognition problem—recognition of a faces from frontal images taken under controlled illumi-

1  

FRVT 2006, which is scheduled to begin on January 30, 2006, is open to academia, research institutions, and companies.

Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×

nation. To encourage the development of high-resolution recognition, all controlled still images in this experiment are high resolution. In biometric evaluations, the set of images known to a system is called the target set, and the set of unknown images presented to a system is called the query set. In Experiment 1, the biometric samples in the target and query sets consist of a single, controlled still image.

Experiment 2 is designed to examine the effect on performance of multiple still images. In this experiment, each biometric sample consists of four controlled images of a person taken in a subject session.

Experiments 3, 5, and 6 look at different potential implementations of 3-D face recognition. Experiment 3 measures performance when both the target and query images are 3-D. There are three versions of Experiment 3. The main version compares both the shape and texture channels of 3-D images. Experiment 3t compares just the texture channels, and Experiment 3s compares just the shape channels. In all versions of Experiment 3, the target and query sets consist of 3-D facial images. One potential scenario for 3-D face recognition is that the target images are 3-D and the query images are two dimensional (2-D) images.

Experiment 4 is designed to measure progress on recognition from uncontrolled frontal still images. In this experiment, the target set consists of single controlled still images, and the query set consists of a single uncontrolled still image.

Experiment 5 looks at the same scenario as Experiment 3 using controlled query images. Experiment 6 looks at the same scenario with uncontrolled query images. In both Experiments 5 and 6, the target set consists of 3-D images.

Table 1 and Figure 1 summarize our results. Table 1 shows the number of similarity matrices analyzed for each experiment. The bar graph in Figure 1 summarizes performance for each experiment by the verification rate at a FAR of 0.001, the vertical axis. Three statistics are reported for each experiment: the performance of the baseline algorithm (left bar); the best performance among the submitted similarity matrices for an experiment (right bar); and the median performance over submitted results for each experiment (center bar). For Experiments 5 and 6, no baseline algorithm was provided and only one result was submitted, which is reported.

The maximum score for Experiment 1 was 99 percent, and the median was 91 percent. The comparable scores for Experiment 2 are 99.9 percent and 99.9 percent. Because FRGC is a challenge problem and the results are based on raw

TABLE 1 Number of Results Submitted for Each Experiment

 

Experiment

 

1

2

3

3t

3s

4

5

6

Number of results

17

11

10

4

5

12

1

1

Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×

FIGURE 1 Summary of performance results for FRGC Experiments 1, 2, 3, 3t, 3s, 4, 5, and 6.

similarity scores submitted by participating groups, the results are not conclusive that the performance goals of FRGC have been met. However, they do provide evidence that the goals are likely to be met. The difference in performance between the results for Experiments 1 and 2, especially for median scores, indicate that having multiple still images of a person can potentially improve performance.

FRGC is the first time a large set of 3-D facial imagery was made available. The maximum score of 97 percent for Experiment 3 shows the potential of using 3-D facial imagery for face recognition. The results for Experiment 3 were obtained only three months after the first release of a large 3-D data set. By comparison, the results for still images are based on more than a decade of intensive research after the release of the first large still-image data sets.

Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×

CONCLUSION

The FERET, FRVT, and FRGC projects and evaluations have been instrumental in advancing automatic face-recognition technology. Prior to FERET, it was not possible to compare competing methods or to make direct comparisons of the effectiveness of different algorithms. FRVT 2002, which supplied reported performance rates on a large data set of operational images, served as a baseline for progress under FRGC. FRGC is facilitating the development of the next generation of face recognition. Progress under FRGC will be measured by FRVT 2006.

REFERENCES

Blanz, V., and T. Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. Pp. 187–194 in SIGGRAPH ’99, Proceedings of the Annual Conference on Computer Graphics, August 8–13, 1999, Los Angeles, California. New York: Association for Computing Machinery.


Phillips, P.J., H. Wechsler, J. Huang, and P. Rauss. 1998. The FERET database and evaluation procedure for face-recognition algorithms. Image and Vision Computing 16(5): 295–306.

Phillips, P.J., H. Moon, S. Rizvi, and P. Rauss. 2000. The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10): 1090–1104.

Phillips, P.J., P.J. Grother, R.J. Michaels, D.M. Blackburn, E. Tabassi, and J.M. Bone. 2003. Face Recognition Vendor Test 2002: Evaluation Report. Technical Report NISTIR 6965. Gaithersburg, Md.: National Institute of Standards and Technology. Available online at: http://www.frvt.org.

Phillips, P.J., P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek. 2005. Overview of the Face Recognition Grand Challenge. Pp. 947–954 in IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE.

Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×
Page 15
Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×
Page 16
Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×
Page 17
Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×
Page 18
Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×
Page 19
Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×
Page 20
Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×
Page 21
Suggested Citation:"Designing Biometric Evaluations and Challenge Problems for Face-Recognition Systems." National Academy of Engineering. 2006. Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11577.
×
Page 22
Next: Large-Scale Activity-Recognition Systems »
Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2005 Symposium Get This Book
×
Buy Paperback | $50.00 Buy Ebook | $40.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

This volume includes 16 papers from the National Academy of Engineering's 2005 U.S. Frontiers of Engineering (USFOE) Symposium held in September 2005. USFOE meetings bring together 100 outstanding engineers (ages 30 to 45) to exchange information about leading-edge technologies in a range of engineering fields. The 2005 symposium covered four topic areas: ID and verification technologies, engineering for developing communities, engineering complex systems, and energy resources for the future. A paper by dinner speaker Dr. Shirley Ann Jackson, president of Rensselaer Polytechnic Institute, is also included. The papers describe leading-edge research on face and human activity recognition, challenges in implementing appropriate technology projects in developing countries, complex networks, engineering bacteria for drug production, organic-based solar cells, and current status and future challenges in fuel cells, among other topics. Appendixes include information about contributors, the symposium program, and a list of meeting participants. This is the eleventh volume in the USFOE series.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!