Cover Image

Not for Sale



View/Hide Left Panel

INTRODUCTION

DR. NORWOOD: I would like to welcome you all. We have a purpose today. Our purpose, after a brief update, which we will get from John Thompson, about the fact that the census is still alive and running and, from what I hear, doing well, is to have a very careful review of the design for the Accuracy and Coverage Evaluation Survey, A.C.E.

This is not going to be a workshop to review dual-systems estimation. That is not because we do not realize that that is related very much to A.C.E., but rather because this panel plans to have a separate workshop just on dual-systems estimation late this year or in January. At that meeting, we want to be certain that we have people representing all sides of that issue. As you all know, in 1990 at least, there were people who were very strongly for adjustment through dual-systems estimation and there were many people who were professionally very much opposed to it. I would like to have a discussion of dual-systems estimation with both perspectives available to the panel. We intend to do that so that we can have a technical discussion with people of all views presented. We hope to do that in mid-January.1

Today, insofar as possible, I would like to keep the discussion to the specific elements of the A.C.E. design. I should tell you that I am not naïïve enough to believe that I should forget all that I have learned about the fact that when you look at the design of the survey, you have to think about its uses. I am quite aware of that. But I do feel that we need to have a very careful review of dual-systems estimation and that, in order to do that, we need to have a group of people who are critical of it, as well as a group of people who favor it. I think we will have ample time to do that in January.

Today we have invited a group of people who are skilled in survey and sample design. What we plan to do is, first, hear from John Thompson, who will give us an update on where they are. I am very pleased to see that John seems relaxed and comfortable. He is the man with all of the difficult responsibilities of seeing that all these pieces fit together and that the Census Bureau acquits itself well.

UPDATE ON CENSUS 2000

MR. THOMPSON: Let me just hit a few of the highlights of where we are right now. I will start with the decennial budget. We are operating under a continuing resolution. Thanks to bipartisan support in Congress, we have the money we need to keep operating. So, as last year, thanks to the Congress for understanding the census and realizing its importance and making sure that we have the money that we need to keep operating.

A couple of reports have come out recently from the U.S. General Accounting Office. They are available. One is a review of our Local Update of Census Addresses (LUCA) process. I will talk about that in a little bit. We also spent the summer with some General Accounting Office auditors. They went over our 2000

1  

This workshop was held in February 2000 (see National Research Council, 2001b).



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census INTRODUCTION DR. NORWOOD: I would like to welcome you all. We have a purpose today. Our purpose, after a brief update, which we will get from John Thompson, about the fact that the census is still alive and running and, from what I hear, doing well, is to have a very careful review of the design for the Accuracy and Coverage Evaluation Survey, A.C.E. This is not going to be a workshop to review dual-systems estimation. That is not because we do not realize that that is related very much to A.C.E., but rather because this panel plans to have a separate workshop just on dual-systems estimation late this year or in January. At that meeting, we want to be certain that we have people representing all sides of that issue. As you all know, in 1990 at least, there were people who were very strongly for adjustment through dual-systems estimation and there were many people who were professionally very much opposed to it. I would like to have a discussion of dual-systems estimation with both perspectives available to the panel. We intend to do that so that we can have a technical discussion with people of all views presented. We hope to do that in mid-January.1 Today, insofar as possible, I would like to keep the discussion to the specific elements of the A.C.E. design. I should tell you that I am not naïïve enough to believe that I should forget all that I have learned about the fact that when you look at the design of the survey, you have to think about its uses. I am quite aware of that. But I do feel that we need to have a very careful review of dual-systems estimation and that, in order to do that, we need to have a group of people who are critical of it, as well as a group of people who favor it. I think we will have ample time to do that in January. Today we have invited a group of people who are skilled in survey and sample design. What we plan to do is, first, hear from John Thompson, who will give us an update on where they are. I am very pleased to see that John seems relaxed and comfortable. He is the man with all of the difficult responsibilities of seeing that all these pieces fit together and that the Census Bureau acquits itself well. UPDATE ON CENSUS 2000 MR. THOMPSON: Let me just hit a few of the highlights of where we are right now. I will start with the decennial budget. We are operating under a continuing resolution. Thanks to bipartisan support in Congress, we have the money we need to keep operating. So, as last year, thanks to the Congress for understanding the census and realizing its importance and making sure that we have the money that we need to keep operating. A couple of reports have come out recently from the U.S. General Accounting Office. They are available. One is a review of our Local Update of Census Addresses (LUCA) process. I will talk about that in a little bit. We also spent the summer with some General Accounting Office auditors. They went over our 2000 1   This workshop was held in February 2000 (see National Research Council, 2001b).

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census budget in great detail. Their report is also out. It documents the budget fairly well. It also describes what is in it. It is a very interesting report, and I recommend it. We have had a couple of hearings recently at which Kenneth Prewitt testified. One was on how we tabulate data for Puerto Rico. The other one was on the LUCA program. The Census Monitoring Board has also issued two reports recently. The first one was a joint report that both the presidential and congressional side issued on advertising. It is also a pretty interesting report. Another one came out on the congressional side, discussing the statistics of the 1990 Post-Enumeration Survey [PES] block-level data. We have issued a decision memo recently, which would be of interest to everyone, I believe. We have described how we are going to tabulate the data for purposes of redistricting under Public Law 94-171. Basically, we have looked at how we tabulate racial data. This is the first time in the census that we are allowing respondents to report more than one race. We had some initial ways to tabulate the data in the dress rehearsal. Basically, we looked at that, our users looked at it, the Department of Justice looked at it, and we have come to the conclusion that the best way for us to support the needs of the country is to provide the multiracial data as collected. There are basically 63 different ways respondents can report race in census 2000. We are going to tabulate those at the block level and higher, so that the users of the redistricting data will have the data as reported, and that will meet the needs of everyone. We have just finished a big year for the address list. We call fiscal year 1999 the year of the address list. We have done a lot of work on developing the address list. We talked about this a little bit before. We started in the fall, where we listed all the addresses in rural areas. We continued in the winter and spring by doing what we call the 100 percent block canvass, where we took our city-style addresses and went over the ground with our own people. We also allowed state and local governments to review the address list. This was our Local Update of Census Addresses program. We did that for both city-style and non-city-style areas. We have gotten very good participation. We put all of the results of the address list together and have prepared a computer file that we are using to address our questionnaires. We are basically finishing our address work with the final stages of LUCA. We got addresses from local governments, we matched them to our files, and we are now feeding back to them the results of the addresses that we cannot verify. The next stage will be for the local governments to appeal, if they desire. We have finished all the process for the rural areas, the non-city-style address areas. The local governments are receiving their feedback. We are in the final stages of doing field reconciliation for the city-style addresses. We will finish that next week, and then we will be starting to feed back to the city-style governments the results of what we did with the addresses, so they can decide whether they want to appeal or not. We are opening up our data-capture centers. We opened Baltimore in June. We opened up the National Processing Center in the Jeffersonville area. We are opening up the Pomona, California, site next week, and then, in November, we

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census are opening up our Phoenix site. All that is going extremely well. The sites are functioning. We are doing an operational test in the Baltimore site. We are waiting to get the results from that. We processed several million questionnaires through Baltimore to look at various aspects of how the system will work. We are very busy right now printing census questionnaires. Basically, we are printing 24 hours a day, seven days a week. We have 34 printing contracts out there to print over 426 million questionnaires. We have printed about 300 million questionnaires, and, as I said, we have started addressing the questionnaires that we are going to mail or deliver. That process has started. We have had some experience so far with recruiting and hiring. We have hired about 141,000 temporary persons, mostly for address-list development. We are very happy that we have hired over 5,000 welfare-to-work people. That exceeds our goal. Our promotion outreach program is also under way. We have gotten over 7,000 complete-count committees formed. A complete-count committee is a local government with some partnering local organizations that will agree to work with us to promote the census locally. We are very pleased that we have 7,000 already. We have also gotten over 29,000 regional partners. These are local organizations that have signed up with the Census Bureau to help us promote the census at a local level. We are very pleased with that. We have also hired over 600 of our total 642 staff we are calling partnership specialists, who will be out there in the communities working with these organizations. The Census to Schools Program is well under way. We mailed out over 900,000 invitations for teachers to participate in the program. We have gotten back over 300,000 requests for materials. We are very pleased with that as well. That is basically a synopsis of where the census is. Right now we are on schedule. We are in a little bit of a lull right now. We are opening up our local census offices. We are finishing up the address-list work. We are getting ready for the next big stage, which will be the mail-out and recruitment for nonresponse follow-up. DR. NORWOOD: We will move on now to our workshop. But before I ask Howard to begin his presentation, I am going to take my prerogative as chair and say a couple of things that have been bothering me a great deal. I think it is important for us, as we begin a workshop on A.C.E., to recognize that everyone in this city is running around worrying about the political uses of the census. I recognize that that is extremely important. But I would point out to you all that today, as we look at the design of A.C.E., we should recognize that there are a lot of other uses for A.C.E., quite apart from whether you adjust or do not adjust. Much of the discussion really ought to focus on the fact that even when there are not problems found, there are a lot of uses—trying, first of all, to know where we are—and many of the uses do not get down to the block level. There are many uses of the census which are national in scope, which are regional, which are state, and then, within states, a variety of different kinds of configurations—including, of course, election districts. Census data are used for program allocation at all levels of government. I hope I can put in a plug for the fact that they are also used for analysis of where this country has been and projections of where it is heading.

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census So as we consider the issues that are going to be discussed today, I think we should keep in mind a very broad framework for the uses of the census during the decade. These are the uses that are always overlooked, because people in this country tend always to focus on the particular political issue of the day. Important as that is, having spent a good bit of time in the federal statistical system, I can tell you that there are a lot of other uses of the census data that, in my view, are, in the last analysis, equally important. So let us try to have a broad perspective of where we are heading. Howard Hogan is going to present to us the current plans for this survey [A.C.E.]. For each of the topics, we have invited some guests to comment and give us their views. Our panel members always have something to say, and we will all participate. Before Howard begins, I would like to take the opportunity to say that I have spent a lot of time looking at the materials the Census Bureau has provided. I am delighted, really, and a little surprised, that we have received so much so quickly, and in such detailed form. I have worked with the Census Bureau a long time, and I do feel that all of the people at the Census Bureau should be commended for having provided us with as much information as possible, at a stage when, knowing how statistical agencies in general operate, it is very difficult for them to do this. I want to thank Rajendra Singh for his work in liaison with us and for seeing to it that everything happened quickly and on time. I want to thank John for seeing to it that we had the material we needed, and Jay Waite and everybody else. Howard, I know (because I have been hearing about all this on a daily basis), has done an enormous amount. I think we should recognize that this workshop is, in many ways, extremely important, because we have a lot of information, and will have—you will hear it all presented—and I do think the Census Bureau deserves commendation for having been as cooperative as it has been. Many of you who know me know that that is quite a statement from me. OVERVIEW OF A.C.E. DR. HOGAN: Many of the people who wrote the background materials are in the audience, so you were talking to the people who did the real work. I will convey your message to the ones who could not make it here today. I want to begin by thanking you and the panel and the discussants and guests and the CNSTAT staff and the Census Bureau staff for coming. Looking over the agenda, I had two emotions, one of which was fright and the other of which was to feel extremely flattered that anybody could possibly look at this agenda and still show up. Thank you very much. In timing these discussions and choosing topics, we have tried to get a delicate balance between having some real results to present and some substance to discuss and, on the other hand, getting to the point where everything is all sewn up and there is nothing left to discuss. I think Janet and the panel will find us pretty well in that process where we have a lot of stuff here, but we are still at a stage where it will be useful to hear from the panel and get their comments and their insights.

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census The first part of the agenda is an overview of status and plans. I am going to use this as an opportunity to review all sorts of things. For the panel members, who have been keeping up with this and hearing this often, this will be a review, but there are other people in the audience for whom I think this probably is worthwhile, to sort of set the stage for our more detailed discussions later. Where are we in terms of the Accuracy and Coverage Evaluation Survey? First, we have drawn what we call the listing sample. That sample was designed before the Supreme Court decision, back when we thought we were going to take A.C.E., or Integrated Coverage Measurement (ICM) in those days, to 750,000 housing units. So it is a very large sample. Indeed, to support a sample of 750,000 housing units, we would actually do a sample of close to 2 million housing units. That sample was drawn last summer. It has essentially a few strata worth mentioning here at the beginning, sort of general strata, based on the size of the blocks—that is, blocks that have more than two housing units [in the A.C.E. listing] or that have more than two housing units listed by the census Master Address File [MAF]. That is sort of the general sample. We divide that into two groups of medium blocks, 2 to 30, and large blocks, 30 and above. (I do not know if I got that cutoff exactly right.) Then we have also a stratum of small blocks, blocks with zero, one, or two housing units in them, and a stratum of blocks on American Indian reservations. That sample was sorted out last summer, based on the address files that existed at the beginning of the census. We also sorted that sample within states, based on 1990 demography, the most recent we have, to make sure the sample is spread out proportionately within the states. We allocated the sample to the states based on our plans for the ICM, which really was the sample we had designed for supporting state estimates, and drew the sample. Then we printed our maps, printed our listing books, hired and trained interviewers, and sent them out in the field. We have about 5,000 interviewers out in the field. We started address listing in the beginning of September. It is going as well as anybody who has ever run a real survey can expect. If you heard nothing amiss, then you would know they simply were not telling you what was going on. It is going pretty well, we think. We also have hired and trained our matching technicians at the Jeffersonville National Processing Center. All the A.C.E. matching will be done in one location. We do not talk about it, but that is a huge advantage, made possible because of computerization of the census. We have hired about 50 technicians, and we are training them. We will be training them—we started in September—all the way through the end of the process. This will be a core staff. At each stage of the matching—and we have many stages that I will talk to you about in a minute—we have essentially computer matching, followed by clerical, followed by having technicians doing quality control and problem cases. We have about eight permanent census people out in Jeffersonville that have been matching, some of them, for 30 years, who handle, basically, pathological cases. The technicians are, in a very real sense, the core of this, because they are the quality control of the large body of clerks. It is an excellent group. Many of them have done matching, either in 1990 or our various dress rehearsals. We are quite pleased with our recruitment of those people.

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census We have also designed the methodology to allocate the interviewing sample to the states. We discussed this with the panel, I think, last time. We have to reduce our sample to support 750,000 down to a sample to support 300,000 housing units. We have developed a methodology to allocate that to the states. Based at least in part on some of the ideas the panel recommended, we have the panel’s suggestion implemented of a minimum state sample size of 1,800 housing units, except in Hawaii, where it is about 3,700 housing units. The reason for that is to try to get enough Hawaiians in the race group. Essentially, we assumed proportional allocation within states and simulated various designs to give us measures of reliability when aggregated up to the state. That sample has been drawn. We are now ready to move on. The upcoming operations—and this will, again, be important for some of the things we are going to discuss today—we are going to have the listing for A.C.E. going out with the block maps and listing the housing units. Then we are going to have the block-sample reduction. I will talk in a moment about the various kinds of sample reduction, the various steps to get from the 2 million housing units down to the 300,000. Then we will do housing unit matching. That will be done in the early winter. That has several stages—before follow-up, after follow-up. Each one has a quality assurance operation on it, so it is a multi-step process. Then we will do large block sub-sampling. Then right after those, when the census returns start coming back, we will be actually doing personal interviewing—telephone interviewing for a handful of cases, and then personal visit interviewing. Then we will have person matching, person follow-up, after-follow-up person matching, missing data estimation. I think that is on the agenda for later in the day—what we call the population sample [P-sample], the sample designed to figure out who was missed, and the enumeration sample [E-sample], the sample designed to see whether census records are correct or not. Then we will finally get to the topic of the upcoming workshop, the actual dual-systems estimator and dual-systems estimation. Then finally, we will be carrying that down and adjusting the actual data file. So those are the steps. The sequence of the steps is important for the way we handle the sample and the sampling issues. Essentially, in getting from the 2 million we have listed to the 300,000 that we are going to interview, we have three steps. One is what we call block-sample reduction. It is sort of an arbitrary term. It helps us keep our words straight from other stages. That is pretty much what we are going to be discussing today when we get to this afternoon on remaining issues for A.C.E. sample design. We have too many clusters, because we selected them for a much larger number of housing units. Which ones to keep and which ones not to keep—that is a new operation made necessary because of the new design following the Supreme Court decision. That operation has the advantage of doing this in two steps rather than one. We were sort of forced into the one because of the timing, when we had to draw our samples, print them out, hire the interviewers. But, in addition, we drew the listing sample back in June. As John said, the census has gone on and updated their address list and has more recent information about how many housing units

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census in these blocks are on the decennial Master Address File. In addition, as I mentioned, we have gone out and done our own listing. So now we have two pieces of information on all of our sample blocks. We have the updated census MAF, more accurate than was available initially, and the A.C.E. address file. Of course, we have the difference between those. When we decide how to reduce the sample, that is a very important piece of information. We also have a stage of sample reduction that we have long planned. We did this in 1990. That is the large block sub-sample. Try to keep that separate from the block-sample reduction. This is where you go out and get blocks of 500 to 1000, 2000 housing units, for all sorts of reasons. We do not want to go out and interview 500 or 1000 housing units, so we are going to sub-sample that down to 30 housing units in a cluster. We do that after the housing-unit matching, because we want to segment and sub-sample these large blocks in a way that the housing units that stay in from the population side, our independent A.C.E. listing side, overlap with the housing units that we retain on the census side. As in our nonsub-sample blocks, we have the same housing units from the population side and the census side, so we can match easily and resolve easily. When we sub-sample these large blocks, we want to retain the same segments in both sides. We do that after we have done the housing-unit matching. Finally, we have our second stage of the small block sampling. There are just millions of blocks out there with zero, one, two housing units. They are very expensive to list or interview. We have done two things this time, one new. When possible, we have associated the small blocks with a medium or large block, thus cutting down the small-block universe at very little additional cost. The universe left over of these small blocks is now smaller than it has been in the past. But our methodology of handling that is unchanged. We select a large sample. We go out and list. Many of them will have nothing there. Some will have one or two. Some will have 100; some will have 500. As I said, we drew our sample based on the best information we had back in June. Using that information, we will then take a second-stage sample. Essentially, in our sampling, we have the three stages, the block-sample reduction, the large block sub-sample, and the second stage of the small block sample. That will get us from our big sample to our small sample—well, 300,000 is not exactly a small sample, but if you had been looking at 750,000, you would think it was small. Sampling is one stage, but then we have the other topic of today, which is how we define our post-strata. With the sampling, what we need to do—we have already allocated, in terms of the block-sample reduction, to the states—we now need to figure out how to allocate that within the states. We would like to retain adequate sample to make sure we can support good post-strata. It is virtually certain that amongst our post-stratification variables will be some sort of race variable. So in allocating the sample within state, we want to take into account a couple of things, one of which is the racial makeup of the blocks within state to make sure we have an adequate representation of the various groups. Unfortunately, our most recent information, as I will remind you several times, is the 1990 census. So we have to go with that.

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census Second, we now have more information. We can differentiate between blocks where the A.C.E. lists more addresses than the census and blocks where the census lists more than the A.C.E. We have a lot of information on the housing units in the block that we did not have initially. We can take that into account to select better measures of size in a traditional sample kind of context. Also, if the census is a lot bigger than the A.C.E., that might be an indication of either a coverage problem or a geocoding problem that would have a huge variance implication. If the A.C.E. is much bigger than the census, clearly that would also have huge variance implications. So in allocating the sample within the states, we are looking at how to take into account our demographic information and how to take into account these new measures of size. That is the topic of this afternoon. The next stage—and we will be spending, I think, most of the morning on this—is how we define our post-strata. That will be very important in our design. The post-strata serve essentially two purposes. One, that is how we form the dual-systems estimator. For that we want sort of homogeneous capture probabilities. We are also bringing in the E-sample, the probability of the census record being miscoded or being erroneous. We would like that to be uniform within post-strata and as different between post-strata as we can. But, in addition, we use the post-strata for the carrying down, for the distribution of the sample to the small areas. So our choice of post-strata is very important, first in terms of the DSE [dual-systems estimate], correlation bias kinds of arguments, but also in the ability to set coverage patterns of the local areas. We are spending a lot of time on this and are certainly seeking advice from a number of groups, including this one, on the best way of going about that. The strategy that we have been using so far is—and we are really going back in history now—if you remember, in 1990, the first set of estimates that we cranked out had 1,392 post-strata. But after the dust had settled and we looked at it, we came up with the set of post-strata that we have been using for our intercensal work, including for the controls to the CPS [Current Population Survey], which we refer to as the 357 post-strata design. That is the one, probably, that people are most familiar with, post-stratifying on race, tenure, age, sex, region of the country, and three measures of size. That is the 357. We developed that around 1992. It seems to have withstood the test of time. People understand it; we understand it. So that is sort of where we have been going in a lot of our thinking. Just to keep your numbers straight, if you take 357 post-strata divided by the seven age/sex groups, I think you get 51 post-strata groups. For the dual-systems estimate, the age/sex is fairly important, because it makes things a little more homogeneous. Since almost all areas tend to have males and females, old and young, the age/sex has very little predictive ability in carrying down the estimates. So in a lot of our research we focus on the 51 groups, as opposed to the 357. Beginning with the variables [used for the 357 post-strata] I have just mentioned, we threw in a wide possibility of other variables and asked all sorts of people, “What is your favorite variable,” in a very broad exploratory kind of approach. I will be talking about that, including some variables that not everybody

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census agreed on, but we thought we would throw them in and see what happens. We ran some regressions on 1990 to see how well any of these predict, and we studied their properties in terms of not just ability to predict the undercount, but also their consistency and usability for use in post-stratification. After we had done this for a while, we came up with a set of post-stratification variables that looked reasonable, and we are going to start simulating them. We selected some candidate post-stratifications, and we are going to simulate them, based on, again, 1990 data. For each of our designs, we are going to compute a predicted value, map that back to the fifty-one 1990 post-strata groups—in the future, we would map that back to the state or the city or something—and calculate the variance contributions, trying to get a feel for the synthetic bias. Obviously, that is an exceedingly difficult task. We cannot predict the synthetic bias, but we can perhaps scale it so that we make adequate allowance for it. Then we will estimate, in a very broad sense of the word, the mean squared error and variance for various proposals on the table. That work and a lot of the work I will be talking about today is based on the 1990 census, some of it based on the dress-rehearsal data. But then in all this work, we have to translate what we have learned from 1990 to what we might expect in 2000. Something that might predict very well in 1990 may not predict as well in 2000. It is a different census. We have made improvements in a number of areas, which John will be happy to tell you about. We have an advertising campaign, our “Be Counted” forms, other things. So it would be hubris, at best, to assume that the five variables that had the highest correlation coefficients in 1990 would be the best variables for 2000. Things are different. We have to think about what we can infer from 1990, but also what we know about 2000 in making our choices in post-stratification variables. Then we will have our post-strata. The other research that is going on: We need to work on missing data for the population sample and the census E-sample. In 1990, we had done by Tom Belin and Greg Diffendal, among other people, a logistic regression hierarchical model. But when we went to the 51 separate estimates that we used for the ICM—in the ICM, each state was going to stand alone—we did not think we could support 51 logistic models. We would have to have 51 teams of highly talented statisticians. (I know Alan Zaslavsky does the work of five, but that still leaves us 46 short.) So we went to a much simpler model, a basic ratio estimate model. When we went from the 51 to our current design, the design where we can share information across states—we now have some choices that we want to research: Can and should we go back towards the logistic regression? What does that gain us? Even if we stay with our ratio estimator, we certainly can support more variables now, more slices, more cells. Which are the most important ones to slice? There are some other issues that we can now think about that we could not under the ICM design. So we have some research going on in that. I am happy to say we now have Tom Belin back working with us. We have some other research, which we can talk about later. We have decided, for the 100 percent data file, that we will only add person records. The discussion we had at the Census Bureau a year or two ago, where we would add families and households to the 100 percent data—it is only person records.

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census However, when we get to the sample data, by which I mean long-form sample— we always have to rake it 10 ways; John was one of the world’s experts on that— that is where we are going to bring in the results not only of the dual-systems estimate I talked about this morning, but we also have the housing-unit-coverage study, trying to figure out the coverage of housing units using a dual-systems methodology. So when we get to the sample data, we will have the results of both the housing-unit coverage and the person coverage. We will try to bring all of that together. Finally—this has more to do with our DSE estimator—we are working on some research that you have been given on defining the search area, how far around the sample block we look to determine whether someone was counted or not counted. The flip side of that is, how far from the correct block can a census enumeration be and still be considered a correct enumeration? We had some rules in 1990 that were fairly expansive. We are looking at ways of doing it only where it is necessary, but doing it in a way that minimizes variance and bias. There are some other topics, but I think that is everything that I wanted to say in the time that I have been given. DR. NORWOOD: You have heard a quick but quite good overview of all of the pieces of this. What we are going to do is go into detail on several of them. DEFINITION OF POST-STRATIFICATION DR. HOGAN: The post-stratification plays two roles, how we are going to form our dual-systems estimates, probabilities in terms of capture probabilities and erroneous enumeration probabilities. It is also how we carry down the estimates. We want a similar overall coverage rate. That is really what we are looking for in terms of our post-strata. We use race or age or other things as tags or markers to try to predict these coverage probabilities. But to the extent we are able to predict them, then our estimators will be accurate for all other dimensions. So I think it is less of a concern with this group. Some other groups get our post-stratification for the A.C.E. mixed up with some questions about how we tabulate the census or what groups are important in American society. Our goal is really to group together people who have similar experience. As we began looking at the A.C.E. post-stratification, essentially, our first step was rejecting the state boundaries that had prescribed the post-strata for the ICM. For the ICM, we were going to develop 51 separate state estimates. We set that aside in looking at our A.C.E. post-strata because we did not feel that state boundaries carried any real information in terms of chances of being counted in the census, response to the census, linguistic isolation—anything that would relate directly to coverage probabilities. So we set that aside and went back to the 1990 357 groupings. As I mentioned earlier, we took that group, expanded it, and did some exploratory work using logistic regression, seeing which variables predicted capture probabilities. Then we looked at some other properties of the variables and got

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census them down to a handful. Now we have started simulating the properties of various post-stratification approaches using essentially 1990 data, trying to calibrate the predicted variance for 2000 and also get some handle on the predicted bias for 2000. As I said, we began by casting our net rather widely, including some things that are a little bit different than things we have tried before. I assume the panel members have their notebooks. (I will do this like I do a graduate seminar. I am used to having students who are far brighter than me, so that will not bother me.) In Q-9 [Haines, 1999b], we go over some of the variables. The first variable, which will be familiar to many of you, is the race variable. These are the categories we used for 1990, which were analyzed in the 1990 results here as part of this exploratory work. For example, you will see Asian and Pacific Islanders as one group because we are working with 1990. This is part of the issue I will be talking about later. We have to translate these into 2000 concepts. DR. NORWOOD: Howard, may I just interrupt you? You keep saying that you are using 1990, but I assume that when you get to 2000, it is possible that you will use 2000. DR. HOGAN: Yes. When I say I am using 1990, I should be very clear on this. As I said, we expanded our range of variables we are going to look at. We are going to do some exploring of what we can learn about the properties of those variables. In that exploring, the data that we have available are 1990. However, at the end of this exploratory process, we are going to define a set of 2000 variables using data gathered from 2000. So for any real definition of a post-stratum for the 2000 A.C.E., we will use race as reported in census 2000, age as reported in census 2000. But for this first very preliminary step of exploring the properties, we are stuck with the only data that we have, 1990. For that, we are stuck with the race variables for 1990. I will discuss how we have modified those in dress rehearsal and some of the issues in terms of how we may have to modify them. Race, with the new Office of Management and Budget [OMB] directive, is defined differently in 2000 than it was in 1990. So even if these were the best, we would still have to change them. Age/sex is defined. Tenure I think you are all familiar with. Household composition is one of those dark horse variables that we threw in to see where it would take us. It is fairly complex. I will not walk you through it. But it is trying, with some ideas that came out of our Population Division, to use relationship to head of household to figure out who within a household is part of the count and who is not. Relationship, very simply, “Are you directly related to the head of the household, or the person in column 1, or not,” again exploring the idea that people who are less directly attached to the person in column 1.... DR. NORWOOD: To the reference person. DR. HOGAN: Yes, the reference person. The next one—and this is a 1990 variable we are using for exploratory purposes— is urban size. This is what we used in the 357—urbanized areas over 250,000, other urbanized areas, towns and cities, and then non-urban, rural areas. This,

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census increase your adjustment factors and get rid of this part of the bias from variability in the capture probabilities. It can be used for movers/non-movers. It can be used for any other covariate that you could measure in A.C.E. that you cannot measure on the short form. That opens up a lot of potential. So that is one thing. The second has to do with using some lessons that the Bureau has learned from the Living Situation Survey, the LSS. There is a paper by Betsy Martin in the summer Public Opinion Quarterly, where she talks about how the LSS was designed to probe more deeply than the traditional census interview or P-sample interview, and really include on the roster many of the people who are missed in the census—people with transient relationships, for example. There is a lot of probing. If these people are missed by the census and tend to be missed by the A.C.E. as well, they are going to give rise to correlation bias. If you could modify the A.C.E. interview, you might pull in more of these and come up with an improved dual-systems estimator. I am not suggesting that you modify the A.C.E. interview. I understand that the evaluation program, is already set. What would be interesting would be, since you have some extra cases in the A.C.E. sample that you are subsampling anyway, to treat these as a separate evaluation sample. I will call it A.C.E.-Star. In this, you use the LSS methods for the A.C.E. interview and see whether you pull together additional people. If you could embed your traditional personal interview within the LSS, then you could see how many people you are adding, household by household. Even so, you are doing this on a probability sample of blocks, so you can still compare the results between the A.C.E.-Star and the usual A.C.E. interview. There are some other details of this, but if you could do this, you would then have a means of seeing whether you could estimate correlation bias at the block-by-block level. That would then provide a means for testing models that Bill Bell has explored for bringing the national estimates of sex ratios down to the local level and would provide some direct evidence also for block-level estimates that you could use for evaluating the accuracy of the census and the DSE at the block level, by coming up with really good counts, at the block level, of whom you found. I am sure there are statistical problems with it, but it is a suggestion. DR. NORWOOD: Thank you. Alan? DR. ZASLAVSKY: I will just say a couple of things about estimation because I think I have said what I had to say about the other topics as we went through them. In terms of the unresolved cases, I guess the main message is, try to be as conditional as possible when you do the imputation for them or the estimation of the probabilities for them, which means, for the individuals within households, doing some modeling that will allow you to see whether they have different characteristics that predict different average omission rates. I think, also, for the household non-interview weighting—you mentioned this and sort of passed over it in about two sentences in the memorandum on that topic—probably a lot more could be said about that in terms of how you define weighting classes. We know that there are household characteristics, structural characteristics of households, which are quite predictive of whether a household is

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census correctly enumerated or erroneously enumerated. This is different from the issue this morning, which is imputing race or other characteristics. You have a group of cases in the same block, some of which you could resolve on the E-sample side and some of which you could measure whether they get enumerated or not—I am thinking of size of households and things like that—which are some of the things that you considered in your post-stratification. But since we know that a lot of these things that are predictive will not make it into the post-stratification, you could use those in forming the weighting classes for the non-interview adjustment and get better estimates. Again, if the non-interviews are related to some of the same characteristics that are related to census omission or enumeration, then you will get more accurate estimates that way. Those are the points I wanted to make about that topic. The other topic we did not get to was the extended search, which I think is a great idea. I have some detailed suggestions about it, which I do not think there is any point in going over here. But it is clear that not doing the extended search not only increases the low-level variance from the individual households, but also it puts you in the situation you were in in the last census, where you have these huge omissions that contribute a half a million people to the undercount, and you know that it is wrong and you just cannot do anything about it except on an ad hoc basis. If anything, you might want to go a little further with the extended search idea and have maybe another stratum of really interesting cases, for which you would be willing to go really far to find them. I suspect, if a case is really, really interesting, there is a lot of information there. We are talking about cases where 500 people are outside the search area. There is probably enough information among 500 people to figure out where they really live. So if you can extend that idea a little bit further, to figure out what rules would have picked up some of the worst cases in 1990, you may save yourselves some real problems down the road. DR. NORWOOD: Thank you. Joe? MR. WAKSBERG: I am not going to repeat some of the comments I made earlier. Let me pick up a few additional things. First of all, Alan left off of the extended search that you had included a factor of 1.56 of the variance for not doing the full extent of the search. It seems like a big price to pay. In addition, if I understood this memo we got yesterday, there seems to be a bias. You say that simulating the effect of limiting the search areas to block clusters found that the direct DSE of the total population was 1.5 percent higher than the 1990 DSE. If I understand this, you are understating your estimate of undercoverage to the extent of 1.5 percent. That is probably half of the total of the undercoverage. Did I misunderstand something? DR. HOGAN: I will not say you misunderstood. I think this is one of the drawbacks of doing a lot of our research on 1990. Our research on 1990 has taught us a lot about how to design targeted extended searches and variance properties. But the 1990 data are very limited in terms of what we can infer about the bias properties of our models. I think some of the results that we discuss there—even though that is a very recent memo, we continue to work on that very issue.

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census There are a number of things about 1990 that do not carry over to 2000, one of which is that we move from the PES-B treatment of movers to the PES-C, and that has some implications about the relative bias of the extended search. In addition—this goes back to one of the 1990 evaluation studies, one of the so-called P studies—they looked on the E-sample side at how well they coded, whether it was correct because it was in the block or correct because it was in a surrounding block, and found out that, since that was not really very important to the 1990 PES design, it was not really done very accurately. So I think some of the stuff in that memo, in terms of directly quantifying the bias of extended search or not-extended search, has to be taken with more than a grain of salt. I think what we learned there in terms of the variance properties, and what we learned there in terms of some of the theoretical issues that we had to think through in building a model, was very important. But there is no reason I can think of that the kinds of models we are dealing with would likely cause this kind of bias. I have not been able to think of a reason, except that the data set from 1990 is limited because of the way we treated movers, and, finally, the coding. We need to continue to think about what we can infer from the 1990 data about probable 2000 biases. MR. WAKSBERG: I want to echo a point that the other Joe [Sedransk] made before. If you want to talk about regions, think of defining regions differently. For example, it does not make sense to me to include California with Washington, Oregon, Alaska, as compared to including it with, say, Texas, with a high Hispanic population. Certainly a plausible region might be the southwestern states that have high Hispanic populations. You can think of other, similar situations. DR. SPENCER: They could use 1990 estimates of coverage rates for defining a new kind of region. MR. WAKSBERG: That is another way of doing it, yes. The classifications that you have for minorities—black and Hispanic, under 10 percent, over 25 percent—just having two categories seems skimpy to me. Maybe for post-stratification, you are more constrained by the number of cells, but for 5sampling you certainly do not need to think in terms of very gross classifications. You can have much finer ones. You can select systematic samples at the same rate or at different rates. Bruce earlier sent me a little note about demographic analysis. I echo some of the other comments made. You should explore more uses of demographic analysis than simply thinking in terms of sex ratios, black and white. I cannot be any more explicit because I do not know what I am talking about, but I just think it is something that should be considered. For example, you are going to use sex ratios based on black females. What happens if the black female count in the census or the undercount estimate differs seriously from the demographic estimate? For blacks, the demographic estimate should be very good, and probably for the non-Hispanic white and other. DR. NORWOOD: Thank you. Graham? DR. KALTON: Until this meeting, I was an interested spectator of all of this. I knew it was complex. Now I really know it is complex. So I sort of feel like a learner in a lot of this.

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census But the kinds of suggestions that I have drawn out of the meeting are: First of all, we spent quite a bit of time talking about the post-strata and the one sort of model, which is to take the cross-classification of all the variables and do a little bit of collapsing down when you have to, to the logistic regression models where you can put in as many terms as you would like, including interactions. I feel that the cross-classification is too rigid. I basically favor the post-stratification approach of doing this, ending up with some cells that you operate on. But some sort of more flexible approach to that—I had not thought of the idea that Rod brought up of a sort of propensity score/linear combination approach for putting some of these variables together. But that is certainly one way of going about it. The other way, which I have thought about, is the kind of way that some of you may know as automatic interaction detection, when you split up cells and you do things differently in different subgroups; as you keep splitting, it goes in different ways. Those kinds of ideas of a more flexible determination of the cells, I think, are worth thinking about. It is not clear to me, if you go these kinds of routes, how important some of the later variables in this whole process are. I think the work certainly should be looking at what the consequences are of adding in extra variables, in terms of what they do to variances and what they do to try to improve the adjustments. So that is one area. A second area that we talked about this morning was this issue that is related to all of that, which is the comparability of the A.C.E. data and the census responses, and the inconsistencies there. One possible explanation of that, which Rod commented on a couple of minutes ago, is imputation. That is certainly, I think, something you could separate out and have a look and see what that is. If that seems to be a really important thing, you may be able to find ways to improve that imputation. I am not sure if you will, but I think it would push you to look at that question. I raised a question this morning. I did not quite understand how it was answered. I decided I did not want to delay everybody else on this one. But it still seems to me that the key issue is, do you have systematic error in this—let’s say the household composition variables? I do not know. I still think that I am not terribly worried about a random error issue to that. So that may not concern me. I think I want to concentrate on that particular systematic bias aspect of it. With regard to the sample design, first of all, there is the point Joe Waksberg made earlier, which is that you are in a unique situation of having the very large sample, 750,000, from which you are subsampling. That is a natural image of a two-phase design. The question is, how do you use that most effectively? You can use it in design, you can use it in analysis, or you can use it in both. One of the things you can do is oversample some groups rather than others, or you can do ratio adjustments or whatever. Obviously, you are exploring some of those things, and that is an important thing. The oversampling by race seems reasonable, but remembering that, as is noted in the papers, the race data are 10 years out of date. Therefore, you do not want to go as far as might be suggested by the optimum kinds of formulae.

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census There were a number of questions I had left that we did not have time to cover. I was not sure how you were going to do the oversampling when you found that the measures of size differed markedly, what sorts of operational procedures would be applied for that. There were a variety of other things that we did not have time for. I do not know what constraints you are under, but with the large blocks you segment them and you take a segment. The natural way to do it would be to just take a systematic sample through the whole of that. But there is presumably a reason why you do it this way. I would have asked the question and we would have discussed it, and you would have probably explained to me why what you do is right. There are the boundaries to those segments that create the question of what you do about this extended search. Crossing those boundaries might be much more significant than crossing boundaries across “other.” Do you go in the whole of the block? DR. HOGAN: Yes. DR. KALTON: You do, okay. I was not clear on that. Then the relationship to the E-sample—I again was not clear on how that worked out. So I had a number of uncertainties about sample design, which we did not have time to go over. DR. NORWOOD:Joe? DR. SEDRANSK: These are all rather broad things. There are several uses of post-stratification. Reading through the documentation, it was never clear to me what importance various pieces of these have. I do not think you need to clarify them for us. The workshop is ending at 5 o’clock and it might help you to articulate as clearly as possible these alternative uses of post-stratification, some of which are obviously contradictory. I am sure that is not a simple task. I am sure it is in the back of your mind, Howard, and is in the back of everybody’s mind. But if you can articulate it, you may get a better solution. By the way, these are all comments about post-stratification. Another thing that I am sure you are doing—but it has not gotten that far— the issue is, what are the additional gains from using, for example, another post-stratification variable or substituting one for another? In other words, looking at variances by themselves or mean squared errors does not seem to be the whole answer. Is it worthwhile adding another variable What sort of reduction do you get? That is the second thing. The third thing, which is much more substantive, is testing models. At the stage you are now doing, you have some candidate models you are testing. Suggestions were made here about checking them against domains which were not used in the post-stratification. One suggestion was geography— states, large cities. Then Graham had a very good idea: how about something like growth areas, something that is not connected with the usual thing? Then I thought, as I was getting up for the break, what about surprises—that is, things you could not think about in the first place? My suggestion about that would be some kind of cross-validation—just drop out some observations and see how well you predict. I do not know if this is particularly useful. I am just thinking, is there

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census a factor or a type of factor that you are not capturing and you do not know about? Graham thought of growth. I had not thought of growth, but there may be some others like it that we do not know, and the only way I know to do it is to drop out some observations. Another thing that I would never have thought of before, except in predicting mortality rates for chronic obstructive pulmonary disease—you are using kind of a random-effects model, and random effects for areas were all very small. I thought, does this make any difference? It is kind of a small-area analysis. It was very revealing to drop them out of the model and see what would happen. It turned out that the model just did terribly without these little effects in them. The reason I am saying that in this context is—this is, again, a matter of time—if tenure is an issue, one of the variables that you are thinking of including, I might be more convinced about tenure if you dropped it out of the model and saw how well the model performed. If it is really good, you ought to see a real decline in performance by knocking it out. So that is the idea—even something that is sort of obvious, to see how it goes. Two more general things. In some of the modeling exercises, it seemed to me there were rather strong assumptions made—the independence assumption. I do not know if you can relax it, but I am suggesting that, if there is a key analysis that depends on some assumptions, you still try to check the sensitivity to it. The very last thing is—since I might be the only person here from the 2010 panel—it seems to me that using 1990 data mostly (although not completely)—it would be really good to analyze after the census, if you had the 2000 PES rather than the 1990, would you have drawn very different conclusions? If the answer to this is yes, and then in 2010 you come up to this—2010 is projected to be very different from 2000—maybe you should not be spending this much time using the 2000 census. DR. NORWOOD: Thank you. Bruce? MR. PETRIE: I do not think I could add anything of value to the technical discussion on the various aspects of A.C.E. But based on my reading of the documents and the discussion that we have had here today, I did form an impression or two of the program. It really boils down to the issue of complexity. Notwithstanding Howard’s and his colleagues’ attempts and assurances earlier that efforts are made, where possible, to keep things simple and robust, the fact of the matter is that this is a complex initiative. That has implications from a couple of points of view. One is in terms of explaining to the public just what is going on here, how the second set of real census results was produced. It is not going to be easy to understand. There is room for legitimate differences of opinion among experts about the choices that are being made, the decisions that have been made, that will be made, and a debate about whether some of the decisions were the best ones, or indeed even proper ones. So it is going to be a difficult program to explain when the census results are released, particularly if the results are considerably at odds, in some areas, with the census counts. So that is one aspect of the complexity that would be of concern.

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census The second is that there still are, as I read it and as I listen, a fair number of decisions that have to be made and a fair bit of analysis and work that has to be done before the program is in the field and before the analysis can take place. Those various steps that are yet to be taken simply will never be tested in an integrated way. It was not possible in the dress rehearsal. There is not going to be an occasion to do this. The bottom line is that there will have to be a combination of good luck and good management to ensure that the outstanding issues and decisions that have to be taken and the work that has to be done are, in fact, completed properly and can be implemented. The schedule just does not leave much room for any significant second-guessing or rethinking of the plan. So there is an operational challenge here that I think is quite substantial. I know that the folks at the Bureau appreciate that and are keeping it in mind in the decision process. It is one that I certainly would emphasize, as somebody who used to be concerned about running censuses. It is a major concern that I would have with this set of proposals that is on the table. That is generally it. DR. NORWOOD:Norman? DR. BRADBURN: I do not have a lot to add, except that I would like to stress a kind of perspective of thinking about these post-stratification issues, which I think has been implicit, but I would just like to make more explicit. It looks to me as if everything we have been talking about is the kinds of variables that are associated with errors in the census, and you concentrate on trying to pick post-strata that reflect that. I would say that the perspective would be to think about what we know about the processes that actually produce errors in the census. Many of those [variables] that we use are kind of proxies for it, and they may be good or bad. There are two kinds of things we talk about. We talk about unit errors, where households are missed, and then where individuals are missed. What is associated with missing units? The big one had always been the address list. You have done a lot to improve that. But it seems to me—and we have talked a little bit about this—that places where there are big mismatches or errors between the initial address lists and updates, or various kinds of things, might be areas that you want to concentrate on. The mail-back rate is very appealing to me, because it seems to me that that captures a lot of what the problems are—even though I was not quite sure from these kinds of models whether it looked as if it really did do things. But it seems to capture so much about what we know about the difficulties. There are some others that were mentioned—areas where there are a lot of multiunit structures. We know that those are problems. But that may be captured in the mail-back rate. On the individual coverage problems, one thing that we had not talked about that struck me might be sort of useful is the number of forms that are returned with high missing data. Again, that would indicate that these are areas where there are a lot of problems. Ken mentioned thinking through—and this, I think, we have not given as much thought to as we should—what the changes are this time compared to 1990,

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census since we have been using, so much, the analysis of 1990. I think this reflects, probably, all of our sort of thinking: do not count your improvements before they are proved. On the other hand, we should look at the other side of it, because there are some changes that I think are going to make, at one level at least, the gross error rate worse. That we have not talked about. As you know, probably, I am the only person—certainly on the panel, but probably the only person in the world—who worries about overcounting rather than undercounting. If we are not going to make it this time, I think by 2010 we will be thinking about net overcounts rather than net undercounts. But the big thing that is happening this time that I think worries a lot of us on the operational side is—not to put it too pejoratively—the loss of control over the forms. There are going to be a lot of forms. I think we heard a number yesterday, 425 million or something being printed for 125 million households. That suggests that there are three or four forms per household that are going to be.... MR. THOMPSON: Let me comment on that. The biggest number of forms are forms that we print for our nonresponse follow-up enumerators. That is based on past experience. We only take one form per household. But enumerators tend to quit, and when they quit they usually walk off with a bunch of forms. Instead of trying to track them down and get the forms back, we send somebody else out with a new batch of forms. DR. BRADBURN: Okay, but there are going to be a lot of forms left around, presumably, for the “Be Counted”. One of the big things is that you want to have a lot of ways for people to report their census information, other than the one that is mailed to the house. We have talked in other meetings about the fact that those do not have the printed labels, so there are ways of distinguishing them and so forth. However, the gross effect is that there are going to be a lot more matching problems, and there is going to be a lot more chance for two or more forms. As I always say, coming from Chicago, we know about multiple voting and other kinds of things, so we always worry about these things. So I think that is something that one needs to look at. If you know by the time you are doing post-stratification something like how many duplicate forms or non-standard forms came in, in an area, that might be something you want to look at as a kind of post-stratification. Rod mentioned the matching kind of problem. This is a very, I think, important problem. Again, if there is some way in which you can get some probability of accurate matching of data into the post-stratification, or at least in the estimates, that would be something that I think you would want to give a lot of serious attention to. A lot of these depend on being able to have some kind of information that comes from the actual operation of the census, in time to be useful for the A.C.E. I think we have been concentrating a little bit too much on things we thought would reduce undercoverage. I think we ought to think about things that might increase gross errors and take that into consideration. DR. NORWOOD: Bill?

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census DR. EDDY: I have the great advantage, or disadvantage, of being very near the end here. Everybody has already said all of the things that I wanted to say. I just have one small thing I wanted to say, which is to echo this notion of using other methods than logistic regression. We can name these methods. I think in this situation they are clearly going to be superior. I do not think there is any doubt that they are going to be superior to the regression method. They have the great advantage that you do not have to decide what size city makes it urban or what the right number of categories is for your urbanicity measure. When you are done, you have let the data make those definitions for you. If it turns out that 149,000 is the right-sized city, then 149,000 is the right-sized city. So I just want to really pound on that. DR. NORWOOD: Ken, would you like to make any comments about the day? DR. PREWITT: I do not, other than, obviously, to express appreciation. I would say one general thing. It partly builds from what Norman just said. We actually do believe we have a more robust operational census for the basic enumeration than we had in 1990, by some measurable amount. I cannot measure it, but we are really confident that the promotional stuff is really catching on, the paid advertising is quality, the Census in the Schools Program is certainly catching. Complete-count committees are now out there, in the neighborhood of 7,000 or 8,000 and growing every day, et cetera. You have begun to see a little bit in the press already, but you will see a lot more of it. There is a lot of individual initiative being taken by lots and lots of groups. That creates for the Census Bureau a particular kind of problem. We really are trying to share the operation, if you will, or the ownership of the census, with, “the public.” That creates all kinds of problems of quality control, of balancing pressures on our regional directors, our local offices. We have people making demands on us, other than ourselves. That also feeds into Norman’s concern about certain kinds of overcounting, pockets of overcount, where you get a whole lot of mobilization in a community. Nevertheless, setting aside that dimension of it, I do think that we have a strong operational system. We are extremely pleased that LUCA came and went on schedule. That was a big test for us. In fact, I would go back to Bruce’s point, a quite important point. This is probably the most complicated census we have ever fielded that has never been tested. That is a result, of course, of the way the Supreme Court ruling happened. None of the field tests, none of the dress-rehearsal sites were run the way we are now about to run the census, with a 12-month frame instead of a 9-month frame, trying to get the adjusted numbers for the redistricting data, and so forth. So there is, I think, a kind of complicated operational anxiety that is simply associated with the fact that we have not run the whole system through any kind of field test. That sits there. On the one hand, it is more robust; on the other hand, it is not tested. On the one hand, you have more public engagement; on the other hand, that creates other kinds of complicated operational challenges for us. How all that is going to balance out is extremely difficult to know. There are now serious people—and I

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census would invite anyone in this room to join in this—laying bets on what the response rate is going to be. There are serious people who are now willing to bet we are going to do better than our 61 percent target, and other serious people who say the demography is running against it. Anyway, all of that is to say that the basic enumeration census, barring some sort of unforeseen this, that, or the other—on the budget front or the political front or the PR front or natural disasters—we do feel reasonably confident about. However, at the end of the day, we do not completely count our way out of the undercount problem. If we thought we could, we would simply go try to count our way out of it. We actually do not believe we can count our way out of the differential undercount problem. Therefore, we are extremely pleased that we got an A.C.E. I think, at the end of the day, if all goes reasonably according to the current plan and design, we may, for the first time since the real discovery of and the beginning of early work on differential undercount issues, be able to tell the country, based on data, how far you can get trying to count your way out of the undercount problem, and therefore how much you need an A.C.E. to—however the data are used—at least to know, at the end of the day, how well you did. So getting A.C.E. right, technically and operationally, is extremely critical, just in terms of giving the country an answer to what has been the albatross around the decennial census now for a half-century, in some respects. That is why the importance of this meeting and the other one on dual-systems estimation is so critical. We are kind of in the position, by the funny confluence of political this, that, and the other—in which we have a good budget, a good operational plan on enumeration, and yet the capacity to do an A.C.E. largely according to our statistical design, which is 300,000 cases of the 750,000, which is not trying to make state estimates, which is a 12-month frame instead of a 9-month frame—all kinds of properties of the A.C.E. are closer to what the Bureau would have wanted if you had asked us 8 or 9 years ago what the ideal way to do an A.C.E. is. We are closer to it than you might imagine. So at least we have the capacity to say something fairly serious to the country when this is all over about what kind of basic decennial census you ought to be running. That is why your help in making sure the A.C.E. is as close as possible to a strong and defensible design—to say nothing of the importance of the National Academy in general and this committee in particular, helping to prove the case that it is a very transparent thing. We are not hiding anything. Here are all of our problems; here is where our current thinking is; here are all the papers. We constantly want to keep using whatever mechanisms we have to create, we hope, a level of political confidence that this is a transparent, open set of decisions, and we will pre-specify as much as we can, so that nobody will think we are down in the basement fiddling with this, that, or the other thing next spring. Anyway, that is why we think this meeting is so very important. DR. NORWOOD: Thank you. Andy, do you have anything to tell us?

OCR for page 1
PROCEEDINGS, FIRST WORKSHOP Panel to Review the 2000 Census DR. WHITE: I would like to draw a simple analogy. It is really easy to explain to people outside of this room how a go-cart works. You can show them a frame, an engine, and a chain and say, this is how it works. We all know it is very hard to explain to people outside this room, and inside this room probably, how a modern automobile system works. Just look under that hood, guys; it is tough. That does not mean that the modern automobile does not work. I also feel that those of us who have varying degrees of understanding of what has gone on today in this room might want to compare it to sitting in on a design session of technical engineers for a new automotive engine. I think a lot would have been said that was very complicated that not everyone could take out of the room and explain to somebody else. But you wait for the results: does the engine work? What counts, I think, is not the complexity per se; it is how well controlled the complexity is, how well thought-out it is, how well it is executed, and what the result is. I kind of hope that the complexity we witnessed today ends up giving us a Cadillac and not a go-cart. It is hard to explain some of this stuff. DR. NORWOOD: We have a couple of minutes. If anyone in the audience has anything to say, we will entertain an opportunity for you to do that now. [No response] Let me say that I think this has been a good day. It has not surprised me that it has been complex. What I have been extremely pleased with is the cooperation we have had from the Census Bureau. Even more than that—because the Census Bureau has always been cooperative—what has been unusual is the production of papers, even internal papers. I again want to compliment the staff. When you think about all the criticisms that a lot of people in the press and otherwise make of people who work for government agencies, it is quite clear that there are very few issues that have not been thought of by the people inside the Census Bureau and in which they have not done very high-quality work. That does not mean they have all the answers. I do not think any statistical agency ever does or ever will have all the answers. But I do want to commend you all for the efforts that you are making. Having said that, Ken, I really do not envy you for having to explain all of this in very simple terms. But I think it can be done, and if anybody can do it, I think you can. I want to thank you all for coming and adjourn the meeting.