Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 55
5
End of Day 1:
Discussant Remarks and Floor Discussion
DISCUSSANT REMARKS
Prior to the floor discussion, Alan Zaslavsky (Harvard Medical School)
summarized some of the salient points from the first day of the workshop.
Referring to the series of presentations on other countries’ survey systems, he
noted that what is impossible to implement in one country might be the only
way to do things someplace else. In the same manner, what is impossible in the
United States today could be a research project in 5 years, and in 10 years it
might become obvious that this once-impossible strategy is now the only way
to operate. In other words, persistence can pay off.
He went on to say that the reasons for some of the differences across
countries go beyond the realm of scientific considerations to areas in which
participants at this workshop do not necessarily specialize: history, politics, and
culture. The degree of centralization characterizing administrative structures is
another important factor contributing to differences. Nevertheless, the presen -
tations can serve as a wake-up call for the statistical community in the United
States to consider household survey systems in other countries and to aspire to
learn from the experience of others.
Zaslavsky mentioned that there was a lot of discussion about innovation.
Now, he said, it is a question of how can the statistical system convince itself,
and then others as well, that many of the ideas mentioned today are worth pur-
suing. In the case of the U.K. survey, validation was carried out by comparing
the new series with the previous series, which from a statistical point of view
is a fairly clear-cut process. But if members of the statistical system are truly
interested in innovation, then they must be prepared for situations in which
55
OCR for page 56
56 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
the new measures will not be consistent with what was done before. Although
changes in methodology will make some data users unhappy, a new methodol -
ogy may be equally or perhaps more fit for use and more practical to implement.
This may mean that agencies and decision makers will have to think hard about
who the key data users are, as well as what information and policy needs have
to be satisfied.
An example of a transition to a new methodology in the U.S. federal sta -
tistical system is the transportation research community’s transition from using
the census long-form sample to using the American Community Survey as a
source of transportation data. At the start of this process, they were reportedly
quite unsure about the idea of using data that were based on a rolling sample
and that would usually be 2 or 3 years old, as opposed to the data from the
census long-form sample, which could be up to 10 years old. This is a good
example of breaking away from the way things have been done with the goal
of improving the fitness for use, and now they may have something better than
what they had before.
Another way of thinking about the issue of acceptability is to question what
are considered official statistics. Some people argue that an actual enumeration
is the only legitimate way to count the population, but the statistical commu -
nity knows that this is not the best approach to obtain most of the data. The
question is how far is the statistical and survey community really willing to go
to innovate. When will model-based estimates be widely accepted as official
statistics? There have been and continue to be challenges to almost all forms
of statistical methodology applied to the census. But the statistical system is
in a position that it could be releasing a lot more official numbers that are
model-based, and indeed there are some areas in which model-based estimates
are well accepted, such as unemployment statistics that are adjusted through a
sophisticated time-series model.
There has been considerable talk of Google’s consumer price index (CPI)
recently. If Google develops a method that tracks the online sales of groceries,
it will probably reflect the price of groceries in stores fairly well. The index
will, of course, be based on a biased sample, with not nearly the right coverage
of grocery stores, but if there is demand to get a leading indicator of the CPI
without having to wait for data to arrive from an agency whose field representa-
tives are visiting stores or calling people and asking what they paid for a gallon
of milk, the Google CPI, or a more disaggregated version of it, can be useful
for statistical modeling.
However, this does not mean that the statistical community should be
accepting all new methodologies that come along. There is still an important
role for statistical agencies, perhaps as gatekeepers, because raw administrative
data and unvetted Internet surveys are not going to necessarily yield very good
statistics.
Zaslavsky also reflected on the discussions about the use of different modes
OCR for page 57
57
END OF DAY 1
for data collection, which may require the use of different sampling frames.
There are some purposes for which Internet panels may be a useful tool—for
example, they are widely used in market research. Few researchers believe that
these panels are efficient, representative, or accurate as a simple statistical esti -
mation tool. However, they are quite consistent from month to month, because
respondents are on the same panel for a few years or even longer. If the research
interest is to look at trends or change over time, the data from these panels may
be quite useful, although only in modeling. This is another area in which the
statistical community must consider how far it is willing to stretch the concept
of official statistics in order to make use of tools like this.
In the day’s presentations there was a good deal of discussion about the use
of surveys as sampling frames for other surveys. There are obviously substantial
efficiencies resulting from collaborations of this type, but there are also substan-
tial challenges related to making these arrangements work well, Zaslavsky said.
There is the problem of the second-phase survey inheriting the limitations of
the first-phase survey. Beyond this, there are significant administrative barriers
that exemplify many of the problems occurring in the statistical system more
generally, especially different objectives that come along with different sources
of funding.
Some of the important underlying issues are those of privacy and confiden-
tiality. These concerns are very ill defined. What exactly does privacy mean?
Jean-Louis Tambay gave an excellent example of how a confidentiality scandal
can be created by simply informing the public of an existing data collection
practice, even if there have been no breaches of confidentiality. A scandal on
this topic is easy to create at any time.
One could argue that, in the past, the protection of privacy was guaranteed
primarily through inefficiency and inaccessibility. For example, a great deal of
public data are unalphabetized and moldering in the basements of courthouses
in over 3,000 different counties. In some sense, those data are private, and it
does not matter that they are actually public. Today a lot of information is easily
accessible over the Internet, and as the inefficiencies are fading, organizations
are finding that they must establish official policies about storing public records
that were once much less obviously public. A national policy conversation is
required to think about what the rational trade-offs are and the obligations of
individual citizens and the polity toward each other. Zaslavsky added that it is
also worth mentioning that the greatest threats to privacy and consequences of
breaches are from the commercial sector, not government data collections. For
example, being denied a home loan because someone stole your credit card is
a scenario that is a lot more likely than confidential data being released by a
government agency.
For years there has been talk of using administrative records, especially
for the census, but in every case it was decided that it was not the right time.
Zaslavsky has always believed that taking small steps and making incremental
OCR for page 58
58 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
progress is important to move the statistical system forward in this area. If there
had been more persistent efforts in the early days, the system would be much
further ahead now. Julie Trépanier presented a good list of alternative uses for
administrative data and of programs actually being implemented at Statistics
Canada, incremental as they may be.
Zaslavsky said that the current work in this area, described by Rochelle
Martinez, is perhaps one of the most optimistic developments in years for the
federal statistical system. But one question that arises in response to these initia-
tives is whether the opportunities for sharing will be adequate for everything
that is needed. As an example, there is clearly a role for those who work with
the Statistics of Income Division (SOI) to work with data from the Internal
Revenue Service (IRS). The SOI can collect a sample and clean it, thus making
it a much better data system than just the raw tax returns would be. These
analysts can then cooperate with other agencies for data matching. However,
there are some situations in which there really is a need to have access to the
entire IRS database, and a statistical agency may or may not be able to gain that
access. The point, Zaslavsky said, is that broader support is needed to carry out
linkage projects.
FLOOR DISCUSSION
The topics covered during the floor discussion at the end of the first day
were as varied as the day’s presentations. Cynthia Clark commented that as
part of the thinking about the sharing of sampling frames across agencies, it
would be useful to consider the development of a frame that contained both
households and establishments in a comprehensive geographic system. She
recalled that a suggestion similar to this was made as part of the work of a
United Nations commission developing a global strategy for agricultural and
rural statistics. The goal of the initiative was to develop a system that enables
the collection of comparable data across countries and to build a master sam -
pling frame that would allow linkages to occur. She added that, in the National
Agricultural Statistics Service, which focuses on rural statistics, access to a
household sampling frame would enable the agency to better meet some of
its data needs than what is currently feasible given the design of the American
Community Survey.
Trivellore Raghunathan (University of Michigan) noted that, with the
advent of mixed-mode designs, there needs to be an effort to understand what
is really being measured, because context matters for survey participation.
Research has shown that if the same question is asked in two different ways,
different answers will result. Perhaps the differences should be modeled to cre -
ate some sort of population-level equivalence. Jelke Bethlehem agreed, saying
that in the Netherlands, much of the survey data can be collected via the web,
making mixed-mode surveys cheaper. However, it is difficult to disentangle
OCR for page 59
59
END OF DAY 1
mode effects and selection effects, and there are concerns about the estimates
as a result. Developing models to examine these questions would be interesting.
Phillip Kott noted that as long as there is nonresponse in a survey, model-
based methods will have to be applied. Many of the participants at the workshop
recognize that models are already being used in multiple ways. For example,
model-assisted methods are used to get a good sense of probability sampling
properties, to carry out small-area estimation, and to create synthetic estimates.
Furthermore, data users generally do not care how the data are produced; they
just want them. So perhaps it is worth considering how much of the resistance
to model-based estimation comes from the statistical community itself.
Roderick Little (Census Bureau) agreed that much of what is done now is
model-based. The issue is the robustness of the models and how they repre -
sent the data. Regarding administrative records, he added that their role may
be different depending on the intended analysis. In many cases, administra -
tive records may be most useful for descriptive statistics, such as an income
distribution, given that the records do not usually contain information about
relationships.
Zaslavsky responded that in some cases it is possible to imagine adminis -
trative records being more useful for analytic purposes than survey data. An
example of this would be longitudinal data, such as income tax records that
go back 30 years. Survey data are rarely available for a similar time period.
However, producing model-based estimates designed for descriptive purposes
and then using these in analytic studies could be problematic. In an analytic
study that involves a model-based estimate with a large regression component,
relationships may be discovered that are primarily due to the way the model
was specified. So it is important to go back to the original data and understand
how they were put together in order to be able to use them in an analytic study.
Bethlehem provided an example from Statistics Netherlands to illustrate
how relationships can be studied using administrative data. Statistics Nether-
lands combined police register data with population register data to examine
relationships between ethnic background and crimes committed. He added
that sometimes it is possible to study relationships that could not have been
examined with survey data alone, but he acknowledged that a major limitation
is that these types of data are not necessarily accessible to outside researchers
because of disclosure concerns.
Frauke Kreuter (University of Maryland) said that the German Department
of Labor Statistics has permission to link indicators, such as nonresponse and
linkage consent indicators, to an administrative database on the grounds that
they are survey production data that do not reveal personal information. This
could be described as an incremental step that allows researchers to use the
administrative data for modeling in various forms. It may be interesting to con -
sider whether such a step could be within reach in the United States, she said.
Katherine Wallman said that it is time to have a conversation with the
OCR for page 60
60 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
American public about the issue of privacy. Prior to the release of the memo
by the Office of Management and Budget (OMB) that outlined several pilot
programs for the use of administrative records, OMB staff met with privacy
advocates. Despite these conversations, it remains unclear whether many of
these privacy issues have been fully parsed out with this community, and they
have definitely not been parsed out with the public. She said that the federal
statistical community needs to take some risks in this area and to have a care -
fully constructed conversation about privacy, and in her view the time to do
that is now.
Wallman said that there is frequent miscommunication on the topic of
administrative records, because often assumptions are made about how the data
will be used without the specifics being discussed. She was reminded of this
during Trépanier’s very clear presentation, which made her realize that she and
her Canadian colleagues have been talking past one another about the use of
tax data for the past few years. She clarified that the Census Bureau does have
access to tax data for most of the functions that Statistics Canada does, short
of actually using the records to replace missing data. Another example recalled
by Wallman involved the discussions of extending authority to the Bureau of
Labor Statistics to use tax records, and this dialogue was also hindered by mis -
communications related to the type of use. Wallman ended by saying that she
plans to advocate for more conversations about data sharing.