| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 126
Data Gaps and Ways to Fill Them
INTRODUCTION
In Chapter 3 we reviewed the kinds of data that are needed for
legislators, program managers, and program staff to design and implement
immigration policy. Chapters 4 through 7 described the data that are
actually available, and the processes by which they are collected. In
this chapter we compare the two in order to determine major data
gaps--that is, areas in which data are needed by policy makers or by
analysts examining the consequences of immigration policies but are not
currently available. The treatment inevitably must be rather general.
It is not possible to foresee future needs in full detail, nor to define
every last piece of information that should be collected from a
particular alien, because the exact nature of future policy issues cannot
be predicted with precision. Such details must be left to the design
stage of a data collection initiative. The planning of such an
initiative should aim to incorporate the demographic, social, and
economic information likely to be of general relevance to policy issues
in a format sufficiently flexible to accommodate future needs as they
arise. It is possible, however, to identify both major areas for which
data are currently needed and general approaches by which such data can
be obtained. This chapter has two major sections, the first discussing
data gaps and the second discussing approaches to filling the gaps. We
start, however, with a brief discussion of the costs and benefits of data
improvements to set the stage for the lengthier discussion of the gaps
and ways to plug them.
COSTS AND BENEFITS OF DATA IMPROVEMENTS
All improvements have some cost attached to them, and at a time of acute
concern with government spending it is important to weigh the costs of
different improvements against the value of the expected improvement in
data quality or quantity. In this context, approaches can be listed in
ascending order of their likely cost. All the approaches listed require,
of course, that the basic data are of good quality. The first essential
for any improvement of immigration statistics is thus the implementation
of quality-control processes at the data generation stage; in the absence
126
OCR for page 127
127
of such quality control, the returns to implementing any of the further
approaches listed, however sophisticated they may be, will be
disappointing. Given this overriding need for emphasis on data quality,
the least expensive way to improve immigration statistics is to improve
the presentation of data already collected and available in
machine-readable form; costs are limited to initial computer programming
time and recurrent marginal computer execution time. The next least
expensive way is to process data that are collected but not used; the
costs are higher because of the inclusion of recurrent data entry. The
third way is to integrate existing data sets; even if the data sets are
already in machine-readable form, system planning, interagency
coordination, data set preparation, and final execution all have
substantial and, except for planning, recurrent costs attached to them.
The fourth way is to modify existing data collection procedures;
planning, testing, and processing design are the one-time costs, while
data collection, preparation, and tabulation are recurrent costs.
Finally, the most expensive way to improve immigration statistics is to
undertake new data collection initiatives; this approach requires major
additional costs, including questionnaire, sample, and data processing
design, testing, and implementation.
Evaluating the Benefits of Better Data
The information gains from each approach must be weighed against the
relative costs of putting them into effect, to facilitate selecting those
that offer the best value. Unfortunately, it is much more difficult to
determine the value of a data improvement, or even rank order the values
of such improvements, than it is to estimate their likely costs. The
best we can do is to indicate the nature of the improvement that would
result from a particular strategy and to state that in our collective
judgment the potential benefits of our recommendations more than justify
their modest costs. The judgment of those to whom we direct our
recommendations--the Congress and several executive agencies--must be
based on their assessment of the benefits to them of the improved data
that our strategies offer.
The Costs of Data Improvements
Costs here should not be interpreted narrowly as merely dollars and cents
of government expenditure. Data collection exercises involve costs to
those providing the data both in terms of the time spent answering
questions or filling in forms and in terms of concerns regarding
confidentiality of sensitive information. Public goodwill toward data
collection activities will wear thin very rapidly, with adverse effects
on data quality, if demands for data are perceived as excessive.
Immigrants may be more tolerant than other groups of the time costs of
data collection activities, although possibly more suspicious of
motivation and official interference, but goodwill toward the INS has
already worn thin because of the number, complexity, and repetitiveness
of the forms to be filled in and because of the excessive waiting time
people spend when dealing with the agency.
OCR for page 128
128
Issues of confidentiality and civil liberties are still more thorny.
The INS already imposes conditions on the alien population that would be
unacceptable to the public at large: permanent residents are required to
carry "green cards" at all times, and the INS maintains both
machine-readable and hard-copy files on aliens with very limited
restrictions on accessibility. Public concern with privacy is probably
the major barrier to the linkage of data files between agencies.
Ultimately it does not matter whether such concerns are well founded or
not (experience over the last decade or so suggests they may be): if a
majority of the public regards the construction of ''super files" on
individuals as an unwarranted intrusion on their civil liberties, the
construction of such files will be politically unacceptable.
Furthermore, if the development of such a system is opposed by the
population at large, it is highly questionable whether a similar system
should be imposed on the politically underprivileged population of
aliens. This conclusion does not mean that no data set linkages can or
should be attempted, but rather that they should be made with due regard
for legitimate concerns, with adequate safeguards of privacy, and with
adequate protection against use for other than statistical purposes.
Data Generation
The vast majority of the data available about aliens is generated when
they come into contact with U.S. officials. Thus information about a
permanent immigrant is obtained either at the time of applying for a visa
and at first entry to the United States or when a nonimmigrant applies
for adjustment to permanent resident status. Further information is
obtained at subsequent contacts: in theory at every address change
(although in practice such changes probably go unreported quite
frequently); when applying for naturalization or other immigration
benefits; through income tax returns and social security benefits or
contributions; at census enumerations or survey interviews; and when
registering births or deaths. The number of observations depends on the
number of contacts, which may be with a wide range of government
agencies, including the INS, the Internal Revenue Service, the Social
Security Administration, the Bureau of the Census, and the National
Center for Health Statistics. Some of these contacts will happen for all
immigrants (application for status, first entry to the United States,
census enumeration); some will happen for a large majority (income tax
filing, social security contributions); and the remainder (address
change, application for naturalization or other benefit, social security
benefit, registration of births or death) will depend on events in the
immigrant's life in the United States.
Data Linkages
If it were possible to link together the information from all these
contacts, our knowledge of what happens to immigrants would be greatly
expanded (but there would still be gaps and uncertainties arising from
noncoverage of departures from the United States, from the
less-than-universal coverage of other systems, and from the inability to
OCR for page 129
129
collect all the desirable information for each contact; the census, for
example, cannot reasonably ask about the visa status of noncitizens). In
practice, it is often not possible to link records across agencies,
either because of confidentiality restrictions or because of a lack of
suitable and accurate identifiers. Even within agencies, notably the
INS, opportunities for record linkage--for example between immigrant and
naturalization applications or between entries and annual address
reports--have not been exploited. However, since not all unmet data
needs could be met even by complete linkage, and since complete linkage
is not a politically acceptable proposition, we must examine carefully
what the most pressing unmet data needs are and how they can be met
acceptably.
The next section discusses the major unmet data needs of policy
makers in the area of immigration. This discussion provides the
framework for the third section, which explores what can be obtained by
implementing different improvements.
UNMET DATA NEEDS
There are five groups of aliens that are of major importance for policy
formation: permanent resident aliens, refugees, asylees, temporary
workers, and illegal residents. Minor policy issues arise for some other
groups (such as the Simpson-Mazzoli bill's visa waiver scheme to make it
easier for visitors to enter the country), but by and large there is no
dispute about either the principle (whether they should be admitted) or
the magnitude (how many should be admitted) of entries of temporary
visitors, employees of international organizations, crew members, treaty
traders, intracompany transfers, full-time students in higher education,
and the like. The information needed for the five important groups
shares common elements but also differs in key respects, reflecting the
different policy questions relevant to each group.
.
Immigrants
For immigrants, the most obvious issues are how many should be admitted
each year and what criteria should be used to decide which applicants to
admit. Both issues involve judgments that are not immediately amenable
to quantitative assessment--for instance, to answer the question of
whether a higher level of immigration, though beneficial overall, would
impose unacceptable costs on the poor requires not only data to estimate
the possible effects but also a definition of what is unacceptable--and
are also too broad for determining information needs. What are needed
are enough data to evaluate current policy rather than all possible
policies and to answer specific questions. For example: do new
immigrants put legal residents out of work or do they create additional
jobs? To answer this question, data are needed on where immigrants first
settle and on their initial labor market experience (activity, type of
employment, wage rate or earnings, type of employer, nature of work), and
parallel data are needed for the existing resident population (both
citizen and noncitizen). Many such data exist, at least with regard to
participation in the formal economy, in IRS, Social Security
OCR for page 130
130
Administration, or Bureau of the Census records; however, there is
insufficient detail, particularly to distinguish between new permanent
residents, existing permanent immigrants, nonimmigrants, and illegal
residents. Given the lack of INS data on settlement and secondary
migration patterns of immigrants, the necessary data would be difficult
to construct even if perfect interagency data linkage were possible. Are
immigrants net contributors to, or recipients from, public revenues?
Again, many relevant data exist, but it is not possible to link records
for a particular individual or even group, such as all nonimmigrants.
Whatever the question, the data gaps are similar--detail, individual
identifiers for record linkages, visa history, history of life in the
United States, and history of life before coming to the United States.
Turning to admission criteria, the policy questions are rather more
concrete. Since 80 percent of immigrants are admitted under family
reunification preferences, one can examine the underlying rationale for
such a policy by examining the results in terms of the family structure
of such admissions. Do the families remain united? One could find out
whether emigration rates of principal aliens whose spouses or children
are admitted under the second preference are higher or lower than those
of other aliens; whether emigration or secondary migration rates are
higher or lower for those admitted under the family reunification
preferences than for other immigrants and how they vary by preference
category; whether naturalization rates are higher or lower for some
preference admissions than for others. One could also determine whether
immigrants admitted under the various family reunification preferences
perform better or worse than other immigrant groups in terms of income,
assimilation, naturalization, and the like. For immigrants admitted with
occupational preferences, policy makers would probably like to know
whether the immigrants so admitted actually alleviate labor market
shortages, whether they continue to work in the same field after
admission, how well they perform relative to native-born workers in that
field and to earlier cohorts of immigrants, and what proportions become
naturalized or emigrate. What are needed are data on the history of life
in the United States by the preference category of entry and country of
origin of the immigrant; existing sources provide very little, since it
is not possible to link data across data sets and agencies for given
individuals.
Refugees
Data needs for refugees are somewhat different, since the admissions
policy is at least partly altruistic, numbers of admissions are set by
the perceived world refugee pressure, and refugee admissions have an
immediate cash cost in terms of resettlement assistance. Selection,
however, is based only partly on need and partly on family ties or other
connections in the United States. Data on the world refugee situation
come largely from the U.N. High Commissioner for Refugees; although the
data are of limited scope, covering mainly refugees living in camps, and
of limited accuracy (as detailed in Chapter 7), they provide a broad
indication of the numbers and geographical concentrations of refugees
throughout the world. Improvement of that data system, though useful, is
not essential for U.S. policy purposes and would require an international
OCR for page 131
131
cooperative program that the United States could stimulate but could not
run.
Given that the number of refugees who need resettlement is known with
adequate accuracy the question of how many the United States should
admit depends in part on how much they cost in terms of cash and program
assistance; how quickly they become self-supporting; how well they
assimilate; whether on a lifetime (and suitably discounted) basis their
contributions to public revenues exceed their receipts; whether they
displace domestic workers; how much impact they have on local communities
in which they settle; and so on--much the same subquestions, with a few
additions about cash assistance, as for permanent immigrants. Data
availability is substantially higher, however, for refugees than for
immigrants, with a tracking system for the 3-year period during which
they are eligible for benefits and a regular though small follow-up
survey by telephone. After the 3-year eligibility period, responsibility
for refugees passes from the Office of Refugee Resettlement to the INS,
and data availability declines drastically; it ceases to be possible to
track individual performance or to distinguish refugees from other
foreign-born residents. Thus, the most important unmet data needs relate
to the long-term performance in the United States of those admitted as
refugees and the impact on future immigration of refugees who become
permanent residents or naturalized citizens and apply for family
reunification benefits. Reasonable data, though lacking depth of detail,
already exist for most policy and program purposes for the early stages
of the resettlement process thanks to the efforts of the Office of
Refugee Resettlement.
Asylees
The data needs for establishing policy concerning the granting of asylum
share some common elements With the data needs concerning the admission
of refugees, since the justification for asylum is largely altruistic,
although the issue of cash benefits, to which asylees are not entitled,
does not arise. Asylum is granted on the grounds of well-founded fear of
persecution or discrimination in the country of origin, so information is
needed to establish how well-founded such fears are in particular cases.
However, asylees have a social and economic impact on the United States,
so the question of how many applications to grant depends not only on the
numbers meeting the formal requirements but also on their costs and
benefits to society, implying data needs similar to those for
immigrants.
Temporary Workers
Temporary workers are admitted to the United States for short, specified
periods to meet temporary labor shortages or for such special purposes as
musical or sporting events. The number of people thus admitted is small,
about 40,000 in fiscal 1981, and their long-te`Q~ economic and social
impact probably is also small. The policy issues involved are whether
labor shortages really justify the admission of such workers or the
workers thus admitted are taking jobs that legal residents would
OCR for page 132
132
otherwise take. This question is not as simple as it sounds: residents
may not be willing to take such jobs for the minimum wages offered, but
might take them at the higher wage levels that would have to be offered
if temporary workers were not available, thus increasing costs and
prices, but also increasing domestic employment and reducing losses from
remittances abroad. There is also the question of whether the workers
actually leave the country when their work is completed (or the admission
period runs out) or stay on illegally.
The first issue requires estimates of the wage elasticity of the
supply of domestic labor and of the wages paid to the temporary workers,
as well as information on the potential for substituting capital for
labor; such information is best provided by micro-level studies of
particular industries rather than by a national immigration statistics
system. The second issue, of compliance with terms of entry, requires
the sort of linkage within the INS of arrival, departure, and location of
Reportable alien records that will become available when the INS
long-range ADP plan is fully implemented. Thus, apart from a need for
small-scale industry studies and a need for more complete coverage of
departing aliens, the data needs for this group are in the process of
being met.
Illegal Aliens
Illegal aliens are important for a number of policy reasons. First, they
attract more political attention and generate more political passion than
any other group of noncitizens. The presumed ill effects, both social
and economic, of the presence of illegal aliens in the United States also
affects public attitudes to, and debate about, broader issues of
immigration and refugee policy. The policy questions related to illegal
aliens are very similar to those about legal immigrants. Do they take
jobs that legal residents would otherwise fill, or do they take jobs that
legal residents do not want at the going wage rates? Do they hold down
wage rates for menial jobs and slow productive investment? Do they take
more in services than they contribute to revenue, and at which levels of
government? Do heavy concentrations of illegal aliens increase crime
rates, either as perpetrators or as victims? Do they overburden
education and health services? Do they come to work temporarily or to
settle permanently? Since it costs money to keep illegal aliens out and
would cost a very large amount of money to reduce illegal immigration to
a trickle, policy makers have to decide how much should be spent on the
Border Patrol and other INS activities in trying to keep illegal aliens
out: if a steady stream of illegal migrants is beneficial overall, then
legal immigration limits could be increased and enforcement activities
could be cut back.
Although the data needs for illegal aliens are much the same as those
for legal immigrants, virtually no large-scale data sets are available
about illegal aliens, and the official collection of such data, with
illegal aliens voluntarily identifying themselves as such, is
impossible. Some data are collected involuntarily, for instance by the
INS from located deportable aliens, but there is no information about
either how representative located aliens are of all illegal aliens or
what the location rate is. Some illegal aliens are included in official
OCR for page 133
133
statistics--for example, in the 1980 census results and in birth and
death registration--but are not directly identifiable as such.
So-called informed guesses of the number of illegal aliens in the
country made in the early 1970s have given way in recent years to
estimates derived from a variety of empirical bases; these estimates,
reviewed in more detail in Appendix B. are all indirect and rely on
numerous assumptions; in general, however, they suggest a range of
between 2 and 4 million illegal immigrants in the United States around
1980. Furthermore, there is no evidence to support the view that the
illegal population has grown rapidly since 1980, and INS locations data
by duration of illegal stay suggest little general change. These
estimates of the number of illegal aliens include their distribution by
age, sex, and country of origin (though the estimates may be wrong by a
factor of two), but little else is known about this 1 to 2 percent of the
U.S. population. What is known comes from small-scale, often
ethnographic studies carried out by nongovernment researchers, and it is
of uncertain generalizability to the total illegal population. An
ethnographic study of Mexican immigrants described by Massey in Appendix
C illustrates the information that can be obtained from such an approach.
Program Needs
There are also program, as opposed to policy, needs for immigration
data. The Bureau of the Census, for instance, is a major user as well as
a producer of data on immigration. Current data on international
migration are needed to derive postcensal population estimates that are
used, among other purposes: as independent controls for the monthly
Current Population Survey; for evaluating the coverage of decennial
censuses; for the distribution of revenue-sharing funds; and in the
computation of widely used and important ratios, ranging from birth and
death rates to life insurance survival probabilities. The immigration
data used to derive population estimates for the United States have
serious deficiencies in addition to the lack of timeliness already
mentioned. No reliable information is available on the flow of illegal
immigrants to the United States or on emigration from the country.
Furthermore, estimates of the migration between the United States and
Puerto Rico are computed annually as the residual between the arrival and
departure of millions of people to and from Puerto Rico. Finally, the
estimates of international migration used by the Census Bureau to derive
population estimates exclude any allowance for migration of civilian
citizens who are not affiliated with the U.S. government (e.g., employees
of international corporations, university personnel, students, retirees,
etch. These needs are for information on the international migration of
all U.S. residents, rather than just immigrants or the foreign-born.
APPROACHES TO DATA IMPROVEMENTS
Unmet data needs of immigration policy and program management can be seen
to range from a lack of timeliness and quality of data that are produced
to data that are not, and never have been, available or even collected.
Approaches to improvement, ranked in cost from improved tabulation
OCR for page 134
134
through improved quality control and broadened scope to new data
collection processes, have already been outlined above. We now turn to a
consideration of what each of these approaches can be expected to
contribute to meeting unmet needs for data.
Improved Data Tabulation
The cheapest and quickest way of increasing the usefulness of data is by
improving the tabulation of machine-readable data sets or by preparing
public-use data tapes. However, the potential for this method of
improvement is limited by what exists; one cannot tabulate what is not
there. The most important improvement that can be made is speed, since
the more up-to-date the information, the more useful it is. The INS
statistical yearbook for fiscal 1980 was issued in early 1984 and that
for fiscal 1981 was issued in mid-1984; these time lags compromise the
value of the data. The ADP systems now being implemented make an
improvement in timeliness readily attainable. No obvious improvements in
data tabulation are necessary, but some tables in the statistical
yearbook could be simplified to reduce both detail and the number of
empty cells by grouping countries, could have revised layouts to improve
readability, and could make use of fuller, more comprehensible
footnotes. The addition of a glossary to the 1981 yearbook represented a
major improvement. Public-use tapes of samples of both immigrants and
nonimmigrants admitted should be prepared each year as a matter of
routine.
The panel therefore recommends that the INS:
o Maintain its efforts to bring the statistical yearbook up to date;
0 Reinstate the publication of figures on temporary entrants;
o Review the content of each table;
0 Publish the statistical yearbook no later than 6 months after the
end of the fiscal year; and
o Prepare and release public-use samples covering both immigrants
and nonimmigrants.
The Bureau of the Census is to be commended for meeting United
Nations recommendations for tabulations of data on the foreign-born and
on households including foreign-born members from the 1980 census.
However, the gain has been eroded by the excessive time lag involved; the
tables were not available until mid-1984. The Bureau should ensure that
comparable tables are prepared more quickly from the 1990 census. Given
the data collected and the form in which it was collected, there are no
clear ways to improve the tabulation program. However, the collection
method could be improved by, for example, using preceded periods of entry
for the foreign-born that correspond to the periods used for the 1970
census.
The panel therefore recommends that the Bureau of the Census:
o Ensure speedier tabulation of data on the foreign born from the
1990 census; and
OCR for page 135
135
o Ensure the maximum comparability with data from earlier censuses,
particularly concerning period of entry.
The Office for Refugee Resettlement collects considerable amounts of
cross-sectional and longitudinal data concerning refugees, but staff time
constraints have limited the amount of data published or made available
for outside analysis. Substantially better use could be made of the data
through more extensive tabulation or through the release of public-use
tapes, to permit analysis of the data beyond the bare reporting
requirements specified by Congress. Such improvements cannot be achieved
given current ORR staffing levels and would thus require either some
increase in staff or collaborative arrangements with outside
organizations, either of which could be readily justified given the
relative costs of data collection on the one hand and of
on the other.
The panel therefore recommends that the Office of Refugee
Resettlement:
data processing
o Allocate the additional resources necessary to ensure the adequate
dissemination of existing data in both tabular and machine-readable
form.
The Social Security Administration is an agency that offers some
potential for improved data tabulation. It is not primarily interested
in statistics--and still less in statistics about immigrants--but it
collects information that could be useful for statistical studies of
immigration. Systematic tabulation of data from the NUMIDENT file (new
applications for social security numbers) for foreign-born people could
provide revealing information about patterns of first settlement. We
note that tabulations of beneficiaries receiving payments abroad have
been used to study the extent of return migration of elderly immigrants.
The Internal Revenue Service also processes some data of potential
value for estimating flows of U.S. citizens out of, and back into, the
country. Citizens living abroad can, under certain conditions, claim tax
allowances for foreign residence. A minimum figure for gross outflow in
a year can be obtained as the number of new claims for foreign residence
allowances, weighted by number of dependents claimed, while a minimum
figure for gross inflow in a year can be obtained as the number of
returning residents, with no claims to foreign residence allowances when
such a claim had been made the year before (again weighted by number of
dependents). Though the policy value of data on inflows and outflows of
citizens is low, and the estimates would be affected by changes in tax
law, by filing delays, or by citizens not filing at all, the costs of
producing suitable tabulations, by country of residence, would not be
high, and the program value of the information would be substantial.
Processing of Data Already Collected
Some data collected for administrative purposes may have a statistical
value that goes unrealized. Processing and tabulation of such data may
be a cost-effective way of increasing data availability. A case in point
OCR for page 136
136
is the INS form I-213, record of a Reportable alien located. While very
little is known about the population of Reportable or illegal aliens, a
considerable amount of information of uncertain quality is collected,
supposedly for administrative purposes, for each such person located by
the INS. With somewhat more emphasis on data quality and with regular
processing, insights into the structure, economic activity, and even size
of the illegal alien population could be obtained with very little
increase in workload. Indeed, workload might not be increased at all,
since the regular processing of I-213 forms would eliminate the need for
hand tallies of locations of Reportable aliens for summarized reporting
on form G-23 (see Chapter 4~.
The panel therefore recommends that the INS:
o Process and tabulate data on a regular basis from at least a
substantial sample of I-213 forms, and put more emphasis on the quality
of the basic data collected.
Improved Record Linkage
Record linkages across and within agencies offer tremendous potential for
improving the statistical base for studies of migration. Linkages across
agencies would be most valuable. If it were possible to link INS records
on immigrant admissions with decennial census data on residence, current
and past occupation, income, and recent internal migration, and with
Social Security Administration or IRS data on income (or covered
earnings) and residence, much of what policy makers need to know about
immigrants, nonimmigrants, and even illegal immigrants would become
available at modest cost. Unfortunately, such linkages have never been
made; the INS has never participated in such an interagency data linkage
project, perhaps because of an understandable modesty about its own data
sets.
One stumbling block to attempts to link files across agencies is the
rules concerning the confidentiality of the respective files. Each
agency that collects and maintains data from or about individuals or
business establishments, whether for administrative, program, or
statistical purposes, strictly limits its release of information to
ensure that the persons (or firms) cannot be identified. In many
instances, release of individual information beyond the collecting agency
is prohibited by statute (as in the case of the Census Bureau); in
others, it reflects an administrative decision consistent with
maintaining credibility for the program. As a general rule, adherence to
confidentiality has been accomplished by deleting the name and specific
address of individuals from any publicly released files, by limiting
geographic detail to a sufficiently high level (such as a city with
250,000 or more people) to eliminate any possibility of individuals'
being identifiable or, in some instances, deleting what might be
perceived as unique information from the file (such as exact dollar
amounts for people with incomes in excess of $100,000~.
The confidentiality issue, and the responses to it in terms of the
record file structures of various agencies, raise a number of problems
related to the linking of files produced by two or more agencies. The
OCR for page 137
137
neces sity for a high degree of accuracy in the matching proces ~ requires
the presence of a common, unique characteristic in each file; name, for
example , is insuf ficient , since there may be many John Smiths in any
file. Adding other characteristics, such as address, date of birth,
wife's maiden name, number of children, will improve matching precision
but at the same time will inevitably increase the risk that a particular
record in the file can be identified subsequently as that of a particular
individual. Thus files that in themselves do not violate confidentiality
become suspect in the matching process as the number of characteristics
expands. The use of unique identifiers such as social security number,
by their very nature, permit the unique identification of an individual.
In recent years, serious discussion has taken place about the issues
of privacy and confidentiality. Studies have been undertaken to explore
public perception of the meaning of confidentiality and public concerns
with the issues (see for example National Research Council, 1979~.
Debate also has taken place on how confidentiality can be maintained and
individual privacy protected while, at the same time, data are provided
for important policy purposes. One approach that has been proposed would
recognize the major federal statistical agencies as a single entity
within which data files could be exchanged for linkage or other
statistical use while still honoring the requirements for
confidentiality. Research also is under way on methods by which
individual data can be modified sufficiently to ensure the
confidentiality of the individual, without harming the data for analytic
or linkage purposes.
Given the ever-growing resource of administrative data, the large
savings to be had in terms of cost and respondent burden, and the gains
to be made in analytic terms from linking files, it is essential that
efforts continue to develop acceptable solutions to the problem.
The potential of interagency linkage may at present be limited by a
lack of suitable identifiers. Although the Social Security
Administration, the Internal Revenue Service, and some Census Bureau
surveys all collect social security number, all machine-readable data
sets suffer from some nonresponse, reporting error, or keying error,
which reduces match rates and increases mismatches. The INS data sets do
not include social security number in general, more commonly using the
A-file number, so linking INS files with records from other agencies
would not in practice be easy. Thus, although the potential benefits of
interagency linkage are obvious, the practical obstacles make its
implementation doubtful. However, intra-agency linkages are feasible and
offer solid though less spectacular benefits.
In the past, INS data systems have been designed and operated as
discrete entities, not surprisingly given their administrative rather
than statistical origins. The new ADP systems being implemented now
represent a major change of direction, with data sets generated by each
INS process being viewed as modules of a grand, integrated system linked
through the Central Index. Once operational, the new systems will make
it straightforward to link records of immigration or adjustment of status
with subsequent naturalizations; to link apparent overstayers from the
I-94 form (arrival records with no matching departure record) with I-213
records of Reportable aliens located; to link petitions for immigration
benefits with characteristics of the principal alien; and to link
notifications of change of address to other records of an alien. It is
OCR for page 138
138
important that the INS recognize not only the statistical but also the
program value of such linkages, and implement regular, routine data
tabulation across functionally independent data sets.
The panel thus recommends that the INS:
o Examine and implement procedures to exploit the potential of
linking data sets for statistical and program management purposes as an
integral part of the long-range ADP plan.
The Social Security Administration is another agency with data sets
that could usefully be linked. Current records of contributions and
benefits provide information on area of residence, employment, and
earnings, while records of initial applications for social security
numbers provide background information on age, sex, year of application
(a potential surrogate for year of admission), and country of birth.
Though gaps in the record would be impossible to interpret (such gaps
might result from absence from the United States, low income, or
employment not covered by the system), the linkage of data sets internal
to the agency would still provide a substantial amount of information
about the economic activity of foreign-born residents, and make possible
a direct assessment of contributions paid in against benefits paid out.
Modification of Existing Data Collection Procedures
Existing data collection procedures can be improved by raising data
quality and by collecting additional pieces of useful information. Data
that fail to meet minimum quality standards waste resources devoted to
their collection, processing, and analysis and, worse, can result in
misleading analytical conclusions and poor policy decisions. As detailed
in Chapter 4, many of the INS data collection activities suffer from
shortcomings of design, standardization, adequate supervision, and
quality control. These shortcomings are particularly serious for data
provided by INS administrative data sets--for example, data for border
crossers--but have also affected the timeliness and quality of data on
immigrants, temporary admissions, and naturalizations. The highest
priority must be given to instituting sound collection and processing
procedures incorporating step-by-step quality control, without which the
collection of additional data would be pointless. Specific
recommendations for necessary improvements are presented in Chapter 4
(and repeated in Chapter 9), and in fairness to the INS, some progress
has already been made through the introduction of new ADP systems.
At the level of particular data elements, emphasis must be put on the
quality of occupational data for immigrants, by ensuring that INS
interviewing officers probe the type of work performed by the applicant;
at present, these data are virtually useless. There are also some items
that could usefully be added to existing collection processes;
applications for immigrant status should include a question on formal
education; the I-94 arrival and departure form should reinstate questions
on gender and port of embarkation or disembarkation; petitions to
naturalize should also include questions on formal education. This list
OCR for page 139
139
is meant to be illustrative rather than exhaustive; a thorough review of
the content of all INS forms is overdue and recommended in Chapter 4.
Other agencies have traditionally paid more attention to data quality
than has the INS, but they could still improve the usefulness of their
data for purposes of U.S. immigration policy by modifying or adding to
questions included in existing collection systems. As recommended in
Chapter 5, the Bureau of the Census should continue to include questions
relevant to the foreign-born consistent with earlier censuses--in
particular, should reinstate questions on birthplace of parents in the
1990 census--and to clarify the question on date of entry to the United
States to refer clearly to date of entry to take up residence, coding the
responses to be consistent with periods used in previous censuses. A
module to measure emigration of both immigrants and native-born citizens
should also be included in the Current Population Survey, since little is
known about emigration levels or patterns and the cost would be modest.
New Data Collection Initiatives
The modifications to data tabulation, processing, linkage, and collection
procedures outlined in the previous four sections represent
cost-effective improvements of the statistical base available for policy
formation, but they cannot fill the largest single lack: good-quality
longitudinal data on the process of settlement in the United States by
immigrants and refugees, and on the social and economic impact of such
settlement on the existing resident population. Even an automated system
of record linkage, in which each contact of an immigrant with any
official agency would be added to a historical file for the individual,
would go only part of the way toward meeting the longitudinal data need,
since the individual records would include gaps for periods without
official contact and omit important occurrences such as further
education, short- or long-term absence from the country, and changes in
family and household relationships.
To meet such needs, the panel strongly recommends that Congress
mandate that the INS be the lead agency in:
0 The establishment of a longitudinal panel survey of a sample of
aliens entering the United States or changing visa status during a 1-year
period. This sample of an entry cohort would be followed up for a
minimum period of 5 years. The survey should be repeated by drawing a
new sample of entrants every 5 years thereafter. The sample would
consist of:
(a)
Those admitted to permanent resident status, both new immigrants
and those changing status;
(b) Those admitted as temporary residents under educational,
training, and short-term work visas; and
Illegal aliens given legal status under amnesty provisions
included in any future amendments to the INA.
For each participant, data would be collected on:
OCR for page 140
140
(a)
Initial characteristics: sex, age, country of birth, education,
occupational history, year of entry, marital status, visa status
and admission preference, family ties in the United States,
place of initial settlement, and household structure;
(b) Demographic changes, including : marital status, births, death,
internal migration, temporary absence from the United States,
emigration, formal or vocational education, and household
characteristics;
Income and labor force experience in the United States; and
Program participation and service use, including educational and
health costs of children; local, state, and federal taxes paid,
and social security benefits and contributions.
We recommend that the survey be funded by the INS but conducted under
contract by a recognized survey research organization, either public or
private, experienced in longitudinal panel design and execution. The
sample should be selected from a 1-year cohort of entrants or those
changing status, to ensure that the sampling frame is complete and that
potential respondents can be located (at time of entry or change of
status). Every effort, including the collection of social security
numbers and names and addresses of close relatives or friends and the
provision of incentives to respondents, should be incorporated as part of
the survey approach in order to minimize the dropout rate and to help to
locate those who migrate during the life of the study. The study design
should incorporate the use of administrative data sets, partly to obtain
data and partly for mutual evaluation. To obtain broad support for the
study, as well as to identify key data items and to ensure sound design,
an advisory panel of representatives of key agencies and experts in the
field of immigration research and immigration policy should be
established. Implementing this survey will not be inexpensive--we
estimate a cost of around $5.5 million over 5 years for a sample of about
6,000 cases--although this cost is small relative to the $58 million
budgeted by the INS for fiscal 1984 alone on ADP development and data
processing. Such a survey is overdue and data needs are pressing, so
work on the survey should start as soon as possible.
A longitudinal sample survey such as that outlined above will meet
many data needs, but it cannot be expected to meet all data needs,
particularly for small-area or small-group data for which the sample
would be too small. There will remain a need for continued analysis of
other data and for in-depth studies of particular areas, issues, or
groups; such work is best left to universities and other nongovernment
research organizations. It should also be stressed that the proposed
survey is complementary to other administrative data collection
activities. It cannot tell policy makers how many entries of particular
categories of aliens there are in a year, but it will provide a basis for
predicting what the effects of such entries will be, and of what the
effects of changing the numbers in each category would be.
OCR for page 141
141
IMPLEMENTATION OF RECOMMENDATIONS
Immigration is an important and emotional area of public policy, yet as
we have seen in this report the statistics on which informed debate and
policy formation are based are woefully inadequate. Two of the major
reasons for this inadequacy have been a lack of interest in or commitment
to the production of relevant, high-quality statistics by the agencies
having contact with aliens, and the failure of any one agency to take the
lead in fostering a governmentwide coordinated system for collecting,
processing, and analyzing the necessary data. This leadership role
belongs by right to the INS as the agency primarily concerned with
immigration policy and process. However, the INS has consistently failed
to look beyond its immediate management needs for information, an
attitude clearly expressed in its mission plan and in the assumptions
underlying its ADP program, and it has on occasion actually impeded
existing collaborative interagency agreements by introducing process
changes without consultation or regard for outside needs. Blame for the
lack of enthusiasm for producing immigration statistics shown by the
agencies involved lies partly with the agencies themselves, but some part
of the blame must also be borne by the executive branch and Congress:
the agencies have not been told clearly enough to produce useful data.
Two courses of action are necessary to improve the present
unsatisfactory situation. One is a congressional initiative to mandate
specific reporting requirements, particularly for the INS. The Refugee
Act of 1980 shows what Congress can do in the area of data production by
legislation, and the Simpson-Mazzoli bill also was a clear movement in
the right direction.
The panel therefore recommends to Congress that:
0 Specific language covering data collection, analysis, and
reporting requirements be incorporated into an amendment to the INA and
into all other legislation dealing with immigrants, refugees, and other
aliens.
The second is for the executive branch to establish an interagency
review group to ensure coordination between agencies, action on necessary
new initiatives, and due regard for quality control within agencies.
Accordingly, the panel recommends:
0 That an interagency review group for immigration statistics be
established under the aegis of the Statistical Policy Office of the
Office of Management and Budget.
This group would be charged with ensuring the coordination across
agencies of data collection and processing in the area of migration and
refugees and with overseeing the implementation of improvements within
agencies, particularly with reference to timeliness, quality control, and
responsiveness to changing data needs. An early task for the group would
be to examine the recommendations on statistics of international
migration of the United Nations (1980), with a view to proposing changes
leading to greater conformity with the recommendations. The group would
thus provide the leadership that has been so lacking in the past.
OCR for page 142
142
Improved coordination alone will go some way toward remedying the past
neglect of immigration statistics, but the interagency review group must
go further, to bear responsibility for the testing and implementation of
the panel's specific, agency-directed recommendations.
With the exception of the proposed longitudinal survey, we have not
provided detailed cost estimates for the recommendations given in this
chapter. The reason for this omission is that we believe that the cost
of the proposed measures, again with the exception of the longitudinal
survey and the reorganization within the INS, are small enough to be met
within existing budgets, at least in the initial stages of
implementation. Some reallocation with current appropriations will be
necessary to effect these actions, but such changes fall well within the
scope of normal managerial discretion. Estimates of the cost of
reorganizing statistical activities in the INS are given in Chapter 4; as
noted, we estimate additional recurrent expenditures of some $2.5 million
per year once the proposed system is fully operational. This money must
be spent to reverse past neglect, but the returns in terms of improved
policy making and program monitoring fully justify the additional
expenditure.
REFERENCES
National Research Council
1979 Privacy and Confidentiality as Factors in Survey Response.
Committee on National Statistics, Assembly of Behavioral and
Social Sciences. Washington, D.C.: National Academy of
Sciences.
United Nations
1980 Recommendations on Statistics of International Migration._ _
Department of International Economic and Social Affairs.
Statistical Office. Statistical Papers Series M No. 58
(ST/ESA/STAT/SER.M/58) New York: United Nations.
Representative terms from entire chapter:
immigration statistics