National Academies Press: OpenBook

Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition (1999)

Chapter: Chapter 8 Invited Session on Confidentiality

« Previous: Chapter 7 Contributed Session on More Applications of Probabilistic Record Linkage
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Chapter 8

Invited Session on Confidentiality

Chair: Laura Zayatz, Bureau of the Census

Authors:

Eleanor Singer and John VanHoewyk, University of Michigan and Stanley Presser, University of Maryland

Arthur B.Kennickell, Federal Reserve Board

Katherine K.Wallman and Jerry L.Coffey, Office of Management and Budget

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
This page in the original is blank.
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Public Attitudes Toward Data Sharing by Federal Agencies

Eleanor Singer and John VanHoewyk, University of Michigan Stanley Presser, University of Maryland

Abstract

Very little information exists concerning public attitudes on the topic of data sharing among Federal agencies. The most extensive information prior to 1995 comes from questions on several IRS surveys of taxpayers, from questions a a series of Wisconsin surveys carried out in 1993–95, and from scattered other surveys reviewed by Blair (1995) for the National Academy of Sciences panels. From this review it is clear that the public is not well informed about what data sharing actually entails, nor about the meaning of confidentiality. It seems likely that opinions on this topic are not firmly held and liable to change depending on other information stipulated in the survey questions as well as on other features of the current social climate.

In the spring of 1995, the Survey Research Center at the University of Maryland (JPSM) carried out a random digit dialing (RDD) national survey which was focused on the issue of data sharing. The Maryland survey asked questions designed to probe the public's understanding of the Census Bureau's pledge of confidentiality and their confidence in that pledge. Respondents were also asked how they felt about the Census Bureau's obtaining some information from other government agencies in order to improve the decennial count, reduce burden, and reduce cost. In addition, in an effort to understand responses to the data sharing questions, the survey asked about attitudes toward government and about privacy in general.

Then, in the fall of 1996, Westat, Inc. repeated the JPSM survey and, in addition, added a number of split-ballot experiments to permit better understanding of some of the responses to the earlier survey. This paper examines public attitudes toward the Census Bureau's use of other agencies' administrative records. It analyzes the relationship of demographic characteristics to these attitudes as well as the interrelationship of trust in government, attitudes toward data sharing, and general concerns about privacy. It also reports on trends in attitudes between 1995 and 1996 and on the results of the question-wording experiments imbedded in the 1996 survey. Implications are drawn for potential reactions to increased use of administrative records by the Census Bureau.

Introduction

For a variety of reasons, government agencies are attempting to satisfy some of their needs for information about individuals by linking administrative records which they and other agencies already possess. Some of the reasons for record linkage have to do with more efficient and more economical data collection, others with a desire to reduce the burden on respondents, and still others with a need to improve coverage of the population and the quality of the information obtained.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

The technical problems involved in such record linkage are formidable, but they can be defined relatively precisely. More elusive are problems arising both from concerns individuals may have about the confidentiality of their information and from their desire to control the use made of information about them. Thus, public acceptance of data sharing among Federal and state statistical agencies is presumably necessary for effective implementation of such a procedure, but only limited information exists concerning public attitudes on this topic.

A year and a half ago, the Joint Program in Survey Methodology (JPSM) at the University of Maryland devoted its practicum survey to examining these issues. The survey asked questions designed to probe the public 's understanding of the Census Bureau's pledge of confidentiality and their confidence in that pledge. It also asked how respondents felt about the Census Bureau's obtaining some information from other government agencies in order to improve the decennial count or to reduce its cost. In addition, in an effort to understand responses to the data sharing questions, the survey asked a series of questions about attitudes toward government and about privacy in-general.

Most of these questions were replicated in a survey carried out by Westat, Inc. in the fall of 1996, a little more than a year after the original survey. The Westat survey asked several other questions in addition—questions designed to answer some puzzles in the original survey, and also to see whether the public was willing to put its money where its mouth was—i.e., to provide social security numbers (SSN's) in order to facilitate data sharing. Today, I will do four things:

  • Report on trends in the most significant attitudes probed by both surveys;

  • Discuss answers to the question about providing social security numbers;

  • Report on progress in solving the puzzles left by the JPSM survey; and

  • Discuss the implications of the foregoing for public acceptance of data sharing by Federal agencies.

Description of the Two Surveys

The 1995 JPSM survey was administered between late February and early July to a two-stage Mitofsky-Waksberg random digit dial sample of households in the continental United States. In each household, one respondent over 18 years of age was selected at random using a Kish (1967) procedure. The response rate (interviews divided by the total sample less businesses, nonworking numbers, and numbers that were never answered after a minimum of twenty calls) was 65.0 percent. The nonresponse consisted of 23.4% refusals, 6.5% not-at-home, and 5.1% other (e.g., language other than English and illness). Computer-assisted telephone interviewing was conducted largely by University of Maryland Research Center interviewers, supplemented by graduate students in the JPSM practicum (who had participated in the design of the questionnaire through focus groups, cognitive interviews, and conventional pretests). The total number of completed interviews was 1,443.

The Westat survey (Kerwin and Edwards, 1996) was also conducted with a sample of individuals 18 or older in U.S. households from June 11 to mid-September. The response rate, estimated in the same way as the JPSM sample, was 60.4%[1]. The sample was selected using a list-assisted random digit dial method. One respondent 18 or over was selected at random to be interviewed.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Trends in Public Attitudes Toward Data Sharing

The most significant finding emerging from a comparison of the two surveys was the absence of change with respect to attitudes relating to data sharing. Indeed, if we are right that there has been little change on these matters, the new survey is testimony to the ability to measure attitudes reliably when question wording, context, and procedures are held reasonably constant—even on issues on which the public is not well informed and on which attitudes have not crystallized. In 1996 between 69.3% and 76.1%, depending on the agency, approved of other agencies sharing information from administrative records with the Census Bureau in order to improve the accuracy of the count, compared with 70.2% to 76.1% in 1995[2]. Responses to the Immigration and Naturalization Service, asked about in 1995, and the Food Stamp Office, asked about in 1996, are comparable to those to the Social Security Administration (SSA). Responses are consistently least favorable toward the Internal Revenue Service (IRS).

Westat documents five significant changes (p < 10) among 22 questions asked about the Census Bureau on both surveys. First, there is more awareness of the fact that census data are used to apportion Congress and as a basis for providing aid to communities; but second, there is less awareness that some people are sent the long census form instead of the short form. (Both of these changes make sense in retrospect. In the election year of 1996, apportionment was very much in the news; at the same time, an additional year had elapsed since census forms, long or short, had been sent to anyone.) Third, fewer people in 1996 than 1995 said that the five questions asked on the census short form are an invasion of privacy—a finding at odds with others, reported below, which suggest increasing sensitivity to privacy issues between the two years. This issue will be examined again in the 1997 survey. Fourth, there was a modest increase in the strength with which people opposed data sharing by the IRS. This finding (not replicated with the item about data sharing by SSA) may have less to do with data sharing than with increased hostility toward the IRS. These changes are mostly on the order of a few percentage points. Finally, among the minority who thought other agencies could not get identifiable Census data there was a substantial decline in certainty, although the numbers of respondents being compared are very small.

Trends in Attitudes Toward Privacy

In contrast with attitudes toward data sharing and the Census Bureau, which showed virtually no change between 1995 and 1996, most questions about privacy and alienation from government showed significant change, all in the direction of more concern about privacy and more alienation from government. The relevant data are shown in Table 1.

There was a significant decrease in the percentage agreeing that “people's rights to privacy are well protected” and a insignificant increase in the percentage agreeing that “people have lost all control over how personal information about them is used.” At the same time, there was a significant decline in the percentage disagreeing with the statement, “People like me don't have any say about what the government does,” and a significant increase in the percentage agreeing that “I don't think public officials care much what people like me think” and in the percentage responding “almost never” to the question, “How much do you trust the government in Washington to do what is right?” The significant decline in trust and attachment to government manifested by these questions is especially impressive given the absence of change in responses to the data sharing questions. We return to the implications of these findings in the concluding section of the paper.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Table 1. —Concerns about Privacy and Alienation from Government, by Year

Attitude/Opinion

Agree Strongly or Somewhat

 

1995

1996

People's rights to privacy are well protected

41.4

(1,413)

37.0

(1,198)

People have lost all control over how personal information about them is used

79.5

(1,398)

80.4

(1,193)

People like me don't have any say about what the government does

59.2

(1,413)

62.9

(1,200)

I don't think public officials care much what people like me think

65.4

(1,414)

71.1

(1,202)

How much do you trust the government in Washington to do what is right? (Almost never)

19.2

(1,430)

25.0

(1,204)

Willingness to Provide Social Security Number to Facilitate Data Sharing

One question of particular importance to the Census Bureau is the extent to which people would be willing to provide their social security number to the Census Bureau in order to permit more precise matching of administrative and census records. Evidence from earlier Census Bureau research is conflicting in this regard. On the one hand, respondents in four out of five focus groups were overwhelmingly opposed to this practice when they were asked about it in 1992 (Singer and Miller, 1992). On the other hand, respondents to a field experiment in 1992 were only 3.4 percentage points less likely to return a census form when it requested their SSN than when it did not; an additional 13.9 percent returned the form but did not provide a SSN (Singer, Bates, and Miller, 1992).

To clarify this issue further, the Bureau asked Westat to include a question about SSN on the 1996 survey. The question (Q21) read as follows:

“The Census Bureau is considering ways to combine information from Federal, state, and local agencies to reduce the costs of trying to count every person in this country. Access to social security numbers makes it easier to do this. If the census form asked for your social security number, would you be willing to provide it?

About two thirds (65.9%) of the sample said they would be willing to provide the number; 30.5% said they would not; and 3.5% said don 't know or did not answer the question.

The question about SSN was asked after the series of questions asking whether or not people approved of other administrative agencies sharing data with the Census Bureau. Therefore, it is reasonable to

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

assume that responses to this question were influenced by opinions about data sharing, which the preceding questions had either brought to mind or helped to create. And, not surprisingly, there is a relationship between a large number—but not all—of the preceding questions and the question about providing one's SSN.

For example, those who would provide their SSN to the Bureau are more likely to believe the census is extremely or very important and more likely to be aware of census uses. They are more likely to favor data sharing. Those who would not provide their SSN to the Bureau are more concerned about privacy issues. They are less likely to trust the Bureau to keep census responses confidential; they are more likely to say they would be bothered “a lot” if another agency got their census responses; they are less likely to agree that their rights to privacy are well protected; less likely to believe that the benefits of data sharing outweigh the loss of privacy this would entail, and more likely to believe that asking the five demographic items is an invasion of privacy. All of these differences are statistically significant.

Table 2. —Willingness to Provide SSN and Attitudes to Census Bureau

Attitude/Opinion

Would Not Provide SSN

%

Would Provide SSN

%

Believes counting population is “extremely” or “very” important

63.8

79.7

Is aware of census uses

43.1

54.8

Would favor SSA giving Census Bureau short-form information

56.3

85.0

Would favor IRS giving Census Bureau long-form information

30.4

61.2

Would favor “records-only” census

45.6

60.0

Trusts Bureau to not give out/keep confidential census responses

45.0

76.7

Would be bothered “a lot” if other agency got census responses

54.1

29.9

Believes benefits of record sharing outweigh privacy loss

36.0

51.1

Believes the five items on short form are invasion of privacy

31.3

13.4

There are also significant relationships between political efficacy, feelings that rights to privacy are well protected, feelings that people have lost control over personal information, and trust in “the government in Washington to do what is right” (Q24a-d) and willingness to provide one's SSN. These political attitude questions, it should be noted, were asked after the question about providing one's SSN, and so they could not have influenced the response to this question.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Of the demographic characteristics, only two—gender and education—are significantly (for gender, p<.10; for education, p<.05) related to willingness to provide one's SSN. Almost three quarters (71.4%) of men, but only 65.5% of women, are willing to provide their SSN. This is true of 71.2% of those with less than a high school education, 63.9% of those who are high school graduates, 68.7% of those with some college, and 76.8% of those who are college graduates. The same curvilinear relationship is apparent for income: 75.4% of those with family incomes of less than $20,000, 69.6% of those with incomes between $20,001 and $30,000 and $30,001 and $50,000, 68.6% of those between $50,001 and $75,000, and 75.4% of those with incomes over $75,000 say they would be willing to provide their SSN if asked by the Census Bureau to do so.

Table 3. —Willingness to Provide SSN, by Concerns about Privacy and Alienation from Government

Concern/Alienation

Would Provide SSN

%

Would Not Provide SSN

%

Disagrees strongly that rights to privacy are well protected

24.2

45.6

Agrees strongly people have lost control over personal information

37.9

54.2

Agrees strongly “people like me” have no say about what government does

27.7

43.7

Agrees strongly public officials don't care much about “what people like me think”

31.2

45.4

Almost never trusts government in Washington to do what's right

19.5

37.8

Privacy loss outweighs economic benefit of data sharing

47.1

56.0

Economic benefit of data sharing outweighs privacy loss

47.9

30.4

From the foregoing, it appears that there are two reasons underlying reluctance to provide one's SSN. First, there are reasons associated with beliefs about the census: People who are less aware of the census, who consider it less important, and who are less favorable toward the idea of data sharing are significantly less willing to provide their SSN. Low levels of education are also associated with these characteristics. Second, however, is a set of beliefs and attitudes concerning privacy, confidentiality, and trust: People who are more concerned about privacy, who have less trust in the Bureau's maintenance of confidentiality, and who are less trusting of government in general are much less likely to say they would provide their SSN to the Census Bureau. Women are more likely to be concerned about privacy issues than men, and they are also less willing to say they would provide their SSN to the Bureau. In earlier analyses (Singer and Presser, 1996) we found that importance attached to the census, knowledge about the census, and attitudes about privacy were independent factors predicting willingness to have other agencies share data with the Bureau. Though we have not carried out a factor analysis of attitudes toward willingness to provide one's SSN, the relationships described above suggest that the same clusters of beliefs are relevant for this attitude, as well.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

We should point out that the question asked on the 1996 survey, about whether or not respondents would be willing to provide their SSN, is not equivalent to a field experiment. The number of people who would provide their SSN if asked to do so in an actual census might very well be higher than the two thirds who said they would do so on this survey, as suggested by the field experiment cited at the beginning of this section. On the other hand, if the issue of privacy became salient prior to the census, the number complying might well be less. Arguing for the second, more cautious, inference is the fact that more than a third of those approached for the survey did not participate, and, since the introduction to the survey informed potential respondents about the topic, the nonparticipants may well have included those more suspicious of government and less inclined to cooperate with any request from government agencies, including the Census Bureau[3].

What Does Confidentiality Mean?

A number of question wording experiments were included in the 1996 Westat survey. The most important of these, from the perspective of understanding data sharing attitudes, had to do with the meaning of the Census Bureau's assurance of data confidentiality to respondents. The short answer to the question, “What does confidentiality mean to the public?” is, “We don't know.” However, in the rest of this paper, we try to summarize what we think we learned.

The 1995 JPSM survey resulted in one very puzzling finding. When asked whether other agencies could get their answers to census questions, identified by name and address, 41% said they did not know; of the rest, about 90% said other agencies could get such information (Presser and Singer, 1995). To make things even more puzzling, the better educated were more likely to believe, erroneously, that other agencies could get such data—virtually the only time, so far as we know, that more education has been associated with more error (Hyman, Wright, and Reed, 1975). Furthermore, the belief that other agencies could get such data was associated with more favorable attitudes toward data sharing.

It thus seemed fairly clear that our attempt to provide a neutral definition of “confidentiality” in the 1995 instrument had not had the intended effect. Accordingly, we incorporated a four-way split ballot experiment into the 1996 survey.

One quarter of the sample were asked the 1995 question; one quarter, the 1995 question without the DK filter. One quarter were asked, “Do you think the Census Bureau does or does not protect the confidentiality of this (household demographic) information, or don't you know (DK)? ” And the final quarter were asked the confidentiality question without the DK filter.

The results are shown in Table 4. The most striking thing about the table is simply the variation in responses, depending on the wording of the question. But the next most startling finding is the difference in responses to the questions asking whether other agencies can get identified data, and whether the Bureau keeps data confidential. Omitting those who answer DK, the percentages who believe responses are NOT shared (or data ARE kept confidential) ranges from 11.5% in Q 7–1 to 69.2% in Q 7–4. Omission of the DK filter reduces the size but does not change the basic form of the relationship. Majorities of the public believe that other agencies can get identified data; they also believe that the Bureau maintains data confidentiality.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Table 4. —The Effects of Question Wording on Beliefs Regarding Sharing of Responses by Census Bureau

Response

Do you think other government agencies…can or cannot get people's names and addresses along with their answers to the census?

Do you think the Census Bureau does or does not protect the confidentiality of this [household demographic] information?

 

Explicit “Not Sure”

%

No Explicit “Not Sure”

%

Explicit “Not Sure”

%

No Explicit “Not Sure”

%

Believe that census responses are shared

47.1

76.9

9.6

20.9

Believe that census responses are not shared

6.1

15.4

12.9

47.0

Not Sure/Don't Know

46.8

7.7

77.5

32.1

N (unweighted)

310

296

294

315

In passing, we should note that the distribution of answers to the version of the question which is identical to the 1995 question do not differ significantly from the 1995 distribution; and, as in 1995, people who said other agencies CAN get data were significantly more likely to favor data sharing in 1996 as well.

In another effort to understand the meaning of confidentiality to respondents, we asked another splitballot question near the end of the 1996 survey. One asked whether the Census Bureau was required by law to keep census information confidential; the other, whether the Bureau was forbidden by law from giving identified census information to other agencies.

The responses to the two versions of this question are shown in Table 5. Majorities of those who have an opinion give the correct answer to both questions; but the proportion answering DK is larger, and the proportion giving the correct answer smaller, when the question asks about giving other agencies identified information than when it asks about maintaining confidentiality.

As a follow-up to both questions, we asked those who said the Bureau is required to protect the information or forbidden from disclosing it, whether or not they trusted the Bureau to uphold the law—that is, to keep the information confidential, or to refrain from disclosing it to other agencies. Regardless of which version of Q22 they got, two thirds of those who answered Yes to the factual question about legal requirements said they trusted the Bureau to comply with the law. However, those who not only say the Bureau is required to keep information confidential but who also trust the Bureau to do so, are significantly more likely to say both that other agencies cannot get the data and that the Bureau keeps data confidential. Thus, not only knowledge of the law, but also trust in the Bureau's compliance with the law,

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

is implicated in responses to the factual questions about whether the Bureau does or does not protect the data in its possession.

Table 5. —The Effect of Question Wording on Knowledge of Laws Regarding Sharing of Census Information

Response

Is the Census Bureau forbidden by law from giving other government agencies census information identified by name or address?

%

Is the Census Bureau required by law to keep census information confidential?

%

Total

%

Yes

28.3

51.1

40.2

No

17.1

11.6

14.2

Dont't Know

54.6

37.3

45.5

N (unweighted)

591

624

1215

What differentiates those who trust the Bureau to keep information confidential from those who do not?

We found only two demographic characteristics that seemed to make a difference. Women are considerably more likely to say they trust the Bureau than men, and younger respondents are more likely to express trust than older respondents are. Whether this is an effect of age or of cohort is impossible to tell from this cross-sectional survey. None of the other demographic characteristics we examined—education, race, or income—make a consistent difference in attitudes of trust.

Finally, we looked at the relation of the beliefs about legal requirements to attitudes about data sharing. People who believe the Bureau is required by law to keep data confidential are significantly more likely to favor data sharing than those who do not. On the other hand, people who believe the Bureau is forbidden from sharing data with other agencies are significantly more likely to oppose data sharing by other agencies. Whether this results from confusion, or from an application of the norm of reciprocity, or from opposition to all data sharing, is impossible to tell.

Conclusion and Implications

The following conclusions seem to follow from comparison of the 1995 and 1996 surveys:

  • Beliefs about the Census Bureau and attitudes toward data sharing have undergone little change since 1995.

  • Beliefs about privacy and trust in government have deteriorated since 1995.

  • To the public, the belief that the Bureau protects confidentiality does not seem to mean that other agencies cannot get data identified by name and address. What it does mean, we cannot tell from these data.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
  • In contrast to an implicit Census Bureau hypothesis, knowledge about legal requirements for confidentiality is not enough to convince the public that the Bureau actually protects confidentiality. In order for knowledge to translate into belief, trust in the Bureau is required. The number of people who both know about legal requirements and trust the Bureau is only about two thirds as great as the number whose factual information is correct. (However, both knowledge and trust are independently related to attitudes toward data sharing.)

We believe these findings have two major implications for future data collection:

  • First, we are planning in 1997 to ask both about whether other agencies can get data, and whether the Census Bureau maintains confidentiality, of one third of the sample. Then, everyone will be asked what confidentiality means to them. Only when the sources of misunderstanding are known can the Bureau better communicate its message about data protection to the public.

  • Second, future surveys should be used to experiment with arguments that might be presented to the public in favor of data sharing. For example, there is evidence from the 1995 and 1996 surveys that the quality of the data is a more important consideration than cost. Are there orther arguments that are even more persuasive? How can the argument about quality be made even more compelling?

We hesitate to make substantive predictions about the public's acceptance of data sharing at the time of the next census. On the one hand, about two thirds of the public currently to favor this practice, this proportion has remained stable over at least a year, and two thirds say they would be willing to provide their SSN to the Bureau to facilitate such sharing. On the other hand, opposition to data sharing, and to making the SSN available, is strongly related to privacy concerns, and such concerns show a small but significant increase between 1995 and 1996. Thus, it seems possible that if privacy concerns continue to increase, they may erode the support for data sharing that currently exists. The same implication can be drawn from our findings concerning belief in the Census Bureau's assurance of confidentiality. Information about the law is apparently not enough; trust is also required. And the latter is a much more difficult message to communicate effectively.

Acknowledgments

We would like to express our thanks to Jeff Kerwin and Sherman Edwards at Westat for help with some aspects of data analysis and to Randall Neugebauer and his colleagues at the Census Bureau for helpful comments on an earlier draft.

Note

Follow-up to this research appears in Singer, Eleanor and Presser, Stanley (1997). Public Attitudes Toward Data Sharing by Federal Agencies: Trends and Implications, Survey Research Methods Section Proceedings, American Statistical Association (in process).

Footnotes

[1] The Westat report gives a response rate of 64.4%, which is based on excluding the number of respondents with language problems (n=126) from the denominator. This group is included in the JPSM

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

count of eligibles. The introduction to the Westat survey differed somewhat from that used by JPSM. It read as follows:

“My name is______. I'm calling from Westat on behalf of the U.S. Census Bureau in Washington, D.C. We're doing a study of people's opinions on whether government agencies keep information about them private. You were randomly selected for this study from the adults in your household. This survey has been approved by the Office of Management and Budget, Number 0607–0822. Without this approval, we could not conduct this survey. Any questions or comments about the survey may be directed to the Census Bureau. If you would like, when we are done, I will provide you with the address.”

The JPSM introduction omitted all references to OMB or the Census Bureau, as well as the sentence about random selection, and introduced the interviewer as calling from the University of Maryland. The sentence about the topic of the study was identical to that in the Westat introduction.

[2] Text and tables use data weighted for number of residential phone numbers in the household and number of persons in the household, poststratified to Census estimates of sex, race, age, education, and region.

[3] If the Bureau used a less specific introduction, the overall response rate to the survey might not change, but nonrespondents might be more representative (less biased) with respect to their attitudes toward government and the Census Bureau.

References

Blair, Johnny ( 1995). Ancillary Uses of Government Administrative Data on Individuals: Public Perceptions and Attitudes, a paper commissioned by the panel to Evaluate Alternative Census Methods, National Academy of Sciences, College Park, MD: Survey Research Center, University of Maryland.

Hyman, Herbert H.; Wright, Charles; and Reed, John ( 1975). The Enduring Effects of Education, Chicago: University of Chicago Press.

Kerwin, Jeffrey and Edwards, Sherman ( 1996). The 1996 Survey on Privacy and Administrative Records Use, Staff paper, Rockville, MD: Westat, Inc.

Kish, L. ( 1967). Survey Sampling, New York: Wiley.

Presser, Stanley and Singer, Eleanor ( 1995). Public Beliefs about Census Confidentiality, paper presented at the 1995 meetings of the American Sociological Association.

Singer, Eleanor and Miller, Esther R. ( 1992). Report on Focus Groups, Center for Survey Methods Research, U.S. Bureau of the Census.

Singer, Eleanor; Bates, Nancy; and Miller, Esther R. ( 1992). Memorandum for Susan Miskura, Bureau of the Census, July 15, 1992

Singer, Eleanor and Presser, Stanley ( 1996). Public Attitudes toward Data Sharing by Federal Agencies, paper presented at the Annual Research Conference, Bureau of the Census.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Multiple Imputation and Disclosure Protection: The Case of the 1995 Survey of Consumer Finances

Arthur B.Kennickell, Federal Reserve Board

Abstract

Donald Rubin has suggested many times that one might multiply impute all the data in a survey as means of avoiding disclosure problems in public-use datasets. Disclosure protection in the Survey of Consumer Finances is a key issue driven by two forces. First, there are legal requirements stemming from the use of tax data in the sample design. Second, there is an ethical responsibility to protect the privacy of respondents, particularly those with small weights and highly salient characteristics. In the past, a large part of the disclosure review of the survey required tedious and detailed examination of the data. After this review, a limited number of sensitive data values were targeted for a type of constrained imputation, and other undisclosed techniques were applied. This paper looks at the results of an experimental multiple imputation of a large fraction of the SCF data using software specifically designed for the survey. In this exercise, a type of range constraint is used to limit the deviations of the imputations from the reported data. The paper will discuss the design of the imputations, and provide a preliminary review of the effects of imputation on subsequent analysis.

Introduction

Typically, in household surveys there is the possibility that information provided in confidence by respondents could be used to identify the respondent. This possibility imposes an ethical, and sometimes a legal, burden on those responsible for publishing the survey: It is necessary to review the data for items that could be highly revealing of the identity of individuals, and to filter the data made available to the public to minimize the degree of disclosure[1]. A recent issue of the Journal of Official Statistics (vol. 9, no. 2, 1993) deals with many aspects of this problem.

The Survey of Consumer Finances (SCF) presents two particularly serious disclosure risks. First, the survey is designed to measure the details of families' balance sheets and other aspects of their financial behavior. Second, the SCF oversamples wealthy families. Because of the sensitive nature of the data collected and because the sample contains a disproportionate number of people who might be at well-known, at least in their localities, disclosure review of the SCF is particularly stringent[2].

There is a growing belief that publicly available records, such as credit bureau files, real estate tax data, and similar files make it increasingly likely that an unscrupulous data user might eventually come closer to identifying an SCF respondent[3], Several protective strategies have been proposed, but many proposals —truncation, simple averaging across cells, random reassignment of data, etc., —raise serious obstacles for many of the analyses for which the SCF is designed. The prospect of either being unable to release any information, or having to alter the data in ways that severely restrict their usefulness makes it imperative that we explore alternative approaches to disclosure limitation.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Most disclosure limitation techniques attempt to release some transformation of the data that preserves what is deemed to be the important information. Taking this idea to one farsighted conclusion, Donald Rubin has suggested on several occasions creating an entirely synthetic dataset based on the real survey data and multiple imputation (see, e.g., Rubin, 1993)[4]. My impression is that most people have viewed the idea of completely simulated data with at least suspicion[5]. Such an exercise presents considerable technical difficulties. However, even if it is not possible to create an ideal simulated dataset, we may learn something from the attempt to create one. This paper describes several explorations in this direction.

Multiple imputation has played an important role in the creation of the public datasets for the SCF since 1989. In both the 1989 and 1992 surveys, a set of sensitive monetary variables was selected for a set of cases, the responses to those variables were treated as range responses (rather than exact dollar responses) and they were multiply-imputed using the standard FRITZ software developed for the SCF (see Kennickell, 1991). The approach has been broadened in the 1995 survey based on the work reported here. In the experiments discussed in this paper, several approaches are taken to imputing all of the monetary values in the 1995 SCF.

The first section of the paper provides some general information on the content of the SCF and the sample design and gives a review of the past approach to disclosure review. Because of the importance of imputation in the work reported here, the second section reviews the FRITZ imputation model. The third section discusses the special manipulation of the data for this experiment and presents some descriptive results. A final section summarizes the findings of the paper and points toward future work.

The 1995 Survey of Consumer Finances

The SCF is sponsored by the Board of Governors of the Federal Reserve System in cooperation with the A Statistics of Income Division of the IRS (SOI). Data collection for the 1995 SCF was conducted between the months of July and December of 1995 by the National Opinion Research Center (NORC) at the University of Chicago. The interviews, which were performed largely in person using computer-assisted personal interviewing (CAPI), required an average of 90 minutes—though some took considerably longer.

Because the major focus of the survey is household finances, the SCF includes questions about all types of financial assets (checking accounts, stocks, mutual funds, cash value life insurance, and other such assets), tangible assets (principal residences, other real estate, businesses, vehicles, and other such assets) and debts (mortgages, credit card debt, debt to and from a personally-owned business, education loans, other consumer loans, and other liabilities). To meet the analytical objectives of the survey, detailed information is collected on every item. For example, for up to six checking accounts, the SCF asks the amount in the account, the owner of the account, and the institution where the account is held. The actual name of the institution is not retained, but a linkage is established to every other place in the interview where the institution is referenced, and detailed questions are asked about the institution. For automobiles, the make, model, and year of the car are requested along with the details of the terms of any loan for the car. Detailed descriptions of types of properties and business that the household owns are collected, along with information on the financial flows to and from the household and the businesses.

To provide adequate contextual variables for analysis, the SCF also obtains data on the current and past jobs of respondents and their spouses or partners, their pension rights from current and past jobs, their marital history, their education, the ages of their parents, and other demographic characteristics. Data are also collected on past inheritances, future inheritances, charitable contributions, attitudes, and many other variables.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Although the combination of such an broad array of variables alone is sufficient cause to warrant intensive efforts to protect the privacy of the individual survey participants, a part of the SCF sample design introduces further potential disclosure problems. The survey is intended to be used for the analysis of financial variables that are widely distributed in the population—e.g., credit card debt and mortgages —and variables that are more narrowly distributed—e.g., personal businesses and corporate stock. To provide good coverage of both types of variables, the survey employs a dual-frame design (see Kennickell and Woodburn, 1997). In 1995, a standard multi-stage area-probability sample was selected from 100 primary sampling units across the United States (see Tourangeau, et al., 1993). This sample provides good coverage of the broadly-distributed variables. A special list sample was designed to oversample wealthy households. Under an agreement between the Federal Reserve and SOI, data from the Individual Tax File (ITF), a sample of individual tax returns specially selected and processed by SOI, are made available for sampling[6].

The area-probability design raises no particularly troubling issues beyond the need to protect geographic identifiers that is common to most surveys. However, the list sample raises two distinct problems. First, it increases the proportion of respondents who are wealthy. Such people are likely to be well-known at least in their locality, and because of the relatively small number of such people, it is more likely that data users with malicious intent could match a respondent to external data if sufficient information were released in an unaltered form. Second, because SOI data have been used in the design of the sample, there is a legal requirement that SCF data released to the public be subjected to a disclosure review similar to that required before the release of the public version of the ITF.

Generally, the SCF data have been released to the public in stages. This strategy has allowed us to satisfy some of the most immediate demands of data users, while allowing time to deal with more complex disclosure issues. Once a variable has been released, no amount of disclosure review can retrieve the information, and it can be much more difficult to add variables later because of the possible interactions of sensitive variables. In the past, staged release has allowed users to build a case for including additional variables, and we have been able to accommodate many such requests.

In 1992, the last year for which final data were released at the time this paper was written, the internal data were altered in the following ways for release[7]. First, geography, which was released at the level of the nine Census regions, was altered systematically; observations were sorted and aligned by some key indicators, and geography was swapped across cases.

Second, unusual categories were combined with other related categories —e.g., among owners of miscellaneous vehicles, the categories “boat,” “airplane,” and “helicopter” were combined. Third, a set of cases with unusual wealth or income were chosen, and a random set of other cases was added to the group. For these cases, key variables (for which complete responses were originally given) were multiply imputed subject to range constraints that ensured that the outcomes would be close to the initially reported values. Fourth, a set of other unspecified operations was performed to increase more broadly the perceived uncertainty associated with all variables in every observation; these operations affect both actual data values and the “shadow” variables in the dataset that describe the original state of each variable[8]. As a final step, all continuous variables were rounded as shown in Table 1. Generally, it is impossible to tell with certainty from the variables observed by a user of the public dataset which variables may have been altered and how they were altered.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Table 1. —Rounding of Continuous Variables

Data Range

Rounded to Nearest

>1 million

10,000

10,000 to 1 million

1,000

1,000 to 10,000

100

5 to 1,000

10

-5 to -1,000

10

-1,000 to -10,000

100

-10,000 to -1 million

1000

Negative numbers smaller than -1 million truncated at -1 million

Negative numbers between -1 and -5 unaltered

A similar strategy is being followed for the 1995 SCF. The one significant change is in the imputation of data for the cases deemed “sensitive ” and the random subset of cases described in step three. For the 1995 survey, all monetary data items in the selected cases will be imputed. Depending on the reception of the data by users, this approach may be extended in the 1998 SCF.

FRITZ Imputation Model

Because the principal evidence reported in this paper turns critically on the imputation of monetary variables, it is important to outline some of the more important characteristics of the FRITZ model, which was originally developed for the imputation of the 1989 SCF and has been updated for each round of the survey since then. This discussion focuses on the imputation of continuous variables (see Kennickell, 1991).

Figure 1 shows a hypothetical set of observations with various types of data given. In the figure, “X” represents complete responses, “R” symbolizes responses given as a type of range, and “O” indicates some type of missing value. In the SCF, there is a lengthy catalog of range and missing data responses, and this information is preserved in the shadow variables. Respondents in the 1995 SCF had the option of providing ranges in many ways: as an arbitrary volunteered interval (e.g., between 2,546 and 7,226), as a letter from a range card containing a fixed set of intervals (e.g., range “G” means 5,001 to 7,500), or as the result of answering a series of questions in a decision tree the intervals of which varied by question [9]. Data may be missing because the respondent did not know the answer, refused to answer, because the respondent did not answer a question of a higher order in a sequence, because of recording errors, or other reasons.

The FRITZ system is an iterative multiple imputation model based on ideas of Gibbs sampling. The system acts on a variable-by-variable basis, rather than simultaneously drawing a vector of variables[10]. Within a given iteration, the most generally applied continuous variable routine is, in essence, a type of randomized regression, in which errors are assumed to be normally distributed[11].

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

One factor that distinguishes the model from the usual description of randomized regression imputation models is the fact that the FRITZ model is tailored to the missing data pattern of each observation. In Figure 1, all of the missing data patterns shown are different, and they are not monotone (Little, 1983). For most continuous variables, the program generates a covariance matrix for a maximal set of variables that are determined to be relevant as possible conditioning variables. For a given case, the model first determines whether a particular variable should be imputed. Given that the variable should be imputed, the FRITZ model computes a regression for the case using the variables in the maximal set that either are not originally missing or are already imputed within the particular iteration for the case. Finally, the model draws from the estimated conditional distribution until an outcome is found that satisfies any constraints that may apply. Constraints may take several forms. When a respondent has given a range response to a question, FRITZ uses the range to truncate the conditional distribution. Constraints may also involve cross-relationships with other variables, or simply prior knowledge about allowable outcomes. Specification of the constraints is very often the most complex mechanical part of the imputations.

Figure 1. —Hypothetical Missing Data Patterns

Variables

X = reported value

R = range value

O = missing value

As noted, once a variable has been imputed, its value is taken in later imputations as if it were originally reported by the respondent. In a given imputation, variables which were originally reported as a range but are not yet imputed within the iteration, are given special treatment. Range reports often contain substantial information on the location of related variables, and one would like to use this knowledge in imputation. In the ideal, it is not difficult to write down a general model that would incorporate many types of location indicators. However, in practice such a model would quickly exhaust the degrees of freedom available in a modestly sized survey like the SCF. In practice, we adopt a compromise solution. Values reported originally as ranges are initialized at their midpoints, and these values are used as conditioning variables for other imputations until the final choice within the range is imputed.

The FRITZ model produces multiple imputations. For simplicity, the strategy adopted is to replicate each observation five times and to impute each of these “implicates” separately. Because different implicates may be imputed to take very different paths through the data, this arrangement allows users to apply standard software to the data.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

The iteration process is fairly straightforward. In the first iteration, all the relevant population moments for the imputation model are computed using all available data, including all non-missing pairs of data for the covariance calculations. As imputations progress in that iteration, the covariance estimation is based on increasingly “complete” data. In the second iteration, all population moments are computed using the first iteration dataset, and a new copy of the dataset is progressively “filled in.” In each successive iteration, the process is similar. Generally, the distribution of key imputations changes little after the first few iterations. Because the process is quite time-consuming, the model for the 1995 SCF was stopped after six iterations[12].

Experiments in Imputation for Disclosure Limitation

In this section, I report on three experiments in using multiple imputation for disclosure avoidance (summarized in Figure 2). In these experiments every monetary variable for every observation in the survey was imputed[13]. In the first experiment, all complete reports of dollar values were imputed as if the respondent had originally reported ranges which ran from ten percent above the actual figures to ten percent below that figure. In keeping with our usual practice of using midpoints of ranges as proxies for location indicators in imputation, the original values were retained until the variable was imputed. The second experiment also retained the reported value for conditioning, but imposed no range constraints on the allowed outcomes other than those required for cross-variable consistency. The third experiment treated the original values as if they were completely missing (that is, they were unavailable as conditioning variables) and, like the second experiment, imposed no prior bounds on the imputations; other monetary responses that were originally reported as ranges were also treated as completely missing values for purposes of conditioning, but their imputed values were constrained to lie within the reported ranges.

Figure 2. —Design of Experiments

Experiment

Range Constraints

Use Original Value as Initial Location Indicator

1

±10%

Yes

2

None

Yes

3

None

No

For several reasons, these experiments fall short of Rubin's ideal that one impute an entire dataset conditioning only on general information—even possibly using only distributional data external to the actual sample. First, the experiments deal only with the dollar variables in the SCF. Second, all complete responses other than monetary responses are used as conditioning variables. Third, the imputations of range responses are constrained to lie within the reported ranges, even in experiment three. Finally—and most probably importantly—the results are specific to the particular specification of the FRITZ model. Inevitably there are deep compromises of theory made in implementing almost any empirical system. For imputation, such compromises may be less pressing when the proportion of missing data is relatively small, as is usually the case in the SCF. These compromises may cause larger distortions when much larger fractions of the data are imputed. A key question in evaluating the results here is how well the system performs under this more extreme condition. Because we also have the originally reported values, it is possible to make a direct evaluation of the performance of the model.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Despite the shortcomings of the three experiments, they seem very much in the sprit of Rubin's proposal. Because the experiments show the effects of progressively loosening the constraints on imputation, I believe the results should provide useful evidence in evaluating the desirability of going further in developing fully simulated data.

The mechanical implementation of these experiments was reasonably straightforward. In the first experiment, the shadow variables of all complete reports of dollar values were set to a value which would normally indicate to the FRITZ model that the respondents had provided distinct dollar ranges. Values equal to the points ten percent above and 10 percent below the reported value were placed in the appropriate positions in a file that the model normally assumes contains such information. In the second and third experiments, a special value was given to the shadow variable to indicate that there were no range constraints on the imputations other those that enforce cross-variable consistency. In experiments one and two, the initial values of complete responses were left in the dataset at the beginning of imputation; during the course of imputations, these values were used for conditioning until they were replaced by an imputed value, which was used to condition subsequent imputations. In experiment three, values originally reported completely were set to a missing value, and the usual midpoints of range responses were also set to a missing value. Thus, no dollar variables in the third experiment were available for conditioning until they were imputed. In each of the experiments, the imputations were treated as if they were the seventh iteration of the SCF implementation of FRITZ. Thus, estimates of the population moments needed for the model were computed using the final results of the sixth iteration.

In the absence of technical problems—far from the case with the work for this paper for which the imputation system was subject to a massively larger than normal stress—each version of the experiment would require approximately three weeks to run through the entire dataset on a fast dedicated Sun server. More importantly, each execution would also require about 2 gigabytes of disk space for the associated work files. The process could probably be made at least somewhat more efficient, but the time available for debugging such a potentially complex change was limited. A compromise has been adopted here. The first of the eight modules of the SCF application of FRITZ was run for all of the experiments. This module deals largely with total household income and various financial assets.

Figures 3 through 6 show descriptive plots of data from the three experiments for the following four variables: total income, amount in the first savings account, the amount of Treasury bills and other Federal bonds (referred to hereafter as “T-bills”), and the total value of financial assets[14]. The first three of these variables are intended to span a broad set of types distributions; total financial assets, a variable constructed from many components, is included to show the effects of aggregating over the potentially large number of responses to questions about the underlying components. The impression from looking at a broader set of variables is very similar. Each of the figures is divided into two sets of three panels. The top three panels show the distribution for experiments one through three, of the (base-10) logarithm of the originally reported values less the average across the five implicates of the logarithm of the corresponding imputed values (“bias”), where the distribution is estimated as an unweighted average shifted histogram (ASH). The bottom three panels are ASH plots for the three experiments, of the distribution over all cases of the standard deviation of the multiply-imputed values within observations.

For experiment one, the distribution of bias has a mode at approximately zero for all the variables. This is not surprising given that the outcome is based on models estimated using reported data for these observations. In the case of income, savings balances, and T-bills, the distribution of bias is fairly concentrated, with the 10th and 90th percentiles of the distribution corresponding to a bias of only about 5 percent (±0.02 on the scale shown). The distributions of bias for savings accounts and T-bills are

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Figure 3a: ASH Plots of Distribution over Observations of Log10(Actual Value)-Mean(Log10(Imputation)) within Observations, Total Household Income, Experiments 1–3.

Figure 3b: ASH Plots of Distribution over Observations of the Standard Deviation of Log10(Imputation) within Observations, Total Household Income, Experiments 1–3.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Figure 4a: ASH Plots of Distribution over Observations of Log10(Actual Value)-Mean(Log10(Imputation)) within Observation, Balance in First Savings Account, Experiments 1–3.

Figure 4b: ASH Plots of Distribution over Observations of the Standard Deviation of Log10(Imputation) within Observations, Balance in First Savings Account, Experiments 1–3.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Figure 5a: ASH Plots of Distribution over Observations of Log10(Actual Value)-Mean(Log10(Imputation)) within Observations, Face Value of T-Bills, Experiments 1–3.

Figure 5b: ASH Plots of Distribution over Observations of the Standard Deviation of Log10(Imputation) within Observations, Face Value of T-Bills, Experiments 1–3.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Figure 6a: ASH Plots of Distribution over Observations of Log10(Actual Value)-Mean(Log10(Imputation)) within Obervations, Total Financial Assets, Experiments 1–3.

Figure 6b: ASH Plots of Distribution over Observations of the Standard Deviation of Log10(Imputation) within Observations, Total Financial Assets, Experiments 1–3.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

relatively “lumpy,” largely reflecting the smaller samples used to estimate these distributions: about 1,200 observations were used for the savings account estimate and only about 110 observations were used for the T-bill estimate, but about 2,900 were used to estimate the distribution for total income. Reflecting the integration over possibly many imputations, the distribution of bias for total financial assets is quite smooth. In every case shown, there is some piling up of cases at the outer bounds corresponding to ±10 percent (about ±0.04 on the log scale). The FRITZ model is allowed to draw as many as 400 times from the predicted conditional distribution of the missing data before selecting the nearest endpoint of the constraint. Thus, it is likely that these extreme observations are ones for which the models do not fit very well. Not surprisingly, examination of selected cases suggests that these observations are more likely to have unusual values for some of the conditioning variables in the imputation models. The median variability of the imputations within implicates shown by the ASH plots of the distributions of standard deviations, is about ±6 percent for income, savings accounts, and T-bills. The variability within implicates is substantially lower for the sum of financial assets, reflecting offsetting errors in imputation.

In the second experiment, the relaxation of the simple range constraint in experiment one has the expected effect of increasing the variability of the bias, and increasing the standard deviation of imputations within implicates. In the case of total household income, the bias corresponding to the 90th percentile of the bias distribution jumps to about 25 percent. The effect is even larger for the other variables (the bias is nearly 300 percent at the 90th percentile for total financial assets). It is somewhat surprising just how much these values increase given that the imputations are potentially conditioned on a large number of reported values[15].

In the third experiment with the removal of the reported values used for conditioning in experiment two, the range of the bias rises further. The 90th percentile of the bias distribution is about 140 percent for total income, and about 400 percent for total financial assets.

Because these results are reported on a logarithmic scale, it is possible that they could be unduly influenced by changes that are small in dollar amounts, but large on a logarithmic scale. The data do not provide strong support for this proposition. For income, scatterplots reveal that the logarithmic bias appears to be approximately equally spread at all levels of income for experiments one and two[16]. In the third experiment, the dominant relationship is similar, but there are two smaller groups that deviate from the pattern: a few dozen observations with actual incomes of less than a few thousand dollars are substantially over-imputed on average, and a somewhat larger number of observations with actual incomes of more than $100,000 are substantially under-imputed. The data suggest a similar relationships across the experiments for the other variables as well.

To gauge the effects of the experiments on the overall univariate distributions of the four variables considered, Figures 710 show quantile-quantile (Q-Q) plots of the mean imputations against the reported values on a logarithmic scale. Across these variables, the distribution is barely affected by experiment one. In the second experiment, the results are a bit more mixed. For total income and total financial assets, there is some over-imputation of values less than a few thousand dollars, and slight under-prediction at the very top. For T-bills, the relationship is much noiser, but not strikingly different. However, for savings accounts, the Q-Q plot is rotated clockwise, indicating that the imputed distribution is under-imputed at the top and over-imputed at the bottom. All of the simulated distributions deteriorate in the third experiment, though the distribution of total financial assets appears the most resilient[17].

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Figure 7: Q-Q Plots of Imputed Distribution vs. Actual Distribution, Total Household Income, Experiments 1–3.

Figure 8: Q-Q Plots of Imputed Distribution vs. Actual Distribution, Balance in 1st Savings Account, Experiments 1–3.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Figure 9: Q-Q Plots of Imputed Distribution vs. Actual Distribution, Face Value of T-Bills, Experiments 1–3.

Figure 10: Q-Q Plots of Imputed Distribution vs. Actual Distribution, Total Financial Assets, Experiments 1–3.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Univariate and simple bivariate statistics are important for many illustrative purposes, but for the SCF, as is the case for many other surveys, the most important uses of the data over the long run are in modeling. Table 3 presents the coefficients of a set of simple linear regressions of the logarithm of total household income on dummy variables for ownership of various financial assets and the log of the maximum of one and the value of the corresponding asset. This model has no particular importance as an economic or behavioral characterization. It is intended purely as a descriptive device designed to examine the effects of the variation across the experiments on the partial correlations of a set of variables imputed in all the experiments. Two types of models are shown: one set includes all observations regardless of whether the variables included were originally reported completely by the respondent, and the other model includes only cases for which every variable in the model was originally reported completely. The regressions were run using data from each of the three experiments, as well as data from the final version of the sixth (final) iteration of the imputation of the main dataset[18].

Experiments one and two perform about equally well in terms of determining the significance of coefficients in both variations on the basic model. However, data from the first experiment misclassify one variable as not significant, and data from the second experiment misclassify some variables as significant. The third experiment implies both type one and type two errors. The R2 of the regressions changes little except in the third experiment, where this value drops about 10 percent. Overall, none of the experiments do dramatically worse than the original data. Given the structure of the FRITZ model and the degree to which the variables in these regression models were mutually interdependent, it would be very surprising if the outcome were otherwise. However, such regressions are only the beginning of what many economist would consider applying to the data, and it is possible that more complex models or methods of estimation would give different results.

Summary and Future Research

By design, experiment one is virtually guaranteed to induce minimal distortions, but it also leaves the outcomes near the original values. Unfortunately, just knowing that an outcome is in a certain range may already be sufficient information to increase too much the probability of identifying some of the very wealthy respondents in the SCF. My ex ante choice of contenders among the experiments was the second one, in which imputations condition on actual values, but there is no prior constraint on the outcome that is connected to the original value. Ex post, I find the results relatively disappointing. Certainly, the reported outcomes of the third experiment look least attractive. There may be ways of more globally constraining or aligning the outcomes of experiments two and three, but I suspect the choice of method would depend critically on a ranking of the importance of the types of analyses to be performed with the data. I hope that someone in the SCF group or elsewhere will be able to take the next step.

One technical question that appears potentially troublesome is how to estimate sampling error in a fully simulated dataset[19]. It is possible, in theory, to simulate records for the entire universe, but even in this case there would still be sampling variability in the imputations. This variation may be a second order effect in normal imputation, but we need to deal with the issue carefully if we expect to simulate all the data. Perhaps we could find an approximate solution in independently multiply imputing each of a manageably small number of replicates—implicates of replicates; each replicate would require population estimates from a corresponding replicate selected from the actual data in a way that captured the important dimensions of variability in the sample. Another possibility might be to compute variance functions from the actual data.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Table 3. —Regression of Logarithm of Total Household Income on Various Variables, Original Data and Experiments 1–3, Using all Observations and Using Only Observations Originally Giving Complete Responses to all Variables in the Model

 

All Observations Included

Only Complete Responders Included

 

Orig.

Exp. 1

Exp. 2

Exp. 3

Orig.

Exp. 1

Exp. 2

Exp. 3

Intercept

2.64*

1.92*

2.56*

3.76*

2.83*

2.87*

3.43*

6.60*

 

0.75

0.75

0.74

0.69

1.09

1.09

1.02

1.09

Have checking

0.18*

0.20*

0.25*

0.21*

0.17*

0.18*

0.18*

0.15*

 

0.03

0.03

0.03

0.03

0.04

0.04

0.04

0.04

Ln($ checking)

0.25*

0.27*

0.30*

0.26*

0.26*

0.27*

0.27*

0.23*

 

0.01

0.01

0.01

0.01

0.02

0.02

0.02

0.02

Have IRA/Keogh

0.16*

0.18*

0.18*

0.17*

0.07

0.06

0.12

0.08

 

0.05

0.05

0.05

0.05

0.07

0.07

0.07

0.07

Ln($ IRA/Keogh)

0.10*

0.11*

0.11*

0.10*

0.07*

0.07*

0.10*

0.08*

 

0.02

0.02

0.02

0.02

0.03

0.03

0.05

0.05

Have savings acct.

0.01

0.02

0.01

0.01

-0.03

-0.03

-0.02

-0.03

 

0.04

0.04

0.04

0.04

0.04

0.05

0.05

0.05

Ln($ savings acct)

0.03

0.03

0.03

0.04

0.00

0.00

0.01

0.01

 

0.02

0.02

0.02

0.02

0.02

0.02

0.02

0.02

Have money market acct.

0.02

0.03

-0.04

-0.11

0.11

0.12

0.01

-0.07

 

0.07

0.07

0.07

0.07

0.09

0.10

0.10

0.10

Ln($ money market acct.)

0.03

0.03

0.00

-0.02

0.05

0.05

0.01

-0.02

 

0.03

0.03

0.03

0.03

0.04

0.04

0.04

0.04

Have CDS

0.24*

0.26*

0.31*

0.27*

0.22*

0.22*

0.27*

0.23*

 

0.08

0.08

0.08

0.08

0.10

0.11

0.11

0.77

Ln($ CDS)

0.07*

0.07*

0.09*

0.08*

0.07

0.07

0.09*

0.07

 

0.03

0.03

0.03

0.03

0.04

0.04

0.04

0.04

Have savings bonds

-0.02

-0.01

-0.05

-0.09

-0.10

-0.10

-0.12

-0.13

 

0.04

0.04

0.05

0.04

0.06

0.06

0.06

0.05

Ln($ savings bonds)

0.02

0.02

0.00

-0.02

-0.03

-0.03

-0.05

-0.04

 

0.02

0.02

0.02

0.02

0.05

0.05

0.05

0.05

Have other bonds

0.62*

0.65*

0.51*

0.63*

0.68*

0.66*

0.54*

0.35*

 

0.09

0.09

0.08

0.09

0.14

0.14

0.13

0.14

Ln($ other bonds)

0.26*

0.27*

0.22*

0.25*

0.27*

0.26*

0.22*

0.15*

 

0.03

0.03

0.03

0.03

0.05

0.05

0.04

0.05

Have mutual funds

0.06

0.07

0.09

-0.02

0.18

0.17

0.20*

0.00

 

0.07

0.07

0.07

0.05

0.09

0.09

0.09

0.06

Ln($ mutual funds)

0.04

0.05

0.05*

0.01

0.10*

0.09*

0.10*

0.03

 

0.02

0.02

0.02

0.02

0.05

0.05

0.05

0.05

Have annuity/trust

0.02

0.02

0.03

0.01

-0.04

-0.04

-0.07

-0.07

 

0.04

0.04

0.04

0.02

0.05

0.05

0.05

0.05

Ln($ annuity/trust)

0.04*

0.04*

0.04*

0.02

0.01

0.01

0.01

-0.29

 

0.01

0.01

0.01

0.02

0.02

0.02

0.02

0.24

Have whole life insurance

-0.70

0.11

0.14*

0.19*

-0.61

-0.63

0.17*

0.2*

 

0.17

0.08

0.05

0.05

0.25

0.25

0.07

0.06

Ln($ cash value life ins.)

0.10*

0.01

0.02*

0.01

0.09*

0.09*

0.03

0.02

 

0.02

0.01

0.01

0.01

0.05

0.05

0.04

0.05

R2

0.40

0.39

0.40

0.37

0.43

0.43

0.42

0.36

* = significant at the 95% level of confidence.

Simple regression standard errors are given in italics below each estimate.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

The experimental results reported in this paper say at least as much about the nature of the SCF imputations as they do about the possibility of creating a fully simulated dataset. Although the imputation models have been refined over three surveys now, the results of experiments two and three, in particular, suggest that there is room for improvement. Indeed, a number of changes were instituted in the process of getting the experiments to produce meaningful data, and other changes will be implemented during the course of processing the 1998 SCF. Other changes, including the possibility of using empirical residuals, deserve further attention. However, I am not optimistic that there are many major improvements in our ability to impute the SCF data waiting to be discovered. There is a difference in what one can accept in imputing a relatively small fraction of the data and what is acceptable for the whole dataset. With fully simulated data, we are left with a difficult tradeoff between noise (however structured) and potential disclosure.

Disclosure limitation techniques have a Siamese twin in record linkage techniques. As one side progresses, the other side uses closely related ideas to follow. This conference has played an important part in highlighting this relationship and the need for coordination. Perhaps if we work hard together, there may be a chance that we will find a way to allow users to analyze disclosure-limited data using record linkage ideas to sharpen inferences. There may also be a payoff in more routine statistical matching, which is really just another form of imputation.

A large problem in planning all disclosure reviews is how to accommodate the needs (but not necessarily all the desires) of data users. I expect that users will express considerable resistance to the idea of completely simulated data. Some statisticians may be troubled about how to address questions of estimating sampling error with such data. Among economists, there are substantial pockets of opposition to all types of imputation, and some researchers have raised carefully framed questions that need to be addressed equally carefully. For example, if unobserved effects are a serious issue (and they often are in econometric modeling), then imputation must consider the distortions it may induce if such latent models are ignored; the question becomes much more pressing if all of the data are imputed. Given the choice between having no data or having data that are limited in some way, most analysts will likely opt for some information. However, to avoid developing disclosure strategies that yield data that do not inform interesting questions for users, it may be important to engage users in the process where possible.

Acknowledgments

The author wishes to thank Kevin Moore and Amy Stubbendick for a very high level of research assistance. The author is also grateful to Gerhard Fries, Barry Johnson, and R.Louise Woodburn for comments, and to Fritz Scheuren for encouragement in this project.

Footnotes

[1] As Fienberg (1997) argues, releasing any information discloses something about the respondent, even if the probability of identification is minuscule.

[2] See Fries, Johnson and Woodburn (1997a) for a summary of the disclosure strategies that have been developed for the survey.

[3] Ivan Fellegi emphasized a similar point in his address to this conferences.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

[4] For example, Rubin (1993) says “Under my proposal, no actual unit's confidential data would ever be released. Rather, all actual data would be used to create a multiply-imputed synthetic microdata set of artificial units…”

[5] However, Fienberg and Makov (1997) have proposed creating simulated data for the purpose of evaluating the degree of disclosure risk in a given dataset and Feinberg, Steele and Makov (1996) have examined the problem of simulating categorical data.

[6] Use of the ITF for the SCF is strictly controlled to protect the privacy of taxpayers. For the 1995 SCF, SOI provided NORC with the names and addresses of a sample selected from a copy of the ITF purged of name and address information at the Federal Reserve. NORC contacted respondents, but had no means of linking to the tax data. The SCF group alone at the Federal Reserve is allowed access to both survey data and tax data, but no names were available, and use of these tax data at the Federal Reserve is strictly limited to activities connected with sampling, weighting, and other such technical issues.

[7] See Fries, Johnson, and Woodburn (1997b) for details and information about the effects of the alterations on the data.

[8] The shadow variables are used as a formal device in documentation, and they inform the imputation software about which variables should be imputed. The shadow variables contain information about various types of editing that may have been performed to reach the final value, whether it was reported as one of a large number of types of range outcomes, whether it was missing for various reasons, or whether its outcome was affected by other processes.

[9] The collection of range data in the 1995 SCF is described in detail in Kennickell (1997).

[10] For an excellent example of a simultaneously determined system, see Schafer (1995). Geman and Geman (1984) discuss another type of structure involving data “cliques.”

[11] In general, continuous variables are assumed to follow a conditional lognormal distribution. For continuous variables, the program assumes by default that errors should be drawn within a bound of 1.96 standard errors above and below the conditional mean.

[12] For the 1995 data, the process required about ten days per iteration, which is down from about four weeks per iteration in 1989.

[13] There are 480 monetary variables in the SCF, but it is not possible for a given respondent to be asked all of the underlying questions.

[14] The sets of observations underlying the charts include only respondents who gave a complete response for the variable, or, in the case of financial assets, who gave complete responses for all the components of financial assets. For many sub-models of the SCF implementation of FRITZ, general constraints are imposed for all imputations to ensure values that are reasonable (e.g., amounts owed on mortgage balloon payments must be less than or equal to the current amount owed); in the actual data, these constraints are occasionally violated for reasons that are unusual, but possible. When reimputing these values subject to dollar range constraints in experiment one, a small number of imputations violated the bounds imposed. To avoid major restructuring of the implementation of the FRITZ model for the experiments, these instances are excluded from the comparisons reported here.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

In each of the figures, the set of observations is the same across all six of the panels. For the income plots, households reporting negative income have been excluded.

[15] For example, total income is the first variable imputed, and all reported values (or midpoints of ranges) for variables included in the model for that variable are used to condition the imputation.

[16] For disclosure reasons, the scatterplots supporting this claim cannot be released.

[17] In the cases examined, this result also holds if the data are separated by implicates rather than averaged across implicates.

[18] The five implicates were pooled for these regressions. Standard errors shown in the table are simple regression standard errors that take no account of imputation or sampling error; the degrees of freedom were altered in the standard error calculation to reflect the fact that there were five times as many implicates as observations.

[19] Fienberg, Steele and Makov (1996) also address this question.

References

Fienberg, Stephen E. ( 1997). Confidentiality and Disclosure Limitation Methodology: Challenges for National Statistics and Statistics Research, working paper Department of Statistics, Carneige Mellon University, Pittsburgh, PA.

Fienberg, Stephen E. and Makov, Udi E. ( 1997). Confidentiality, Uniqueness, and Disclosure Limitation for Categorical Data, working paper, Department of Statistics, Carneige Mellon University, Pittsburgh, PA.

Fienberg, Stephen E.; Steele, Russell J.; and Makov, Udi E. ( 1996). Statistical Notions of Data Disclosure Avoidance and their Relationship to Traditional Statistical Methodology: Data Swapping and Loglinear Models, Proceedings of the 1996 Annual Research Conference and Technology Interchange, Washington, DC: U.S. Bureau of the Census, 87–105.

Fries, Gerhard; Johnson, Barry W.; and Woodburn, R.Louise ( 1997a). Analyzing Disclosure Review Procedures for the Survey of Consumer Finances, paper for presentation at the 1997 Joint Statistical Meetings, Anaheim, CA.

Fries, Gerhard; Johnson, Barry W.; and Woodburn, R.Louise ( 1997b). Disclosure Review and Its Implications for the 1992 Survey of Finances Proceedings of the Section on Survey Research Methods, 1996 Joint Statistical Meetings, Chicago, IL.

Geman, Stuart and Geman, Donald ( 1984). Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 6 (November), 721–741.

Kennickell, Arthur B. ( 1991). Imputation of the 1989 Survey of Consumer Finances, Proceedings of the Section on Survey Research Methods, 1990 Joint Statistical Meetings, Atlanta, GA.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Kennickell, Arthur B. and Woodburn, R.Louise ( 1997). Consistent Weight Design for the 1989, 1992 and 1995 SCFs, and the Distribution of Wealth, working paper, Board of Governors of the Federal Reserve System, Washington, DC.

Kennickell, Arthur B. ( 1997). Using Range Techniques with CAPI in the 1995 Survey of Consumer Finances Proceedings of the Section on Survey Research Methods, 1996 Joint Statistical Meetings, Chicago, IL.

Little, Roderick J.A. ( 1983). The Nonignorable Case, Incomplete Data in Sample Surveys, New York: Academic Press.

Rubin, Donald B. ( 1993). Discussion of Statistical Disclosure Limitation, Journal of Official Statistics, 9, 2, 461–468.

Schafer, Joseph ( 1995). Analysis of Incomplete Multivariate Data, Chapman and Hall.

Tourangeau, Roger; Johnson, Robert A.; Qian, Jiahe; Shin, Hee-Choon; and Frankel, Martin R. ( 1993). Selection of NORC's 1990 National Sample, working paper, National Opinion Research Center at the University of Chicago, Chicago, IL.

The views presented in this paper are those of the author alone and do not necessarily reflect those of the Board of Governors or the Federal Reserve System. Any errors are the responsibility of the author alone.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Sharing Statistical Information for Statistical Purposes

Katherine K.Wallman and Jerry L.Coffey

Office of Management and Budget

Abstract

Congress has recognized that a confidential relationship between statistical agencies and their respondents is essential for effective conduct of statistical programs. However, the specific statutory formulas devised to implement this principle in different agencies have created difficult barriers to effective working relationships among these agencies. The development of mechanisms to establish a uniform confidentiality policy that substantially eliminates the risks associated with sharing confidential data will permit significant improvements in data used for both public and private decisions without compromising public confidence in the security of information respondents provide to the Federal government.

Initiatives of the Statistical Policy Office to enhance public confidence in the stewardship of sensitive data and to permit limited sharing of confidential data far exclusively statistical purposes received a substantial impetus in the 1995 reauthorization of the Paperwork Reduction Act. The Act strongly endorses the principles embodied in statistical confidentiality pledges and charges OMB to promote sharing of data for statistical purposes within a strong confidentiality framework.

This paper discusses the history, the promise, and the current status of initiatives to strengthen and improve data protection while promoting expanded data sharing for statistical purposes. The most recent efforts include the OMB Federal Statistical Confidentiality Order, the Statistical Confidentiality Act (SCA), and companion legislation the SCA, that would make complementary changes to the Internal Revenue Code.

Introduction

A promising initiative to improve the quality and efficiency of Federal statistical programs is a legislative proposal that would allow the sharing of confidential data among statistical agencies under strict safeguards. The development of this approach has been a painstaking, careful process that has been supported and nurtured by Administrations of both parties over many years.

The Administration's Statistical Confidentiality Act and two companion initiatives—the OMB Federal Statistical Confidentiality Order and an amendment to the Internal Revenue Code—address two issues that are vital to ensuring the integrity and efficiency of Federal statistical programs and, ultimately, the quality of

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Federal statistics. These are

  • the unevenness of current statutory protections for the confidential treatment of information provided to statistical agencies for exclusively statistical purposes; and

  • the barriers to effective working relationships among the statistical agencies that stem from slightly different statutory formulas devised to implement the principle of confidentiality for statistical data in different agencies.

The proposed legislation would establish policies and procedures to guarantee the consistent and uniform application of the confidentiality privilege and authorize the limited sharing of information among designated statistical agencies for exclusively statistical purposes.

Initiatives Span More Than Two Decades

Efforts to address confidentiality concerns with regard to Federal statistical data have a history that extends for more than 25 years. Such efforts have been endorsed on both sides of the aisle in the Congress. The roots of the policies in the Administration's current Statistical Confidentiality Act reflect the work of three Commissions that examined statistical and information issues during the Administrations of Presidents Nixon and Ford. In 1971, the President's Commission on Federal Statistics recommended that the term confidential should always mean that disclosure of data in a manner that would allow public identification of the respondent or would in any way be harmful to him should be prohibited; this commission also recommended that consideration should be given to providing for interagency transfers of data where confidentiality could be protected.

In July 1977, the Privacy Protection Study Commission stated that “no record or information…collected or maintained for a research or statistical purposes under Federal authority…may be used in individually identifiable form to make any decision or take any action directly affecting the individual to whom the record pertains…” Later, in October of that year, the President's Commission on Federal Paperwork endorsed the confidentiality and functional separation concepts, but applied them directly and simply to statistical programs, saying that:

  • Information collected or maintained for statistical purposes must never be used for administrative or regulatory purposes or disclosed in identifiable form, except to another statistical agency with assurances that it will be used solely for statistical purposes; and

  • Information collected for administrative and regulatory purposes must be made available for statistical use, with appropriate confidentiality and security safeguards, when assurances are given that the information will be used solely for statistical purposes.

The policy discussions generated by the three Commissions came together during the Carter Administration in a bipartisan outpouring of support for the Paperwork Reduction Act (PRA), which largely addressed the efficiency recommendations of the Paperwork Commission. The legislative history of that Act recognized the unfinished work of fitting the functional separation of statistical information into the overall scheme.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

The first attempt to deal with the issues of confidentiality and sharing of statistical data was made by the Carter Administration 's Statistical Reorganization Project (popularly known as the “Bonnen Commission”). This effort paralleled the legislative development work by OMB that became the Paperwork Reduction Act. The initiative identified a group of statistical agencies that could serve as protected environments —or enclaves—for confidential data and attempted to create a harmonized confidentiality policy by synthesizing the several prescriptions in existing laws. The initiative was left behind by the fast-track PRA for two reasons:

  • First, each new prescription to solve problems in one agency raised new questions in other agencies, so that objections to the language increased as the draft legislation became longer and more complex.

  • Second, the approach failed to appreciate that some large databases —e.g., Census and tax files—represented more significant risks and, thus, needed more elaborate confidentiality protection than other files.

During the first Reagan Administration, this prescriptive formula became more and more complex, as attempts were made to incorporate comments from both statistical and nonstatistical agencies. The draft proposal eventually was withdrawn when it became apparent that almost no one could understand how all of the myriad definitions and exceptions fit together.

While the proposed approach did not succeed, the effort did draw attention to many subtle weaknesses in existing law and led to new statutes and amendments during the second Reagan Administration. In particular, stronger statutory protections were enacted for the National Center for Health Statistics, the National Agricultural Statistics Service, and the National Center for Education Statistics. At the same time, the concept of a government-wide law for statistical confidentiality and data sharing received a complete overhaul.

A new strategy was presented to the statistical agencies during the Bush Administration. It had five important features that were missing from earlier efforts:

  • It was designed to work with the tools already available in the PRA—promoting data sharing, but providing for functional separation to ensure that the statistical data are only shared for statistical purposes.

  • It was designed to be robust with respect to reorganizations within the statistical system. Since every major statistical agency had been involved in one or more reorganizations since 1970, it became apparent that any successful strategy would have to work well in any reasonable organizational environment.

  • It was built around a procedural strategy that gives due deference to the precepts of existing law that are tailored to specific risks and builds on agency experience in implementing that body of law. The idea was to adopt a general confidentiality policy consistent with existing law and provide the tools — data sharing agreements, coordinated rules, and consistent Freedom of Information Act (FOIA) exemptions—to address those risks.

  • It provided a means for the major statistical agencies to work closely with other agencies in their areas of expertise. While only the Statistical Data Centers would have broad access to data, any agency that collects its own statistical data can act as a full partner in improving those data under the terms of a data sharing agreement.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
  • It strengthened the Trade Secrets Act. This universal confidentiality statute consolidated provisions of tax law, customs law, and statistical law, but the statistical implications had been ignored. The new proposal set uniform policies for confidential statistical data, increasing penalties and addressing questions of agents.

This fresh start—based on a precedent-setting data sharing order involving the Internal Revenue Service, the Census Bureau, and the Bureau of Labor—had strong support within the Administration. But the effort failed to reach closure.

The basic strategy developed during the Bush Administration was later expanded and refined during the first term of the Clinton Administration. Criteria for the Statistical Data Centers (SDCs) were incorporated into the Statistical Confidentiality Act, and every statistical agency that could meet these tests was added to the list of SDCs—bringing the total from four agencies to eight. The relationship to the PRA was fine-tuned, as well, and this process identified some improvements to the PRA that were adopted in the 1995 amendments to that Act.

The final step in the recent initiative involved negotiating a complementary amendment to the Statistical Use section of the tax code [26 USC 6103(j)]. This change actually facilitates increased security for taxpayer information, by targeting and, thus, limiting the wholesale disclosures permitted under current law. It permits multi-party sharing agreements, so that specific statistical data sets that include tax data can be shared under IRS security procedures with other SDCs.

What Factors Argue for Success Now?

After more than two decades, why should we think that these efforts will be any more successful than those of the past? Perhaps it comes down to what can be called the “Three E's:”

  • Experience. —Over the past 25 years we have learned a considerable amount. The current proposal builds on the experience OMB and the agencies gained through earlier efforts.

  • Environment. —The Federal statistical system is faced with growing fiscal resource constraints. At the same time, the 1995 Paperwork Reduction Act extends requirements for reducing burdens imposed on respondents to Federal surveys. Yet another factor that has affected agency views is the increasing number of proposals for consolidating statistical agencies.

  • Enthusiasm. —Last but not least, the statistical agencies appear to be in a “can do” mood—enthusiastically supporting the development and passage of legislation that will even out statutory confidentiality protections and permit data sharing for statistical purposes.

Whatever the reasons, the agencies have come together on the Administration proposal now embodied in Statistical Confidentiality Act and its companion pieces.

The Statistical Confidentiality Act

As the centerpiece of this effort, the Statistical Confidentiality Act has two principal functions:

  • To ensure consistent and uniform application of the confidentiality privilege; and

  • To permit limited sharing of data among designated agencies for exclusively statistical purposes.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

A limited number of Federal statistical agencies would be designated as Statistical Data Centers. The eight agencies that currently meet the criteria to become SDCs are the Bureau of Economic Analysis (BEA), Bureau of the Census, Bureau of Labor Statistics (BLS), National Agricultural Statistics Service (NASS), National Center for Education Statistics (NCES), National Center for Health Statistics (NCHS), the Energy End-Use and Integrated Statistics Division of the Energy Information Administration (EIA), and the Science Resources Studies Division of the National Science Foundation (NSF).

A key component of the legislation is functional separation, whereby data or information acquired by an agency for purely statistical purposes can be used only for statistical purposes and cannot be shared in identifiable form for any other purpose without the informed consent of the respondent. If a designated SDC is authorized by statute to collect data or information for any nonstatistical purposes, such data or information must be distinguished by rule from those data collected for strictly statistical reasons.

The procedural strategy for implementing the legislation would be carried out via written data sharing agreements between or among statistical agencies. The Statistical Data Centers would provide information on actual disclosures and information security to OMB for inclusion in the annual report to Congress on statistical programs. OMB would also review and approve any implementing rules to ensure consistency with the purposes of the SCA and the PRA.

Companion Legislation

In addition to the Statistical Confidentiality Act, special amendments have been proposed to the Statistical Use subsection of the Internal Revenue Code—Section 6103 (j). These amendments would authorize limited disclosure of tax data to agencies which have been designated as Statistical Disclosure Centers. In addition, the Research and Statistics Division at the Federal Reserve Board has been added to the group of agencies covered under the IRS companion Bill.

The amendment would provide access to tax return information to construct sampling frames and for related statistical purposes as authorized by law. Names, addresses, taxpayer identification numbers, and classifications of other return information in categorical form could be provided for statistical uses. These latter data are not to be used as direct substitutes for statistical program content, but rather can be applied using statistical methods such as imputation to improve the quality of the data. Class sizes or ranges for such data—e.g., for income —will vary by purpose.

The amendment is designed to protect taxpayer rights and maintain proper oversight and control over tax return disclosures, while allowing carefully targeted expansion of access to tax return information for statistical purposes only.

The Statistical Confidentiality Order

As an integral step to foster passage of these legislative proposals, OMB felt it was critical to move ahead with efforts to clarify and make consistent government policy protecting the privacy and confidentiality interests of individuals and organizations that provide data for Federal statistical programs. With that aim in mind, OMB developed and sought public comment on an Order that assures respondents who supply statistical information that their responses will be held in confidence and will not be used against them in any government

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

action. The Order also gives additional weight and stature to policies that statistical agencies have pursued for decades and includes procedures to resolve a number of ambiguities in existing law. Following the public review process, the Federal Statistical Confidentiality Order went into effect on June 27, 1997.

What Opportunities Will Attend Passage of the Legislation?

For more than a decade, we have worked within the constraints of existing law to make limited comparisons A between similar data sets in different agencies. We have set in motion a series of limited exchanges tailored to conform to current law, but they cannot address all of the problems. Moreover, such exchanges could be cut short by an unfavorable interpretation of any one of the dozens of statutes involved. In each of these cases, extraordinary efforts have been required to accomplish even limited data exchanges. Based on these experiences, we believe that even modest exchanges of information could, in the future, unearth and eliminate important errors in existing economic series, enable significant consolidations of overlapping programs (with comparable reductions in costs), and permit substantial reductions in reporting burden imposed on the public.

As the possibility of a law to permit data sharing in a safe environment has become more credible, statistical agencies have begun to identify potential improvements to current operations and programs that this law would permit. These include possibilities such as the following:

  • Integrated database concepts for information on particular segments of the economy and society, such as educational institutions (NCES, NSF, and Census), health care providers (NCHS, Census, and some program-specific agencies), and agricultural establishments (NASS, Census, and the Economic Research Service at the Department of Agriculture), would improve the consistency and quality of data while reducing current data collection costs.

  • Collaboration on sampling frames would improve accuracy and reduce maintenance costs. A more efficient division of labor would make it possible to maintain high quality frames at minimum cost, both for list frames (Census, BLS, NASS) and for area frames (NASS, Census, NCHS). This approach would avoid duplicate expenditures and improve quality. Coordination and shared use of relisting information (updates) in large multi-stage designs could also reduce frame maintenance costs.

  • Targeted frames—or sample selection services—from improved master frames could reduce duplicative expenditures in agencies that must currently pay the cost of independently developing these resources for specific surveys.

  • Access to specific data details that can resolve uncertainties in particular analyses—e.g., anomalies that arise in the Gross Domestic Product estimation process—would reduce errors in macroeconomic statistics without imposing additional burden.

  • Coordination of sample selection across agencies could reduce the total reporting burden that falls on any one household or company (and, thus, improve the level of respondent cooperation).

What Systemic Problems Will the Act Address?
  • The Statistical Confidentiality Act creates a credible government-wide confidentiality umbrella.— The public will know that the entire government stands behind the pledges of statistical confidentiality offered by the SDCs or any agency engaged in joint statistical projects with the SDCs.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
  • The SCA creates the legal presumption that data collected for most purposes may be used in a safe environment for statistical purposes. —This is one of the critical insights of the Privacy and Paperwork Commissions.

  • The SCA provides consistent FOIA policies for all the SDCs. —This was controversial 15 years ago, but now six of the eight agencies designated as SDCs already have in place statutes that meet the requirements of Section (b)(3) of FOIA.

  • The SCA permits the data sharing authorities of the PRA to work without compromising confidentiality. —By establishing the functional separation principle in law, the SCA facilitates the use of PRA mechanisms to promote and manage data sharing for exclusively statistical uses.

  • The SCA provides a privacy-sensitive alternative to the creation of universal databases, which each Department has proposed at one time or another to support its own policy interests. —Statistical methods—particularly sampling—coupled with secure data sharing provide a natural hedge against the big database (i.e., dossier building) mentality that puts privacy at risk.

In short, the Statistical Confidentiality Act permits the SDCs and their statistical partners to share both expertise and data resources to improve the quality and reduce the burden of statistical programs, while preserving privacy. Moreover, no matter how the organizational boxes for the ideal Federal statistical system are drawn, this legislatin will permit the components of the statistical system to manage their data as if they were a single, functionally-integrated organization.

Current Status of the SCA and Related Initiatives

Culminating efforts that literally have spanned decades, the Statistical Confidentiality Act initially was introduced on a bipartisan basis in the House of Representatives in 1996. Late in 1997, the Administration 's proposed legislation was included in a broader bill, S. 1404, introduced on a bipartisan basis in the Senate. With growing bipartisan support in both houses, hopes are high that the SCA will soon become law. The complementary amendment to the Internal Revenue Code is also pending before Congress, with broad bipartisan support. OMB is working with the House and Senate to attain re-introduction and successful action on the legislation during 1998.

In addition to these legislative approaches to foster efficiency and quality in Federal statistical programs, the agencies are actively exploring other means of expanding collaboration to improve the effectiveness of the Federal statistical system. Recently the Interagency Council on Statistical Policy (ICSP), under the leadership of the Office of Management and Budget, has broadened efforts of the principal Federal statistical agencies to coordinate statistical work—particularly in areas where activities and issues overlap and/or cut across agencies. One by-product of these efforts was the establishment in 1997 of the Interagency Confidentiality and Disclosure Avoidance Group, under the auspices of OMB's Federal Committee on Statistical Methodology. This working group discusses common technical issues involving privacy, confidentiality, and disclosure limitation. The group is currently working on developing a set of generic guidelines for disclosure review, which could be adapted for use by other agencies.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

It is our hope and expectation that both the statistical confidentiality legislation and the subsequent cooperative efforts will go a long way towards solving some of the challenges the Federal statistical agencies have encountered in a decentralized environment.

Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
This page in the original is blank.
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 235
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 236
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 237
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 238
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 239
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 240
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 241
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 242
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 243
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 244
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 245
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 246
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 247
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 248
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 249
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 250
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 251
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 252
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 253
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 254
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 255
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 256
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 257
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 258
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 259
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 260
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 261
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 262
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 263
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 264
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 265
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 266
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 267
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 268
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 269
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 270
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 271
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 272
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 273
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 274
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 275
Suggested Citation:"Chapter 8 Invited Session on Confidentiality." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 276
Next: Chapter 9 Contributed Session on Methods and Plans for Record Linkage »
Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition Get This Book
×
MyNAP members save 10% online.
Login or Register to save!
  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!