Read "Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options" at NAP.edu

Page 153 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

8 Not-for-Profit-Sector Data

MS. KELLY: I am Maureen Kelly, of BIOSIS, and I will moderate this session on not-for-profit-sector data. Our challenge this afternoon is to pull together a different view of the information we heard this morning during the data panel presentations. The presentations this morning focused on discipline-specific data, and we also heard talks by people who were thinking from the same context set. We have an opportunity this afternoon, by organizing on not-for-profit-sector data, to bring together some of the contrasts that may exist with the different kinds of data across disciplines.

What we will do is give each of this morning's not-for-profit-sector panelists an opportunity to make a few comments on each of the questions posed by the study committee (see Box 7.1). Following the comments by all, I will take clarifying questions from others here, including our rapporteur, since he has to make some sense of all of this. Given the tight schedule, I would prefer that we hold any discussions until we finish all of the questions.

Indeed all of the questions talk to the same theme. They address the issues of the effects of the status quo—what is good about it, and what the problems are that we see. These questions are very important because, as we contemplate some change to the status quo, we want to understand what it is that we would like to preserve, what we find that is very functional in the current environment, and the problems expressed now that may be amenable to some solution in the new regime.

Jerome Reichman, of Vanderbilt University, is serving as rapporteur. David Fulker, who runs the Unidata program at the University Corporation for Atmospheric Research, is going to be speaking about meteorological data. Jim Lohr, of Chemical Abstracts Service, and Chris Overton, with the Center for Bioinformatics at the University of Pennsylvania, will focus on chemical and genomic data, respectively. All three panelists presented their own views this morning. So we will begin with the first question, which concerns identifying the principal benefits and opportunities that are available to the different database producers in terms of the current regime. What is valuable about what is going on now? We are asked to rank these benefits as well as simply itemize them.

MR. FULKER: I would characterize the principal benefit of the current regime as the extent to which it recognizes the balance between providers and users. I find myself a little hesitant about wanting to see that balance tampered with because what we have learned from the European protection effort seems to be distinctly an area of problems for us. I recognize that the result of the European policies protecting databases may have very little to do with what we are able to do here in the United States; they may be related to pressures in the European form of governance. But I think that the balance that has been struck over the course of time has worked well in our regime.

I would add to that a point that I don't think I am able to articulate very well, which is the notion that copyright, as it stands now, does not really allow ownership of facts per se. There is something appealing about that to me. From the standpoint of trying to support scientists who are studying the state of the Earth, it is appealing that knowledge about the state of the Earth, in other words factual information about the Earth, is not owned by specific individuals.

Page 154 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

DR. LOHR: I, too, think that at least in our situation the current regime has existed ever since the Chemical Abstracts Service (CAS) has been in business, and it has managed to survive through all that, and it is a two-edged sword in our case.

CAS gathers most of its information from the public literature and is able to do that, and permitted to do that, by the rules and regulations under which we live because of certain fair-use provisions and other social-benefit clauses that exist in the law. So, were it not for those things, it is not altogether a certainty that we, and operations like us, could even operate; we might not be able to.

Increasingly, however, the current environment is yielding to a much more transactional basis for dealing with things. Publishers and other sources of information are finding ways to get around some of these provisions; and we find ourselves increasingly getting into contractual arrangements with people and paying people to get information that historically we gathered up for free.

On the other side, however, we do enjoy the protection of the copyright laws and all that this protection entails, especially in terms of dealing abroad. Right now over 60 percent of all the revenue that CAS takes in comes from outside the United States. So the international laws and policies are important to us as well as the types of protections that other countries are willing to afford us by virtue of various treaties and policies that exist between the United States and other countries.

DR. OVERTON: I think it is important that there are no barriers to the free distribution and dissemination of and access to information vital to biology. One of the ways I look at the situation is that we should have a Hippocratic oath on databases: Do no harm when thinking up new laws or regulations that could be put in place regarding databases. This isn't a reflection on the current situation, which I find more or less satisfactory, but it is a concern about what we may see in the future. I will give you an example of what copyright laws have done to education.

When I was an undergrad and a grad student, our professors would hand out these big binders that had reprints and chapters copied from many different books and journals, which made it very cost effective for us to get a diversity of views from a variety of different sources; but we cannot do that anymore. When I teach a course now we are very restricted and very cautious about the sources that we can reproduce for the students. I think that these barriers have been to the detriment of education. Now, changes in the existing copyright law can put up barriers, even more serious barriers, to the future of research.

MS. KELLY: Are there any questions of clarity from the participants here?

MR. RINDFLEISCH: Tom Rindfleisch, Lane Medical Library, Stanford University. I would just like to underscore this business of the cost of course readers. Some of these materials at our institution now cost $70 or $80, which creates an imposition for students getting access to the information as part of their education.

MS. KELLY: So this is a drift in the implementation of current law as you see it?

PARTICIPANT: This is not a clarifying question, but, for James Lohr, I think someone in the audience said that the Chemical Abstracts registry, in his opinion, was not copyrightable. Did you understand why he said that? Do you think he was referring to the chemicals themselves or to all the associated data?

DR. LOHR: In fact, I was going to talk to that in the second question, but I will talk to it now. One of the big problems about the current regime from CAS's point of view is that it is currently fraught with all sorts of uncertainty, regarding what intellectual property rights accrue

Page 155 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

to what works in what circumstances. The uncertainty today is a result of, I think simplistically, the Feist decision, number 1, and number 2, just this whole digital regime that we are now living in and all the technological possibilities that it affords.

One of the things that is uncertain on the legal side, and again, it goes back to Feist and other decisions, is that you depend upon a copyright for certain kinds of protection. Well, have you got copyright or haven't you? Copyright seems to be redefined rather frequently these days.

Someone says, “Well, you may think you have got a copyright, but I am here to tell you that you don't.” I don't know whether he knows what he is talking about, but the Register of Copyrights, fortunately, seems to think we have one. I believe what the audience member was talking about is that the CAS registry is successful and essential to the chemical industry and to the whole movement and control of chemicals throughout the world because it is comprehensive and every attempt is made to make it as comprehensive as it possibly can be. I think you heard Peter Weiss say that as you approach total comprehensiveness, you approach a point where the information is noncopyrightable. Then you have to fall back on other things. How is the database organized? What details are added—various artifacts about its construction and so forth and so on. This registry fortunately isn't just a phone book. It is an extraordinarily complex database that takes a lot of input in order to create it, but I believe that the comprehensive aspect is what he was talking about.

MR. REICHMAN: I think we need to clarify a little bit why it is that you have this comfort level with copyright laws. Then you can see what is not there or what differences exist in the other regime. There are a number of issues to discuss.

One is that, in copyright law, you have this idea-expression dichotomy —facts are assimilated to ideas for specific purposes, and these facts are not protected. The expression of facts is protected. So in any factual work presented right in front of you, all that is protected is the stylistic expression of that matter. Because of the independent creation rule, you can (a) take the findings and reuse them immediately, expressing them in your own words, and (b) have the convenience that they are already there in front of you so that you can build on them immediately.

Now, if you switch that over to the database regime, there is no idea-expression distinction, and that which was not protected in copyright law becomes the very object of protection. You cannot just take the facts in front of you and reexpress them. You have to reinvent or rediscover or recompile them. You have to refine them. Now, that is where they had a kind of disconnect in Europe. It is easy to reexpress any copyrightable work, including your own, because you have it in front of you. You can easily work with existing data because you have them in front of you. But it is not easy to recreate data, and that is the difference. I need to recreate the data underlying the work, which is what the database law requires you to do, unless you are willing to pay for the privilege of making additional uses of data that have already been disclosed to the public.

Another thing that you are comfortable with in copyright law or that you are uncertain about is the scope of protection, and Marybeth Peters pointed that out. I agree with her. The real problem does not arise from the creativity standard of Feist; most complications will manifest sufficient creativity. The problem is that you don't get very much protection once an original database enters copyright law. Feist would, at the most, give you protection against wholesale duplication of your copyrightable database, but there are very real questions about what would happen if people took systematic extracts of disparate data from your database and built on it, if that would be protected.

Page 156 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

DR. LOHR: That is why CAS sells users the right to take systematic extractions and use them and build on them for their own purposes. The real risk in that is that there is some wholesale expropriation. Someone finds a way into the database some night and pushes a button that gets all the information, and then this is made available by some well-meaning person saying, “Hey, guess what, this is all on the Internet. Come on in and grab it.”

MR. REICHMAN: There is reason to think that if a third party took the whole database, you would win a copyright infringement action if your database met the creativity standard. But I think Justin Hughes was saying, “Look out, you may have less protection than you think,” if somebody comes and takes a lot of the data, disparately, not the whole database, and then says afterward, “I took the unprotectable components.” We don't know what that means. We haven't had enough litigation to know what it means, but the Feist decision stands for thin protection of factual matter, which tends not to protect derivative productions.

DR. LOHR: Frankly CAS doesn't know what this means either. I don't think we ever would rely solely upon regulations to protect us from that problem. The best protection that we have against truly damaging mass copying is that there is so much information available in the database that someone would require a large amount of time with unrestricted access if that were the approach taken.

DR. OVERTON: But I make a living out of doing exactly what you just said. I go in and take subsets out of other databases, combine them with a subset from another database, and come up with something that satisfies my needs and the needs of others, members of my community. So I am exactly that someone who would take advantage of the current situation. Is that what you are saying you want to prevent?

MR. REICHMAN: No; what those of us who have been negotiating on behalf of science are saying is that this is the single most important thing we do, and we don't want that changed. What I am saying, for the purposes of this session, is that you are comfortable with copyright laws because you can do precisely that. You can take all these bits and pieces, the unprotected components, put them in another database, and there are no repercussions whereas what we heard in the negotiations was exactly the opposite.

Even if you pay for access to my database, if in constructing another scientific database you take a chunk from ours and combine it with chunks of other databases to make a new database, that would be violating our exclusive redistribution rights, according to the proponents of database protection during the Hatch negotiations.

MS. ADLER: Prue Adler, Association of Research Libraries. During negotiations, the database legislation proponents talked about the fact that taking one piece of information didn't turn the liability, but taking two pieces of information from a database could.

MR. REICHMAN: That was the threshold, with two.

MS. ADLER: Now, if the database is four pieces, one might understand that; but I don't think that is common to what this universe is talking about.

MR. RINDFLEISCH: What is the time distance between taking one piece and then another?

MS. ADLER: Actually I don't think we asked that question. We didn't want to know the answer.

MR. PERLMAN: Harvey Perlman, University of Nebraska. Dr. Lohr, I wonder if in the face of this growing uncertainty about the legal world whether your company has reduced its investment in developing the database?

Page 157 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

DR. LOHR: That is an interesting question. We have reduced our investment in the database, but not for the reasons you may think. During the end of the 1980s and into the early 1990s the costs for producing the database were growing so expensive on a progressive basis that we were essentially becoming noncompetitive by pricing ourselves out of the market. We have a large and expensive program, which was aimed at driving down the cost of database building per unit of database because you just have to build more and more every year. The chemists of the world won't stop working, and so the answer to your question is yes, we are investing less. At least we are paying less in operating costs. We are investing maybe more in the infrastructure but we are still building more and more database. But that is probably not the answer you were looking for.

MR. PERLMAN: Actually I was just looking for an answer, but I take it that what you are telling me is that the uncertainty of the legal rule is not affecting incentives for you to develop a database.

DR. LOHR: You are basically right, but again you have to understand that we are not an investor-owned company. CAS is part of the American Chemical Society, which provides a mission to accomplish certain objectives somewhat independently of their purely economic merit.

MR. PERLMAN: I sensed from your earlier presentation that revenues that at least match expenses are a very important part of what you are thinking about.

DR. LOHR: Oh, yes, revenues have got to more than match expenses.

DR. OVERTON: I have a question related to that issue. Maybe someone here can help me understand what “sweat of the brow” means. A lot of the research that we do is actually to automate the construction of databases so that we could just press a button and, like the database I talked about this morning, the whole thing would just be generated by extracting bits and pieces out of all of these other existing databases. That is a research effort on our part. There is a lot of effort going into the software development, but down the road we will be able to do this with anything. This morning I talked about a database focused on red blood cell development. We would be able to go in and do the same process, through the press of the button, for brain or liver or heart or anything else. So what does sweat of the brow mean?

MR. RINDFLEISCH: I think this is a crucial point in terms of the next five years because the computer science term for this is “interoperability. ” It is a core piece of technology that is being developed, for example, for the digital library initiative. The whole idea is to make it relatively easy, machine easy, to assemble these things and to make different data sources interoperate so that they can be assembled. So, is computer cycle sweat of the brow or sweat of the silicon?

MR. REICHMAN: No, that would be a case for the other side. The other side would say that the more you perfect these machine-assembled databases, the more noncopyrightable databases you will have, not just protection problems, but eligibility problems as well.

DR. LOHR: And that is the last uncertainty I wanted to mention. If you are in this business, economic necessity forces you to drive your database production operation more and more toward algorithmic generation of those databases. It depends on what kind of a database you are building, how you do that, and which way you do it, but the more successful you are at doing that, the less likely you are to be able to argue persuasively that you continue to enjoy copyright protection.

MR. RINDFLEISCH: What if these algorithms are very smart?

Page 158 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

DR. LOHR: They have to be, but that doesn't appear to make much difference under the law.

MR. REICHMAN: You might be able to get patent protection. There is more and more hope of getting patent protection, which could have horrible consequences.

MR. RINDFLEISCH: Perhaps that state of the art is what differentiates you in the market, that you have the fee for software that allows you to assemble these things into newly usable products and that other people don't have that advantage.

DR. LOHR: But it doesn't protect you against grand larceny.

DR. SAXON: Neither does copyright.

DR. OVERTON: But, again, your database in particular is a moving target, and suppose someone did break in one night, steal however many terabytes you have, and then the next day it is a different database.

MS. KELLY: It is a cumulative database, not necessarily a revised database.

DR. OVERTON: That is true, but if you look at the value of the scientific literature, so what? Five years from now you have twice as much as you have now. So you lost half of the database.

MR. RINDFLEISCH: I think that is an old-fashioned view of databases in the sense that some of the new databases are actually intricately interlinked, and those links do change quite frequently, and keeping those up to date is crucial for the value.

MS. KELLY: And costly. Let me interrupt and go back to the questions. We already have started to work on question 2, What are some of the problems and challenges of the status quo? Jim Lohr has made a few remarks. Let's take up what he did and get some additional points on this second question, and then we will resume with the other question.

MR. FULKER: I think one point of greatest importance is the discrepancies between the European and American views on database protection because this seems to be reflected in a good deal of disagreement about the exchange of meteorological data and, in particular, results in uncertainty as to what Unidata recipients can do with the data.

Unidata has taken a conservative approach to dealing with European data. What this means is that the European data are not made available on the Web, but certainly some members of my university community would assert that this has reduced the educational and scientific value of the data in spite of the fact that everyone agrees on both sides of the Atlantic that educational use is permitted. It is a question about whether placing the data on the Web is in itself a publication to the larger community. So that is the first point.

The second point is one that I am not actually sure relates to the current policy regime because it may in fact be a violation of it. The National Weather Service tried to economize on data networking among the radars in this country, and so they granted to four companies exclusive access to the outputs of those radars with the assumption being, I believe, that these four companies would compete with one another, which would keep the costs low and would be a reasonably good way to make these data publicly available. I won't go into the details about this situation, but the net result has been that these data are actually much more expensive than any of the other data that we use.

MS. KELLY: You mentioned this problem earlier in your presentation. How do you see this relating to current policy?

MR. FULKER: I am not sure how this relates because I am not actually sure that the Weather Service policy was in accord with overall government policy when they took that step. So I am hesitant to actually bring it up as a problem with current policy.

Page 159 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

MS. KELLY: Dr. Overton, would you make some remarks on problems that you have encountered?

DR. OVERTON: As I said, my interest is in lowering the barriers, so that there is less protection rather than more protection. However, I think that we have seen a disturbing trend in which various groups feel that they don't have enough database protection, so they come up with a license agreement for each of the resources that we need to access instead of relying on any uniform set of rules. So, as I said in my talk, in that one database we are building, we now have to go through three different licensing agreements to do anything with that database other than use it internally.

If we want to use the database internally we can basically do whatever we want. As soon as we put it on the Web or try to distribute the database in any way, then we run into these license agreements that we have to deal with one-on-one, this it is a real burden.

MR. REICHMAN: I wanted to clarify that because this is the major problem that you brought up, more transactional difficulties about stuff we can pay for. Even if you didn't have the drive for database protection, the proponents of a database protection bill also operate in the contract sector, and they would like to see a uniform state contracts law that would validate all of these licenses, which would impose standard-form terms to access (mostly through online access) sources of data. This was formerly known as the proposed Article 2B of the Uniform Commercial Code; it is now known as the proposed Uniform Computer Information Transactions Act, and it would validate “click-on” and “shrink-wrap” licenses without mutual assent in the classical sense of the term.

Harvey Perlman is an expert on this subject, but isn't actually involved in the negotiations. They are at a very advanced state. So, one of the things that we can formalize is that your concerns about giving way, in a lawless world, to a regime of contractually imposed conditions and terms that cause you problems are not an isolated perception. This is what is really happening. All that is restraining the pressure is that the validity of these contracts remains uncertain and differs from state to state, and their impact on federal law is uncertain from jurisdiction to jurisdiction because courts in one jurisdiction may say that some or many such contracts conflict with federal law, and others will say that they do not. It is clear that any coherent findings that you make have to address both sides of the problem—intellectual property rights and contractual rights —because even if there aren't any new intellectual property rights, what is going to happen in contracts will result in what I have elsewhere termed “privately legislated intellectual property rights.”¹ Another scenario is if there is a new intellectual property right, what is going to happen to the combination of contract and intellectual property rights? Yet another issue is David Fulker's point that the National Weather Service limited radars to the four companies; this is a question that surfaces both here and in Europe about data providers. It is the possibility that in the data industry there is an unusually high degree of concentration, which will drive up the price of acquiring data.

So then the question becomes, If you have a property right and/or a contract right, what do you do with this concentration of power that wouldn't be possible if you had competition? Another way to turn that question around is, Would everybody be better off if there were a regime that produced more competition in which case users might have to pay less to begin with and then there might be some special deals for certain privileged users? Or are you likely, either

¹	See J.H. Reichman and Jonathan Franklin (1999), “Privately Legislated Intellectual Property Rights: Reconciling Freedom of Contract with Public Good Uses of Information,” University of Pennsylvania Law Review, 147:875-970.

Page 160 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

in a contract regime or contract plus intellectual property regime, to be faced with sole-source providers and monopolistic pricing? This is a serious concern of those who have studied the issues, and I don't know to what extent it underlies your point and other points.

MS. KELLY: May I add that in the case of more competition, if, as you said, it is necessary to recreate the data, then more competition may not drive the costs down.

One of the things we have been asked to do, and I should have done it after each question, is to give some sense of priority regarding the most important issues, and that seems to be distilling from this round of comments on the second point. Would someone care to take a stab at what you consider to be most important to your operations under the current regime?

MR. FULKER: I tried to give them in order actually, and, for me, a sense of balance is most important.

DR. LOHR: I think it is using fair use as the code word for all that this implies about the way our society treats facts and access to certain kinds of information, followed by protection aspects of the copyright law.

DR. OVERTON: Again, I suppose that relatively unfettered access to data is the way things stand now, and maintaining that is my priority.

MS. KELLY: I expect we may be skewed to the extent that the representatives here are largely dependent on facts created by others for the databases they produce. We work with data that are created by others. We are not involved in actually creating the first round of the data. So we may have a bias for making sure we can get the data that we want.

DR. OVERTON: I think that is true. There is a significantly different set of rules if you have proprietary data. We are not talking about proprietary data at all here as I understand it.

MS. KELLY: In terms of the greatest risks, they seem to spring from the point of greatest benefits, in prioritizing things.

MR. FULKER: Actually, perhaps it is related, but I believe that the greatest risk for me has to do with the discrepancy between European and American database protection. I place that as the first priority.

MS. KELLY: That was well said because it certainly came from a lot of the things that were said earlier.

DR. LOHR: Yes, I think I would change the order too. At least from our point of view, we are able to deal through negotiation of contracts and everything else with the suppliers of information that we need. And we think we can be successful with that, but the wild card, especially when you do as much business outside of the United States as CAS does, is just what these legal regimes are going to be across the world. Are you going to wake up some morning and just have no protection whatsoever in Europe because they get relentless about their database directive, and then how do you truly protect yourself? Can you fall back on contractual arrangements? You can, of course, do everything you can by technical means and so forth and so on, but all of these things seem to be prone to failure if there are people who are dedicated to making them fail.

DR. OVERTON: My priority would be proliferation of burdensome licensing agreements. And by the way, as it now stands, all of the licensing agreements that we have dealt with so far are coming from Europe.

MS. KELLY: So licensing has an international flavor as well.

MR. PERLMAN: Licensing is burdensome in terms of what?

DR. OVERTON: Just having them.

Page 161 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

MR. PERLMAN: That is my question actually. The fact that you have to contract with a number of people is never going to go away. The whole idea of contracting is at least trying to get the terms that I want from you when we negotiate. Is it just the fact that you have to deal with a lot of people or is it that they are imposing things that you think are offensive?

DR. OVERTON: Both.

MR. PERLMAN: It is disruptive?

DR. OVERTON: Yes. I don't mean to pick on SWISS-PROT, because I don't think their situation is different from anyone else's, but let me just use that database as an example. Everybody keeps going back to SWISS-PROT because it is an extraordinarily valuable database. We all agree about that, and researchers are concerned that our access to it is going to be restricted in one way or another.

One of the restrictions SWISS-PROT has is the form of the data. In their license agreement, which you have to sign, they limit the way you can manipulate the data. In other words, the data have to be in certain prescribed formats, and we don't use any of those formats. We take what is in a flat file database and we convert it to a relational database. We haven't gone to SWISS-PROT and negotiated with them and asked if it was okay because we have been using this for years, and the license agreement only came up this year. So, on the one side, this is what we have to deal with.

On the other side, if we have all of these license agreements and then we provide this information on the Web, how is that propagated? How do we propagate the license agreements through users who come to our resource on the Web? We actually have had one of the other database providers say, “Well, do industry sites, commercial sites, who haven't signed our license agreement, have access to your database? ” That was a concern of theirs.

So if we jump to question 5—What are we going to look at five years down the road?—I imagine that this situation is going to get worse as the value of the databases goes up, and the chance is that someone will say, “You will to have to monitor usage by every individual who comes to your site.” There may be technical solutions for this, such as using digital signatures or something that goes with license agreements, but we are not there yet. I work at a university, and this is just another layer of things that I would have to deal with.

MR. REICHMAN: I want to ask a question about that, which has to do with question 5. But in response, I wonder how you would feel if the consortium method were extended so that everyone who wanted to deal with universities and academics had to pass through an institution that was, in fact, run by universities and academics in which there was a single set of rules, and the rules worked both ways? (We assume that Congress will exempt such a consortium from antitrust liability.) If you wanted data from this group, you would agree to a common set of licensing rules that it has, and then, instead of having to worry about all the different licensing agreements, anyone who is keyed into that system as a bona fide member of the protected group would be protected by the consortium. Would you think that could work?

DR. OVERTON: That issue was raised before. I tried to see how that would apply to what we are doing, and I think it would be a difficult model to apply. For meteorological data, you can see where many universities around the country might want access to that particular type of data. But in the biological arena, we have a lot of specialized databases, so does that mean we would have a consortium for each database with only a limited set of users, or are you suggesting a consortium that is an umbrella for all biological databases?

MR. REICHMAN: Ideally, yes, even all scientific databases, if you could get one set of rules that would work for all; and now maybe you cannot. That is one of the questions: Can you

Page 162 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

get one set of rules that would work for all, given that the National Science Foundation has one set of rules that work for everybody and the National Institutes of Health has its own set of rules? If you had the data-sharing rules built into an agreement such that Europeans could not get the data from us unless they agreed to give us their data for scientific purposes on similar terms, and I am not talking about commercial purposes, you might solve the problem that way, by a consortium approach.

MS. KELLY: But it wouldn't solve the problem for the increasingly commercialized components of the data that you would want to use.

DR. OVERTON: Or Web site access by commercial users. Our Web sites are free and open at the moment.

MS. KELLY: The boundaries between those two worlds are becoming fuzzy.

DR. OVERTON: That is right. It would remain a terrible problem.

PARTICIPANT: One set of rules would still be relatively easy to manage with a multiple pricing system. You don't have to have a set of rules, one of which is uniform access. One of the rules can be differential pricing for different kinds of users, and that would not be difficult to implement.

DR. OVERTON: Our pricing is all the same—zero.

PARTICIPANT: It is, and you would want that for the same community —the user/generators, the academic scientific community—but when it comes to a question of for-profit uses, one of the rules could be that that there is a toll gate.

DR. OVERTON: That would be very difficult to put into effect at a university.

PARTICIPANT: If the access were handled as Mr. Reichman suggests, not through gates to each university but through a common gate, then if you had the right kind of card you would go in free, and if you didn't have the right kind of card, you would pay a toll to the gatekeeper. Then the rules decide how these databases are shared depending on what the consortium decides to do. This would not be difficult to do automatically.

MS. ADLER: Part of what is more troubling to me as I look at Dr. Overton's point is the notion that it puts more burdens on the institution instead of focusing on what you are there to do, which is work with the data and create new databases and integrate vast knowledge and have access to information. The work would be to set in place new licensing privileges or not privileges, depending on who the users are, and one problem is that you shouldn't have to monitor or move in that direction because it is not a part of what you should be doing. It should be the user's burden, which has traditionally been some of the vendor's responsibility. That part of what you are describing is escalating in terms of the communities that we are seeing, which is really unfortunate.

MR. ONSRUD: The paradigm for doing academic research has changed, and we did have a simple rule before. The way we would deal with the commercial sector in the past was essentially through the library. We would go to the library, where published books and journals were available, and we got access to commercial publications in that manner, did our own work, and still were publishing. Those books were in the library even if the commercial companies didn't want them there because of the first-sale doctrine. You could buy that book and put it in the library, and commercial publishers couldn't keep you from doing that. We all had access to that shared resource. No licensing was required. That was the paradigm. Now, in the electronic environment, all of a sudden we don't have the first-sale doctrine anymore because the publication might be there in the library, but to copy it over to your computer would be a violation of the copyright law.

Page 163 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

Now you say that there are always going to be these contracts, but one should think in terms of how one would move the first-sale doctrine into a digital environment for the scientific community. For example, whether it is an electronic book, which is copyrightable, or a data set, if the library buys five copies, it can check out five copies. Mark Stefik said earlier today that the data technology is available where you can control it, that one person checks it out digitally online and it is canceled in the library until a two-week time period is up and then it is reactivated in the library.

DR. SCHOOLMAN: Hack Schoolman, National Library of Medicine. To whom can you transmit? It is an endless progression because not only do you have the issue of whether you go beyond the first sale and lend the publication to someone or start to transmit it to someone, but then the next question is to whom you can transmit. You say that you can do it to your faculty and your constituency, but what is your constituency? There is no uniform definition of constituency.

From the National Library of Medicine's point of view, our constituency is the world. We have obligations throughout the world, and every university and every institution has an undefinable constituency set by geographic device or by some other type of artificial device which has no relationship to its operations.

MR. ONSRUD: All I am saying is that this is an idea worth thinking about. In thinking about the constituency, right now there are the practical ramifications of data coming to my university library and that local community is the only one likely to use the publication. We do have a different paradigm here, but the interesting paradigm is that we didn't have to have all these licenses. Do we really need these Uniform Commercial Code kinds of provisions, or do we need the library community right now?

DR. SCHOOLMAN: You can get a license to do all the things you want to. Just pay enough money to accomplish it; they will give you licenses in all the things you want.

MR. ONSRUD: And you end up with myriads of licenses that all of our research laboratories and all of our libraries are dealing with. So, what I am saying is that we shouldn't just throw out the idea that we can come up with one contract that is going to fit a large proportion of one set of code provisions, that this is going to really benefit the scientific community because there are so many different conditions, and conditions change over time.

MR. PERLMAN: I don't think you can say, in an ad hoc way, say that license terms—putting restraints on use that are off the wall—may be the most beneficial way to distribute digital information, and that is what these licensing terms are designed to do. Licensing may well be the only way one can have effective large-scale sophisticated use; but at least there has to be some income coming to the database producers to create enough incentives for them to do it.

DR. SCHOOLMAN: Since Feist there are landfills of scientific databases that have been created.

MR. RINDFLEISCH: I think the issue of price discrimination implies that you can identify who it is you are discriminating against, and one of the problems in a university environment is that there are affiliations that come in all sorts of flavors. I am from Stanford University. We define members of the Stanford community in different regards if they are doing clinical work, if they are doing basic science work, or doing education, and to identify people, much less classify them, becomes an extraordinarily difficult task. As a director of the medical library there, I know that we simply cannot keep these things straight, and the administration of these criteria just becomes impossible.

Page 164 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

MS. KELLY: On the third question, the objective is to find out what specific conduct of others (the database producers, the product disseminators, and the data users) most adversely impacts your organization's database activities.

MR. FULKER: First, I would say that there are a few things that I find problematic in this, although the behavior in any of these sectors (not-for-profit, commercial, or government) among database producers or disseminators is usually not seriously problematic for us, in any case. I could highlight two, but because they interplay it is hard for me to rank them. There are a few, shall we say, vocal folks who are keen on asserting unfair competition, but this is not uniform by any means. I think Unidata and the Web, and the Internet in general, are sort of evil manifestations of what is happening, which is that some database providers had an economic model where it was fairly expensive to get government data. And by making it inexpensive for users to gain access, they had a particular market niche; that is disappearing. This idea that weather prediction and getting weather information out to the public is a partnership between the government and the private sector has caused the government to, in some cases, tiptoe around what I would characterize as these commercial concerns about unfair competition. So occasionally I feel like I have a very constructive discussion with the National Weather Service about making some new data available, and suddenly the brakes will be put on because they are concerned about the private sector.

I am not saying that it is not a legitimate concern, but it does sometimes impede progress because we are fairly proactive in using new approaches to making information available, and that is occasionally, I think, a bit of a threat.

DR. LOHR: I had some trouble with this question. I don't think I can comment about specific behavior, but let me just talk a little bit about then and now in terms of the whole chemical database industry.

If you go back 10 years or so, what we had was a very nice system where a chemist would do an experiment, create data, and publish them by handing them out to publishers. The publishers would hand the data off to CAS, and we ultimately would hand them back to the scientist who had done the original experiment because he needed them. We had this circular value chain, which was kind of a closed feedback loop. All the transactional relationships through this loop were understood and accepted by everyone, and life went on. It was like living in Mayberry.

Today it is as if we all woke up in Dodge City, strapped on our guns, and are out in the street trying to see what we can make of this brave new world. It is just this whole degree of uncertainty that affects not just us but everyone related to this database enterprise. Everyone is trying to reassess the situation and reevaluate the position they are going to occupy into the future, trying to jockey around to take advantage of whatever can be found to take advantage of. It is just remarkable.

You ask, “How does this relate to everyone's behavior?” You would like everyone to behave the way they used to except with maybe a few more degrees of freedom. But this is the kind of world we live in now, and it is all brought about because of this uncertainty in the legal realm and in the technological realm.

MS. KELLY: Would you care to add a little more about what you think the future, the five-year future holds?

DR. LOHR: I really don't know. So much is changing so fast that predicting the future is hazardous.

Page 165 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

DR. OVERTON: I don't want to sound like a broken record, but again the major problem is licensing by the database producers.

MS. KELLY: The diversity.

DR. OVERTON: It is the diversity of licensing. So let me use this opportunity to talk about something else. One of the things that strikes me about any database provider is not just the database but the tools you have to use to access the data. That is something that really discriminates between different providers of the same data.

Now, there is a flip side, which is one of the reasons I don't worry as much about the content side as about what someone can do with the content. If I go to a particular database provider, I will choose that provider based on how I can access the data, how I can manipulate the data, and what I can really do with the data. One of my frustrations, in fact, with some of the databases that are available is the narrow view of the data.

One of the things we do is data mining, which requires sweeping through big databases. Often the access to the data that we want is limited. It is a very restricted view of the data. In fact, sometimes we cannot data mine. For example, having access to all the data in MEDLINE, which is a literature database, would make it easier for us to do certain kinds of data analysis based on word occurrences or something similar. Let me just throw that out as a possibility.

So that is a problem; not only do the data have to be provided to us, but they have to be provided in a way that allows us to do the kind of things with them that we want to be able to do. That is something that the producer can restrict, which then cuts us off from being able to do what we want.

MS. KELLY: And is it your experience that the licenses that you are running into do constrain your ability to do the more innovative things with the content?

DR. OVERTON: So far, no, although, again, there are big chunks of MEDLINE that are freely available now. We can even deal with the big chunks that are for electrobiology, but I think this is a concern I have down the road.

We do automated data analysis, bulk analysis of data, and that is not supported by the Web. So, down the road, simply because it is a more effective way for people to control what goes on, we are going to get more and more of our data available only through that avenue, and I have a concern about where that is going to take us.

MS. KELLY: So your sense of the future is that, with the popularity of the Web as the preferred distribution mechanism, it will constrain your ability to do what you need to do?

DR. OVERTON: Absolutely, and because of the other features that will come with the Web, like security features, people may say that the easiest solution to dealing with access to our databases is to only provide access through this peephole view of the data, through the Web.

MR. RINDFLEISCH: I would have answered the question you just asked —has this impeded innovation?—as definitely yes. I will give you a concrete example. There is a database from MICROMEDEX, which is a drug database that is useful for clinical medicine, and it is organized from the point of view of a pharmacist. So you can ask certain questions, and you get drug-oriented answers. From a clinician's point of view, you typically want to go in by disease and look across drugs, but their interface doesn 't allow you to do that. So as we try to use these data in an innovative way, the interface has to facilitate different kinds of access to the data that allow you to make use of these things in ways that the inventors or the people who put the database together never imagined. I think we are already to the point where these kinds of restricted interfaces are constraining innovation.

Page 166 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

PARTICIPANT: Is this a correct interpretation or implication of what you and Chris Overton are saying, that the broadened uses of data and the changed manner of uses are creating the need for a new standard of metadata to accompany databases to make the full utilization possible when you are doing meta-analyses? It sounds almost like you are making a plea for some kind of new standard of metadata or some change of general practice to be able to allow you to use data the way you want to.

DR. OVERTON: It is more than a matter of permissions. It is much deeper than that. The problem is the data come bound with a certain set of interfaces to the data and they are not separable, and it is getting worse if you use the data through the Web. If I could assume that all the databases were all relational databases, and there was an access that gave me ad hoc queries to the database, fine. That is what I am asking for, but that is generally not the case.

PARTICIPANT: But metadata don't just include substantive information. They also include format.

DR. OVERTON: Right, but that implies that I have access to the whole data set, and I am not getting access to the whole data set. So I cannot take the metadata and the data and download them to my site, reformat them in whatever way I want, and then ask any kind of query I want—build another warehouse, in other words.

MR. REICHMAN: But his fear, which is amazingly farsighted, is exactly what the situation is moving toward. The fear is that a technical limitation is then linked to a contractual limitation because it is easy to control it on the database. So, for our interest in control and pricing, etc., you cannot do with the data what you need to do.

DR. SAXON: There is a point of technical and contractual interaction.

MR. RINDFLEISCH: And we are at such an early stage of understanding what these new kinds of data are and new ways of ways of looking at data, how they are changing the way we do science, the way we deliver medicine, and the way we do education, that we don't want to constrain this at the point where we are actually developing new market opportunities by trying to preserve the dinosaurs, if you will, that exist now and that have been successful. So, it seems to me that this innovation is part of what will fuel the whole next generation of economic advantage of these data.

MS. KELLY: Mark Stefik had made an observation last night that with copyright, the legislation has followed the technology and innovation, If we are trying now to get ahead of that, we may be doing damage we cannot anticipate.

MR. RINDFLEISCH: That is right, and this is moving so quickly that I don't think anyone can imagine where we are going to be five years from now, and the World Wide Web basically did not exist five years ago, other than in a few research labs.

DR. LOFTUS: Philip Loftus, Glaxo Wellcome. There also is an underlying drive in this that moves it forward. Science itself has moved from an era that was generally data poor, and there wasn't a huge amount of data, to a technology that has created now enormous volumes of data, and that is what drives all this.

Just to put this in context for you, as a major pharmaceutical company, five years ago we could generate data on 100,000 compounds per year. Two years ago we could do that on 100,000 compounds in a month. A year ago we could do 1 million compounds in a year, and now we can do 1 million compounds in a week. This is nothing special for us. This is not a prowess of our company; it is a change in technology. So now the challenge for science is to be able to digest and manipulate all of these data, and that I think plays exactly to where Chris Overton is.

Page 167 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

As a scientist, you now have got all of kinds of databases, and the name of the game is to put it in different ways, look across databases, look at it in new and imaginative ways, and generate new values. Given the pace at which technology is moving, if you were moving very slowly then you could have, if you like, a small generation of primary databases. People like Chris Overton could over a year or two create a second-tier level of databases, and maybe in five years' time he could have a tertiary tier of databases, but in the real world this is changing by the month.

So your level of derivative databases would be enormous. If each of these has pass-through licenses to the person who owned all the preceding databases, then this process would die very quickly of complexities. It is very hard to understand how that model actually can cope with the reality of the speed with which modern R&D moves.

I am from the information systems side in this environment and I don't have an ax to grind in the marketplace. I just have to worry about how we implement this within our own company, but if you look at it from the information systems side of databases, there are three key components: first, the data, and, in the model you have been talking about, you have public-domain data and there is no ownership of the data; second, the structuring of the data, where you are looking for your creativity, what we call the schema, the way that those data are structured together; and third, is the software you put around it, which lets you access and analyze the data.

If the data are free, and you take a copyrighted paper that has, if you like, a certain concept in it, then as scientists read that, they take that concept to the next level, and it becomes original to them. If you look at databases, other than the data, the only thing in general that is original about them is their schema. Presumably at some point, if you take a set of schema from a set of databases, you combine them in a novel and creative way to generate a new schema which is not intuitive from the databases but which will be the source of the added value. Then I would have thought at some point, potentially, maybe that intellectual property could become yours and would not carry the encumbrance of all the history of the structures that had been made in the past.

I would think from a pragmatic point of view, because of the speed at which science moves, that we will need a model better than just a pass-through set of rights model. Certainly, if you look at it from the database point of view, looking at the way information is structured in the database, the schema, which, as I understand it, is the part that still can be copyrightable, and draw a parallel from the way that traditional scientific intellectual copyright has evolved, that at some point would have to come into play.

MR. FULKER: I would just like to comment that I think that these are important points about these new development arenas. I didn' t link these issues to copyright because Unidata doesn't have any of our suppliers asserting any control over the form of access. That has just not been a problem in our arena.

A large amount of the innovation that is occurring within my group and my community pertains exactly to these kinds of subjects, forms of access, and my own view is that schema and software are merging. I don't know that it is so easy in object-oriented software to actually distinguish those two, but I still agree with the basic point, and, also, the point about forms of access. I think that a potentially serious downside to technology as the approach to access control is that it might seriously impede these areas of progress.

MS. ADLER: I want to echo what you are saying. This went to the heart of the concerns of many of us regarding the legislative process last year because there was total downstream control of all uses. In essence, there was no transformative use permitted. So in formulating the

Page 168 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

argument this time around it would be critically important to be very articulate as to why that will so undermine innovation in each of the sectors that we are speaking to because that was not understood at all last year, and I don't think it was for lack of trying. It just was not understood, and I think that gets to the heart of so many of the things that we are going to have to be doing this time around.

MR. RINDFLEISCH: It reminds me of the argument we went through 15 years ago to try to explain to people why the Internet was a good idea. It is these little incremental things that happen all over the place that sum up to the huge whole. There isn't a single event that captures everything that might happen, and it is the “might happen” that is the one predictable.

MR. VAUGHN: John Vaughn, Association of American Universities. I would like to underscore something Prue Adler said, and I hope that the report tomorrow can reflect what I think Chris Overton demonstrated most clearly, which is the advancement of knowledge by drawing from a multiplicity of sources—proprietary, not-for-profit—in this country and other countries. What we are up against, I think, are licensing schemes. Harvey Perlman pointed out that we are going to have to deal with this.

These commercial database groups—and even noncommercial database groups—will have to recover their costs. We have to pay some money, and I think that the contractual schemes, the licensing arrangements, which are causing us so much trouble, are not designed to cause that trouble. That is an unintended consequence of the database producers trying to protect themselves against other competitors. It may be possible to develop some collective contractual arrangements through some guidelines, something like Jerome Reichman talked about, to include the terms under which the academic community for science and educational purposes should work and talk about how we cannot be impeded by these interfaces that are there for other purposes. This is not to say that we have to operate without contracts or licensing arrangements, but we need to stipulate the conditions that we need that are science and educationally based, and I mean a different treatment in the legislative arena that has to translate into a set of contractual arrangements. Collectively, we might be able to make some sort of guidelines that could bring that kind of uniformity.

I don't think Chris Overton's problem is licensing per se. It is having 37 different licenses with conflicting terms that impede his capacity to deal with his colleagues elsewhere. As he said, you can do anything you want internally, but once you try to go outside you cannot do it.

MS. KELLY: That perhaps suggests that there is a clear and definable boundary between the educational community and the for-profit community, which may not be the case even in research going on in academia. So that makes it difficult to live with.

DR. LOHR: There is a limit on how far we can go, but I still think we can make some progress in that area.

MR. FRANKEL: Mark Frankel, American Association for the Advancement of Science. While I support the notion that John Vaughn has just suggested, I think what has happened within the academic community regarding technology transfer agreements and sharing of various kinds of materials, particularly in the biomedical arena, is not terribly optimistic. We have had a very difficult time in getting universities to come to some general agreement about the nature of sharing within the context of the technology transfer. It has been very difficult, and I think we should keep that process in mind as we try to move forward in the direction that John Vaughn is suggesting in terms of databases.

DR. SAXON: To take that even further, that same university—because it is engaging in all that contractual work in some of the work that it does, which is the output of the research and

Page 169 Cite

Suggested Citation:"8 Not-for-Profit-Sector Data." National Research Council. 1999. Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options. Washington, DC: The National Academies Press. doi: 10.17226/9693.

×

educational effort—in turn is the subject of contractual arrangements with for-profit concerns, and so it gets very hard to decide what is what.

MR. REICHMAN: This is a very sobering and wise comment that we need to deal with, but I do think there is a basis for a little more optimism.

What you are talking about is indeed the unhappy result of all the high-sounding principles of the Pajaro Dunes conference on university-generated biotechnology, and then it came down to every man or woman for himself; but in the case of biotechnology, there are obvious downstream applications whose payoff anyone can predict. It may not be there, but you think it is there, and you want a slice. In contrast, with regard to databases, the data are so far upstream that the real damage to the scientific establishment will be that you do not really know the potential applications and you are putting these scientific blockers so far upstream that we don't know the real damage—and we cannot even deal with the potential damage—and we will never know the lost opportunity costs. It seems to me that universities have a much greater interest in preserving their common access upstream than they do downstream where they resemble other entrepreneurs, and I think they are uniquely placed to understand that.

So, I kind of agree with John. I think that a Pajaro Dunes agreement on database rules would stand up, in fact, and I think the punishments could be terrible if they didn't. That is, in universities and granting agencies, the direct and indirect punishments could be effective if you violated the rules or tried to hold out.

MR. RINDFLEISCH: I think universities are scared stiff of making mistakes in licensing. My group generated the software that was the basis of Cisco Systems, but Stanford neglected to take a 5 percent cut in Cisco Systems. If we had, I would have to scramble much less for research support today than I do, and the patent that has just run out. People are making decisions in universities about things they don't understand any better than anyone else, and they are afraid of making agreements that end up being disadvantageous.

MR. EISGRAU: I am Adam Eisgrau with the American Library Association. My position is as legislative counsel, and for that reason I want to associate myself with what was said in respect to the legislation. This coming legislative debate is going to be a ferocious one, specific to proposals to be analyzed from every potential specific angle, and the players around this table and every other table you can think of to whom to export the message have to get involved.

The key stumbling block to making more progress than we did last year was the ability of the proponents of various kinds of legislation to say, rightly or wrongly, “Our business is going to be harmed, to the tune of x million dollars, unless we get this kind of protection immediately to fill the gap.” Congress wants to plug loopholes, and loopholes for people who are then “deprived of legal protection” is a very sympathetic argument. So my specific request is to look at the details that Mr. Coble or anyone else comes up with and get involved and get related communities involved in assessing specifically how that legislation could have an adverse impact on your operations, commercial or noncommercial. That specific analysis and response with concrete examples is going to be the only thing that produces any type of balance in the ultimate legislative process.