Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 337
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS Appendix C RAW KNOWLEDGE: PROTECTING TECHNICAL DATABASES FOR SCIENCE AND INDUSTRY Stephen M. Maurer, Attorney-at-Law NOTE: The author wishes to thank the National Research Council for commissioning this study and to acknowledge helpful conversations with Suzanne Scotchmer, Jeannette Balko, D. Ben Borson, Jack Brown, Richard Firestone, Richard Gilbert, Karl Kenna, Elizabeth Powers, Jerry Reichman, Kenneth Rosenblatt, Pamela Samuelson, John Stattler, Tom Slezak, Paul Uhlir, and Joel White. The author is solely responsible for all opinions, errors, and omissions contained herein. This background paper was prepared by Stephen M. Maurer for the National Research Council's Committee on Promoting Access to Scientific and Technical Data for the Public Interest and its January 14-15, 1999, workshop on the same subject. Please note that a number of exhibits were prepared as attachments to this paper; these exhibits are available for viewing in the National Research Council's Public Access Records Office.
OCR for page 338
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS SUMMARY As usually defined, “databases” include numerical data, text, images, and any other “organized collection of information.” Because enormous numbers of products fit this description, it is sometimes hard to think about such apparently straightforward questions as, “Is existing legal protection adequate?” or, “Could it be improved?” This paper tries to make matters more concrete by examining existing databases and how they are produced. The results are then used as a benchmark to evaluate potential legislation. Special attention is paid to features and problems that set scientific/technology databases apart from other products. The world of scientific and technology databases is already extremely rich and well-developed. Since the U.S. government has never enacted database legislation, this presents a paradox: If existing databases can be freely copied, why do firms continue to invest in them? The answer is that database providers have devised a bewildering number of unofficial (“self-help”) methods for protecting their investments. These include but are not limited to (1) bilateral agreements with users, (2) “shrink-wrap” or “click-wrap” language, (3) bundling with copyrighted materials, (4) continual updating and improvement that leaves would-be copiers “out of date, ” (5) search-only Web sites where the underlying database cannot be downloaded, and (6) passwords and encryption. The fact that rich and diverse databases exist in today's world shows that such protection can be extremely robust. At the same time, self-help strategies may cause undesirable distortions in the economy, particularly when they discourage database suppliers from sharing products with a wider audience. Even more insidious, lack of statutory protection may mean that some databases are never created in the first place. Scientific and technology databases present unique needs and problems. These include The need to assure private firms that they can profitably invest in commercializing and extending government databases for use by a broader audience; The need to keep database prices within the reach of academic users, who have traditionally driven most advances in basic knowledge; The scientific community's need for value-added or edited databases that not only collect but also update, cross-check, comment on, and try to reconcile reported results; The fact that virtually all scientific databases have historically been created by combining and extending earlier data sets; and The scientific community's need for full and unrestricted access to data, which inevitably conflicts with self-help strategies based on secrecy or partial disclosure. The modern history of database reform begins with the U.S. Supreme Court's 1991 decision in Feist Publications, Inc. v. Rural Telephone Service Co., which restricted “sweat-of-the-brow” protection under copyright in the United States. This was followed by the European Union's (E.U.) 1996 Directive on Databases, which required member countries to expand their statutory protection of databases. The E.U. Directive also contained a controversial threat that citizens of countries (including the United States) that did not adopt E.U.-style statutes would not be protected by the new laws when they took effect. Because of the E.U. Directive, the U.S. Congress introduced European-style legislation in 1996 and again in 1997-1998. Scholars have also suggested alternatives to the European model.
OCR for page 339
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS Existing reform proposals can be broadly summarized as (1) de minimis changes to existing law, (2) “unfair competition” schemes that would examine the need for protection on a case-by-case basis, and (3) so-called sui generis protection that would give database owners strong property rights modeled on the E.U. Directive. The principal difficulty has been to reconcile these proposals with the public-domain principle that “mere facts” cannot be protected. Although this is an old problem, courts were frequently able to avoid it in the past because copyright and patent law protected only a small fraction of all possible commercial knowledge. Comprehensive database protection would turn the situation on its head by making virtually all facts protectable as “organized collections of information.” In the final analysis, the policy debate for and against database protection cannot be settled by purely legal considerations. Instead, the underlying question is largely empirical. If free ridership turns out to be a problem for all databases, then some sort of additional protection should be enacted. But if free ridership is only “sometimes ” or “never” a problem, reform should be much more cautious. The fact that such questions have so far received relatively little attention makes the committee's work especially timely and represents a valuable opportunity to advance debate in this area. PART I. TODAY'S DATABASES The concept of a “database” is usually defined quite broadly. For example, one typical formulation describes a database as “any organized collection of information, ”1 even though the same phrase could just as easily describe intellectual property in general. The problem is that such definitions are too broad to provide a concrete sense of which databases actually exist in today's economy or why they should be protected. Part I of this paper tries to make the concept of a database more concrete through examples, anecdotes, and case studies. By way of background, Examples 1, Examples 2 and Examples 3 describe some nontechnical databases that are available on CD-ROM, over the Internet, and in print. Examples 4, Examples 5, Examples 6, Examples 7, Examples 8 continue the discussion by describing an assortment of databases drawn from the physical sciences, biotechnology, and engineering. The final section ends by collecting and commenting on various lessons learned from these examples. The lessons provide a benchmark for evaluating proposed reforms later in this paper. Some Commercial Databases Example 1: A Sampler of CD-ROMs As of January 1995, the authoritative Gale Directory of Databases listed 9,385 electronic databases for sale by commercial vendors. The list was further subdivided by format, including online and CD-ROM. Table C.1 analyzes a sample of 100 databases randomly selected from the catalog 's CD-ROM listings.2
OCR for page 340
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS TABLE C.1 A CD-ROM Sampler Vendor/Type Numerical Data and Directory Software Bibliography Text, Image, and Multimedia Government provider 6 0 1 0 Commercial provider of public domain data 0 1 6 8 Commercial provider of public domain data enhanced with proprietary software or other features 7 0 0 0 Commercial provider of original data 9 3 21 38 The fact that such a rich and diverse selection of databases has evolved without statutory protection is striking. At the same time, Table C.1 illustrates the fact that database suppliers use a variety of nonstatutory strategies to protect their products: Copyrighted Content. One of the most surprising aspects of Table C.1 is that most products continued to follow traditional print-based models. For example, nearly half of the sample (46 percent) consisted of text, image, and multimedia—predominantly electronic versions of books, journals, and newspapers. Virtually all of these materials are individually protected by copyright whether or not they are included in a database. “Free” Counterparts. Another way to look at text, image, and multimedia is that the cost of producing them electronically tends to be small once print-based counterparts already exist. This makes electronic databases extremely tempting to would-be providers. Updating. Seventy-three percent of the products listed in the sample were regularly updated on a quarterly or annual basis. This practice makes it extremely difficult for would-be copiers to sell a current product. Enhancements. Many CD-ROM databases were packaged with advanced (and presumably copyrighted) search software. For example, many of the numerical data and directory products combined public-domain data with advanced software for making customer lists and address labels or performing searches. Since the copyright laws protect software, the presence of such enhancements forces would-be copiers to choose between selling a visibly inferior product and making investments of their own. Reprints. Finally, some large providers were able to sell “reprints” of government databases despite the fact that this information is freely copyable. The tactic appears to work because large providers have advantages of scale when it comes to finding widely scattered consumers in a “thin” market. By contrast, would-be copiers tend to be too small to locate these same consumers for themselves.3 It is may be significant that the Gale Directory showed no obvious difference between CD-ROM products and those available online. As explained below, the Web offers several distinct technical advantages for self-help security. The fact that providers chose to forego these advantages shows that security can be accomplished in various ways.
OCR for page 341
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS Example 2: An Internet Sampler Table C.2 summarizes 100 Web sites obtained by searching for the word “databases” on the Infoseek search engine. Table C.2 significantly expands the list of self-help strategies found in Table C.1.: Passwords and Two-tier Access. Perhaps the most traditional way to protect databases is to use passwords. Many Web sites provide free samples of password-protected data. Search-Only Web Sites. The most common form of self-help found in the sample was for users to submit requests to the vendor, who would then perform searches on their behalf. This provides stronger protection against piracy than passwords.4 Clearinghouses. Some databases earn income by selling listings instead of charging user fees. The classic example is a job agency, in which employers pay for ads that are then distributed to the public without charge. Product Ties and Come-ons. Many Web databases are offered free as an inducement to purchase related products. In such cases, the producer provides data without charge in order to promote his core business.5 TABLE C.2 An Internet Sampler Provider Type Bulletin Board in Which Individual Needs Are Posted in a Single Place (e.g., Job Listings) Compilations of Two or More Public Domain Databases Original Data Directory and Network Data that Identify Community Members to Each Other and/or the Public Enthusiast 0 6 0 0 Government and education 2 40 3 4 Commercial Provider Access provided without charge 6 0 7 3 Commercial Provider: Portions of data restricted to users who have purchased passwords 0 0 8 0 Commercial Provider: Search only 0 0 18 3 Finally, Table C.2 is a stark reminder that not all database providers want protection. This is trivially true for the enthusiast, government, and education providers whose missions are heavily slanted toward dissemination. A more subtle point is that the phenomenon also exists in the commercial field, where databases are frequently used as “market makers” to bring buyers and sellers together. It is an open question whether or not such players welcome copying (particularly when they receive attribution) as a way of reducing their own publication costs and/or reaching even larger audiences.6
OCR for page 342
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS Example 3: Other Types of Databases Example 3(a): Dataquest. Even though all of the providers found in Table C.2 offered more or less standardized products, this is not the only business model available on the Web. For example, a consulting group known as Dataquest offers two types of proprietary information: (1) a library of 25,000 confidential reports that can be searched and downloaded over the Web at a cost of between $100 and $5,000 per item, and (2) custom research at a negotiated price. All of these products are subject to elaborate contractual safeguards governing each side's use and disclosure of the reports. Dataquest also sells “alert” services that notify users of developments in predefined areas of interest.7 Example 3(b): Info-Trac. One of the largest (and most useful) databases found in the course of preparing this report was a citation index called Info-Trac. Info-Trac is available both online and as a CD-ROM. Although Info-Trac is available to users (e.g., libraries) free or at nominal cost, it charges a substantial fee for copying hard-to-find articles.8 This is yet another example of using an essentially free database to market the seller's principal product. Example 3(c): Paper-based Databases. The fact that Feist involved telephone books shows that paper-based databases are still important. Virtually all of the text and bibliographic products listed in Table C.1 have print-based counterparts. Some Scientific Databases Example 4: Some Electronic Database Samplers Most of the examples listed below describe the creation, evolution, and/or capabilities of individual databases. The present section tries to set the stage by presenting broader, more impressionistic samplers of scientific and engineering databases offered over the Web or in libraries. Because the samplers show considerable overlap, they are discussed together at the end of this section. Example 4(a): Physics. Table C.3 extends the previous discussion to the sciences by summarizing online and CD-ROM databases offered by the University of California (UC) at Berkeley Physics Library and by the results of a request to the Yahoo physics search engine for the word “database.”9 Because of their greater volume, the UC Berkeley and Yahoo resources for engineering are listed separately. Example 4(b): Engineering (Library Resources). The UC Berkeley Engineering Library resources are given in Table C.4. Example 4(c): Engineering (Internet Databases). Table C.5 summarizes the 71 relevant hits generated by polling the Yahoo engineering search engine for the word “database.”
OCR for page 343
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS TABLE C.3 A Physics Sampler Resource Number Comment Online versions of print journals 55 Includes physics-related journals published by American Physical Society, Reed Elsevier, American Chemical Society, and American Astronomical Society. Electronic preprint servers 8 Includes servers maintained by government laboratories and professional societies. The American Institute of Physics also offers its own e-journal. Electronic abstracting and indexing databases 2 Consists of INSPEC database of 4,000 journals plus selected conferences, reports, dissertations, and books, and Web of Science database of 3,300 scientific and technical journals. Other electronic resources 12 Includes large atomic, particle, and thermodynamic databases prepared by national labs and universities. TABLE C.4 An Engineering Sampler (UC Berkeley Resources) Resource Number Comment Online versions of print journals 60 Includes journals published by professional societies (ACM, ACS, IEEE, Society of Industrial and Applied Mathematics) and private publishers (Academic Press, Elsevier, Springer, Wiley). Electronic abstracting and indexing databases 16 Includes private, DOE, EPA, and National Technical Information Service publications Technical Report Databases (Includes both indexed and full text) 10 Includes government sites and the Yahoo physics search engine. TABLE C.5 A Second Engineering Sampler (Internet Resources) Vendor/Type Full Text Original Data Directory and Network Data, that Identify Community Members to Each Other and/or the Public Enthusiast 0 4 0 Government and education 0 18 3 Commercial provider/search only 0 3 0 Commercial provider/portions of data restricted to users who have purchased passwords. 0 10 0 Commercial “public service” provider 2 3 12 Commercial database limited to provider's own products 0 8 0 Commercial database offered at no charge to sell enhanced or CD-ROM versions of the same data and/or related products 0 4 0 Commercial database paid for by advertising and/or selling right to post items on a public bulletin board 0 4 0
OCR for page 344
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS Table C.3, Table C.4 and Table C.5 are strikingly similar to the broader electronic databases discussed in the section on commercial databases above. In particular, they show Richness. The sheer number and diversity of available databases is astonishing. Diverse Suppliers. The products listed in Table C.3, Table C.4 and Table C.5 are produced by government laboratories, private institutes, and commercial ventures. Researchers appear to use these sources interchangeably. Online Versions of Print Media. Academic journals and societies have rushed to make online versions of their journals available. This is exemplified by the fact that the American Physical Society, American Mathematical Society, American Chemical Society, and American Astronomical Society currently place all of their journals online. Most of the index and bibliographic products listed above are also extensions of preexisting print-based counterparts.10 Self-help. Private publishers universally rely on passwords and/or contractual restrictions to limit access to, and republication of, their products. 11 Electronic Options. Despite the greater technical difficulty of protecting CD-ROMs, they continue to be well represented in the sample. Example 5: A Large Nuclear Science Database12 Since the late 1940s, the nuclear science community has struggled to reduce an exploding literature to a more manageable data set. Despite declining manpower and budgets, the Department of Energy (DOE) continues to spend approximately $4 million per year to maintain, update, edit, and disseminate nuclear science databases.13 Approximately $800,000 of this is spent to support a group at Lawrence Berkeley Laboratory (LBL) whose principal product is the Table of Isotopes. The product includes over 160,000 published references and approximately 1.5 gigabytes of data. Historically, nuclear database creators have never started from scratch. For example, the Table of Isotopes can trace its lineage to roughly half a dozen nuclear databases, many of which still exist. The LBL group has made extensive efforts to improve and extend these sources by adding new data, checking reported calculations, comparing different experiments to arrive at best values, and deducing additional data not calculated by the original authors. The Table of Isotopes is currently 5 years behind the literature, on average. Approximately one-half of the group's budget goes to improving its database so that it can support more advanced, relational searches; the balance is spent on disseminating the product over the Web and/or rearranging the data into new tables aimed at medicine and other non-traditional users.14 DOE has not asserted any proprietary interest over the database. The LBL group is not worried about copying, provided that proper attribution is given. In addition to its public domain/Web-based version, the Table of Isotopes is also available as a commercial book and a CD-ROM. To protect against copying, the publisher has insisted on the following self-help provisions:
OCR for page 345
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS Updates. The group must supply new material annually, although the content of updates is discretionary. In practice, the group has concentrated on developing new tables aimed at nontraditional (and potentially lucrative) users in fields such as medicine. Additional Graphics. The group must prepare copyrighted graphics that are, at least initially, superior to those found on DOE's Web site. This is an important selling point for commercial buyers who use the CD-ROM to prepare graphics for talks and presentations. The graphics material also adds copyrighted content to an otherwise public product. Additional Software. The group must prepare additional software. This provides an additional selling point and copyrighted content not found at DOE's Web site. 15 Although these enhancements are useful, the LBL group probably would have invested its resources differently if left to its own devices. In particular, it would have devoted more effort to updating and improving the underlying (but unprotected) database itself. This is a concrete example of how reliance on self-help solutions can distort investments by comparison with a hypothetical world in which all forms of intellectual property were identically protected by statute. At the same time, the Berkeley group does not seem to view self-help as a significant bottleneck to new commercial projects. Example 6: Elsevier Science16 Elsevier Science publishes (1) nearly 1,200 English-language scientific journals, (2) a variety of highly specialized reference works, (3) various bibliographies, abstracts, and reviews, and (4) paper and electronic versions of the world's “most comprehensive interdisciplinary engineering database.”17 Virtually all of these materials are available both online and as CD-ROMs. Elsevier Science's search software permits users to search multiple journals at once. Although old print journals never had enough space to include full data sets, the advent of online journals has effectively removed this constraint. As a result, Elsevier Science now requires authors to submit underlying data sets so that they can be linked to online journals. Elsevier Science routinely asks authors for the copyright to their work (including any underlying data) but will usually agree to accept a license instead. According to the company, there is currently no other way to manage reprint and reuse requests. The company does not ask for patent or exclusive database rights.18 Elsevier says that its nonscientific divisions have sometimes decided not to invest in new databases because of protection concerns. So far, however, this has not happened to any of Elsevier Science's science projects. At most, database protection has been one issue among many. To date, Elsevier Science has collected only a “tiny” number of databases and has little experience with database issues. In line with its current reprint policy, the company would probably not assert its copyright against authors who tried to make commercial products from their own previously submitted databases, but probably would demand reasonable reprint fees from third parties who wanted to republish the data for commercial gain. The company has given little, if any, thought to compiling its own commercial products from authors' data sets. In theory, Elsevier Science could assert its rights more aggressively in the future. Under this scenario, the company's large number of journals might then be leveraged into a corresponding
OCR for page 346
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS dominance over databases.19 So far, however, there is little indication that Elsevier Science 's disparate databases will ever be combined into a useful—much less dominant—commercial product. Example 7: Biotechnology20 Bioinformatics. Finding commercially interesting genes is essentially a race to find subtle patterns in an enormous body of experimental data. (The task is often compared to that of prospectors looking for hints of gold in an otherwise featureless landscape.) The principal raw data needed to conduct academic and commercial biotech research are currently maintained in over 200 public sector databases scattered throughout the world. Virtually all of these Web sites are narrowly focused on the owner's research agenda. As a result, the system is often fragmented and redundant. From a computing perspective, many of the sites tend to be amateurish, underfunded, and unstandardized.21 This creates recurring difficulties for corporate users.22 The intersection between computer science and biology is known as bioinformatics. Next-generation bioinformatics systems will be designed to (1) convert diverse databases to a format that users can read, (2) search simultaneously the Web's 200+ sites as if they were a single database, (3) enhance existing text-based databases with relational links to make them more amenable to sophisticated searches, and (4) create software search tools that are not only powerful but also flexible enough to let researchers study the data in unanticipated ways. GenBank. The best-known and most important public database is a National Institutes of Health (NIH) Web site called GenBank. GenBank is one of three official locations where researchers can deposit information about the precise order of base pairs found in human DNA. The current Release 110.0 of GenBank contains over 3 million sequence records and includes more than 2 billion base pairs. More than 100,000 sequences from individual laboratories and high-throughput sequencing centers are added each month. Since it was founded in 1982, GenBank's size has doubled every 14 months. Because of funding constraints, GenBank's capabilities are limited. For example, search tools can perform full text searches only for written words. This is extremely unwieldy for most biology applications. In addition, editing and comments are limited to author annotations. No effort is made to comment on related journal articles or to identify or resolve conflicts between data submitted by different researchers.23 Finally, updating comments and sequences is virtually impossible. These problems are not unique to GenBank. In recent years, several not-for-profit biotechnology databases have either closed or been threatened with closure. Commentators have complained that the community may have to get by with inadequate updating, editing, and annotations. 24 Private Database Vendors. Beginning in the early 1990s, several firms began to offer private versions of a few databases to elite users willing to pay multimillion-dollar license fees.25 Initially, these biotechnology databases were attractive because they included large amounts of secret (i.e., proprietary) data, and they offered advanced bioinformatic search tools. Because public discovery was booming, the former advantage was short-lived. This has driven some firms to shift their emphasis to “the sale of new databases, software packages, and perhaps consulting.”26
OCR for page 347
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS One early leader in the field, Human Genome Sciences, started off by selling its proprietary database to a single research partner as part of a $125 million deal.27 More recently, Human Genome Sciences has broadened its relationships and now plans to offer market friendly software packages ranging from “simple, low-end packages for impoverished [academics to] tailor-made luxury items for drug companies.” Raw data will be provided free of charge.28 Human Genome Sciences' principal rival, Incyte, originally charged licensees $15 million to $20 million for access to its proprietary databases over a three-year period.29 Approximately 50 companies currently subscribe. Like Human Genome Sciences, Incyte focuses on software and database enhancements. According to Incyte's chief financial officer, There's a huge information-based business growing from the pharmaceutical industry. . . . This is not a small market segment that's going to be serviced by half a dozen companies. This is going to be a fairly large segment of service for a lot of companies, for everything from software and hardware companies to more biologically oriented companies and consulting firms that do systems integration or go in and design something specifically for a big drug firm.30 Incyte subscribers can currently buy the company's advanced LifeSeq relational databases with or without proprietary data. However, even nonproprietary databases have been cleaned and standardized to support Incyte's advanced search software.31 Incyte also develops custom databases for individual clients; these are typically resold to other companies after an initial period of exclusivity. Another private company, Celera Genomics, recently joined the ranks of Human Genome Sciences and Incyte. Celera's proposed human genome database will reportedly include extensive proprietary human genome data and a “value-added software and informatics system.” Celera has not asked to share in the profits from any discoveries. Instead, it will offer its databases to users on a straight fee-for-service basis. Very large users will be able to purchase dedicated systems.32 Human Genome Sciences, Incyte, and Celera have many smaller rivals. These firms rarely sell proprietary information at all. Instead they concentrate on helping clients to manage their existing data in new and better ways. Examples 8(a)-8(h): Anecdotes and Profiles. The following examples are drawn from earlier descriptions of databases found in the literature.33 Example 8(a): POISINDEX. This CD-ROM product links approximately 750,000 poisons to 775 management and treatment protocols. Approximately 200 clinicians from 20 countries participate in editing and selection. POISINDEX also hires computer scientists to maintain its database and create search software. It is updated quarterly and sold by subscription. Example 8(b): MDL Drug Data Report. This Reed Elsevier CD-ROM database contains molecular structure and biology information for approximately 85,000 potential drug candidates. The data (1) are updated on a monthly basis from published reports, patent applications, and scientific papers, (2) allow users to track clinical trials, (3) come with ISIS software that allows
OCR for page 366
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS Significant adverse impacts could include reduced real government funding levels and damage to the existing culture of science (Issue 6). Future legislation should move cautiously in this area. Option 5: Sui Generis Protection with a Defense for Improved Databases Since advocates of extended database protection usually base their arguments on free ridership, it might make sense to exempt copiers who are willing to incur substantial costs. This view fits naturally with the existing world, in which databases are typically created by combining, improving, and extending earlier products. The principal drawback of such a defense is that “substantial improvement” is hard to define and would almost certainly require judicial elaboration. The defense would presumably be available to any copier who invested in improvements, updates, and/or extensions at levels comparable to those of the original owner. Short of this, there is no obvious way to determine how substantial the copier's improvements would have to be. The concept would probably require judicial elaboration over time. Option 6: Shrink-wrap Contract Reforms Everyday experience suggests that the lawyers who write shrink-wrap and click-wrap contracts will continue to claim as many rights as possible—even when those rights happen to exceed the normal scope of copyright. The only real question, therefore, is what the courts will enforce. The draft UCC provisions discussed above provide little guidance. The unpredictability and uncertainty of asking the courts to evolve common law solutions to the database problem were discussed under Option 1 above. However, common law unfair competition is at least based on free-ridership and other relevant concepts. In contrast, the shrink-wrap doctrine tends to be more concerned with contract law concepts like “offer,” “consent,” and “unconscionability.” Since these concepts have little or nothing to do with free ridership, reliance on the shrink-wrap doctrine is likely to divert attention from the public policy issues most relevant to databases. Option 7: Administrative Solutions In their preferred (second) solution, Reichman and Samuelson argue that all databases should be protected by automatic licensing according to a predetermined fee schedule.141 Although they recognize that automatic licensing schemes have met with mixed reviews in the past, Reichman and Samuelson believe that these criticisms could be ameliorated by (1) using an industry-based “collection society” to set baseline license fees and (2) allowing would-be licensees to opt out of the baseline by negotiating fee schedules directly with the database's owner.142 Reichman and Samuelson are right to point out that the collection-society concept has a history of mixed reviews. Potential problems include:
OCR for page 367
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS Need for Market-based Solutions. Economists have traditionally justified intellectual property because it creates a mechanism for turning private knowledge of research and development opportunities into socially optimal levels of investment. Reichman and Samuelson's proposal would replace this market mechanism with a collection society's judgment of what fees should be. For a particular database, the regulated price would either be lower than that required to cover costs (thereby jeopardizing investment) or higher (thereby deterring use). Transaction Costs. Allowing participants to contract around the collection society may reduce transaction costs but will not eliminate them. (This is true for the same reason that allowing litigants to settle lawsuits has not put the court system out of business.) Antitrust Concerns. Reichman and Samuelson correctly note that their proposal could only be enacted only after removing “any antitrust barrier that stands in the way . . . .”143 However, the dangers of collusion should not be minimized. Giving an industry-based collection society the power to set database prices would create a political lightning rod. If suppliers (consumers) eventually became dominant, the temptation to impose monopoly (monopsony) solutions could become irresistible. Given these concerns, Reichman and Samuelson's proposal should be viewed with caution absent strong evidence that the existence of niche markets has created a natural monopoly requiring regulation. Even then, the issue of whether license fees should be set by an industry-based collection society remains an open question. CONCLUSION The principal argument for statutory protection is that firms do not create enough databases because doing would require a large up-front cost that is not currently protected. However, this paper has found little evidence that lack of statutory protection has prevented the creation of new products. The NRC study committee should ask witnesses for concrete of examples where this has happened. It should also ask whether the assumption of large up-front costs is realistic. Most of the database industry's products may instead consist of updates and improvements whose cost can be recouped within a year or so. This paper has found evidence that self-help can cause distortions. From the vendor's perspective, these include overinvestment in updates, graphics, software, and other enhancements at the expense of the databases themselves. From the consumer's perspective, self-help can unnecessarily restrict access to data. The NRC study committee will have to decide how serious such distortions are and whether they constitute an adequate case for reform. From a legal standpoint, the committee should remember that virtually all commercially valuable data can be described as “compilations of information” and hence “a database.” So-called sui generis protection is therefore unlikely to stay confined to a particular type of information for very long. Sooner or later, most commercially valuable information will probably end up receiving database protection. This may or may not be a sensible result, but that is the choice. Finally, the benefits of reform must be weighed against its likely costs. Potential problems include, but are not limited to, deterring the creation of new databases from earlier
OCR for page 368
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS products, creating monopoly power within niche markets, making databases unaffordable by the same university researchers whose work typically advances knowledge in the first place, and damaging the culture of science through inappropriate privatizing and hoarding of information. Throughout this century, most arguments for and against database protection have proceeded from relatively simple assumptions about why databases are created and how they are sold. This report shows that the reality is much more subtle. The January 14-15, 1999, NRC Workshop on Promoting Access to Scientific and Technical Data for the Public Interest represents a unique opportunity to deepen and extend this understanding.
OCR for page 369
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS EXHIBITS* The “Gale 100” List Sample pages from various Web sites, including those of the University of California at Berkeley and Yahoo Web pages, that were used to compile Table C.3 and Table C.4 Notes from the November 10, 1998 interview with Richard B. Firestone, Lawrence Berkeley National Laboratory Reprint of Appendix C of the 1997 National Research Council report, Bits of Power: Issues in Global Access to Scientific Data. National Academy Press, Washington, D.C. Notes from the November 25, 1998 interview with Karen Hunter, Elsevier Science; notes from November 10, 1998 interview with Richard B. Firestone, Lawrence Berkeley National Laboratory Notes from the November 10, 1998 interview with Thomas R. Slezak, Lawrence Livermore National Laboratory Reprint of Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, Official Journal of the European Community, No. L 77/20, 3/96 World Intellectual Property Organization Basic Proposal for the Substantive Provisions of the Treaty of Intellectual Property in Respect to Databases to be Considered by the Diplomatic Conference, CRNR/DC/6, August 30, 1996, available on U.S. Copyright Office Web site at <http://lcweb.loc.gov/copyright/wipo/wipo6.html> U.S. Congress, H.R. 2652, Collections of Information Antipiracy Act Uniform Commercial Code Article 2B-110 (August 1998 draft) * Please note that these exhibits, which were prepared as attachments to this paper, are available for viewing in the National Research Council's Public Access Records Office.
OCR for page 370
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS NOTES 1 Laura D'Andrea Tyson and Edward Sherry, Statutory Protection for Databases: Economic and Public Policy Issues (1997) (report commissioned by the Information Industry Association). The breadth of this definition is intentional. Indeed, the European Union's Directive on Databases expressly extends to “literary, artistic, musical or other collections of works or collections of other material such as texts, sounds, images, numbers, facts, and data [as well as] collections of independent works, data or other materials which are systematically or methodically arranged and can be individually accessed.” E.U. Directive at ¶ 17. In fairness, the E.U. Directive does include an ad hoc exclusion for “audiovisual, cinematographic, literary, or musical work as such.” Id. 2 The Gale Directory of Databases describes itself as “easily . . . the most complete guide to the electronic database industry worldwide.” Kathleen Lopez Nolan (ed.), Gale Directory of Databases (New York and London, 1995) at p. vi. Entry-by-entry description of the sample can be found in Exhibit 1. A particularly useful feature of the Gale Directory is Professor Martha E. Williams' annual profiles of the industry. 3 The best example of this is a company called Silver Platter. Silver Platter's nuclear databases are discussed in my interview with Richard Firestone of Lawrence Berkeley National Laboratory (see Exhibit 3). 4 Tom Slezak, a computer scientist at Lawrence Livermore National Laboratory, confirmed that these methods conferred “reasonable and prudent security” when I interviewed him on November 20, 1998. A memorandum summarizing Slezak's comments can be found in Exhibit 6 of this paper. 5 Perhaps the best example of this in the sample was an online video store that allowed users to search a massive database of over 125,000 movies, many of which were not even available commercially. 6 See also J.H. Reichman and Pamela Samuelson, “Intellectual Property Rights in Data?” Vanderbilt Law Review, Vol. 50, p. 51 (January 1997) at p. 67 (“To the extent that government generated or university generated data remain noncommercialized, their vulnerability to technically refined means of [copying] may be of relatively little importance. Presumably, the originators want the broadest possible distribution of their data sets.”) 7 Joel S. White (personnel communication). 8 Joel S. White (personal communication). Info-Trac copies the articles at Bay Area libraries. 9 The UC Berkeley and Yahoo Web pages used to compile Table C.3 and Table C.4 can be found in Exhibit 2. Interested readers may want to acquire a feel for existing databases by skimming through these listings. 10 By way of example, the Berkeley Physics Department Web site reports that Inspec, MathSciNet, and Chemical Abstracts all existed on paper before their current electronic incarnations. Inspec is more than 100 years old. 11 For example, the UC Berkeley Engineering Library's Web site lists 47 of its 60 Web sites as “UC only” or “UCB only.” Publisher Web sites were similarly restricted, although four offered their products on a trial access basis. 12 This section is taken from a five-hour interview between the author and Dr. Richard Firestone, the head of LBL's Table of Isotopes project. Curious readers will find full details in a memorandum attached as Exhibit 3. An earlier workshop studied Brookhaven's related but distinct ENSDF database. See, National Research Council. 1997. Bits of Power: Issues in Global Access to Scientific Data, National Academy Press, Washington, D.C., at Appendix C. A copy of Appendix C is reproduced as Exhibit 4 to this paper. 13 This figure does not include the actual work of reviewing articles, which is done on a volunteer basis throughout the world. 14 The Brookhaven National Laboratory followed a similar path with respect to its related Evaluated Nuclear Structure Data File (ENSDF) database. Like Berkeley, Brookhaven has devoted extensive effort to editing ENSDF's data and improving ENSDF so that it can support advanced relational search engines. Brookhaven has also created a new version of ENSDF for use by medical workers. Finally, it is working to improve dissemination by upgrading its Web site and making the same data available on floppy disk and CD-ROM. See National Research Council. 1997. Bits of Power: Issues in Global Access to Scientific Data, National Academy Press, Washington, D.C., pp. 205-206. 15 Surprisingly, the private CD-ROM/book package competes successfully with—and indeed seems to benefit from—its Web-based counterpart. In addition to the relatively minor enhancements required by the publisher, there seem to be intrinsic reasons for this. For example, books are often easier to use; searches conducted over the Web are not confidential; and CD-ROMs are permanent, whereas data on the Web can potentially change or disappear without warning.
OCR for page 371
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS 16 This section is taken from a brief interview with Karen Hunter, who handles copyright issues and strategic planning for Elsevier 's scientific journals and databases. Curious readers will find full details in a memorandum attached as Exhibit 5, which also contains additional information reproduced from Reed Elsevier's Web site. 17 Reed Elsevier also publishes many nonscientific databases, including The Official Airline Guide. 18 Elsevier Science points out that these policies are broadly similar to those of many other journal publishers, including Academic Press, the American Chemical Society, the American Institute of Physics, and the American Geophysical Union. 19 J.H. Reichman (personal communication). 20 This section is taken from a four-hour interview between the author and Thomas R. Slezak, head of bioinformatics for Lawrence Livermore National Laboratory's human genome sequencing group. Full details can be found in a memorandum attached as Exhibit 6. A supplementary discussion of genome databases can be found in Appendix C to the NRC's Bits of Power report and is reproduced here as Exhibit 4. 21 In 1997, Science described the bottleneck this way: Because the world's major biological databases are constructed differently, it is virtually impossible to devise search programs to tap into them all effectively. A user has to hop from one to the other using each database's search engine to retrieve information that comes in a variety of different formats. The article also described how a “group of leading pharmaceutical companies” was putting its “considerable weight behind the development of common standards.” Nigel Williams, “Drug Firms Back Move to Link Databases,” Science, Aug. 15, 1997. 22 Because private biotechnology companies believe that submitting searches over the Web compromises security, each maintains internal copies of the 200+ public databases needed to conduct research and uses in-house software engineers to update them nightly. Since many online databases tend to change computing conventions abruptly, systems often crash without warning. These crashes cause recurring panics within corporate management information systems departments. 23 GenBank started out as a traditional database that tried to comment on and add value to journal articles. GenBank converted to its current format because it could no longer keep up with the volume of data. 24 See, e.g., Nigel Williams, “Unique Protein Database Imperiled,” Science, May 17, 1996 (international reaction to threatened closure of Swiss-Prot database); Howard M. Ca, “After the Genome Database,” Science, March 13, 1998 (user comment on closure of GDB database). Dr. Cann 's letter is particularly illuminating for its discussion of the current system and how it might be fixed: In the post-GDB-project world, the user may have to click more often to find mapping information [at other Web sites] and perform interpretation and editing personally. Problems that might be expected in the absence of GDB coordination include recognizing duplicates of new markers and conflicting map locations from different resources. Perhaps the community will get by with the available final copy of the GDB and with database “shopping” on the Internet. If not, the international community may have to pull together to arrive at a solution. For instance, database host institutions could form a consortium for the purpose of reviewing new data and maps in a coordinated fashion before release to the public. External expert reviewers might volunteer efforts (similar to those of the “editor” group of scientists that now review and edit GDB data) within the framework of such a consortium, injecting further assurances of quality and coordination. This type of program or something with a similar intent could be provided at a minimal cost increase and would continue to support the efforts of many scientists involved in mapping and eventually identifying genes underlying complex disorders. 25 These products were concentrated in particularly lucrative areas such as expressed sequence tags or, more recently, gene sequences. 26 “Incyte Serves Up Information, Part I,” In Vivo, May 1996. 27 See, e.g., Jon Cohen, “The Genomics Gamble,” Science, Feb. 7, 1997. The database user exercised its right to terminate the partnership in late 1996. 28 “Genetic Warfare,” The Economist, May 16, 1998.
OCR for page 372
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS 29 The fee amounted to roughly 1 percent of a typical large pharmaceutical company's research and development costs. 30 “Incyte Serves Up Information, Part II,” In Vivo, May 1996. 31 Incyte's bioinformatics capabilities are summarized on its Web site. The interested reader can find selected Web pages included as part of Exhibit 6 to this report. 32 “Perkin-Elmer's Pharmacogenomics Spin-Off Creating a New Customer for Instrumentation, ” Bioventure View, June 1, 1998. 33 Except as noted, all information reported in Examples 6(a) through (f) is taken from Tyson and Sherry, Statutory Protection for Databases: Economic and Public Policy Issues , at pp. 3-6. Supplemental research was taken from the Web and is collected at Exhibit 2. Examples 8(g) through (h) are based on descriptions found in Appendix C to Bits of Power at pp. 209-210 (materials science), pp. 210-212 (chemistry), pp. 214- 216 (geophysics), and pp. 217-218 (meteorology). A copy of Appendix C is attached as Exhibit 4 to this paper. 34 Interview with Richard Firestone (Exhibit 3). 35 Interview with Karen Hunter (Exhibit 5). 36 See Examples 1 (commercial CD-ROMs), 2 (Web sampler), 4(a) (full-text physics journals), 4(b) (full-text engineering journals), 5 (copyrighted nuclear science graphics), and 6 (Elsevier Science full-text journals). 37 See Examples 1 (commercial CD-ROMs), 2 (Web sampler), 5 (nuclear databases), 6 (Elsevier Science search engine), 7 (biotechnology databases), 8(a) (POISINDEX software), and 8(b) (MDL Drug Database software). 38 See Examples 3(a) (Dataquest semicustom reports) and 7 (semicustom databases in biotechnology). 39 See, e.g., ProCD, Inc. v. Zeidenberg, 86 F.3d 1447 (7th Cir. 1996) (“contracts about trade secrets may be enforced”). 40 See Examples 1 (CD-ROM sampler), 2 (Internet sampler), 4(a) (physics CD-ROMs), 4(b) (engineering CD-ROMs), and 4(c) (engineering Web sites). 41 See Examples 1 (Internet sampler) and 8(f) (materials science database). 42 See Examples 1 (Internet sampler), 4(a) (physics journals), and 4(b) (engineering resources). 43 Although technologically less secure, CD-ROM makers often use the parallel strategy of encryption to block access to their databases. This type of self-help recently received a legal boost when the U.S. Congress enacted P.L. 105-304 (the Digital Millennium Copyright Act). The statute establishes criminal fines and penalties for anyone who tries to defeat an electronic encryption system. 44 If anything, the statistic errs on the side of conservatism since it ignores products that advertise irregular updates. 45 See Examples 3(b) (Info-Trac), 4(a)-(c) (journals, indexes, and bibliographies), 5 (nuclear science), 6 (Elsevier Science), and 7 (biotechnology). 46 See Examples 5 (nuclear physics), 7 (biotechnology), 8(a) (POISINDEX), and 8(e) (animal husbandry). 47 See Example 5 (updating of nuclear physics databases to reflect improved data), 7 (biotechnology), and 8(f) (updating of materials science databases to reflect improved data). 48 This is also a popular strategy for Web-based businesses. See Example 1. 49 It might be argued that there is no reason to enact legislation that encourages cost-free databases because such spinoffs will exist whether or not they are protected. This argument ignores the role of price signals in achieving economic efficiency. If industry members are not allowed to recapture the value of spin-offs, the underlying product will be more expensive (and less used) than it should be. Suzanne Scotchmer, “Standing on the Shoulders of Giants: Cumulative Research and the Patent Law,” Journal of Economic Perspectives (Winter 1991), pp. 29-41. 50 International News Service v. Associated Press, 248 US 215 (1918). 51 Id. at p. 236. 52 Id. at p. 241. 53 Jack E. Brown, “Obscenity, Anonymity, and Database Protection: Emerging Internet Issues,” The Computer Lawyer, 1997 (citations omitted). In 1942, a federal judge argued that INS would have been decided differently if it had been heard in that year. Id. at fn. 78. 54 One confusing aspect of Feist is that many commentators who disagree with the Court's reasoning nevertheless support its final ruling. For example, Tyson and Sherry argue that telephone book data should not be protected because they are generated “with no additional effort” in the course of operating a publicly sanctioned monopoly. Laura D'Andrea Tyson and Edward Sherry, Statutory Protection for Databases: Economic and Public Policy Issues (1997) (report commissioned by the Information Industry Association). In narrowly legal terms, the same result could also be reached by arguing that firms that exercise “monopoly power” in one market should not use it to obtain an “unfair” cost advantage elsewhere.
OCR for page 373
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS 55 Feist Publications, Inc. v. Rural Telephone Service Co., Inc., 499 US 340 (1991). 56 Id. at pp. 342-345. 57 Id. at p. 344. 58 Id. at p. 348. 59 Id. at pp. 352-353. 60 Id. at p. 354. 61 Warren Publishing, Inc. v. Microdos Data Corp., 52 F.3d 950 (11th Cir. 1995). 62 Key Publications, Inc. v. Chinatown Today Publishing Enterprises,Inc., 945 F.2d 509, 514 (2d Cir. 1991). 63 BellSouth Advertising & Publishing Corp. v. Donnelly Information Publishing, Inc., 999 F.2d 1436, 1441 (11th Cir. 1993). 64 Id. at p. 1441. 65 Mason v. Montgomery Data, Inc., 967 F.2d 135, 139 (5th Cir. 1992). 66 Nester's Map & Guide Corp. v. Hagstrom Map Co., 796 F. Supp. 729, 733-34 (E.D.N.Y. 1992). 67 CCC Information Services, Inc. v. MacLean Hunter Market Reports, Inc., 44 F.3d 61, 67 (2d Cir. 1994). According to CCC, an author's “loose judgment” that “vast regions” of the United States could be treated as a single market was also protectable. 68 Warren Publishing, supra, at pp. 951-52 69 Skinder-Strauss Associates v. Massachusetts Continuing Legal Ed., Inc., 914 F.Supp. 665, 675 (D. Mass. 1995). 70 Cable News Network, Inc. v. Video Monitoring Services of America, Inc., 940 F.2d 1471 (11th Cir. 1991) at 1485. The opinion was subsequently vacated on other grounds and is cited here as an indication of what future courts might decide if faced with the same question. Cable News Network, Inc. v. Video Monitoring Services of America, Inc., 949 F.2d 378 (1991). 71 National Basketball Assn. v. Motorola, Inc., 105 F.3d 841 (2d Cir. 1997). 72 In theory, database owners could argue that database updates are time-sensitive in an economic sense and should therefore be protected. This would require a semantic stretch beyond anything in NBA itself. 73 The obvious counterargument is that many scientific databases follow conventions that leave little room for creativity. For example, spectra almost always show frequency on one axis and amplitude on the other. A better argument might be that the experimenter's choice of which data to present still reflects creative choices. Even this argument might not be enough for human genome sequencing or other areas of routinized inquiry. 74See Interview with Karen Hunter (Exhibit 5). 75 The E.U. Directive on Databases suggested that existing databases could even be “rearranged electronically . . . to produce a database of identical content which, however, does not infringe any copyright in the arrangement of [the] database.” Directive at ¶ 38. 76 Sinai v. California Bureau of Automotive Repair, 25 USPQ 2d 1809, 1811 (N.D. Cal. 1992). 77 ProCD, Inc., supra. One noteworthy aspect of the ProCD decision was the court's statement that it would “refrain from adopting a rule that anything with the label ‘contract' is necessarily outside the preemption clause.” Id. at p. 1455. 78 Vault Corp. v. Quaid Software Ltd., 847 F.2d 255 (5th Cir. 1988). 79 European Council Directive No. 96/9/EC, O.J.L 77/20 (1996). The E.U. Directive itself is not intended to be legislation. Instead, it sets forth requirements that member states must satisfy by enacting “at least materially equivalent” statutes. Id. at ¶ 32. A copy of the Council's Directive is attached as Exhibit 7. 80 Id. at Art. 1, ¶ 2. 81 Id. at Art. 5, subpart (a). 82 Id. at Art. 5, subpart (b). 83 Id. at Art. 5, subparts (c) through (e). 84 Id. at Art. 7, ¶ 1. 88 Id. at Art. 10, ¶ 2. 86 Id. at Art. 10, ¶ 3. 87 Id. at ¶ 56. 88 Id. at Art. 6, ¶ 2(b). 89 Id. at Art. 6, ¶ 2(d). 90 17 USC § 107.
OCR for page 374
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS 91 “Basic Proposal for the Substantive Provisions of the Treaty on Intellectual Property in Respect of Databases to Be Considered by the Diplomatic Conference,” dated August 30, 1996 (hereinafter WIPO). Interested readers will find a copy of the WIPO draft at Exhibit 8. 92 Id. at Art. 2, ¶ (i). The definition would have specifically included “collections of literary, musical or audiovisual works or any other kind of works, or collections of other materials such as texts, sounds, images, numbers, facts, or data representing any other matter or substance. It is worth pointing out that in addition to many kinds of works and other information materials, databases may contain collections of expressions of folklore.” Id. at comment 2.02. 93 Id. at Art. 2, ¶ (ii) and Art. 3, ¶ (1). The definition of “substantial part” was further amplified in a note: The substantiality of any portion of the database is assessed against the value of the database. This assessment should evaluate the qualitative and quantitative aspects of the portion, although neither aspect is more important than the other . . . . The value of a database refers to its commercial value. This value consists on the one hand of direct investments made in the database and on the other hand of the expected market value of the database. This assessment may also take into account diminution of market value that may result from the use of the portion, including the added risk that the investment in the database will not be recoverable. It may even include an assessment of whether a new product using the portion could serve as a commercial substitute for the original, diminishing the market for the original. Id. at Note 2.09. The concept of an “investment” included any and all “human, financial, technical or other resources” devoted to “the collection, assembly, verification, organization, or presentation of the contents of the database.” Id. at Note 2.10 (iv). 94 Id. at Art. 5, ¶ (1). The accompanying notes emphasized the point by explaining that such exceptions “may never conflict with normal exploitation of the database” and could not “unreasonably impair or prejudice the legitimate interests, including economic interests, of the rightholder.” Id. at Note 5.01. 95 WIPO at Art. 8. 96 Jocelyn Kaiser (ed.), “Treaty on Database Access Stalled,” Science, Dec. 20, 1996. 97 The question of whether the U.S Constitution allows Congress to pass European-style database legislation is outside the scope of this report. For a list of possible problems, see U.S. Copyright Office, Report on Legal Protection of Databases, August 1997. 98 A copy of H.R. 2652 is attached as Exhibit 9 hereto. 99 The Digital Millennium Copyright Act of 1998 was subsequently enacted as Public Law 105-304. 100 HR 2652 at § 1201. 101 Id. at § 1202. 102 Id. at § 1203(a). 103 Id. at § 1202(b) and (c). 104 Id. at § 1203(d) (emphasis supplied). The reference to “potential markets” would have been more restrictive than the corresponding E.U. Directive, which permits copying “for the purposes of . . . scientific research, as long as the source is indicated and to the extent justified by the non-commercial purpose to be achieved.” E.U. Directive at Art. 9, subpart (a). The “potential markets” language was dropped shortly before the bill went to conference committee. Paul Uhlir (personal communication). 105 Id. at § 1203(e). 106 Id. at § 1204. 107 Id. at § 1206(d). Courts would also have been given discretion to reduce damages for any employee of a nonprofit educational, scientific, or research institution who “believed and had reasonable grounds for believing that his or her conduct was permissible under this chapter.” Id. at § 1206(e). 108 Id. at § 1207. 109 UCC 2B-110 (August 1998 draft). A copy of the draft provision with accompanying notes can be found at Exhibit 10. Interested readers can view the entire file at <http://www.law.upenn.edu/library/ulc/ucc2b/2b898.htm>. 110 Id. at Note 2. 111 Id. at Note 3. Significantly, the Reporter adds that state court judges “may look to federal copyright and patent laws for guidance on what types of limitations . . . ordinarily seem appropriate.” Id. This suggests that federal law may provide persuasive reasons why state courts should refuse to enforce particular licenses even where it does not directly command them to do so.
OCR for page 375
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS 112 Id. at Note 1. 113 Id. at Note 3. 114 Cf., Reporter's Note 3 to draft UCC provision 2-105 (copyright statute permits “contractual restrictions on use”). 115 Id. at Note 3 (emphasis supplied). 116 J.H. Reichman and Pamela Samuelson, “Intellectual Property Rights in Data?” Vanderbilt Law Review, Vol. 50 (January 1997), p. 51. 117 This policy-oriented approach to the problem necessarily ignores justice-based appeals that creators “should” be compensated. Suffice it to say here that strong normative arguments exist against rewarding inventors who knew in advance that certain types of activity would not be compensated. 118 Karen Hunter of Elsevier Science did report that her company had turned down nonscientific database products because it was afraid of copying. Furthermore, it is possible and even likely that counterexamples in science and engineering could be found if a more systematic survey were conducted. The apparent rarity of such counterexamples is nevertheless striking. 119 Walter Nicholson, Microeconomic Theory: Basic Principles and Extensions (6th ed. 1995) at pp. 625-628. 120 Michael Heeler and Rebecca Eisenberg, “Can Patents Deter Innovation? The Anticommons in Biomedical Research, ” Science (May 1, 1998) 280:698-701; see also Suzanne Scotchmer, “Standing on the Shoulders of Giants: Cumulative Research and the Patent Law,” Journal of Economic Perspectives (Winter, 1991), pp. 29-41. 121 Jerry Green and Suzanne Scotchmer, “On the Division of Profit Between Sequential Innovators,” Rand Journal of Economics (1996) 27:322-331. 122 See, for example, Andrew Lawler, “Database Access Fight Heats Up,” Science (November 15, 1996); see also Bits of Power, supra, p. 171 (recommending that fair-use-type provisions be included in any future database legislation). 123 See, e.g., Walter Nicholson, Microeconomic Theory: Basic Principles and Extensions (6th ed., 1995) at pp. 568-69. 124 An additional difficulty would be encountered during the transition period that followed any reform. This is because owners of pre-reform databases would receive full protection even though they had not paid for their own “head starts” under the old system. 125 Some proposed legislation suggests that copying should be permitted where the new database serves a different market than the first one. This is another kind of “honest copying” exemption. 126 Journal prices rose 115 percent between 1986 and 1994. A leading study commissioned by the Association of Research Libraries blamed the increases on an “imperfect, monopoly-like marketplace” controlled by a small group of publishers. See, e.g., Gary Taubes, “Electronic Preprints Point the Way to ‘Author Empowerment,'” Science Feb. 9, 1996. 127 Cf. Bits of Power, supra, at p. 114 (criticizing economic argument in favor of having researchers pay for databases from their individual research budgets as politically unsustainable). 128 Interview with Thomas Slezak (bioinformatics expert), Exhibit 6; see also interview with Karen Hunter, Exhibit 5. 129 Interview with Karen Hunter, Exhibit 5; see also Eliot Marshall, “Please Pass the Data,” Science 276:1961 (June 27, 1997) (reporting “recent pressure from [the EU] to give industry first crack at any genome data”). 130 U.S. Patent and Trademark Office, Report on and Recommendations from April 1998 Conference on Database Protection and Access Rules (July 1998) at p. 16. 131 Id. at pp. 14-17; Terry M. Sanks, “Database Protection: National and International Attempts to Provide Legal Protection for Databases,” Florida State University Law Review (1998) 25:992. 132 Directive at ¶ 11. (“Whereas there is at present a very great imbalance in the level of investment in the database sector . . . between the Community and the world's largest database producing third countries.”) At first blush, the E.U.'s logic seems paradoxical since greater incentives would also encourage U.S. companies to compete even harder. However, the European Union may believe that U.S. companies have already decided to enter the database market. If so, additional protection might persuade risk-averse European firms to enter the market without eliciting still more investment by the Americans. 133 Andrew Lawler (ed.), “EU Database Directive Raises Hackles,” Science 279:165 (Jan. 9, 1998). 134 National Basketball Assn., supra, at pp. 852-853. 135 See, U.S. Patent and Trademark Office Report, supra, at p. 6 (reporting suggestion by Professors Ginsburg and Reichman). 136 Element 4's limitation to use of information “in direct competition with a product or service offered by the plaintiff ” is more suspect. From an economic perspective, society wants investment incentives to reflect the potential
OCR for page 376
PROCEEDINGS OF THE WORKSHOP ON PROMOTING ACCESS TO SCIENTIFIC AND TECHNICAL DATA FOR THE PUBLIC INTEREST: AN ASSESSMENT OF POLICY OPTIONS value of a proposed database to all markets—not just the ones that the owner happens to be in at any given time. Suzanne Scotchmer, “Standing on the Shoulders of Giants: Cumulative Research and the Patent Law,” Journal of Economic Perspectives (Winter, 1991), pp. 29-41. 137 Reichman and Samuelson, supra, at pp. 142-143. 138 Id. at p. 143 and fn. 423. 139 Various commentators have suggested that initial start-up protection should be extended each time a database is updated. If the database has only been updated, it makes little sense to extend start-up protection a second time. 140 Bits of Power, supra, at p. 166. 141 Reichman and Samuelson also suggest using an initial blocking period in which no databases could be copied. Reichman and Samuelson, supra, at pp. 145-146. This is conceptually identical to sui generis protection (Option 3) and will not be discussed further. 142 Id. at pp. 146-150. 143 Id. at p. 148.
Representative terms from entire chapter: