Read "Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium" at NAP.edu

Page 114 Cite

Suggested Citation:"27 Overview of Open-Access and Public-Commons Initiatives in the United States." National Research Council. 2004. Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11030.

×

27
Overview of Open-Access and Public-Commons Initiatives in the United States

Harlan Onsrud

University of Maine, United States

This presentation provides some examples of open-access and public-commons initiatives drawn primarily from the United States. To place the presentation in context it should be noted that most wage earners in the world can ill afford the laptop computers with which we have been making our presentations at this symposium. For well over half the nations of the world the cost of this laptop computer exceeds the entire annual salary of the typical worker.

In contrast, most citizens in the United States and in much of Europe can afford and have access to a computer. Many families own several computers and practical access to high-speed cable networks is now almost universal in the United States. The average U.S. or European citizen is far more able to connect with others in contributing to digital collaborative projects than are average citizens in much of the rest of the world. However, the incentives to work collaboratively toward open-access goals may not be as great when you can afford to buy. Incentives to collaborate also are not as great when you do not have a clear vision of how you could productively contribute to a shared open-access depository or development effort.

Despite the impediments large numbers of people across the globe are expending considerable effort in developing open-access and open-source resources. An open-access or “public-commons” approach to resource development can be defined as an end product that is free for anyone to access, utilize, copy, and make derivative products although some limited restrictions may be imposed in order to enhance retention or distribution of the resource’s public availability.

Open-source code and open-access models are sometimes viewed as alternative economic models or a new mode of production in which individual contributors are organized neither in response to price signals nor by firm managers.¹ Under certain conditions this new form of production makes sense and works, while under other conditions it does not.

Some people view these new cooperative means of production as a supplement, complement, or replacement to government funding for the production of public goods (i.e., goods that are nonrival/nondepletable and nonexcludable/nonappropriable). That is, these are the types of goods that will not be produced by normal marketplace dynamics and therefore, if they are desired by society, must be produced through some other means.

¹	Yochai Benkler. 2002. Coase’s Penguin, or, Linux and the Nature of the Firm, The Yale Law Journal 112, winter 2002-2003. Available at http://www.yale.edu/yalelj/112/BenklerWEB.pdf.

Page 115 Cite

Suggested Citation:"27 Overview of Open-Access and Public-Commons Initiatives in the United States." National Research Council. 2004. Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11030.

×

In some instances it makes far more sense to produce such information through direct government funding. In other instances public commons approaches perhaps can be more efficient or complement the government provisioning efforts.

Other people view organized open-access and public-commons approaches as systematic means for organizing volunteerism and philanthropic efforts for the production of communal information products and services. Regardless of theory one can obviously make information fully available to others without one’s own self subscribing to any of the explanatory hypotheses.

This presentation reviews illustrative open-access and public-commons projects under five general headings: (1) open-source software; (2) open-access journal and article initiatives; (3) open-access disciplinary and institutional depositories; (4) search engine approaches; and (5) the general public-commons types of initiatives.

OPEN-SOURCE SOFTWARE

The vast majority of speakers at this symposium have used the leading commercial presentation software to guide their talks. Yet, the freely available open-source equivalents of Microsoft Office have increased dramatically in quality, even within the past few months. For example, OpenOffice is an open-source application that extends from and builds upon the open-source operating system of Linux.² This particular open-source suite of programs seems to work well, the programs are free, and many argue that the potential exists for these programs to far surpass the Microsoft Office suite in general quality and usability due to the transparency of the code and the high level of private and corporate commitments to continue to improve the programs.

Who wrote the code? Volunteer programmers continue to contribute, but this version is also strongly supported by Sun Microsystems. Sun sees at least some open-source software as making sense for its long-term corporate business model.

Writing computer code for sophisticated products such as Linux or OpenOffice requires a high level of expertise. The typical person and even the typical programmer cannot productively contribute to such a development effort. Often these high-end products will take much more than volunteer efforts to make them truly useful and practical. Regardless, the resulting software becomes available for all in the world to freely utilize.

JOURNAL AND ARTICLE OPEN-ACCESS INITIATIVES

There now are several major specialty collections of full-text, open-access scientific journal articles freely available on the Web. For example, the National Aeronautics and Space Administration’s Astrophysics Data System has 300,000 full-text articles online³ and Highwire Press has about the same number focused in the biomedicine and life science fields.⁴ Other initiatives include the high-energy physics arXiv⁵ and PubMed Central.⁶ Most of these online archives deal with intellectual property issues on a journal-by-journal negotiation basis or have scientists submit original work directly to their archive.

The U.S. National Library of Medicine (NLM) subscribes to many thousands of journals. NLM facilitates open access to approximately 100 life science journals that have agreed to make the full text of their articles freely available as soon as they are published, or after a specified period of time, through PubMed Central. Current PubMed Central journals have delays of up to two years, with most releasing their material six months or less after publication. Thus, one could argue that this open-access capability is funded by the philanthropy of those journals choosing to participate; however, several have reported that their subscriptions have increased since joining PubMed Central due to the visibility created.

²	See http://www.openoffice.org.
³	See http://adswww.harvard.edu.
⁴	See http://highwire.stanford.edu.
⁵	See http://www.arXiv.org.
⁶	See http://www.pubmedcentral.nih.gov.

Page 116 Cite

Suggested Citation:"27 Overview of Open-Access and Public-Commons Initiatives in the United States." National Research Council. 2004. Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11030.

×

Elsewhere in the world BioMed Central supports 70 online fully-refereed medical and biology journals.⁷ These articles are immediately open to worldwide public access. BioMed Central’s economic model for providing open access is to charge author fees and fees from their institutional members. In this way the costs are spread among the numerous funding agencies to which authors pass on their charges.

The Public Library of Science is another initiative focused on the life sciences and medicine.⁸ These are fields in which substantial research funding is currently available. Their economic model is very similar to BioMed Central, and they are contemplating a $1,500 author charge per article to support the system. Many of the world’s open-access journals are accessible through the Directory of Open Access Journals.⁹

But, if you are engaged in scholarly research and are writing in the humanities or social sciences, fields in which government funding for most research is not the norm, the BioMed Central model is unlikely to work. In these scholarly areas, you cannot pass publication fees on to funding agencies. Or perhaps you are a young researcher lacking the funds to support your publication costs or you are economically disadvantaged in some other manner. Many researchers in the humanities and social sciences will be unable to contribute to pay the author charges that would allow them to contribute to the open-access scientific literature stream.

Another successful model works through professional member organizations. Dues are paid to a professional member organization that publishes a journal or database they make freely available through an open-access environment. Some free rider problems still exist, but this is generally a successful model for many journals.

What about researchers and scholars from developing countries? Again, many will be unable to pay the author charges that would allow them to contribute to the scientific literature stream. Even when these researchers gain Internet access and are able to read online journals, they are unable to contribute back to that journal under an economic model that requires the contributor to pay.

There is no single model appropriate for all fields and researchers. Other open-access funding models are working in providing common bodies of scientific literature open to all. For example, the Scholarly Publishing and Academic Resources Coalition (SPARC)¹⁰ is an initiative of university libraries and others that have come together to pool their economic resources to start up and support low-cost journals that will compete directly with the most expensive academic journals. All libraries commit to subscribing to the new journals so that they have an assured minimum income stream that they can count on. Unfortunately, SPARC has very limited resources and therefore has had to focus on the highest payoff opportunities. As a result SPARC has failed to spin off new journals at the same rate as the private sector. SPARC recognizes this and has developed a revised strategy to provide a leadership role in exploring academic alternatives for supporting open-access publications.

Another approach is to develop open-access depositories for articles and data sets. Authors can avoid copyright problems with private publishers by openly archiving an electronic copy of their article prior to submitting it to a publisher for peer review. Most private publishers still agree to publish this openly archived work, knowing they are unable to require its removal from the depository. This is a prevalent practice; authors publish on their own Web pages or in more centralized depositories. Thus, open self-archiving of preprints is being highly encouraged by many in the open-access community.

Institutions setting up these archives tend to establish depositories that comply with the Open Archives Initiative using open-source software programs. Examples of such open-access depositories include the Eprints depository at Southhampton University, the DSpace repository at the Massachusetts Institute of Technology, and CDSWare at CERN.¹¹

The only cost of setting up a preprint service at a typical university is technician time and server time. Article submissions are executed by the academic authors and handled automatically. Depositories can be set up to serve

⁷	See http://www.biomedcentral.com.
⁸	See http://www.plos.org.
⁹	See http://www.doaj.org.
¹⁰	See http://www.arl.org/sparc.
¹¹	Information on these projects can be found on their respective Web sites at http://eprints.org, http://www.dspace.org, and http://cdsware.cern.ch.

Page 117 Cite

Suggested Citation:"27 Overview of Open-Access and Public-Commons Initiatives in the United States." National Research Council. 2004. Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11030.

×

a specific discipline, such as the arXiv server. They can be set up to contain all publications arising from all disciplines at a specific institution, such as at a university. The Eprint site lists 66 organizations that are running archives with its software.

What are the drawbacks of the open-access, preprint depositories? They are only half of the solution. After an article is peer reviewed and published the author has to go back, update the metadata in the system, and file a corrigendum showing the changes to the article. The author cannot legally file the peer-reviewed article. While these systems are fine for preprints, almost no one does the coorigendum and metadata updates. Even if someone does, the article still is not in the form in which the author would like it to be read.

SEARCH ENGINE APPROACHES

Search engine approaches exist to provide full-text access, but they have a few problems. CiteSeer is one approach that is being used to index and access the computer science literature. It searches and crawls the entire Web.¹² The system uses an algorithmic approach to find citations that are germane to computer science literature, providing direct links to any full-text article that is found. It works on a citation-to-citation basis. The CiteSeer Index Web site has approximately 5 million distinct citations within computer science literature that have been drawn from about 500,000 full-text online articles. This is purportedly the largest full-text collection of scientific literature on the Web.

The legal problem with this approach is in obtaining permissions to copy the 500,000 articles. The system should automatically copy the journal articles in order to test the article against profile conditions, extract and index the citations, and host copies of the full-text PDF or Postscript files. The system developers have taken the position that they gain the substantial legal protections granted to search engines by the U.S. Digital Millennium Copyright Act (DMCA). Thus, crawlers like CiteSeer, Google, and AltaVista are able to cache copies of articles in order to allow their search services to operate more efficiently.

While CiteSeer began by crawling the Web, the vast majority of the URLs the system searches today are submitted by the authors who have posted their full-text and often fully refereed articles on the Web. The DMCA indicates that the crawler host is not legally liable if someone else, such as an author, submits an article in which that submitting author no longer has the copyright. So far publishers are not suing scientific authors for posting their own articles on their own Web sites, so these types of systems are actually working.

GENERAL PUBLIC-COMMONS APPROACHES

For those uncomfortable with walking a legal tight rope, the Creative Commons project offers some hope.¹³ This is a thoughtful project, and once some further technical problems are solved, the approach could be embedded on the front end of every open-access data archiving and literature archiving project.

Creative Commons provides an online licensing approach that facilitates the ability of authors and artists to affirmatively place their works into the public domain or into a public commons legal environment. The approach can be applied potentially to all works, whether music, literature, databases, videos, or digital art.

How does it work? One goes to the Creative Commons site and chooses the restrictions, if any, that one wants to apply to their creation. The system automatically generates the specific open-access license to be applied to the work—one version is in plain language, another is in language that only a lawyer could understand, and a third version is machine readable to facilitate searching across the Web. When authors post their work on a Web site for others to download, they can also include a bit of HTML code automatically supplied by the Creative Commons system. When people click on the Creative Commons icon on the author’s Web site, the restrictions chosen on that specific work are readily made known. A link is also provided back to the Creative Commons Web site for the full license provisions.

¹²	See http://citeseer.com.
¹³	See http://creativecommons.org.

Page 118 Cite

Suggested Citation:"27 Overview of Open-Access and Public-Commons Initiatives in the United States." National Research Council. 2004. Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium. Washington, DC: The National Academies Press. doi: 10.17226/11030.

×

What are the drawbacks of the system? The process currently applies only to the Web. Once a licensed file is downloaded, the user may have no idea which of the standard provisions apply since they do not accompany the file. Exploring technological solutions for ensuring that the identity of any file in any format is maintained and thereby retained in the public commons is a major development thrust of the project at the current time.

A second drawback is that finding files licensed under a Creative Commons license currently is very difficult. The team has talked to Google with the idea that a researcher will eventually be able to do searches on Google that are limited to public-domain and public-commons works. With this capability, a user will have at least some confidence that what they have found through this process, whether a music file or journal article, can be used freely without breaching someone’s copyright.

A third drawback may arise if the Creative Commons Project proves to be a great success and hundreds of scientists start attaching open-access licenses to their articles and data sets before submitting them for peer review. Some scientists already attach such licenses to their submitted journal articles. Those articles are summarily rejected by most publishers without even being subjected to peer review. Will increased submissions by other scientists help place pressure on the publishers? Possibly, however my experience is that most scientists are likely to buckle. The current scholarly reward system is such that most scientists are more concerned with whether the journals in which they publish are ranked in Science Citation Index or Social Sciences Citation Index than whether they are broadly accessible. The reward system is not focused on the bigger picture of overall progress in science. We should change the reward system for the individual scientist decision maker.

THE IDEAL OPERATIONAL ENVIRONMENT FOR ACCESSING SCIENTIFIC LITERATURE AND RESEARCH DATA ACROSS THE GLOBE

Most researchers want the ability to cite across any and all scholarly domains and link from any citation found on the Web to the full article or the full data set on the open Web. That is what open access is all about; we would like to be able to use the Web as one large open library for us to share with one another. Open-access electronic journals are not likely to completely replace the commercial scientific literature, but open-access literature has a potential major role to play. Most researchers realize the benefits of having access and freely available access to one another’s works.

The secret to open access, according to Peter Suber, is to keep control in the hands of those who most want open access—the authoring scholars. How do we keep control in the hands of the authoring scholars? How do we affect the decision making of individual scholars so that they retain power over their articles? There are several practical actions that can be taken to change the reward system. We should, for instance, consider changing the policies of funding agencies. These policies should encourage researchers to report in their grant applications only those articles and data sets that are in open-access archives. It does little good for a reviewer to assess another scholar’s work or research proposals unless the reviewer has access to all the relevant significant works created by that other scientist. The current system of limited access for scientists in other than the wealthiest of institutions supports lost opportunities in advancing the progress of science.

We should be changing promotion and tenure policies. Peer-reviewed data sets and articles placed in open archives are much more valuable to society, and therefore ought to be recognized as such. The work of university scientists should be available to the world and not just to a small population of economically privileged scientists.

We should also change university intellectual property policies. Formal university policies should encourage professors and researchers to use open-access licenses and should give them full authority to use such licenses for their intellectual property.

Finally, we should identify within each of our disciplinary domains those journals willing to accept open-access licenses and those that are not. We should identify those journals allowing authors to post final journal articles on the Web and those that are not. The goal is that the reward system will eventually benefit economically those that follow open-access approaches.

If the reward system for scholars is restructured and online facilities were made easy, would individual scholars across the globe make use of open-access methods and archives to make their works available for sharing with others? I believe that the history of science shows that the majority of scientists would do so.