National Academies Press: OpenBook

Advancing Commercialization of Digital Products from Federal Laboratories (2021)

Chapter: Appendix C: Definitions of Digital Products

« Previous: Appendix B: Biographies of Committee Members
Suggested Citation:"Appendix C: Definitions of Digital Products." National Academies of Sciences, Engineering, and Medicine. 2021. Advancing Commercialization of Digital Products from Federal Laboratories. Washington, DC: The National Academies Press. doi: 10.17226/26006.
×

Appendix C

Definitions of Digital Products

DATA

Data are digital objects broadly defined as observations or measurements (processed or unprocessed) collected in the course of scientific investigations. Data may take many forms, including numbers, text, images, and videos. Data produced by federal laboratories often have unique qualities, such as being generated by a unique piece of scientific equipment (e.g., colliders) or in massive quantities not available elsewhere (e.g., satellite data). Because of their unique and often irreplaceable qualities, these types of raw data are generally retained for archival purposes. When stored in a form amenable to external analysis, raw and/or preprocessed data may be distributed outside of the lab to ensure reproducibility of scientific results and allow external researchers to advance new use cases and improve data processing pipelines.

Datasets and Databases

While raw data may sometimes be valuable to researchers outside of the federal laboratories, it is often more beneficial for the labs to release datasets: structured collections of data pertaining to a particular endeavor or measurement. Like data, datasets make take many forms, including tables of numeric values, collections of images or videos, or digital formats. Datasets also can include engineering studies, such as characterization of materials, performance measurements of software systems, and failure rates for different devices.

Especially in the latter case, it may be useful to house datasets within a database. A database is a collection of datasets that includes a software framework that allows those datasets to be electronically accessed, analyzed, manipulated, and updated. Databases may enable not just data access but efficient data access, for example, by allowing access to manageable parts of datasets that are too large to retrieve using traditional techniques.

Suggested Citation:"Appendix C: Definitions of Digital Products." National Academies of Sciences, Engineering, and Medicine. 2021. Advancing Commercialization of Digital Products from Federal Laboratories. Washington, DC: The National Academies Press. doi: 10.17226/26006.
×

Metadata and Data Curation

Particularly when releasing data to people who have not been involved in the data’s creation, it is helpful to include metadata—information on the structure and provenance of a dataset. Metadata, increasingly considered a digital product on its own, is often essential to locating and retrieving datasets of interest, understanding the context in which the data were produced, and making appropriate use of the data. Accurate and informative metadata is critical to good data curation—the process of managing datasets and the corresponding metadata to ensure that no information about the structure, provenance, or quality of data is lost. Good data curation, in turn, is an essential component of open and shareable data.

SOFTWARE

Data can often be of limited use in their raw form. Algorithms are computational procedures that take data as inputs and calculate outputs that answer questions and solve problems. Software is a human- and machine-readable set of instructions that implement an algorithm or set of algorithms. Software can be packaged for wider use as an application or app, allowing users to make use of the functionality of a piece of software without having expertise in the underlying code powering the app.

Code Snippets

Code snippets are small subsets of software with lightweight documentation intended to be reused. They allow a programmer to use a standard implementation of a specific statistical technique, reducing error and increasing replicability. Snippets are broadly available online under open-source licenses and are often deeply woven into scientific software.

Scientific Software Artifacts

A particularly important type of code snippet is a scientific software artifact. Such artifacts, which may take the form of data readers or calibration routines, are often an integral part of data, such that they are needed to understand the context of the data. When data are shared, the inclusion of scientific software artifacts greatly increases the usability and portability of the shared data.

STATISTICAL MODELS

Statistical models are a type of digital product that bring together data and software to enable prediction. They are a mathematical framework for predicting the outcome of empirical phenomena based on underlying parameters. Software can be used to define these models and optimize the value of their

Suggested Citation:"Appendix C: Definitions of Digital Products." National Academies of Sciences, Engineering, and Medicine. 2021. Advancing Commercialization of Digital Products from Federal Laboratories. Washington, DC: The National Academies Press. doi: 10.17226/26006.
×

underlying parameters based on input data. A model with optimized parameters is known as a “fitted” model, and can be used for making predictions.

Artificial Intelligence/Machine Learning (AI/ML)

AI/ML is an increasingly popular subfield of statistical modeling. AI/ML employs highly flexible models that can identify complex or unexpected patterns in data, thereby fueling increased predictive performance, instead of prescribing a specific mathematical description of a phenomenon, such as is done in traditional statistical modeling. Deep learning, one popular subtype of AI/ML, uses neural networks to “understand” very complex data, including images, text, speech, and videos.

Training Datasets

The datasets used to “fit” or “train” AI/ML models are known as training datasets. These large datasets often consist of many pairs of inputs and outputs (e.g., input images together with a label indicating the presence or absence of a face), and in the training process, model parameters are optimized to provide the correct output in response to a given input. It is important to provide training datasets that are both large and unbiased (as representative as possible of the full range of inputs that the model may encounter).

Software Notebooks

Software notebooks are web-based tools that support all aspects of data transformation, including workflows, code, data, equations, and visualizations. They are both human- and machine-readable, and can serve as a form of publication as well as a form of software since instructions can be executed within the document. Notebooks can live online as self-contained toolkits or as documentation of a specific data analysis and modeling attempt.1 They can also link to other kinds of digital products, including datasets, external software, workflows, publications, and patents.2

___________________

1 An empirical study of references to notebooks in astronomy over a 5-year period (2014–2018) found that notebooks appear better suited to supporting reuse of machine learning products than to providing direct access to software code and data. The study authors further recommend that any notebooks cited in publications be “stabilized”—frozen in time so the next user can start from the same place where the conclusions claimed were drawn. Indeed, these notebooks even serve as a discovery mechanism for reuse. Wofford, M., B. Boscoe, C. Borgman, I. Pasquetto, and M. Golshan, Milena. 2020. Jupyter notebooks as discovery mechanisms for open science: Citation practices in the astronomy community. Computing in Science & Engineering 22(1).

2 Influenced by and derived from Randles, Bernadette M., et al. 2017. Using the Jupyter notebook as a tool for open science: An empirical study. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE.

Suggested Citation:"Appendix C: Definitions of Digital Products." National Academies of Sciences, Engineering, and Medicine. 2021. Advancing Commercialization of Digital Products from Federal Laboratories. Washington, DC: The National Academies Press. doi: 10.17226/26006.
×

OTHER DIGITAL PRODUCTS

Electronic Media

Many documents and other materials that were previously disseminated through print or other analog media are now distributed by federal laboratories in digital form. These include electronic versions of scientific publications (one vehicle for communicating scientific discoveries from federal labs); electronic manufacturing designs of physical objects; and digital images and videos for directly communicating results to the scientific community, policy makers, the media, and the general public.

Digital Services

Many scientific analyses and digital products developed at federal laboratories require more computational power than is available in a single computer. Cloud computing allows individual users to remotely access the computational power of many computers at once, which can be essential to harness and analyze the vast datasets, complex software, and powerful statistical models routinely produced by federal labs and the private sector. Computing power may be provided by a cluster of computers at an individual lab or a centralized scientific computing facility (e.g., the National Energy Research Supercomputing Center). These facilities, along with the substantial in-house computer processing and analytical expertise at federal labs, can be made available to external researchers.

In addition, federal labs support a wide range of other public-facing digital services, including time.gov (through which the National Institute of Standards and Technology [NIST] provides digital access to the official time) and weather.gov (through which the National Weather Service provides digital access to weather data). Federal labs also maintain an array of software repositories through which they can provide public access to software and code produced in the lab (e.g. osti.gov/doecode, code.mil, code.nasa.gov, software.nasa.gov, and code.nsa.gov).

Suggested Citation:"Appendix C: Definitions of Digital Products." National Academies of Sciences, Engineering, and Medicine. 2021. Advancing Commercialization of Digital Products from Federal Laboratories. Washington, DC: The National Academies Press. doi: 10.17226/26006.
×
Page 139
Suggested Citation:"Appendix C: Definitions of Digital Products." National Academies of Sciences, Engineering, and Medicine. 2021. Advancing Commercialization of Digital Products from Federal Laboratories. Washington, DC: The National Academies Press. doi: 10.17226/26006.
×
Page 140
Suggested Citation:"Appendix C: Definitions of Digital Products." National Academies of Sciences, Engineering, and Medicine. 2021. Advancing Commercialization of Digital Products from Federal Laboratories. Washington, DC: The National Academies Press. doi: 10.17226/26006.
×
Page 141
Suggested Citation:"Appendix C: Definitions of Digital Products." National Academies of Sciences, Engineering, and Medicine. 2021. Advancing Commercialization of Digital Products from Federal Laboratories. Washington, DC: The National Academies Press. doi: 10.17226/26006.
×
Page 142
Next: Appendix D: List of Federal Laboratories »
Advancing Commercialization of Digital Products from Federal Laboratories Get This Book
×
 Advancing Commercialization of Digital Products from Federal Laboratories
Buy Paperback | $55.00 Buy Ebook | $44.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Federal laboratories play a unique role in the U.S. economy. Research and development conducted at these labs has contributed to the advancement or improvement of such key general-purpose technologies as nuclear energy, computers, the Internet, genomics, satellite navigation, the Global Positioning System, artificial intelligence, and virtual reality. Digital output from federal laboratories includes data, metadata, images, software, code, tools, databases, algorithms, and statistical models. Importantly, these digital products are nonrivalrous, meaning that unlike physical products, they can be copied at little or no cost and used by many without limit or additional cost.

Advancing Commercialization of Digital Products from Federal Laboratories explores opportunities to add economic value to U.S. industry through enhanced utilization of intellectual property around digital products created at federal laboratories. This report examines the current state of commercialization of digital products developed at the federal labs and, to a limited extent, by extramural awardees, to help identify barriers to commercialization and technology transfer, taking into account differences between government-owned, contractor-operated (GOCO) and government-owned, government-operated (GOGO) federal labs.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!