2

Omics-Based Clinical Discovery: Science, Technology, and Applications

Since the process of mapping and sequencing the human genome began, new technologies have made it possible to obtain a huge number of molecular measurements within a tissue or cell. These technologies can be applied to a biological system of interest to obtain a snapshot of the underlying biology at a resolution that has never before been possible. Broadly speaking, the scientific fields associated with measuring such biological molecules in a high-throughput way are called “omics.”

Many areas of research can be classified as omics. Examples include proteomics, transcriptomics, genomics, metabolomics, lipidomics, and epigenomics, which correspond to global analyses of proteins, RNA, genes, metabolites, lipids, and methylated DNA or modified histone proteins in chromosomes, respectively. There are many motivations for conducting omics research. One common reason is to obtain a comprehensive understanding of the biological system under study. For instance, one might perform a proteomics study on normal human kidney tissues to better understand protein activity, functional pathways, and protein interactions in the kidney. Another common goal of omics studies is to associate the omics-based molecular measurements with a clinical outcome of interest, such as prostate cancer survival time, risk of breast cancer recurrence, or response to therapy. The rationale is that by taking advantage of omics-based measurements, there is the potential to develop a more accurate predictive or prognostic model of a particular condition or disease—namely, an omics-based test (see definition in the Introduction)—that is more accurate than can be obtained using standard clinical approaches.

This report focuses on the the stages of omics-based test development



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 33
2 Omics-Based Clinical Discovery: Science, Technology, and Applications Since the process of mapping and sequencing the human genome began, new technologies have made it possible to obtain a huge number of molecu- lar measurements within a tissue or cell. These technologies can be applied to a biological system of interest to obtain a snapshot of the underlying biology at a resolution that has never before been possible. Broadly speak- ing, the scientific fields associated with measuring such biological molecules in a high-throughput way are called “omics.” Many areas of research can be classified as omics. Examples include proteomics, transcriptomics, genomics, metabolomics, lipidomics, and epigenomics, which correspond to global analyses of proteins, RNA, genes, metabolites, lipids, and methylated DNA or modified histone proteins in chromosomes, respectively. There are many motivations for conducting omics research. One common reason is to obtain a comprehensive under- standing of the biological system under study. For instance, one might perform a proteomics study on normal human kidney tissues to better understand protein activity, functional pathways, and protein interactions in the kidney. Another common goal of omics studies is to associate the omics-based molecular measurements with a clinical outcome of interest, such as prostate cancer survival time, risk of breast cancer recurrence, or response to therapy. The rationale is that by taking advantage of omics- based measurements, there is the potential to develop a more accurate pre- dictive or prognostic model of a particular condition or disease—namely, an omics-based test (see definition in the Introduction)—that is more accurate than can be obtained using standard clinical approaches. This report focuses on the the stages of omics-based test development 33

OCR for page 33
34 EVOLUTION OF TRANSLATIONAL OMICS that should occur prior to use to direct treatment choice in a clinical trial. In this chapter, the discovery phase (see Figures 2-1 and S-1) of the recom- mended omics-based test development process is discussed, beginning with examples of specific types of omics studies and the technologies involved, followed by the statistical, computational, and bioinformatics challenges that arise in the analysis of omics data. Some of these challenges are unique to omics data, whereas others relate to fundamental principles of good scientific research. The chapter begins with an overview of the types of omics data and a discussion of emerging directions for omics research as they relate to the discovery and future development of omics-based tests for clinical use. TYPES OF OMICS DATA Examples of the types of omics data that can be used to develop an omics-based test are discussed below. This list is by no means meant to be comprehensive, and indeed a comprehensive list would be impossible because new omics technologies are rapidly developing. Genomics The genome is the complete sequence of DNA in a cell or organism. This genetic material may be found in the cell nucleus or in other organelles, such as mitochondria. With the exception of mutations and chromosomal rearrangements, the genome of an organism remains essentially constant over time. Complete or partial DNA sequence can be assayed using various experimental platforms, including single nucleotide polymorphism (SNP) chips and DNA sequencing technology. SNP chips are arrays of thou- sands of oligonucleotide probes that hybridize (or bind) to specific DNA sequences in which nucleotide variants are known to occur. Only known sequence variants can be assayed using SNP chips, and in practice only common variants are assayed in this way. Genomic analysis also can detect insertions and deletions and copy number variation, referring to loss of or amplification of the expected two copies of each gene (one from the mother and one from the father at each gene locus). Personal genome sequencing is a more recent and powerful technology, which allows for direct and com- plete sequencing of genomes and transcriptomes (see below). DNA also can be modified by methylation of cytosines (see Epigenomics, below). There is also an emerging interest in using genomics technologies to study the impact of an individual’s microbiome (the aggregate of microorganisms that reside within the human body) in health and disease (Honda and Littman, 2011; Kinros et al., 2011; Tilg and Kaser, 2011).

OCR for page 33
Discovery and Test Validation Stage Evaluation for Clinical Utility and Use Stage Test Validation Phase B Discovery Phase R I Candidate Test Developed G on Training Set, Followed H by Lock-Down of All T Computational Procedures Analytical and Con rmation of Candidate See Chapter 4 Clinical/Biological Omics-Based Test Using: Validation 1. An Independent Sample Set If L Available (Preferred); I OR N 2. A Subset of the Training Set NOT E Used During Training See Chapter 3 (Less Preferred). FIGURE 2-1 Omics-based test development process, highlighting the discovery phase. In the discovery phase, a candidate test is developed, precisely defined, and confirmed. The computational procedures developed in this phase should be fully specified and locked down through all subsequent development steps. Ideally, confirmation should take place on an independent sample set. Under exceptional circumstances it may be necessary to move into the test validation phase without first confirming the candidate test on an independent sample set if using an independent test set in the discovery phase is not possible, but this increases the risk of test failure in the validation phase. Statistics and bioinformatics validation occurs throughout the discovery and test validation stage as 35 well as the stage for evaluation of clinical utility and use.

OCR for page 33
36 EVOLUTION OF TRANSLATIONAL OMICS Transcriptomics The transcriptome is the complete set of RNA transcripts from DNA in a cell or tissue. The transcriptome includes ribosomal RNA (rRNA), mes- senger RNA (mRNA), transfer RNA (tRNA), micro RNA (miRNA), and other non-coding RNA (ncRNA). In humans, only 1.5 to 2 percent of the genome is represented in the transcriptome as protein-coding genes. The two dominant classes of measurement technologies for the transcriptome are microarrays and RNA sequencing (RNAseq). Microarrays are based on oligonucleotide probes that hybridize to specific RNA transcripts. RNAseq is a much more recent approach, which allows for direct sequencing of RNAs without the need for probes. Oncotype DX, MammaPrint, Tissue of Origin, AlloMap, CorusCAD, and the Duke case studies described in Appendix A and B all involve transcriptomics-based tests. Proteomics The proteome is the complete set of proteins expressed by a cell, tissue, or organism. The proteome is inherently quite complex because proteins can undergo posttranslational modifications (glycosylation, phosphoryla- tion, acetylation, ubiquitylation, and many other modifications to the amino acids comprising proteins), have different spatial configurations and intracellular localizations, and interact with other proteins as well as other molecules. This complexity can lead to challenges in proteomics-based test development. The proteome can be assayed using mass spectrometry and protein microarrays (reviewed in Ahrens et al., 2010; Wolf-Yadlin et al., 2009). Unlike RNA transcripts, proteins do not have obvious complemen- tary binding partners, so the identification and characterization of capture agents is critical to the success of protein arrays. The Ova1 and OvaCheck tests discussed in Appendix A are proteomics-based tests. Epigenomics The epigenome consists of reversible chemical modifications to the DNA, or to the histones that bind DNA, and produce changes in the expres- sion of genes without altering their base sequence. Epigenomic modifica- tions can occur in a tissue-specific manner, in response to environmental factors, or in the development of disease states, and can persist across generations. The epigenome can vary substantially among different cell types within the same organism. Biochemically, epigenetic changes that are measured at high-throughput belong to two categories: methylation of DNA cytosine residues (at CpG) and multiple kinds of modifications of specific histone proteins in the chromosomes (histone marks). RNA editing

OCR for page 33
37 OMICS-BASED CLINICAL DISCOVERY is another mechanism for epigenetic changes in gene expression, measured primarily by transcriptomic methods (Maas, 2010). Metabolomics The metabolome is the complete set of small molecule metabolites found within a biological sample (including metabolic intermediates in carbohydrate, lipid, amino acid, nucleic acid, and other biochemical path- ways, along with hormones and other signaling molecules, as well as exog- enous substances such as drugs and their metabolites). The metabolome is dynamic and can vary within a single organism and among organisms of the same species because of many factors such as changes in diet, stress, physical activity, pharmacological effects, and disease. The components of the metabolome can be measured with mass spectrometry (reviewed in Weckwerth, 2003) as well as by nuclear magnetic resonance spectroscopy (Zhang et al., 2011). This method also can be used to study the lipidome (reviewed in Seppanen-Laakso and Oresic, 2009), which is the complete set of lipids in a biological sample. EMERGING OMICS TECHNOLOGIES AND DATA ANALYSIS TECHNIQUES Many emerging omics technologies are likely to influence the develop- ment of omics-based tests in the future, as both the types and numbers of molecular measurements continue to increase. Furthermore, advancing bio- informatics and computational approaches are enabling improved analyses of omics data, such as greater integration of different data types. Given the rapid pace of development in these fields, it is not possible to list all rel- evant emerging technologies or data analytic techniques. A few illustrative developments are briefly discussed. Advances in RNA sequencing technology are making possible a higher resolution view of the transcriptome. These new approaches could facilitate the development of more novel molecular diagnostics. In the future it may be possible to develop omics-based tests on the basis of small non-coding RNAs, RNA editing events, or alternative splice variants that were not mea- sured using previous hybridization-based technologies such as microarrays. For example, analysis of miRNA (derived from RNA sequencing) shows great promise for clinical diagnostics (Moussay et al., 2011; Sugatani et al., 2011; Tan et al., 2011; Yu et al., 2008). Similarly, DNA sequencing is making it possible to identify rare or previously unmeasured mutations that may have important clinical implica- tions. Next-generation sequencing technologies hold tremendous promise for not only identification of complete DNA and RNA sequences, but also

OCR for page 33
38 EVOLUTION OF TRANSLATIONAL OMICS high-throughput identification of epigenetic and posttranscriptional modi- fications to DNA or RNA, respectively. For instance, new sequencing tech- nologies can monitor a wide variety of epigenetic changes at the genomic scale, in addition to sequencing information. However, it is important to note that because next-generation RNA and DNA sequencing produces even more measurements per sample than do traditional approaches, these new technologies add to the challenge of extremely high data dimensionality and the risks of overfitting compu- tational models to the available data (see the section on Computational Model Development and Cross-Validation for a discussion of overfitting). Large meta-analyses of sequencing datasets collected at multiple sites may prove useful for overcoming these risks and aid in developing clinically useful omics-based tests. The field of proteomics has benefited from a number of recent advances. One example is the development of selected reaction monitoring (SRM) proteomics based on automated techniques (Picotti et al., 2010). During the past 2 years, multiple peptides distinctive for proteins from each of the 20,300 human protein-coding genes have been synthesized and their mass spectra determined. The resulting SRMAtlas is publicly available for the entire scientific community to use in choosing targets and purchasing peptides for quantitative analyses (Omenn et al., 2011). In addition, data from untargeted “shotgun” mass spectrometry-based proteomics have been collected and uniformly analyzed to generate peptide atlases for plasma, liver, and other organs and biofluids (Farrah et al., 2011). Meanwhile, antibody-based protein identification and tissue expres- sion studies have progressed considerably (Ayoglu et al., 2011; Fagerberg et al., 2011); the Human Protein Atlas has antibody findings for more than 12,000 of the 20,300 gene-coded proteins. The Protein Atlas is a useful resource for planning experiments and will be enhanced by linkage with mass spectrometry findings through the emerging Human Proteome Project (Legrain et al., 2011). Recently developed protein capture-agent aptamer chips also can be used to make quantitative measurements of approximately 1,000 proteins from the blood or other sources (Gold et al., 2010). For example, Ostroff et al. (2010) recently reported generation of a 12-protein panel from analysis of 1,100 plasma proteins that was shown to have promising clinical test characteristics for diagnosis of non-small cell lung cancers. A major bottleneck in the successful deployment of large-scale pro- teomic approaches is the lack of high-affinity capture agents with high sensitivity and specificity for particular proteins (including variants due to posttranslational modifications, alternative splicing, and single-nucleotide polymorphisms or gene fusions). This challenge is exacerbated in highly complex mixtures such as blood, where the concentrations of different pro-

OCR for page 33
39 OMICS-BASED CLINICAL DISCOVERY teins vary by more than 10 orders of magnitude. One technology that holds great promise in this regard is “click chemistry” (Service, 2008), which uses a highly specific chemical linkage (generally formed through the Huisgen reaction) to “click” together low-affinity capture agents to create a single capture agent with much higher affinity. It also is feasible to combine com- putational algorithms for modeling protein structures and conformation to infer functional differences among alternative splice isoforms of proteins, including those involved in key cancer pathways (Menon et al., 2011). Improving technologies for measurements of small molecules (Drexler et al., 2011) also is enabling the use of metabolomics for the development of candidate omics-based tests with potential clinical utility (Lewis and Gerszten, 2010). Promising early examples include a metabolomic analysis that identified a role for sarcosine, an N-methyl derivative of the amino acid glycine, in prostate cancer progression and metastasis (Sreekumar et al., 2009), metabolomic characterization of ovarian epithelial carcinomas (Ben Sellem et al., 2011), and an integrated metabolomic and proteomic approach to diagnosis, prediction, and therapy selection for heart failure (Arrell et al., 2011). Included within metabolomics is the emerging ability to more fully measure the lipids in a sample, a rich source of additional poten- tial biomarkers (Masoodi et al., 2010). As with other omics data types, a lengthy, complex development path is necessary to establish a clinically relevant omics-based test from reports identifying metabolite concentration differences associated with a phenotype of interest (Koulman et al., 2009). New technologies are emerging that will make it possible to obtain omics measurements (such as transcriptomics, proteomics) on single cells (Tang et al., 2011; Teague et al., 2010). Such detailed molecular measure- ments provide deep insight into the underlying biology of tissues, and potentially form a powerful basis for omics-based test development. How- ever, as the resolution of these measurements increases, so too does the variability in the measurements due to the heterogeneity of cell states (Ma et al., 2011). Thus, while emerging omics technologies hold great potential for the development of omics-based tests, they also may exacerbate dangers of overfitting the computational model to the datasets. Recent interest has focused on measuring multiple omics data types on a single set of samples, in order to integrate different types of molecular measurements into an omics-based test. Such multidimensional datasets have the potential to provide deep insight into biological mechanisms and networks, allowing for the development of more powerful clinical diag- nostics. An encouraging example of simultaneous measurement of multiple types of omics data is the DNA-encoded antibody libraries approach (Bailey et al., 2007), which can measure DNA, RNA, and protein from the same sample. Another example is the analysis of histone modifications to identify

OCR for page 33
40 EVOLUTION OF TRANSLATIONAL OMICS potential epigenetic biomarkers for prostate cancer prognosis (Bianco- Miotto et al., 2010). Approaches that integrate multiple omics data types within the same clinical test are expected to grow in importance as the number of simultane- ous measurements that can be made continues to increase. While it is rela- tively straightforward to increase the number of genomic and transcriptomic measurements (because DNA and RNA have complementary binding part- ners), increasing the number of protein measurements is more challenging because of the need for high-affinity capture agents, as discussed previously in this section. Systems approaches that integrate multiple data types in functionally based models can be advantageous for the development of omics-based tests. For instance, the analysis of omics measurements in the con - text of biomolecular networks or pathways can help to reduce the num- ber of variables in the data by constraining the possible relationships between variables, ultimately leading to more robust and clinically useful molecular tests. General approaches for using prior biological knowledge to enhance signal in omics data include removing measurements that are believed to be noise or for which there is no support in the published bio- logical literature (filtering), using pathway databases or other sources to guide model construction, and aggregating individual measurements, often across data types, to integrate multiple sources of evidence to support con- clusions (Ideker et al., 2011). For example, in a study of prion-mediated neurodegeneration, data from five mouse strains and three prion strains were used to identify the transcripts, pathways, and networks that were commonly perturbed across all genetic backgrounds (Hwang et al., 2009; Omenn, 2009). Datasets from genome-wide association studies, in which a set of cases and controls are sampled from a large population and genotyped and each mutation identified is evaluated for association with the phenotype of interest, also can be analyzed within the context of biological pathways in order to increase identification of disease-related mutations (Segre et al., 2010). The incorporation of evolutionarily conserved gene sets can lead to the identification of often unexpected factors in disease (McGary et al., 2010). Large-scale mechanistic network models (for example, for meta- bolic, regulatory, or signaling networks) may be used to identify biomarkers grounded in disease mechanisms (Folger et al., 2011; Frezza et al., 2011; Gottlieb et al., 2011; Lewis et al., 2010; Shlomi et al., 2011). Genomics, transcriptomics, proteomics, and metabolomics data can be combined with structural protein analysis in order to predict drug targets or even drug off-target effects (Chang et al., 2010). While computational models of bio- molecular networks for eventual clinical use are still in their infancy, their

OCR for page 33
41 OMICS-BASED CLINICAL DISCOVERY potential for providing stronger mechanistic underpinnings to omics-based test development is encouraging. During the past 10 years, much of the effort to identify genes linked to disease and other conditions of biological interest has focused on genome- wide association studies. However, more recent work has successfully iden- tified disease-causal genes using whole genome or exome sequencing (Ng et al., 2010; Roach et al., 2010). Such studies may prove to be very beneficial for the development of omics-based tests, and indeed such strategies are being used clinically today for the identification of the causal gene mutation resulting in unidentified and uncommon inherited disease states. STATISTICS AND BIOINFORMATICS DEVELOPMENT OF OMICS-BASED TESTS In recent years, a large number of papers have reported new omics- based discoveries and the development of new candidate omics-based tests: that is, computational procedures applied to omics-based measurements to produce a clinically actionable result. However, few of these candidate omics-based tests have progressed to clinical use (Ransohoff, 2008, 2009). Some of this discrepancy may be due to the inevitable time lapse of mov- ing from initial identification of a candidate omics-based test to a precisely defined and validated test that can be used clinically. However, more important are the many significant challenges in the formulation of appropriate research questions and in research design and conduct that confront the successful discovery of candidate omics-based tests, including the complexity of the data and the need for rigorous analy- ses, and the frequent lack of a plausible biological mechanism underpinning many of these discoveries. These challenges need to be addressed in order to realize the full clinical potential of omics research, taking into account issues specific to the field as well as broader principles of good scientific research. Two primary scientific causes for failure of a candidate omics-based test to progress to clinical use are: 1. A candidate omics-based test may not be adequately designed for answering a specific, well-defined, and relevant clinical question. This crucial point is addressed in Chapters 3 and 4. 2. Omics-based discovery studies may not be conducted with ade- quate statistical or bioinformatics rigor, making it unlikely or even impossible that the candidate omics-based test will prove to be clinically valid or useful. This critical problem is addressed in the remainder of this chapter.

OCR for page 33
42 EVOLUTION OF TRANSLATIONAL OMICS Figure 2-1 highlights the discovery and confirmation of a candidate omics-based test, the first component of the committee’s recommended test development and evaluation process. When candidate omics-based tests from the discovery phase are intended for further clinical development, several criteria should be satisfied and fully disclosed (for example, through publication or patent application) to enable independent verification of the findings (Recommendation 1), as discussed below. For the purpose of this discussion, the committee assumed that a clearly defined and clinically relevant scientific or clinical question or questions have been identified, and that an omics dataset from analyses of a set of patient samples, along with an associated clinical outcome for each patient, is available. For example, an investigator may ask whether gene expression mea- surements could be used to predict recurrence in node-negative breast cancer samples in a way that is substantially more accurate than standard clinical prognostic factors, such as tumor size and grade. The investigator might have data consisting of gene expression measurements for breast can- cer tissue samples obtained from patients with node-negative breast cancer, along with disease-free survival time for each patient following surgery. The goal would be to develop a defined assay method for data generation and a fully specified computational procedure1 that can be used to reliably predict, on the basis of gene expression measurements on a new patient sample, whether a patient’s cancer will recur. Before embarking on omics-based discovery, it is worth considering whether or not the test that will eventually be developed has a reason- able chance of demonstrating clinical validity and utility. For example, the sensitivity and specificity needed, particularly in light of the prevalence of the condition in the population to be tested, should be considered (see also Appendix A, page 209, for a discussion of sensitivity and specificity needs for an ovarian cancer screening test). Several steps need to be followed to achieve this goal: (1) data quality control; (2) computational model development and cross-validation; (3) con- firmation of the computational model on an independent dataset; and (4) release of data, code, and the fully specified computational procedures to the scientific community. Each of these is discussed below. 1 All component steps of the computational procedure—namely, all data processing steps, normalization techniques, weights, parameters, and other aspects of the model, as well as the mathematical formula or formulas used to convert the data into a prediction of the phenotype of interest—are completely formulated in writing.

OCR for page 33
43 OMICS-BASED CLINICAL DISCOVERY Step 1: Data Quality Control As in most areas of science, data quality control is a crucial first step. Because omics datasets are typically composed of many thousands, if not millions, of measurements, data quality control is often performed com- putationally. For instance, an investigator might remove genes expressed across conditions near or below background levels on a microarray. The reproducibility of the measurements from run to run (the technical vari- ance) also can be assessed. Furthermore, it may be useful to closely examine aspects of experimental design, including sample run date and other pos- sible confounding factors such as the source of the tissue analyzed (includ- ing normal control tissue) and potential heterogeneities within the tissues, to determine if these have had an effect on the data. This is particularly important because factors such as run date or machine operator can often have a much larger effect on omics measurements than the factors of bio- logical interest (Leek et al., 2010), such as time to disease recurrence or cancer subtype. It is essential that such quality assessment evaluations of the data be done in a blinded fashion, without knowledge of the clinical status or treat- ment outcomes of the patients whose specimens were tested. Step 2: Computational Model Development and Cross-Validation Once investigators have determined in Step 1 that the data are of ade- quate quality, a candidate omics-based test associated with a phenotype of interest, such as a biologic subgroup, preclinical responsiveness to a novel therapy, or a clinical outcome, can be developed on the basis of the omics measurements. An almost unlimited number of statistical tools can be used to perform this task; therefore, they are not enumerated here. However, some key characteristics and challenges are shared by nearly all of these methods and are discussed below. In general, omics datasets consist of thousands to millions of molecu- lar measurements. Typically, investigators first perform feature selection, which entails selecting a subset of the measurements that appear to be associated with the characteristic or outcome or that is thought to be biologically relevant based on prior knowledge. Using just this subset of measurements, a fully defined computational model can be developed to predict the clinical outcome on the basis of the omics measurements. This reduction of required measurements can be beneficial for avoiding the later possibility that an omics-based test involving a huge number of measure- ments is not clinically viable for financial or technical reasons. Note that if cross-validation will be performed in order to select tuning parameters or evaluate the computational model performance, then feature selection

OCR for page 33
54 EVOLUTION OF TRANSLATIONAL OMICS BOX 2-1 Continued each measured phenotype variable and for genotype results. Preauthorization is required to gain access to the phenotype and genotype results for each individual, and this individual-level data is coded to protect the identity of study participants (Mailman et al., 2007; NLM, 2006). Privacy of Health Information The laws protecting the privacy of individuals’ health information are a potential obstacle to making omics data sustainably available to other investigators. Much of the data in omics research is from human subjects and potentially could be linked to a specific individual, especially in the case of genetic data. In addition, most omics data used in the development of a clinical test need to be connected to individuals’ clinical data to be useful in that development process. The Health Insurance Portability and Accountability Act Privacy Rule protects the privacy of personally identifiable health information (called “protected health information [PHI]”) created or received by health care professionals, health plans, or health care clearinghouses (“covered entities”). In general, the rule requires test developers to get authorization from research subjects in order to use and disclose their PHI in health research.k The rule does not require researchers to get authorization to use and disclose PHI that has been de-identified (as defined in the regulation). Until recently, there was considerable confusion about whether the Privacy Rule protected genetic information (IOM, 2009). However, the Genetic Information Nondiscrimination Act directed the U.S. Secretary of Health and Human Services to modify the Privacy Rule to explicitly recognize genetic information as PHI.l The Common Rule provides human subjects protections in omics research that is federally funded. It protects the safety, autonomy, privacy, and fair treat- ment of patient-participants in federally funded research conducted on humans, and the cultural groups from which they are recruited. The Common Rule requires researchers to get informed consent from a person to use his/her private identifi- can often be obtained by beginning the omics-based test develop- ment process using a subset of the omics measurements for which a plausible biological mechanism is available. For instance, there was a plausible biological mechanism behind the HER2 tests and Oncotype DX to motivate their initial clinical trials, but less so for the Duke, MammaPrint, and Ova1 tests (discussed in Appendix A and B). Bioinformatics methods to link transcript or protein expression changes to relevant signaling pathways or biological networks need to be deployed appropriately. 3. Data variability unrelated to clinical outcome of interest: Often, a computational model developed on one dataset (Step 2) performs

OCR for page 33
55 OMICS-BASED CLINICAL DISCOVERY able information in research. Research that involves “anonymized data” (that is, information that is recorded in such a manner that subjects cannot be identified) is exempt from this requirement. However, an advanced notice of proposed rule- making includes the proposal to revise this aspect of the Common Rule to match the Privacy Rule’s more rigorous de-identification standards.m If this change becomes codified in the regulations, researchers may be required in many cir- cumstances to obtain authorization and informed consent prior to sharing their research data, in order to comply with these laws, particularly as DNA sequence- based data can now be considered identifiable. aThe Copyright Act of 1976, 17 U.S.C. §§ 101-810 (2008). bFeist Publications v. Rural Telephone Service Co., 499 U.S. 360 (1991). cPatent Act, 35 U.S.C. § 154 (2008). dId. at § 103(a). eId. at §§ 101-103. fThe Association for Molecular Pathology, et al. v. United States Patent and Trademark Office, et al., 653 F.3d 1329 (Fed. Cir. 2011). gBilski vs. Kappos, 130 U.S. 3218 (2010). hMayo Collaborative Services v. Prometheus Laboratories, Inc., 628 F.3d 1347, (Fed. Cir. 2010), cert. granted, (U.S. Dec. 7, 2011) (No. 10-1150). iThe Association for Molecular Pathology, et al. v. Myriad Genetics, Inc. et al., petition for cert. filed (December 7, 2011). jLeahy-Smith America Invents Act, Public Law No. 112-29 § 27(2011). kThe Secretary of Health and Human Services issued a notice of proposed rulemaking that includes potential modifications to the HIPAA Privacy Rule’s authorization requirements in response to the statutory amendments under the Health Information Technology for Economic and Clinical Health Act (the “HITECH Act”). See, Modifications to the HIPAA Privacy, Security, and Enforcement Rules Under the Health Information Technology for Economic and Clinical Health Act, 75 Fed. Reg. 40,868 (July 14, 2010). lGenetic Information Nondiscrimination Act, Public Law No. 110-233 (2008). mHuman Subjects Research Protections: Enhancing Protections for Research Subjects and Reducing Burden, Delay, and Ambiguity for Investigators, 76 Fed. Reg. 44,512 (July 26, 2011). poorly on another independent dataset (Step 3). This can occur for a number of reasons, such as variability in patient population, sam- ple preparation, time of sample collection, operator variability, etc. Hence, evidence of a computational model’s performance based only on the dataset used to train the model, even if cross-validation is properly performed, provides little evidence of the model’s suit- ability for future samples. A relevant example here is the OvaCheck case study, discussed in Appendix A, in which signals obtained on one dataset did not hold up when the analysis was applied to other independent sample sets (Baggerly et al., 2004).

OCR for page 33
56 EVOLUTION OF TRANSLATIONAL OMICS 4. Need for multiple datasets: For the reasons just described, compu- tational models that are fit on multiple datasets in Step 2 will tend to perform better later. In other words, investigators are urged to develop a computational model on omics datasets derived from specimens and associated clinical outcomes collected at multiple laboratories at multiple institutions, rather than fitting a model on just a single dataset. For instance, the 21-Gene Recurrence Score (Oncotype DX) case study (Appendix A) was developed using multiple independent datasets (Paik et al., 2004). In that case, data were analyzed by the same investigators, but different datasets were derived from different clinical trials at multiple institutions. 5. Study design and batch effects: As in all areas of biomedical research, good study design is crucial. If the dataset used in Step 2 to develop the computational model resulted from poor experimen- tal design (e.g., if the samples from patients whose cancers recurred were processed at a different time or by a different technician or in a different laboratory) then batch effects (Leek et al., 2010) can occur. This will lead to spurious signal, potentially resulting in a computational model that performs extremely well on the data on which it was developed (Step 2), but that will perform poorly on future patient samples (Step 3). A relevant example is the OvaCheck case study, discussed in Appendix A, in which peaks in the noise regions of the proteomic spectra could distinguish sam- ples from controls and cancer, indicating batch effects (Baggerly et al., 2004). 6. Computational procedure lock-down: It is crucial that at the end of Step 2, the fully specified computational procedures be locked down before progressing into confirmation on an independent test set in Step 3. For instance, simply reporting the set of genes included in the computational model underlying a transcriptomics- based test is insufficient, because this does not constitute a fully specified computational procedure. In the original Oncotype DX study, the researchers locked down the computational model after Step 2 and reported the fully specified computational procedures in the paper (Paik et al., 2004). In the Corus CAD case study, lock-down and the fully specified computational procedures were reported in the clinical validation paper (Rosenberg et al., 2010). The fully specified computational procedures for the AlloMap test were reported in Deng et al. (2006). In contrast, in the Duke stud- ies, the genes used in the development of the computational model were reported, but the fully specified computational procedures were not; furthermore, it is likely that the computational proce- dures were not ever fully locked down before proceeding into Step

OCR for page 33
57 OMICS-BASED CLINICAL DISCOVERY 3 or further stages of omics-based test development, including clinical trials (see Appendix B for details). 7. Role of biostatistics and bioinformatics experts: In a relatively new and evolving field such as omics, it is not possible to predict all the possible pitfalls that investigators may face in the discovery phase. The involvement of properly trained biostatistical or bioinformatics collaborators who are fully integrated in all aspects of the discovery and evaluation process can serve as an additional safeguard. The type of biostatistical expertise that is required may vary depending on the stage or phase of test development. For example, experts in developing computational models for omics-based tests may not have sufficient expertise in clinical trial design, and vice versa. This is relevant to the Duke case study (as discussed in Appendix B), in which there was a lack of continuity in biostatistics personnel and numerous errors were identified in the statistical methodology and analyses. COMPLETION OF THE DISCOVERY PHASE OF OMICS-BASED TEST DEVELOPMENT A candidate omics-based test should be defined precisely, including the molecular measurements, the computational procedures, and the intended clinical use of the test, in anticipation of the test validation phase (Recom- mendation 1d). There are enormous opportunities in the rapidly improving suite of omics technologies to identify measurements with potential clinical utility. However, there are significant challenges in moving from the initial identification of potentially relevant differences in omics measurements to validated and robust clinical tests. Among these challenges are risks of overfitting the data in the development of the computational model and the enormous heterogeneity among different studies of ostensibly the same disease states (for both technical and biological reasons). Going forward, transparency in the reporting of all aspects of the development of an omics- based test, including the measurements made, preprocessing techniques used, and the fully specified computational procedure, is critical. The release of sufficient metadata with publication is also key to the identification of can- didate omics-based tests that work across multiple sites, which is necessary to generate increasingly robust omics-based tests to enhance patient care. In the next phase of test development (analytical and clinical/biological validation, described in Chapter 3), the methods used to obtain the omics measurements from patient samples may be changed in order to establish a clinically feasible, inexpensive, and robust assay for implementation in clinical practice. However, the fully specified computational procedures defined in the discovery stage must remain locked down and unchanged

OCR for page 33
58 EVOLUTION OF TRANSLATIONAL OMICS in all subsequent test development steps. At the end of the validation phase in Chapter 3, the complete test method, including the methods for obtaining the omics measurements as well as the fully specified computa- tional procedures, must be locked down before crossing the bright line to evaluate the test for clinical utility and use. SUMMARY AND RECOMMENDATION This chapter has outlined best practices for the discovery phase for omics-based test development. Because omics-based tests rely on interpre- tation of high-dimensional datasets, it is important to guard against over- fitting the data throughout the test development process. Overfitting due to lack of proper statistical methods can lead to a model that fits the training samples well, even though the model might perform poorly on independent samples not used in test development. The steps delineated in this chapter aim to prevent an overfit model from progressing to subsequent stages of test development. Cross-validation or a training set/test set approach can help reduce the risk of overfitting, but confirmation of all fully specified computational procedures and candidate omics-based tests on a blinded independent sample set is the “gold standard” for assessing the validity of any test. The importance of independent confirmation is also emphasized in the committee’s recommendations for funders (see Chapter 5), which urge funders to support this type of work. In addition, complex analyses of these large datasets highlight the need for availability of the data and code used for the discovery phase of omics-based test development, to enable inde- pendent verification of the findings. The result of the discovery process is a candidate omics-based test with locked-down computational procedures that is then moved into the test validation phase to assess analytical and clinical/biological validation, as described in Chapter 3. RECOMMENDATION 1: Discovery Phase When candidate omics-based tests from the discovery phase are intended for further clinical development, the following criteria should be satisfied and fully disclosed (for example, through publication or patent application) to enable independent verification of the findings: a. Candidate omics-based tests should be confirmed using an inde- pendent set of samples, not used in the generation of the computa- tional model and, when feasible, blinded to any outcome or other phenotypic data until after the computational procedures have been locked down and the candidate omics-based test has been applied to the samples;

OCR for page 33
59 OMICS-BASED CLINICAL DISCOVERY b. Data and metadata used for development of the candidate omics- based test should be made available in an independently managed database (such as dbGaP) in standard format; c. Computer code and fully specified computational procedures used for development of the candidate omics-based test should be made sustainably available; and d. A candidate omics-based test should be defined precisely, includ- ing the molecular measurements, the computational procedures, and the intended clinical use of the test, in anticipation of the test validation phase. REFERENCES Ahrens, C. H., E. Brunner, E. Qeli, K. Basler, and R. Aebersold. 2010. Generating and navi- gating proteome maps using mass spectrometry. Nature Reviews Molecular Cell Biology 11(11):789-801. Arrell, D. K., J. Zlatkovic Lindor, S. Yamada, and A. Terzic. 2011. K(ATP) channel-dependent metaboproteome decoded: Systems approaches to heart failure prediction, diagnosis, and therapy. Cardiovascular Research 90(2):258-266. Ayoglu, B., A. Haggmark, M. Neiman, U. Igel, M. Uhlen, J. M. Schwenk, and P. Nilsson. 2011. Systematic antibody and antigen-based proteomic profiling with microarrays. Expert Review of Molecular Diagnostics 11(2):219-234. Baggerly, K. A., J. S. Morris, and K. R. Coombes. 2004. Reproducibility of SELDI-TOF pro- tein patterns in serum: Comparing datasets from different experiments. Bioinformatics 20(5):777-785. Bailey, R. C., G. A. Kwong, C. G. Radu, O. N. Witte, and J. R. Heath. 2007. DNA-encoded antibody libraries: A unified platform for multiplexed cell sorting and detection of genes and proteins. Journal of the American Chemical Society 129(7):1959-1967. Ben Sellem, D., K. Elbayed, A. Neuville, F. M. Moussallieh, G. Lang-Averous, M. Piotto, J. P. Bellocq, and I. J. Namer. 2011. Metabolomic characterization of ovarian epithelial carcinomas by HRMAS-NMR spectroscopy. Journal of Oncology 2011:174019. Bianco-Miotto, T., K. Chiam, G. Buchanan, S. Jindal, T. K. Day, M. Thomas, M. A. Pickering, M. A. O’Loughlin, N. K. Ryan, W. A. Raymond, L. G. Horvath, J. G. Kench, P. D. Stricker, V. R. Marshall, R. L. Sutherland, S. M. Henshall, W. L. Gerald, H. I. Scher, G. P. Risbridger, J. A. Clements, L. M. Butler, W. D. Tilley, D. J. Horsfall, and C. Ricciardelli. 2010. Global levels of specific histone modifications and an epigenetic gene signature predict prostate cancer progression and development. Cancer Epidemiology, Biomarkers, & Prevention 19(10):2611-2622. Chang, R. L., L. Xie, P. E. Bourne, and B. O. Palsson. 2010. Drug off-target effects predicted using structural analysis in the context of a metabolic network model. PLoS Computa- tional Biology 6(9):e1000938. Compendia Bioscience, Inc. 2012. Compendia Bioscience: Cure Cancer with Genomic Data. http://www.compendiabio.com/ (accessed February 23, 2012). Deng, M. C., H. J. Eisen, M. R. Mehra, M. Billingham, C. C. Marboe, G. Berry, J. Kobashigawa, F. L. Johnson, R. C. Starling, S. Murali, D. F. Pauly, H. Baron, J. G. Wohlgemuth, R. N. Woodward, T. M. Klingler, D. Walther, P. G. Lal, S. Rosenberg, S. Hunt, and for the CARGO Investigators. 2006. Noninvasive discrimination of rejection in cardiac allo- graft recipients using gene expression profiling. American Journal of Transplantation 6(1):150-160.

OCR for page 33
60 EVOLUTION OF TRANSLATIONAL OMICS Drexler, D. M., M. D. Reily, and P. A. Shipkova. 2011. Advances in mass spectrometry applied to pharmaceutical metabolomics. Analytical and Bioanalytical Chemistry. 399(8):2645-2653. EBI (European Bioinformatics Institute). 2012. Data Resources and Tools. http://www.ebi. ac.uk/ (accessed February 23, 2012). Fagerberg, L., S. Stromberg, A. El-Obeid, M. Gry, K. Nilsson, M. Uhlen, F. Ponten, and A. Asplund. 2011. Large-scale protein profiling in human cell lines using antibody-based proteomics. Journal of Proteome Research 10(9):4066-4075. Farrah, T., E. W. Deutsch, G. S. Omenn, D. S. Campbell, Z. Sun, J. A. Bletz, P. Mallick, J. E. Katz, J. Malmström, R. Ossola, J. D. Watts, B. Lin, H. Zhang, R. L. Moritz, and R. Aebersold. 2011. A high-confidence human plasma proteome reference set with estimated concentrations in PeptideAtlas. Molecular and Cellular Proteomics 10(9):M110.006353. Folger, O., L. Jerby, C. Frezza, E. Gottlieb, E. Ruppin, and T. Shlomi. 2011. Predicting selective drug targets in cancer through metabolic networks. Molecular Systems Biology 7:501-527. Frezza, C., L. Zheng, O. Folger, K. N. Rajagopalan, E. D. Mackenzie, L. Jerby, M. Micaroni, B. Chaneton, J. Adam, A. Hedley, G. Kalna, I. P. Tomlinson, P. J. Pollard, D. G. Watson, R. J. Deberardinis, T. Shlomi, E. Ruppin, and E. Gottlieb. 2011. Haem oxy- genase is synthetically lethal with the tumour suppressor fumarate hydratase. Nature 477(7363):225-228. Gold, L., D. Ayers, J. Bertino, C. Bock, A. Bock, E. N. Brody, J. Carter, A. B. Dalby, B. E. Eaton, T. Fitzwater, D. Flather, A. Forbes, T. Foreman, C. Fowler, B. Gawande, M. Goss, M. Gunn, S. Gupta, D. Halladay, J. Heil, J. Heilig, B. Hicke, G. Husar, N. Janjic, T. Jarvis, S. Jennings, E. Katilius, T. R. Keeney, N. Kim, T. H. Koch, S. Kraemer, L. Kroiss, N. Le, D. Levine, W. Lindsey, B. Lollo, W. Mayfield, M. Mehan, R. Mehler, S. K. Nelson, M. Nelson, D. Nieuwlandt, M. Nikrad, U. Ochsner, R. M. Ostroff, M. Otis, T. Parker, S. Pietrasiewicz, D. I. Resnicow, J. Rohloff, G. Sanders, S. Sattin, D. Schneider, B. Singer, M. Stanton, A. Sterkel, A. Stewart, S. Stratford, J. D. Vaught, M. Vrkljan, J. J. Walker, M. Watrobka, S. Waugh, A. Weiss, S. K. Wilcox, A. Wolfson, S. K. Wolk, C. Zhang, and D. Zichi. 2010. Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS One 5(12):e15004. Gottlieb, A., G. Y. Stein, E. Ruppin, and R. Sharan. 2011. PREDICT: A method for inferring novel drug indications with application to personalized medicine. Molecular Systems Biology 7:496. Honda, K., and D. R. Littman. 2011. The microbiome in infectious disease and inflammation. Annual Review of Immunology. 2011 Mar 24. [Epub ahead of print]. Hwang, D., I. Y. Lee, H. Yoo, N. Gehlenborg, J. H. Cho, B. Petritis, D. Baxter, R. Pitstick, R. Young, D. Spicer, N. D. Price, J. G. Hohmann, S. J. Dearmond, G. A. Carlson, and L. E. Hood. 2009. A systems approach to prion disease. Molecular Systems Biology 5:252. Ideker, T., J. Dutkowski, and L. Hood. 2011. Boosting signal-to-noise in complex biology: Prior knowledge is power. Cell 144(6):860-863. Ince, D. C., L. Hatton, and J. Graham-Cumming. 2012. The case for open computer programs. Nature 482:485-488. Ioannidis, J. P. A., and M. J. Khoury. 2011. Improving validation practices in “omics” research. Science 334(6060):1230-1232. IOM (Institute of Medicine). 2009. Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health through Research. Washington, DC: The National Academies Press. Kinros, J. M., A. W. Darzi, and J. K. Nicholson. 2011. Gut microbiome-host interactions in health and disease. Genomic Medicine 3(3):14. Koulman, A., G. A. Lane, S. J. Harrison, and D. A. Volmer. 2009. From differentiating metabolites to biomarkers. Analytical and Bioanalytical Chemistry 394(3):663-670.

OCR for page 33
61 OMICS-BASED CLINICAL DISCOVERY Leek, J. T., R. B. Scharpf, H. C. Bravo, D. Simcha, B. Langmead, W. E. Johnson, D. Geman, K. Baggerly, and R. A. Irizarry. 2010. Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics 11(10):733-739. Legrain, P., R. Aebersold, A. Archakov, A. Bairoch, K. Bala, L. Beretta, J. Bergeron, C. H. Borchers, G. L. Corthals, C. E. Costello, E. W. Deutsch, B. Domon, W. Hancock, F. He, D. Hochstrasser, G. Marko-Varga, G. H. Salekdeh, S. Sechi, M. Snyder, S. Srivastava, M. Uhlen, C. H. Wu, T. Yamamoto, Y. K. Paik, and G. S. Omenn. 2011. The human proteome project: Current state and future direction. Molecular & Cellular Proteomics 10(7):M111. 009993. Lewis, G. D., and R. E. Gerszten. 2010. Toward metabolomic signatures of cardiovascular disease. Circulation: Cardiovascular Genetics 3(2):119-121. Lewis, N. E., G. Schramm, A. Bordbar, J. Schellenberger, M. P. Andersen, J. K. Cheng, N. Patel, A. Yee, R. A. Lewis, R. Eils, R. Konig, and B. O. Palsson. 2010. Large-scale in silico modeling of metabolic interactions between cell types in the human brain. Nature Biotechnology 28(12):1279-1285. Ma, C., R. Fan, H. Ahmad, Q. Shi, B. Comin-Anduix, T. Chodon, R. C. Koya, C. C. Liu, G. A. Kwong, C. G. Radu, A. Ribas, and J. R. Heath. 2011. A clinical microchip for evaluation of single immune cells reveals high functional heterogeneity in phenotypically similar T cells. Nature Medicine 17(6):738-743. Maas, S., 2010. Gene regulation through RNA editing. Discovery Medicine 10(54):379-386. Mailman, M. D., M. Feolo, Y. Jin, M. Kimura, K. Tryka, R. Bagoutdinov, L. Hao, A. Kiang, J. Paschall, L. Phan, N. Popova, S. Pretel, L. Ziyabari, M. Lee, Y. Shao, Z. Y. Wang, K. Sirotkin, M. Ward, M. Kholodov, K. Zbicz, J. Beck, M. Kimelman, S. Shevelev, D. Preuss, E. Yaschenko, A. Graeff, J. Ostell, and S. T. Sherry. 2007. The NCBI dbGaP database of genotypes and phenotypes. Nature Genetics 39:1181-1186. Masoodi, M., M. Eiden, A. Koulman, D. Spaner, and D. A. Volmer. 2010. Comprehensive lipidomics analysis of bioactive lipids in complex regulatory networks. Analytical Chem- istry 82(19):8176-8185. McGary, K. L., T. J. Park, J. O. Woods, H. J. Cha, J. B. Wallingford, and E. M. Marcotte. 2010. Systematic discovery of nonobvious human disease models through orthologous phenotypes. Proceedings of the National Academy of Sciences 107(14):6544-6549. McShane, L. M. 2010. NCI Address to Institute of Medicine Committee Convened to Review Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials. Presentation at Meeting 1, Washington, DC, December 20. Menon, R., A. Roy, S. Mukerjee, S. Belkin, Y. Zhang, and G. S. Omenn. 2011. Functional implications of structural predictions for alternative splice proteins expressed in HER2/ neu-induced breast cancers. Journal of Proteome Research. [Epub ahead of print]. Morin, A., J. Urban, P. D. Adams, I. Foster, A. Sali, D. Baker, and P. Sliz. 2012. Research priorities. Shining light into black boxes. Science 336(6078):159-160. Moussay, E., K. Wang, J. H. Cho, K. van Moer, S. Pierson, J. Paggetti, P. V. Nazarov, V. Palissot, L. E. Hood, G. Berchem, and D. J. Galas. 2011. MicroRNA as biomarkers and regulators in B-cell chronic lymphocytic leukemia. Proceedings of the National Academy of Sciences 108(16):6573-6578. NCBI (National Center for Biotechnology Information). 2012. dbGaP. http://www.ncbi.nlm. nih.gov/gap (accessed February 23, 2012). Ng, S. B., A. W. Bigham, K. J. Buckingham, M. C. Hannibal , M. J. McMillin, H. I. Gildersleeve, A. E. Beck, H. K. Tabor, G. M. Cooper, H. C. Mefford, C. Lee, E. H. Turner, J. D. Smith, M. J. Rieder, K. Yoshiura, N. Matsumoto, T. Ohta, N. Niikawa, D. A. Nickerson, M. J. Bamshad, and J. Shendure. 2010. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nature Genetics 42(9):790-793.

OCR for page 33
62 EVOLUTION OF TRANSLATIONAL OMICS NLM (National Library of Medicine). 2006. NIH launches dbGAP, a database of Genome Wide Association Studies. http://www.nlm.nih.gov/news/press_releases/dbgap_launchPR06. html (accessed December 12, 2006). NLM. 2012. GEO: Gene expression omnibus. http://www.ncbi.nlm.nih.gov/geo/ (accessed February 23, 2012). NRC (National Research Council). 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: National Acad- emy Press. NRC. 2003. Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences. Washington, DC: The National Academies Press. NRC. 2005. Catalyzing Inquiry at the Interface of Computing and Biology. Washington, DC: The National Academies Press. NRC. 2006. Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation, and Public Health. Washington, DC: The National Academies Press. Omenn, G. S. 2009. A landmark systems analysis of prion disease of the brain. Molecular Systems Biology 5:254. Omenn, G. S., M. S. Baker, and R. Aebersold. 2011. Recent workshops of the HUPO Human Plasma Proteome Project (HPPP): A bridge with the HUPO CardioVascular Initiative and the emergence of SRM targeted proteomics. Proteomics 11(17):3439-3443. Ostroff, R. M., W. L. Bigbee, W. Franklin, L. Gold, M. Mehan, Y. E. Miller, H. I. Pass, W. N. Rom, J. M. Siegfried, A. Stewart, J. J. Walker, J. L. Weissfeld, S. Williams, D. Zichi, and E. N. Brody. 2010. Unlocking biomarker discovery: Large scale application of aptamer proteomic technology for early detection of lung cancer. PLoS One 5(12):e15003. Paik, S., S. Shak, G. Tang, C. Kim, J. Baker, M. Cronin, F. L. Baehner, M. G. Walker, D. Watson, T. Park, W. Hiller, E. R. Fisher, D. L. Wickerham, J. Bryant, and N. Wolmark. 2004. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. New England Journal of Medicine 351(27):2817-2826. Petricoin, E. F., A. M. Ardekani, B. A. Hitt, P. J. Levine, V. A. Fusaro, S. M. Steinberg, G. B. Mills, C. Simone, D. A. Fishman, E. C. Kohn, and L. A. Liotta. 2002. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306):572-577. Picotti, P., O. Rinner, R. Stallmach, F. Dautel, T. Farrah, B. Domon, H. Wenschuh, and R. Aebersold. 2010. High-throughput generation of selected reaction-monitoring assays for proteins and proteomes. Nature Methods 7(1):43-46. ProteomeXchange. 2012. Mission. http://www.proteomexchange.org/ (accessed February 23, 2012). Ransohoff, D. F. 2008. The process to discover and develop biomarkers for cancer: A work in progress. Journal of the National Cancer Institute 100(20):1419-1420. Ransohoff, D. F. 2009. Promises and limitations of biomarkers. Recent Results in Cancer Research 181:55-59. Roach, J. C., G. Glusman, A. F. Smit, C. D. Huff, R. Hubley, P. T. Shannon, L. Rowen, K. P. Pant, N. Goodman, M. Bamshad, J. Shendure, R. Drmanac, L. B. Jorde, L. Hood, and D. J. Galas. 2010. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328(5978):636-639. Rosenberg, S., M. R. Elashoff, P. Beineke, S. E. Daniels, J. A. Wingrove, W. G. Tingley, P. T. Sager, A. J. Sehnert, M. Yau, W. E. Kraus, K. Newby, R. S. Schwartz, S. Voros, S. G. Ellis, N. Tahirkhelli, R. Waksman, J. McPherson, A. Lansky, M. E. Winn, N. J. Schork, E. J. Topol, and for the PREDICT (Personalized Risk Evaluation and Diagnosis In the Coronary Tree) Investigators. 2010. Multicenter validation of the diagnostic accuracy of a blood-based gene expression test for assessing obstructive coronary artery disease in nondiabetic patients. Annals of Internal Medicine 153(7):425-434.

OCR for page 33
63 OMICS-BASED CLINICAL DISCOVERY Segre, A. V., L. Groop, V. K. Mootha, M. J. Daly, and D. Altshuler. 2010. Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet 6(8). pii: e1001058. Seppanen-Laakso, T., and M. Oresic. 2009. How to study lipidomes. Journal of Molecular Endocrinology 42(3):185-190. Service, R. F. 2008. Chemistry. Click chemistry clicks along. Science 320(5878):868-869. Shlomi, T., T. Benyamini, E. Gottlieb, R. Sharan, and E. Ruppin. 2011. Genome-scale meta- bolic modeling elucidates the role of proliferative adaptation in causing the Warburg effect. PLoS Computational Biology 7(3):e1002018. Simon, R., M. D. Radmacher, K. Dobbin, and L. M. McShane. 2003. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. Journal of the National Cancer Institute 95(1):14-18. Siva, N. 2009. Myriad wins BRCA1 row. Nature Biotechnology 27:8. Sreekumar, A., L. M. Poisson, T. M. Rajendiran, A. P. Khan, Q. Cao, J. Yu, B. Laxman, R. Mehra, R. J. Lonigro, Y. Li, M. K. Nyati, A. Ahsan, S. Kalyana-Sundaram, B. Han, X. Cao, J. Byun, G. S. Omenn, D. Ghosh, S. Pennathur, D. C. Alexander, A. Berger, J. R. Shuster, J. T. Wei, S. Varambally, C. Beecher, and A. M. Chinnaiyan. 2009. Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression. Nature 457(7231):910-914. Sugatani, T., J. Vacher, and K. A. Hruska. 2011. A microRNA expression signature of osteo- clastogenesis. Blood 117(13):3648-3657. Tan, X., W. Qin, L. Zhang, J. Hang, B. Li, C. Zhang, J. Wan, F. Zhou, K. Shao, Y. Sun, J. Wu, X. Zhang, B. Qiu, N. Li, S. Shi, X. Feng, S. Zhao, Z. Wang, X. Zhao, Z. Chen, K. Mitchelson, J. Cheng, Y. Guo, and J. He. 2011. A five-microRNA signature for squamous cell lung carcinoma (SCC) diagnosis and Hsa-miR-31 for SCC prognosis. Clinical Cancer Research 17(21):6802-6811. Tang, F., K. Lao, and M. A. Surani. 2011. Development and applications of single-cell transcriptome analysis. Nature Methods 8(4 Suppl):S6-S11. Teague, B., M. S. Waterman, S. Goldstein, K. Potamousis, S. Zhou, S. Reslewic, D. Sarkar, A. Valouev, C. Churas, J. M. Kidd, S. Kohn, R. Runnheim, C. Lamers, D. Forrest, M. A. Newton, E. E. Eichler, M. Kent-First, U. Surti, M. Livny, and D. C. Schwartz. 2010. High-resolution human genome structure by single-molecule analysis. Proceedings of the National Academy of Sciences 107(24):10848-10853. Tilg, H., and A. Kaser. 2011. Gut microbiome, obesity, and metabolic dysfunction. Journal of Clinical Investigations 121(6):2126-2132. UCSC (University of California, Santa Cruz). 2012. UCSC genome bioinformatics. http:// genome.ucsc.edu/ (accessed February 23, 2012). Weckwerth, W. 2003. Metabolomics in systems biology. Annual Review of Plant Biology 54:669-689. Wolf-Yadlin, A., M. Sevecka, and G. MacBeath. 2009. Dissecting protein function and signal- ing using protein microarrays. Current Opinion in Chemical Biology 13(4):398-405. Yu, S. L., H. Y. Chen, G. C. Chang, C. Y. Chen, H. W. Chen, S. Singh, C. L. Cheng, C. J. Yu, Y. C. Lee, H. S. Chen, T. J. Su, C. C. Chiang, H. N. Li, Q. S. Hong, H. Y. Su, C. C. Chen, W. J. Chen, C. C. Liu, W. K. Chan, W. J. Chen, K. C. Li, J. J. Chen, and P. C. Yang. 2008. MicroRNA signature predicts survival and relapse in lung cancer. Cancer Cell 13(1):48-57. Zhang, G. F., S. Sadhukhan, G. P. Tochtrop, and H. Brunengraber. 2011. Metabolomics, pathway regulation, and pathway discovery. Journal of Biological Chemistry 286(27):23631-23635.

OCR for page 33