3
Approaches to Evidence Generation

Key Points Raised by Speakers

  • The translation of diagnostic tests into practice needs to be facilitated with evidence of effectiveness in a clinical setting.

  • Studies should be designed with a specific clinical context in mind, and benefits, harms, intended use, and desired test fea­tures should be considered.

  • More efficient methodologies for generating evidence need to be developed to expedite decision-making.

  • Researchers need to collect and store biospecimens from pro­spective clinical trials with future analysis and use in mind.

PHARMACOGENOMICS CLINICAL TRIALS

A basic question in clinical research is whether an intervention works across populations. In pharmacogenomics research, said Caryn Lerman of the University of Pennsylvania, the question can be reframed as whether the intervention benefits or harms particular patients. Ultimately, the question is whether a genomics-based therapy is worth doing from the perspectives of patients, payers, and other stakeholders. Data to answer these questions can be gathered through observational studies of the association of a genomic marker with an outcome (e.g., a cohort study) or through experi-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 25
3 Approaches to Evidence Generation Key Points Raised by Speakers • The translation of diagnostic tests into practice needs to be facilitated with evidence of effectiveness in a clinical setting. • Studies should be designed with a specific clinical context in mind, and benefits, harms, intended use, and desired test fea- tures should be considered. • More efficient methodologies for generating evidence need to be developed to expedite decision-making. • Researchers need to collect and store biospecimens from pro- spective clinical trials with future analysis and use in mind. PHARMACOGENOMICS CLINICAL TRIALS A basic question in clinical research is whether an intervention works across populations. In pharmacogenomics research, said Caryn Lerman of the University of Pennsylvania, the question can be reframed as whether the intervention benefits or harms particular patients. Ultimately, the question is whether a genomics-based therapy is worth doing from the perspectives of patients, payers, and other stakeholders. Data to answer these ques- tions can be gathered through observational studies of the association of a genomic marker with an outcome (e.g., a cohort study) or through experi- 25

OCR for page 25
26 GENOMIC DIAGNOSTIC TEST DEVELOPMENT mental studies of the efficacy of a pharmacogenomic intervention based on accumulated data (e.g., a randomized controlled trial). Randomized Controlled Trials An advantage of randomized controlled trials (RCTs) over cohort stud- ies is that they provide controlled exposure to treatment. In addition, randomization helps avoid the type of confounding that can occur in an observational study, in which treatment may be selected based on patient characteristics. A retrospective trial of a pharmacogenomic marker is carried out after a RCT of a drug has been completed, with researchers testing patient samples to identify which patients were positive or negative for a particular marker and then comparing that information with the patients’ responses to the drugs being tested (Figure 3-1). Retrospective trials can provide useful Marker + Retrospective Responder Marker – DRUG A Marker + Nonresponder Marker – ALL Marker + Responder Marker – DRUG B Marker + Nonresponder Marker – Prospective Stratified Drug A Marker Positive Drug B Test Drug A Marker Negative Drug B Prospective Screened Marker + Drug A Genotype- Guided Marker – Drug B ALL Drug A Usual Care Non- Guided Drug B FIGURE 3-1 Pharmacogenomic trial designs, including retrospective, prospective Figure 3-1.eps stratified, and prospective screened. SOURCE: Lerman, IOM workshop presentation on November 17, 2010.

OCR for page 25
27 APPROACHES TO EVIDENCE GENERATION data when a marker is unknown at trial initiation. They are also ideal for hypothesis generation and can be used for independent validation. There are, however, several important limitations to relying solely on retrospec- tive clinical trials, including unbalanced groups, reduced power based on those unbalanced groups, and missing data (e.g., not all patients may have consented to tissue collection or use of their tissue for further study). In contrast, prospective stratified trials first test and identify partici- pants as marker positive or marker negative and then randomize each group of participants to therapy arms (i.e., all the marker-positive participants are randomized to either a drug group or a control group, and the same is done for the marker-negative participants) (Figure 3-1). The advantage of this design is that the trial is based on a hypothesis that takes into account prior genomic knowledge about the members of the test population. One important feature of prospective stratification is that it allows for enrich- ment of more rare genotype groups and balancing of treatment assignment. A third variant is the prospective screened trial, which, Lerman said, comes closer to a clinical utility model than to a clinical validity model. Some argue, she said, that this is the true test of whether personalized medicine works. In a prospective screened trial, patients are randomized to either a genotype-guided group or a non-guided group. In the genotype- guided group, participants are tested for the presence or absence of the marker under study and then assigned to a treatment group based on the hypothesized association of the marker with the outcome of a particular therapy. The therapy for those who are marker-negative can be an alter- nate therapy. Results for the genotype-guided groups are compared with those for the non-guided group, which is either randomized to the same two therapies as the guided group or receives the current standard of care (Figure 3-1). The prospective screened design has high ecological (i.e., real life) validity, providing evidence of whether a genome-guided therapy will provide significantly better outcomes than non-guided therapy. Genome-therapeutic response associations, however, are not necessarily translated into clinical practice. To foster the adoption of genomics-based interventions, it will be important to increase the generalizability of clinical trial designs and results to include clinical practice settings; to demonstrate improvement in health outcomes as well as the cost effectiveness of test- ing versus not testing; and to establish evidence-based guidelines. Lerman offered several reasons for the reduced generalizability and lack of trans- lation of classic randomized clinical efficacy trials into clinical practice. RCTs have strict eligibility criteria and are conducted in a highly controlled setting, the treatment is protocol-driven, and treatment compliance is very closely monitored. In contrast, in everyday clinical practice the population is very diverse, the practice settings are heterogeneous, treatment is flexible and depends on clinical judgment, and compliance is variable (and likely lower than in the clinical trial setting).

OCR for page 25
28 GENOMIC DIAGNOSTIC TEST DEVELOPMENT TABLE 3-1 Classic Randomized Controlled Trials (RCTs) Versus Practical Clinical Trials (PCTs) Classic RCT/Efficacy PCT/Effectiveness Research Question Does it work in ideal Does it work under best circumstances? practice conditions? Population Selective, homogeneous Diverse, heterogeneous Setting Specialized, controlled Clinical practice Intervention Fixed, protocol-driven Flexible, clinician judgment Comparator Placebo or active Usual care, least $ Compliance Closely monitored, high Highly variable Assessments Elaborate, complex Simple outcomes Goal FDA approval Adoption in practice SOURCE: Lerman, IOM workshop presentation on November 17, 2010. One approach to addressing these issues, Lerman suggested, is a practi- cal clinical trial model (also called a pragmatic clinical trial; [Brass, 2010]). Classic RCTs focus on establishing the efficacy of the intervention, while practical clinical trials study the effectiveness of the intervention, looking at simple outcomes, such as health outcomes, patient satisfaction, and costs (Table 3-1). The advantages of the practical approach to clinical trials are that they are more reflective of patients and practice, more efficient and less burdensome, and the results of the trial are more likely to be generalizable. The disadvantages are that practical clinical trials are less experimentally rigorous by design, usual care is not a stable comparator, and increased heterogeneity results in a much lower signal-to-noise ratio, making greater sample sizes necessary. Case Example To illustrate these issues, Lerman offered a case example involving the pharmacogenetics of nicotine addiction treatment. The six-month quit rate across a variety of interventions (lozenge, gum, patch, inhaler, nasal spray, bupropion, and varenicline) is very low. Even using best-in-class pharma- cotherapy with varenicline (CHANTIX®), only about one-third of smokers will have successfully quit smoking at 6 months (Gonzales et al., 2006).

OCR for page 25
29 APPROACHES TO EVIDENCE GENERATION A marker that could predict which intervention would be optimal for a given patient could have a substantial medical and public health impact, and Lerman and colleagues have validated a novel metabolic bio- marker across several clinical trials. The ratio of the nicotine metabo- lites 3-hydroxycotinine and cotinine is a stable measure of an individual’s nicotine metabolism rate derived from smoking (Ray et al., 2009). This marker reflects a heritable trait and is independent of the time since the last cigarette, and the metabolites can be measured in saliva, plasma, and urine. This metabolic marker is highly correlated with the CYP2A6 geno- type (i.e., it is a phenotypic measure of a genomic trait) (Benowitz et al., 2003; Malaiyandi et al., 2006), but the test for the metabolites is less costly and easier to perform than the genomic test. As a phenotypic test, it also reflects environmental influences on nicotine clearance, as well as genetic influences beyond CYP2A6. Lerman and her colleagues obtained evidence of association by a retrospective analysis of four clinical trials, and they then established the clinical validity of the nicotine metabolism ratio test in a prospective stratified RCT. Next Lerman discussed a hypothetical practical clinical trial of genotype- guided versus non-guided nicotine therapy, comparing a nicotine patch (low cost, low toxicity) to varenicline (higher cost, greater toxicity) (Figure 3-2). Participants in the genotype-guided arm would be tested for their nicotine metabolism ratio, and slow metabolizers would be treated with a nicotine patch, while fast metabolizers would receive varenicline. Participants in the non-guided arm would be randomized to patch or varenicline. In this hypothetical scenario, about 20 percent of smokers in the popu- lation are slow metabolizers. In the genotype-guided arm this would mean that 20 percent of participants would get the patch and 80 percent would receive varenicline. In the non-guided group, however, randomizing between the two medications means that 50 percent of these participants are treated with the patch and 50 percent with varenicline; based on this randomiza- tion, half of the slow metabolizers in the non-guided group will receive the same treatment as the slow metabolizers in the guided group, and similarly for the normal metabolizers in the two groups. To have sufficient statistical power to examine the marginal quit rates in the genotype-guided versus non-guided groups, the study would need to enroll thousands of people. It is much more efficient to assess genomics-guided versus non-guided therapy in a prospective stratified trial, which allows for oversampling of slow metabolizers in order to achieve comparable numbers of slow and fast metabolizers in the various treatment arms (Figure 3-3). Examining efficacy is then a matter of simply comparing patch to varenicline for the slow metabolizers, and patch to varenicline for the fast metabolizers.

OCR for page 25
30 GENOMIC DIAGNOSTIC TEST DEVELOPMENT For illustration: n = 100/group, SM (20%) NM (80%) SMs/Patch 20% (n=20) patch Genotype- 45.6% quit Guided NMs/Var 80% (n=80) on var ALL 50% (n=50) patch (10 SM, 40 NM) Patch or Non- Varenicline 50% (n=50) var (10 SM, 40 NM) Guided 10% (n=10) SMs in both arms get patch 40% (n=40) NMs in both arms get varenicline 50% (n=50) of participants in each arm get the same treatment! 20% SM 43.8% quit Non-guided Varenicline 80% NM FIGURE 3-2 Hypothetical genotype-guided versus usual care scenario to measure smoking cessation rates using varenicline or patch. Figure 3-2-RT.eps Abbreviations: NM (normal metabolizer); SM (slow metabolizer; var (varenicline). SOURCE: Lerman, IOM workshop presentation on November 17, 2010. Quit Rate Estimates Placebo 30% A Slow Metabolizers (SMs) oversampled Patch 44% N=675 B 0.55 8 Week Abstinence (Smoothed) Varenicline 0.50 C NRT Patch Varenicline 35% 0.45 0.40 0.35 Test 0.30 Efficacy Placebo 19% 0.25 D hypothesis 0.20 Slowest 25 50 75 Fastest Percentile of 3HC Ratio N=675 Patch 28% E Normal Metabolizers (NMs) F Varenicline 46% FIGURE 3-3 Prospective stratified RCT scheme. Figure 3-3.eps SOURCE: Lerman, IOM workshop presentation on November 17, 2010.

OCR for page 25
31 APPROACHES TO EVIDENCE GENERATION Principles In summary, Lerman said that one trial design does not fit all. Ret- rospective and prospective RCTs are both valid, but prospective trials overcome the limitations of retrospective trials, and population enrichment designs can be used. Practical clinical trials are an important tool to address the transla- tional gap. However genotype-guided versus non-guided trial designs are inefficient under some scenarios and are not likely to supplant classic RCTs. Once clinical validity has been established in a prospective trial, a practical clinical trial could be part of the validation pathway to help facilitate the transition into practice. USE OF ARCHIVED SAMPLES TO EVALUATE GENOMIC TESTS Richard Simon of the National Cancer Institute (NCI) described his work with colleagues on the use of archived specimens to generate new evidence about the clinical utility of prognostic and predictive biomarkers (Simon et al., 2009). Biomarkers The term prognostic biomarker is not well defined, Simon noted, and it is used differently in different fields. For the purposes of pharmacogenom- ics, Simon suggested that prognostic biomarkers are biomarkers that are measured before treatment and are used to predict the long-term outcome for patients receiving standard treatment. The marker may provide infor- mation about both the aggressiveness of the disease and the effect of the treatment. A primary intended use of the prognostic marker is to define a subset of patients who have a very good prognosis on the standard treat- ment and therefore do not require a more aggressive treatment. An example of the application of a prognostic biomarker is the Onco- type DX gene expression assay initially developed for node-negative, ER- positive women who are receiving tamoxifen. The goal of testing is to identify those patients who are unlikely to benefit from adjuvant chemo- therapy in addition to surgery/radiotherapy and hormonal therapy. The Oncotype DX test was initially validated through a retrospective analysis of a previously performed prospective clinical trial. The key to the successful development of the test was that it was done with an indication in mind, Simon said. An important therapeutic decision context was identified, the development and clinical validation separated in a staged manner, and ana- lytical validation emphasized. According to Simon, most prognostic factor studies are not performed with a specific clinical context defined prior to

OCR for page 25
32 GENOMIC DIAGNOSTIC TEST DEVELOPMENT starting and are consequently very difficult to interpret. A prospective trial, TAILORx, for the validation of Oncotype DX is ongoing (Zujewski and Kamin, 2008).1 Predictive biomarkers are measured before treatment to identify who will or will not benefit from a particular treatment. Many cancer treatments benefit only a minority of the patients to whom they are administered, Simon said, and there is probably no case in which a treatment does not harm at least some of the patients. Being able to predict which patients are likely to benefit (or which are unlikely to benefit) could save patients from unnecessary toxicity, enhance the chances of success, and help control medi- cal costs. Predictive biomarkers are also a critical part of the drug develop- ment process for almost all new cancer drugs. Predictive biomarkers are usually single gene/single protein markers, such as with HER2 testing to determine the appropriateness of anti-HER2 breast cancer treatments (e.g., Herceptin) (Baselga et al., 1999; Wolff et al., 2007) and KRAS analysis to determine appropriate usage of anti-epidermal growth factor receptor (anti-EGFR) antibodies in treating colorectal cancer (Lee and Chu, 2007). Validation Validation is essentially a showing of fitness for intended use. Valida- tion is often broken down into analytical validation, clinical validation, and clinical utility. There is some ambiguity concerning what people mean when they talk about these different terms, especially clinical utility, Simon said. Clinical utility can take into account costs or advantages and disadvantages, but he said that the key factor in utility is whether the result of the test is actionable and informs treatment selection to the benefit of the patient. The optimal designs for evaluating the clinical utility of a prognos- tic marker include prospective clinical trials and retrospective analysis of archived specimens from a prospective trial. In evaluating a predictive biomarker, the optimal design is to measure the marker in all patients to identify them as predicted responsive or pre- dicted non-responsive and then to randomize the patients in each group to treatment and control arms. Simon also discussed a “marker strategy design,” which was referred to as a “prospective screened trial” by Lerman, and agreed that it is often a very inefficient design and that it requires a very large sample size to have sufficient statistical power. 1 Clinical trial #NCT00310180.

OCR for page 25
33 APPROACHES TO EVIDENCE GENERATION Prospective–Retrospective Study Some retrospective analyses of archived samples for biomarker studies can result in highly biased conclusions. To address this, Simon and col- leagues have proposed a “prospective–retrospective” trial design which uses archived specimens from a single prospective trial to test a specific intended use of an assay and which meets the following criteria: (1) adequate amounts of archived tissue must be available from enough patients from an appropriately designed prospective trial (which for pre- dictive factors should generally be a randomized design) for analyses to have adequate statistical power and for the patients included in the evalu- ation to be clearly representative of the patients in the trial; (2) the test should be analytically and pre-analytically validated for use with archived tissue and the testing should be blinded to the clinical data; (3) the plan for biomarker evaluation should be completely specified in writing before the performance of biomarker assays on archived tissue and should be focused on evaluation of a single completely defined classifier; and (4) the results from archived specimens should be validated using specimens from one or more similar, but separate, studies. (Simon et al., 2009) Simon also discussed potential revisions to the ASCO LOE scale, which currently classifies retrospective studies as LOE II or lower. He suggested that level 1 evidence could come from either a fully prospective clinical trial or else from two or more prospective–retrospective studies (meeting the proposed criteria above) in which the results were consistent. In conclusion, analysis of archived tissues for prognostic and predic- tive biomarkers can provide either a higher or a lower level of evidence in support of clinical utility depending upon several key factors: the analytical validation of the assay; the nature of the study from which the specimens were archived; the number and condition of the specimens; and whether a focused, written plan for analysis of the specified biomarker was developed before assaying any tissue. Studies using archived tissues from prospective clinical trials, when conducted under ideal conditions and independently confirmed, can provide the highest level of evidence (LOE I). COVERAGE WITH EVIDENCE DEVELOPMENT The Ontario Model In 2003 the Ontario Ministry of Health and Long-Term Care imple- mented a new structure with the goal of implementing an evidence-based approach to policy decision-making regarding medical products and pro- cedures (Figure 3-4) (Goeree and Levin, 2006; Levin et al., 2007). The key component of the new structure is the Ontario Health Technology Advisory

OCR for page 25
34 GENOMIC DIAGNOSTIC TEST DEVELOPMENT Committee (OHTAC), which receives requests for evidence-based analyses from the Ontario Health System and the Ministry of Health. As Leslie Levin of the Medical Advisory Secretariat (MAS) explained, these requests are passed on to the MAS which coordinates systematic reviews and economic analyses with academic partners. Expert panels are engaged to evaluate the evidence; feedback and input from stakeholders, professionals, the public, and industry is sought; and all evidentiary information is then passed to OHTAC, which develops appraisals based on the evidence and provides rec- ommendations to the Ontario Health System and to the Ministry of Health (Figure 3-4). Adoption of the recommendations can be tracked through a geographic information system. As one part of the larger structure, a field evaluation program was developed to collect primary data in order to address uncertainties identi- fied in the systematic reviews and to perform post-market assessment of real Professional, Ontario Health MOHLTC public, and System industry feedback loop Physicians Requests Requests Schedule of Benefits Recommendations Knowledge Transfer LHINs – Implementation: OHTAC Hospitals, Community Stakeholder Agencies, etc.... Engagement Field Outcomes Tracked by MAS Evaluations MOHLTC with GIS Uncertainty • Post-market technology evaluation • POC/INR through PATH, THETA, Intermediate care ICES • Cardiac Expert PATH (McMaster Univ.) • Post-market safety • Arthritis Panels THETA (Univ. of Toronto) evaluation through • PET the Usability Lab • Diabetes • Aging • Wound care FIGURE 3-4 MAS- and OHTAC-associated structures and linkages. Abbreviations: GIS (geographic information system); ICES (Institute for Clinical Evaluative Sciences); LHIN (local health 3-4.eps networks); MOHLTC (Minis- Figure integration try of Health and Long-Term Care); PATH (Programs for Assessment of Technology in Health); PET (positron emission tomography); THETA (Toronto Health Econom- ics and Technology Assessment Collaborative). SOURCE: As modified from Levin et al., 2007 by Levin in IOM workshop presenta- tion on November 17, 2010.

OCR for page 25
35 APPROACHES TO EVIDENCE GENERATION world performance of products. The generation and collection of additional data regarding the utilization and impact of a medical intervention as a requirement of a preliminary coverage decision, or coverage with evidence development, comes under the purview of the Ontario Field Evaluation program. Issues that could trigger a field evaluation include, for example, low quality of evidence, incremental net benefit, generalizability questions, and safety issues. Levin noted that 38 field evaluations have been initiated since 2003, with 19 completed thus far. Of the completed studies, 88 per- cent affected decision making. Ten of these were coverage with evidence development studies. Most of the studies are published in peer-reviewed journals. Levin said that while saving money was not the original intent of the studies, the resulting coverage decisions have contributed to more than $500 million in cost avoidance. Examples of Coverage with Evidence Development Field Studies and Recommendations Levin highlighted several of the ten coverage with evidence devel- opment recommendations that have been made thus far (Table 3-2). In response to a published report identifying an increase in restenosis rates for low-risk patients who were treated with bare-metal stents versus those treated with drug-eluting stents, a field evaluation was performed to test the generalizability of this finding. The field study determined that while drug-eluting stents are advantageous for patients at high risk for restenosis, in Ontario there was no significant difference noted for low-risk patients (Tu et al., 2007). As a result, drug-eluting stents are used in only about 30 percent of patients in Ontario, as compared with 90 to 95 percent in the United States. In another case, a systematic review of endovascular abdomi- nal aortic aneurysm repair raised a concern about endoleak (persistent blood flow into the aneurysm sac after the graft procedure). A subsequent prospective field study of 160 patients found that these were primarily Type II endoleaks, which are less serious (Tarride et al., 2008), and an economic analysis showed the procedure is cost effective only for high-risk patients (MAS, 2010). As a result, the decision was made to fund cardiovascular abdominal aortic aneurysm repair only for high-risk patients. MAS has also looked at the clinical utility of positron emission tomography (PET) scanning (Evans et al., 2009). It has insured PET scanning for staging lung cancer, but field studies have not shown clinical utility for head and neck cancer or in staging breast cancer, and these applications of PET are not insured. As an example of an ongoing coverage with evidence development study, Levin said that MAS is looking at gene expression profiling with Oncotype DX for guiding adjuvant chemotherapy in early breast cancer. As

OCR for page 25
TABLE 3-2 Summary of Ontario Field Evaluations 36 Field Evaluation Reason for Field Technology (n) Overseen by Type of Study Evaluation Result Policy Decision Drug eluting stents PATH, with ICES Prospective Generalizability Only effective in Funded; 30% conversion (DES) (21,000) pragmatic registry of RCT evidence patients at high from bare-metal to DES and cost effective risk for restenosis (90% in U.S.A.) analysis Endovascular PATH and single Prospective Safety assessment of No endoleak; Funded for high but not abdominal aortic AHSC observation endoleak CE only for high low surgical risk aneurysm repair surgical risk (160) Multifaceted PATH, with Before-after Prioritize Most CE were Bariatric program funded primary care Oxford University study using micro investments bariatric surgery, and additional funding diabetes program simulation economic according to MDT; Least, for MDT; Insulin infusion model downstream effects insulin infusion pumps for type 2 on hold and CE following pumps for type II systemic review of diabetes strategy 64-slice CT PATH, with Patients for CA also Uncertainty Sensitivity lower OHTAC recommended angiography cardiologists, underwent CTA regarding than reported, slow diffusion until (CTA) v coronary radiologists, indications for reducing CE sensitivity issue resolved angiography (CA) selected AHSCs use, CE and QA (175) parameters PET to stage OCOG RCT Clinical utility in Terminated by PET insured for this locally advanced decisions regarding efficacy & safety indication NSCLC (310) combined modality committee therapeutics

OCR for page 25
PET to stage early OCOG RCT Resolve PET reduces futile PET insured for this NSCLC (322) inconsistencies to thoracotomy rates indication inform decision regarding access PET to stage OCOG Prospective cohort Compare PET to No utility in Not insured breast cancer sentinel lymph node staging (320) biopsy PET for colorectal OCOG RCT Clinical utility Accrual completed Awaiting results cancer metastatic in decision for February 2010 to liver (400) metastatectomy PET for head and OCOG Prospective cohort Clinical utility pre- No clinical utility Not insured neck cancer (400) surgery following radiation therapy Extracorporeal PATH with AHSC Prospective Basis for decision Effective in GvH; Insured for GvH; photopheresis observational regarding funding Inconclusive for Inconclusive for Sezary - (120) for GvH and Sezary Sezary small vol. after backlog dealt with Abbreviations: AHSC (academic health science center); CE (cost effectiveness); DES (drug eluting stent); GvH (graft vs host); ICES (Institute for Clinical Evaluative Sciences); MDT (multi-disciplinary teams); NSCLC (non-small cell lung cancer); OCOG (Ontario Clinical Oncology Group); OHTAC (Ontario Health Technology Advisory Committee); PATH (Programs for Assessment of Technology in Health); PET (positron emission tomography); QA (quality assurance); RCT (randomized controlled trial). SOURCE: As updated from Levin et al., 2007, by Levin in IOM workshop presentation on November 17, 2010. 37

OCR for page 25
38 GENOMIC DIAGNOSTIC TEST DEVELOPMENT opposed to the TEC and AHRQ reviews, the MAS evidentiary review found low-quality evidence for its prognostic value and very low-quality evidence for its predictive value in terms of the benefits of a classic chemotherapy regimen. As such, a coverage with evidence proposal has been put forth that would consider three key questions: (1) How does Oncotype DX change treatment? (A prospective cohort study has been proposed.) (2) How does Oncotype DX compare to traditional factors? (Electronically collected data on age, tumor size, grade, ER, PR, and HER-2/neu will make it possible to measure correlations between the Oncotype DX recurrence score and tradi- tional risk classification.) (3) What is the impact of Oncotype DX on breast cancer distant recurrence? (Longitudinal data will be collected.) These stud - ies will be informed by ongoing clinical studies, such as the TAILORx trial being conducted by NCI (Zujewski and Kamin, 2008). Another example is EGFR mutation testing in non-small cell lung cancer. MAS looked at the predictive value of mutated EGFR based on a retrospective subgroup analysis of archived specimens from a RCT of first-line treatment with gefitinib versus chemotherapy. The results of the analysis suggested a statistically significant improvement in progression- free survival for gefitinib versus chemotherapy in EGFR-mutation-positive patients, but not in EGFR-mutation-negative patients (Zhu et al., 2008). However, results of a similar analysis of second- and third-line chemo- therapy (erlotinib versus placebo) were not significant (Shao, 2010), and the studies reviewed were not designed to examine the predictive effects of the mutation. Levin noted that the current pattern of practice is to use erlotinib regardless of EGFR status for second or third-line treatment. As a result, it was recommended that there should be payment for EGFR test- ing for gefitinib as a first-line treatment and for EGFR testing for erlotinib for second- or third-line treatment, that treatment should be allowed for EGFR-negative patients, but that the response to erlotinib should be moni- tored by EGFR mutation status and the payment for EGFR testing in this group of patients should be modified based on the findings. Lessons Learned In summary, Levin said, coverage with evidence development works, but more efficient methodologies are needed to expedite conclusions. To this end, Levin suggested that evidence-based analysis should be imple- mented further upstream in the lifecycle of drugs and technologies. Industry, academia, and health systems should be engaged in the premarket phase so that the important evidentiary questions are addressed ahead of time. In this way, it may be possible to influence the development pipeline toward tech- nologies that are more relevant to health systems and to patient outcomes.

OCR for page 25
39 APPROACHES TO EVIDENCE GENERATION CONSTRUCTING CHAINS OF EVIDENCE The rules of evidence that apply to genomic information are no differ- ent than the rules of evidence for other forms of information about prog- nosis, prediction, or diagnosis, said David Ransohoff of the University of North Carolina at Chapel Hill. A chain of evidence is a series of questions or evidence that together describe the impact of some activity—in this case, a genomic test. A primary issue is what questions should be in the chain. Analytic Framework Ransohoff said that established analytic frameworks should be used to develop chains of evidence for genomic tests, and he referred to a presentation that Steven Woolf had given to the roundtable at its March 2010 workshop. Woolf had discussed standard analytic principles that are applied to the evaluation of screening tests regardless of the type of test and had noted that groups such as the U.S. Preventive Services Task Force and the World Health Organization generally consider five issues when assessing preven- tative interventions: (1) the burden of suffering from the target condition; (2) the accuracy and reliability of the test; (3) the effectiveness of early detection of the condition; (4) potential harms; and (5) the balance of benefits and harms. (IOM, 2010) These questions are simple to ask but difficult to answer, Ransohoff said. As discussed by Piper, Calonge, and others, a RCT addressing ques- tions 2 through 5 would be the ideal source of evidence. If there is no clini- cal trial that can answer all of these questions at once, then evidence must be pieced together. Evidence about genomic tests is often limited to the accuracy and reli- ability of the test. However, it is not sufficient for a test to discriminate. The bottom line is the outcome—the benefits and harms that occur because of an intervention choice based on the discriminatory capability of the test. For efficient discovery and development, researchers must work backward from a specific clinical decision and consider benefits, harms, and the intended use and desired features of the test. “Working backwards from a specific clinical scenario is absolutely critical [but] commonly not done,” Ransohoff said. Clinical Trials If no RCT has been performed, the evidence is necessarily limited. It is possible to assess the ability of the test to discriminate between popu-

OCR for page 25
40 GENOMIC DIAGNOSTIC TEST DEVELOPMENT lations but not to determine whether this discrimination or subsequent action affects outcomes. Much of the current genomic evidence is limited to questions of discrimination. Ransohoff also noted that in reviewing the available evidence he found that many of the studies published in clinical journals do not disclose critical details of the study design and participants and sometimes the discrimination observed is actually due to bias or to error, not to biology. As an example, Ransohoff cited a proteomics study about differential exoprotease activities which was looking to determine whether peptide pat- terns are sensitive and specific for prostate cancer (Villanueva et al., 2006). In the study the test arm was 100 percent male prostate cancer patients averaging 67 years of age. However, the control group was 58 percent women with a mean age of 35 years, leading to a potential source of bias in the findings. The publication reported this important detail, though only in supplemental data, but many published “-omics” studies are opaque, mak- ing it difficult or impossible to assess the strength of the evidence. Barriers to Implementing an Analytic Framework An analytic framework model makes clinical sense as an evidentiary pathway, and there is extensive experience with analytical pathways in other fields. The challenge is gathering the evidence to fill out the frame- work, Ransohoff said. Investigators may not think of data as a product of a study. If the study design is weak, then the link in the evidence chain is also weak. Studies need to be carefully and prospectively designed. Speci- mens should also be considered a product of a study, and the source of the specimens should be described in detail in the methods section of the publication. In many cases the rate-limiting step is funding, infrastructure, or infor- matics, but in generating evidence for genomic test development, the rate- limiting step is formulating the key clinical questions and designing a study that provides strong evidence or a link in the chain. The question that needs to be addressed, Ransohoff said, is whether existing data can be used in a strong design. Ransohoff also noted that there are opportunities to add well-designed studies onto current practices. As examples, he cited two studies, one a study of prognosis and the other of diagnosis. The prognostic study assessed the five-year risk of developing colon cancer after a negative colonoscopy (Imperiale et al., 2008), while the diagnostic study assessed the ability of colorectal screening to detect advanced proximal neoplasms in asymptom- atic adults (Imperiale et al., 2000). Both studies were superimposed on a program that a pharmaceutical company had already implemented as a

OCR for page 25
41 APPROACHES TO EVIDENCE GENERATION clinical benefit for its employees, Ransohoff said, and the prognostic study was done at no additional cost. Moving Forward An analytic framework for assessing the impact of a test on outcomes offers an established method for guiding clinical and policy decisions. Con- ceptualized this way, genetic and genomic information is not exceptional. An RCT to assess the impact of a test on outcome is ideal, but when it is not possible or available, there are other sources of data and evidence that can be used. Banked specimens from clinical trials can be used in prospec- tively designed studies to address questions about prognosis and prediction, for example, and there are various ways to use other data sources, such as cohort data from a health maintenance organization (HMO). Ransohoff advised participants not to be overly focused on infrastructure, informat- ics, and data sharing. Rather, the focus should be on answering specific clinical questions and opportunistically designing strong research studies in different settings. DISCUSSION Archived Specimens A participant noted that “you can’t test specimens if you don’t have the specimens to test.” Some fields, such as breast cancer research, collect tissues prospectively when conducting clinical trials, Dan Hayes said. It was noted, however, that the number of patients consenting to this tissue collec- tion has been declining for unknown reasons. Furthermore, collecting other types of samples, such as germ-line DNA, can be very costly. Hayes noted that clinical trials are increasingly run by pharmaceutical companies, which do not necessarily collect and store specimens, and he suggested that the FDA require sponsors of new drug applications to have created specimen banks from their trials, although he acknowledged that intellectual property and other issues would need to be addressed. Simon suggested that, going forward, it will be important to do prospective clinical trials and to store specimens with a prospective–retrospective analysis in mind. One issue with retrospective sample analysis is that it is not possible to optimize the way in which specimens are acquired for the various purposes that may arise in the wide range of possible future marker investigations. Analyte degradation during storage is another concern. Ransohoff agreed that decay is an issue and cautioned that it is important not to compare newer specimens with older specimens. Researchers need to be aware of the problems that can be caused by such decay, so that bias is not introduced in

OCR for page 25
42 GENOMIC DIAGNOSTIC TEST DEVELOPMENT the results. “We can be mindful with whatever specimens we have collected in the past and hope to store future ones better,” Ransohoff said. Hayes noted that NCI is developing a prospective systematic funding mechanism to answer some of the more basic questions regarding handling of samples, such as what are the implications if a sample sits for 3 days instead of 3 hours before processing or what happens if the sample is fixed for too long. One approach, Hayes said, is to develop an assay that works in the kind of tissue that is collected and archived currently. Another approach is to develop an assay that is so fundamentally powerful that it will change the way tissue is collected and archived going forward. Trial Design Participants discussed what is “clinically relevant.” There are mul- tiple study designs that are valid, and which of them is clinically relevant depends on the particular research question. In the end, the goal is to cross a threshold of evidence based on a combination of observational, retrospec- tive, prospective, and larger, more clinically oriented approaches. As the panelists noted, many of the studies that are currently being done are not designed to contribute to the evidentiary base in the way that is needed for genomics. A question was raised about how to better train the next generation of clinical investigators to think about biomarker studies. Should the NIH develop some very specific training programs as we move into the genomics era? Ransohoff noted that the system rewards clinicians for getting grants and publishing papers, as opposed to producing products or expanding general knowledge. Simon said that, in his experience, industry is extremely interested in new clinical trial designs that use predictive biomarkers or candidate predictive biomarkers in new drug development. Industry managers are concerned, however, about what the FDA will require (e.g., prohibitively large clinical trial sizes) and about the potential for more roadblocks in developing new drugs with companion diagnostics. He added that NIH funding is driving much of the basic research on identifying the key targets that could be candidate predictive biomarkers and drug targets. It was noted that patients are increasingly demanding access to inter- ventions that they regard as essential to their well being and health. Institu- tions are adopting technologies prematurely, and there is political pressure to approve or cover the latest technologies. This is a knowledge translation problem that needs to be addressed. It is important to consider evidence generation during the premarket phase, as trials are being designed and conducted, before product diffusion into the marketplace.