6
A Framework for Planning and Improving Evaluations of Telemedicine

In some respects, telemedicine is still a frontier. Rigorous evaluative discipline can be difficult to apply amidst the effort and enthusiasm that comes with developing projects, coping with immature technologies, gaining financial or political support, or building new markets. Systematic evaluations require time to plan, fund, and implement, and the evaluation projects inspired by the recent resurgence of interest in telemedicine generally have yet to be completed and reported. As a result, the models and information available to the committee were limited, although the committee learned much from the work that has been done.

Continued improvement in the field will depend on agreement by those interested in telemedicine that it is important to invest in systematic evaluation of telemedicine's effects on the quality, accessibility, cost, and acceptability of health care. The evaluation framework presented in this chapter attempts to relate broadly accepted strategies of health services research and evaluation research in general to some of the challenges and problems in evaluating telemedicine that have been described in preceding chapters.

Starting with the general principles set forth in Chapter 1, the committee devised several principles more specific to the task of developing the evaluation framework for clinical applications of telemedicine. First, evaluation should be viewed as an integral part



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 137
--> 6 A Framework for Planning and Improving Evaluations of Telemedicine In some respects, telemedicine is still a frontier. Rigorous evaluative discipline can be difficult to apply amidst the effort and enthusiasm that comes with developing projects, coping with immature technologies, gaining financial or political support, or building new markets. Systematic evaluations require time to plan, fund, and implement, and the evaluation projects inspired by the recent resurgence of interest in telemedicine generally have yet to be completed and reported. As a result, the models and information available to the committee were limited, although the committee learned much from the work that has been done. Continued improvement in the field will depend on agreement by those interested in telemedicine that it is important to invest in systematic evaluation of telemedicine's effects on the quality, accessibility, cost, and acceptability of health care. The evaluation framework presented in this chapter attempts to relate broadly accepted strategies of health services research and evaluation research in general to some of the challenges and problems in evaluating telemedicine that have been described in preceding chapters. Starting with the general principles set forth in Chapter 1, the committee devised several principles more specific to the task of developing the evaluation framework for clinical applications of telemedicine. First, evaluation should be viewed as an integral part

OCR for page 137
--> of program design, implementation, and redesign. Second, evaluation should be understood as a cumulative and forward-looking process for building useful knowledge and as guidance for program or policy improvement rather than as an isolated exercise in project assessment. Third, the benefits and costs of specific telemedicine applications should be compared with those of current practice or reasonable alternatives to current practice. Careful comparison is the core of evaluation. Fourth, the potential benefits and costs of telemedicine should be broadly construed to promote the identification and measurement of unexpected and possibly unwanted effects and to encourage an assessment of overall effects on all significant parties. Fifth, in considering evaluation options and strategies, the accent should be on identifying the least costly and most practical ways of achieving desired results rather than investigating the most exciting or advanced telemedicine options. Sixth, by focusing on the clinical, financial, institutional, and social objectives and needs of those who may benefit or suffer from telemedicine, evaluations can avoid excessive preoccupation with the characteristics and demands of individual technologies. The committee recognizes that actual evaluations face a variety of methodological, financial, political, and organizational constraints. Nonetheless, based on its review of current applications and evaluations, the committee believes that considerable improvement can be achieved in the quality and rigor of telemedicine evaluations and, thereby, in the utility of the information and guidance they provide to decisionmakers. Planning For Evaluation Before presenting the evaluation framework, the committee thought it was important to underscore the significance of systematic planning for evaluation. Evaluation is too often an afterthought, considered after the seemingly more important issues of putting a program together are settled. This approach jeopardizes the potential for the evaluation plan, the program plan, and program implementation to operate together to answer questions about the program's benefits and costs. For example, an effort to assess whether a telemedicine application is likely to be sustainable after a demonstration period will be more useful if the conditions for sustained

OCR for page 137
--> operation are considered in planning the personnel, procedures, organizational linkages, outcomes and financial data, and other aspects of the test application. Although evaluation strategies must necessarily be tailored to fit the policy or management concerns and the characteristics of different fields (e.g., education, public safety, health care), certain questions, concepts, and steps are common to the planning of successful evaluations. They include establishing evaluation objectives; setting priorities for the selection of specific applications to be evaluated; assessing the probable feasibility of an evaluation, including the availability of adequate funding and the likelihood of adequate cooperation from relevant parties; identifying the particular intervention to be evaluated, the alternatives to which it will be compared, the outcomes of interest, and the level and timing of evaluation; specifying the expected relationships between interventions and outcomes and the other factors that might affect these relationships; and developing an evaluation strategy that includes a credible and feasible research design and analysis plan. This list reflects several decades' worth of work in many disciplines to create scientifically respectable evaluation strategies that are also useful to decisionmakers and feasible to implement (see, e.g., Suchman, 1967; Weiss, 1972; NAS, 1978; Cook and Campbell, 1979; Sechrest, 1979; OTA, 1980a; Rutman, 1980; Wortman, 1981; Tufte, 1983, 1990; Rossi and Freeman, 1989; Mohr, 1988; Flagle, 1990; Wholey et al., 1994). Although this report was not intended to be a how-to-do-it manual, or to duplicate existing texts, the discussion below briefly discusses the above steps. Readers should, however, consult the references cited above—as well as those cited below and in the preceding chapter—for more detailed guidance. Establishing Evaluation Objectives Ideally, evaluation needs will be considered in the early stages of planning for pilot programs. This implies the identification of clear

OCR for page 137
--> objectives for the program, the stipulation of results that would indicate whether the program has met its objectives, and the specification of steps to collect relevant data about the program's operations and effects. Several important questions will ordinarily be considered in establishing the objectives for a particular evaluation. They include: What kinds of decisions may be affected by the results? Who will be the primary users of evaluation results? Who is sponsoring the evaluation and why? Who else has a major stake in the evaluation results? Determining the objectives—and, thus, the important questions to be answered or concerns to be addressed—for a particular evaluation may not be completely straightforward. In some cases, programs and activities evolve incrementally without much attention to well-argued rationales. Moreover, stated rationales may not always capture program goals, perhaps because the goals have not been carefully thought through or perhaps because underlying motivations are somewhat different from those that are declared. The latter situation may require that study designs be sensitive to political currents. In any case, investigators should seek to determine either what their target program was originally intended to accomplish or what objectives it may serve in the current environment (regardless of the past) or both. Even if program objectives are relatively clear, other considerations such as evaluation feasibility and anticipated concerns of possible future funders may influence the choice of specific evaluation questions. Government agencies, private foundations, and vendors will usually have interests related to public policies or market strategies that go beyond those of specific demonstration sites. Although project objectives can sometimes be stated in some order of priority, how they will be balanced and what trade-offs will have to be considered may be difficult to specify precisely in advance. The varying interests of project and evaluation sponsors are reflected in the expectations for the telemedicine projects supported by different federal agencies. For example, the Office of Rural Health Policy focuses on quality, accessibility, and cost of health care in rural areas. Although the Health Care Financing Administration is also interested in quality and access, its sponsored projects are intended primarily to provide information that will help the agency

OCR for page 137
--> formulate payment policies for Medicare. These differences in interest notwithstanding, federal agencies have been working together (as described in the preceding chapter) to formulate an umbrella framework for project evaluation that is intended to make it easier to aggregate conclusions from individual evaluations. Setting Priorities As is true for any activity, resources for evaluating telemedicine applications are limited, and funding for an evaluation may compete with funding for the services to be evaluated. Making the case for research to distinguish what works from what does not is easier in theory than in practice, for example, in cases when decisions have to be made between funding patient care at higher levels or funding program evaluation. Those sponsoring or conducting evaluations generally have to consider priorities for the use of limited resources in making two kinds of decisions: selection of topics and selection of evaluation strategies or methods. Topic selection is often handled quite informally, but a more formal or explicit process of setting priorities may help decisionmakers focus limited resources more rationally. Several core questions are generally relevant to any priority-setting exercise (IOM, 1992b, 1995b). These questions, which are framed below in terms of possible clinical applications of telemedicine, include How common is the telemedicine application now? How common is it likely to be? How significant is the problem addressed by the application? prevalence of the problem burden of illness (e.g., mortality, quality of life) cost of managing the problem variability across regions or population subgroups What is the likelihood that evaluation results will affect decisions about adoption of the application, its integration into routine operations, and other missions of the venture? Will the study wastefully duplicate or constructively supplement conclusions from other evaluations? Most of these considerations assume a societal or policy-level perspective. They are most likely to be raised by organizations such

OCR for page 137
--> as the Department of Defense, the National Library of Medicine, and the Office of Rural Health Policy that fund a variety of telemedicine projects and have a formal commitment to program evaluation. Nonetheless, health plans, health care delivery organizations, and vendors of communications and information technologies may also consider similar questions in determining where they will direct resources for systematic analysis and evaluation. The resource issues in selecting a research design revolve around three basic questions. First, what are the costs associated with different research strategies? Second, what are the costs of a strategy relative to its potential to provide answers to the evaluation questions? Third, is the cost of the evaluation strategy reasonable in relation to the potential costs and benefits of the application or program to be evaluated? In practice, evaluations often follow targets of opportunity. That is, they are designed to take advantage of the programs or capacities of an established institution or the political appeal of certain topics. For example, if a medical center has an energetic and determined specialist willing and able to design an application and secure funds, that person's project may take priority over applications with (theoretically) more organizational relevance. Likewise, if demonstration funds are confined to projects involving rural areas, urban applications with more potential benefit may be neglected. Determining the Feasibility of Evaluation In addition to costs, a number of other factors may affect the feasibility of an evaluation. Some factors have behavioral or political aspects. These include whether those responsible for the application or program in question will cooperate and whether the intended beneficiaries of a program will agree to provide the information needed from them. A different but possibly relevant question is whether the intended audience for the evaluation will be receptive to results that may run counter to their preferences or self-interest. Other considerations are quite practical. Will the needed information be available on a timely basis? If not, what steps would need to be taken to provide it, and how long would it take to implement those steps? Are the time demands for information collection excessive for program staff or beneficiaries?

OCR for page 137
--> Another practical issue involves the timing of an evaluation. Because most evaluations look for effects within a relatively short period of a few months or perhaps two to three years, timing can be a problem if the key results emerge over a longer term and if short-term outcomes are not good proxies for these long-term results. Moreover, evaluating a program before start-up problems are resolved may produce misleading results. Evaluating a program too late may also lead to problems if, for example, users are so comfortable with an intervention that they will not agree to be part of a control group not subject to the intervention. Feasibility assessments are also relevant to decisions about the alternatives to which the telemedicine application will be compared. If the preferred comparison sites will not or cannot participate, the comparison group may have to be the experimental group before the telemedicine application is initiated or after it has been concluded. This kind of before-and-after single group design is a relatively weak evaluation strategy, although measures taken at multiple points before, during, or after the telemedicine test will strengthen the design (see, e.g., Cook and Campbell, 1979). The choice of appropriate comparisons will depend, in part, on whether the application is in the earlier or later stages of development. For example, when image quality is yet to be established, an evaluation may compare diagnoses based on digital images with diagnoses based on conventional film-based images or direct patient examination. The next stage would extend the evaluative focus to consider other issues of quality, access, cost, patient and clinician acceptance, and feasibility in real practice settings. For example, in a project described in Chapter 5, physicians in one set of rural practices would be able to consult on dermatology problems via telemedicine while physicians in another set of practices would continue their traditional referral patterns. In some cases, the alternative might be doing nothing, but only if that is what would be expected in the absence of a program. Although general methodological and statistical principles exist to guide a multiplicity of evaluation tasks, no "one size fits all" evaluation plan exists. For example, if an evaluation is an early "test of concept" to determine the basic technical and procedural feasibility of a telemedicine application (e.g., home health monitoring), the research design and measures will likely differ from a later project

OCR for page 137
--> intended to help decisionmakers decide whether the application should be adopted as a regular service of a health care organization. Elements Of An Evaluation The committee identified several basic elements that should be considered in planning and reporting an evaluation, whether that evaluation is very tightly focused or broader in scope. These elements include Project description and research question(s) Strategic objectives Clinical objectives Business plan or project management plan Level and perspective of evaluation Research design and analysis plan characteristics of experimental and comparison groups technical, clinical, and administrative processes measurable outcomes sensitivity analysis Documentation of methods and results Although these elements are necessarily described individually and sequentially below, the development of an evaluation plan involves the continuing interplay and rethinking of elements as their conceptual and practical implications are assessed and reassessed. Moreover, during implementation, evaluators often find they need to revise the evaluation plan. In sum, the process of planning and implementing an evaluation flows logically but not always in a strictly linear fashion. Project Description and Research Questions The project description identifies the application that is being evaluated and the alternative(s) to which it is being compared. For example, the application might be described concisely as a dermatology consultation program using a one-way video and two-way audio link between a consulting center and two rural primary care sites. Two other rural sites would maintain their existing consulting practices. A thorough program description would more precisely and

OCR for page 137
--> completely identify the characteristics of the telemedicine and comparison services including relevant hardware and software employed, restrictions on the clinical problems or patients to be studied, the length of the project, and the project personnel. Specifying the basic research question or questions—the hypothesized link between the program intervention and desired outcomes—is a critical evaluation step. It encourages systematic thinking about how program interventions are expected to affect the outcomes of interest; what other factors may influence that link; and which different research designs and measurement strategies best fit the problem. By identifying the expected intermediate changes that an intervention must set in motion if the desired outcome is to occur, evaluators will be in a better position to give decisionmakers useful information on what contributed to a program's success or failure. For example, research on programs designed to change personal health habits or physician practice patterns have made it clear that not only must a service or decision guide be available, it must also be accepted and adopted (Avorn and Soumerai, 1983; Eisenberg, 1986; Soumerai and Avorn, 1990; Green, 1991; IOM, 1992a; Kaluzny et al., 1995). This research implies that potential clinician users of telemedicine, for instance, must (a) know an option is available; (b) understand the minimum details necessary to use it; (c) accept it, that is, conclude that its potential advantages (e.g., better clinical information or better patient access to care) outweigh its apparent disadvantages (e.g., inconvenient scheduling); and (d) act on the basis of their knowledge and conclusions. If one or more of these intermediate events fail to occur for all or most of the clinicians involved, then an application is likely to fail. Strategic and Clinical Objectives The strategic objectives in an evaluation plan state how the telemedicine project is intended to affect the organization's or sponsor's goals and how the evaluation strategy relates to those objectives. These goals might include improving health services in rural areas, keeping deployed soldiers in the field, reducing expenses for government-funded medical care, or strengthening an organization's competitive position. Competitive position is broadly construed to extend beyond the marketplace to encompass the need of public

OCR for page 137
--> organizations to demonstrate their value to the policymakers who determine which programs will survive in an era of government retrenchment and health care cost containment. For instance, the early strategic objectives for a telemedicine program at an academic medical center might be to add to the telemedicine knowledge base (and thereby serve the institution's research mission) and to establish or strengthen the center's research reputation in the field (and thereby lay the base for future funding). Depending on the results, later strategic objectives might relate more to the patient care mission or to reinforcing the institution's position in local, regional, and broader health care markets. The clinical objectives state how the telemedicine project is intended to affect individual or population health by changing the quality, accessibility, or cost of care. For example, a project might be intended to allow more frequent, economical, and convenient monitoring of homebound patients than is provided by existing home and office visit arrangements or it might be designed to improve access to appropriate specialty services for a rural population. To the extent possible, evaluators should identify in advance what constitutes favorable or unfavorable outcomes in a particular context. For example, does a clinical application of telemedicine need to show performance better than, equivalent to, or almost as good as the alternative(s) to which it is being compared? Depending on the outcome at issue, the goals of the project sponsor, and other factors such as severe cost constraints, the answer may vary. Thus, if an application was expected to (and did) substantially reduce costs and if costs were thought to be the dominant issue for the organization's customers, then an organization might consider a slight decrease in patient satisfaction to be tolerable. Although the judgment of the outcome or the way different outcomes are balanced may vary depending on the perspective, the definition, measurement, or calculation of the outcome should not differ. Level and Perspective of Evaluation Once the research questions and objectives have been established, the appropriate level and perspective of an evaluation will usually become apparent. Although they may overlap to some degree, at least three broad levels can be distinguished: clinical, institutional,

OCR for page 137
--> and societal. Somewhat different evaluation strategies may be appropriate for various levels of decisionmaking. At the clinical level, the evaluative focus is on the benefits, risks, and costs of alternative approaches to a health problem. For example, does digital teleradiology provide clinically acceptable images for breast cancer screening? What are the benefits and harms of telepsychiatry compared to the alternatives? Clinical evaluations provide critical guidance for decisions about individual patient care. An institutional decision to adopt a technology will, however, ordinarily require additional evidence of its feasibility and value. At the institutional level, the focus includes not only the application but also its organizational context including administrative structures and practices, clients or customers, clinical and other personnel, and clinical protocols. An institution-level evaluation might ask the following kinds of questions: Has a teleradiology link between a rural hospital and an urban radiology center affected referrals or revenues for each institution? Does a telemedicine link for troops in remote locations reduce medical evacuations? Are clinicians and patients at each site satisfied with a teledermatology link between a university medical center and a capitated medical group? How do the costs compare to the alternatives (e.g., physically referring patients, adding another dermatologist to the group)? What factors (e.g., equipment location or ease of use) appear to underlie the results (positive or negative)? Positive results at this stage of evaluation may encourage diffusion of a technology on an institution-by-institution basis. At the system or societal level, the focus expands to incorporate broader health care delivery and financing issues, particularly those involving the allocation of public resources. For example, does telemedicine have a role to play in state policies to support rural medical services? Or, more specifically, how do particular telemedicine applications compare to other policy options, such as area health education centers, direct subsidies for rural hospitals, and educational loan programs linked to practice in underserved areas? If the evaluation results look positive at this level, decisionmakers may support broad adoption and diffusion of the technology. In developing an evaluative framework and related criteria, this committee has attempted to keep in mind evaluation issues at each of these levels. The distinctions are particularly relevant in the areas of

OCR for page 137
--> through the research design or statistical methods. As briefly described in the addendum to this chapter, random assignment of patients to experimental and control groups is a classic method (actually, a variety of methods) to control for differences in patient characteristics. Often, however, researchers must rely on statistical or other techniques for controlling for differences. For example, to control for (rather than to determine) the effect of different provider payment methods, an evaluation might be restricted to either capitated or fee-for-service sites; alternatively, payment method might be used as a control variable in a multivariate statistical analysis. Technical, Clinical, and Administrative Processes In defining the application and comparison services to be evaluated and identifying the objectives of the evaluation, many elements of the project's clinical, technical, and administrative processes will become evident. The technical infrastructure includes not only the immediate hardware and software requirements of the application but also the larger information and communications systems available to support them (as described in Chapter 3). For example, if a project links an urban medical center and a rural clinic, what personnel are available to assist each site with technical problems? If the system depends on a satellite link, what scheduling and other restrictions apply? Will information about patients be available from a computer-based patient record or will the information have to be specially entered and collected for the project? Clinical processes are the way medical services are to be provided as part of the telemedicine project. Often, they are precisely set forth in a clinical protocol that identifies specific activities, their order and timing, responsible personnel, circumstances that trigger different protocols, and appropriate clinical documentation. Like technical processes, these processes are supported by a larger clinical care system that includes, for example, procedures for maintaining medical equipment, distributing medications, scheduling work flow, and monitoring clinical performance. Administrative processes also include any array of financial, legal, personnel, security, and facilities management. The most immediately relevant of these (e.g., procedures for establishing new staff positions, hiring personnel, purchasing equipment and services, receiving

OCR for page 137
--> funds, paying bills, and referring patients) will ordinarily be identified as part of program and evaluation planning. In addition to describing technical, clinical, and administrative processes as they are expected to operate and establishing steps to implement these processes, evaluators need to track processes as they actually occur to identify shortfalls and unanticipated problems or complications. If, for example, a homebound patient is to demonstrate range of motion in front of a camera, an evaluation should document whether patients follow the instructions well enough for the distant clinician to make an assessment. To cite another case, if military clinicians try to use telemedicine services but find that the clinical protocols are irritating, the equipment does not work, or the consultants are not scheduled appropriately, an evaluation needs to document this and, if possible, suggest how the problem could be resolved. Event or problem logs kept by project personnel may be used to record (for later analysis) departures from planned processes as well as unexpected events and problems. Without efforts to implement interventions as planned and to monitor the extent to which this happens, evaluators will find it difficult to distinguish between a failure of the telemedicine application and a failure to implement the application as intended. Such distinctions are critically important to those making decisions about whether to adopt, substantially redesign, or discontinue telemedicine programs. Measurable Outcomes Measurable outcomes identify the variables and the data to be collected to determine whether the project is meeting its clinical and strategic objectives. This committee was asked to focus on issues in evaluating quality, access, and costs for clinical applications of telemedicine. It also concluded that the acceptability of telemedicine to patients and clinicians warranted separate attention, although patient satisfaction frequently figures in assessments of quality of care, access, and cost-effectiveness. Depending on its objectives, an evaluation may consider a range of other outcomes related to an organization's competitive position, its relationships with other institutions, the demand for different kinds of health care personnel, the economic health of a community, or other effects. In addition to outcomes desired from the project, decisionmakers

OCR for page 137
--> will also benefit from evaluations that attempt to identify and measure possible unwanted and unexpected outcomes. A case in point is the "training effect" that appears to operate in some telemedicine programs such that the distant clinicians who participate in telemedicine consultations learn enough about diagnosis and patient management that they no longer need telemedicine consultations when they encounter certain patient problems. The benefit of such clinician education, however, may create a dilemma if demand for telemedicine consultations drops too low to justify continuation of a program. How such results might factor into decisions about the future of an application is not clear, but it would undoubtedly affect the interpretation of utilization statistics. The specification of outcomes to be measured should describe the time frame for the measurements, for example, rehospitalization within six months of discharge or patient satisfaction with telemedicine at the time of service. One of the most frequent limitations of clinical and program evaluations is their focus on relatively short-term outcomes. This focus is borne of time and budget constraints and data collection difficulties. These difficulties are especially acute for longer-term health and cost outcomes. Depending on the objectives, circumstances, and resources, an evaluation may involve a range of immediate, intermediate, and long-term outcome measures, as discussed further in Chapter 7. Sensitivity Analysis Because the committee believed that the fast pace of change and other uncertainties surrounding telemedicine applications were particular challenges, it highlighted one element of an analysis plan—sensitivity analyses—as a distinct item in the evaluation framework. Sensitivity analyses explore the extent to which conclusions may change if values of key variables or assumptions change. For example, financial projections may show the impact of different assumptions about costs for purchasing and maintaining telecommunications and other equipment. As noted above, a particular problem for telemedicine evaluations is the stability of technology or environment. With data capture, transmission, and display technologies improving in quality and declining in cost, evaluators may need to consider (a) how sensitive their conclusions may be to technological change and (b) how analyses might be constructed to estimate the

OCR for page 137
--> impact of certain kinds of changes. For example, an analysis of cost-effectiveness could include a sensitivity analysis that incorporates different assumptions about the timing and cost of key hardware or software upgrades or replacement (Briggs et al., 1994; Hamby, 1995). Documentation of Methods and Results In reviewing evaluations of telemedicine applications, the committee was often frustrated by the incomplete or casual documentation of the methods employed and the specific findings. One result was to diminish the utility and credibility of the reports. Efforts to identify weaknesses and improve documentation in research reports have been undertaken by a number of medical and health services research journals, including the Journal of the American Medical Association, Annals of Internal Medicine, Health Services Research, and Medical Care. They have developed guidelines and procedures to improve the clarity and specificity of abstracts, the processes of peer review, and the reporting of methods (including randomization procedures, sample sizes, and statistical power), data analysis and reporting, and sponsorship. (See, for example, DerSimonian et al., 1982; Pocock et al., 1987; Haynes et al., 1990; Altman and Goodman, 1994; Moher et al., 1994; Schulz et al., 1994; Sweitzer and Cullen, 1994; Taddio et al., 1994; Rennie, 1995; and Schulz, 1995.) At least one telemedicine publication, Telemedicine Journal, is attempting to follow this guidance. Although these suggestions have been aimed at journal editors, they have the important additional benefit of reinforcing basic principles of sound research and statistical analysis. Evaluation And Continuous Improvement As noted at the beginning of this chapter, one objective of evaluation and applied research generally is to provide decisionmakers with information that will help them redesign and improve programs. This is particularly true for evaluations conducted in the context of a continuous quality improvement process. The tenets of continuous quality improvement, which were derived in considerable measure from industrial applications, are described in detail elsewhere (see, e.g., Deming, 1986; Batalden and Buchanan, 1989;

OCR for page 137
--> Berwick, 1989; Berwick et al., 1990; IOM, 1990c, 1992a; Roberts, 1991; Williamson, 1991; Horn and Hopkins, 1994). Consistent with the evaluation framework set forth here are the principles calling for (a) planning, control, assessment, and improvement activities grounded in statistical and scientific precepts and techniques and (b) standardization of processes to reduce the opportunity for error and to link specific care processes to health outcomes. Another key principle emphasizes close relationships between customers and suppliers, for example, patients and providers or providers and suppliers of equipment or services. The application of this principle to the design and evaluation of telemedicine applications would address one of the human factor problems identified in Chapter 3: inadequate assessment of and attention to user needs. The very process of implementing a program and its evaluation components may make evaluators aware of program deficiencies or environmental obstacles to program success. For example, potential participants may balk at using equipment that is inconveniently located or difficult to apply. In addition, the evaluation frameworks and plans reviewed by the committee suggested a number of other means for securing information for program improvement. These included logs kept by clinical or technical personnel and individual or group "debriefing" interviews with participants. These strategies may identify poorly designed or located equipment, "user-unfriendly" software, inadequate training of personnel, bureaucratic burdens, or deficient patient record systems. Unfortunately, depending on the problems identified, the path to program redesign or improvement may or may not lie within the feasible reach of program administrators or sponsors. For example, some equipment deficiencies may be corrected by switching hardware but others may be resolved only if manufacturers are willing or technically able to fix them. In general, evaluations based on continuous improvement principles will expect that mistakes or poor outcomes are more often the result of system defects (e.g., poor scheduling systems) than of individual deficiencies. In an environment governed by this outlook, program evaluations may provoke less apprehension and win more cooperation from those whose activities are being studied.

OCR for page 137
--> Conclusion Based on its review of current applications and evaluations, the committee concluded that significant improvements are possible in the quality and rigor of telemedicine evaluations. This chapter has emphasized the importance of considering evaluation objectives and strategies during the early stages of program planning. Likewise, it has stressed the value of developing a business plan that explicitly states how the evaluation will provide information to help decisionmakers determine whether a telemedicine application is useful, consistent with their strategic plan, and sustainable beyond the initial evaluation stage. The fast pace of change and other uncertainties surrounding telemedicine applications argue strongly for sensitivity analyses to explore how conclusions may change if values of key variables or assumptions change. It also argues for thinking broadly about potential benefits and costs, carefully documenting how the technical infrastructure and the clinical processes of care were intended to operate, and tracking what actually does occur. This latter step is crucial if evaluators who find negative results are to determine, for example, whether the hypothesis linking independent and dependent variables is untenable or whether the hypothesis was not actually tested because the application was not implemented as intended. By tracking what actually happened, evaluators also may achieve a fuller understanding of critical success factors or the factors that, if changed, might improve results. The evaluation framework presented in this chapter is, in the lexicon of information technologies, a basic evaluation platform that incorporates general evaluation principles, principles adapted to the health care field, and elements of strategies proposed by those encouraging and conducting evaluations of clinical telemedicine. The framework is intended to promote improvements in individual evaluations, but the committee also encourages the coordination of evaluation strategies across projects and organizations, when possible. Addendum: Experimental, Quasi-Experimental, And Nonexperimental Designs As noted in the text of Chapter 6, a large literature on evaluation research designs exists to guide those planning evaluations of telemedicine

OCR for page 137
--> and other activities (see, e.g., Campbell and Stanley, 1963; Suchman, 1967; Weiss, 1972; Cook and Campbell, 1979; Sechrest, 1979; Rossi et al., 1983; Fink, 1993; Wholey et al., 1994). One value of this work is that much of it is not just theoretical but highly practical in its attempts to develop and encourage creative but respectable ways of handling difficult evaluation problems. These efforts revolve around concerns with internal and external validity. In a 1963 discussion that has become a classic source for evaluation research, Campbell and Stanley set forth an analysis of validity and threats to validity and provided a systematic assessment of the strengths and limitations of various common research designs. Internal validity focuses on the fundamental question: "Did in fact the experimental treatments make a difference in this specific experimental instance?" (Campbell and Stanley, 1963, p. 5). External validity focuses on the extent to which the procedures and results of a particular experiment can be generalized to other populations, settings, and circumstances. Box 6.1 lists the common threats to internal validity as identified by Campbell and Stanley. It also provides hypothetical illustrations of how they may appear in evaluations of telemedicine applications. Threats to external validity involve a variety of differences between the groups studied and the groups to which the results might be generalized. For example, generalizing to urban settings from projects in rural areas may be risky. A project that used physicians knowledgeable and enthusiastic about computer-assisted medicine might not produce results applicable to physicians without such knowledge and enthusiasm. A project undertaken in a fee-for-service environment might be less relevant in managed care markets. In general, research designs can be categorized as experimental, quasi-experimental, or nonexperimental. A true experimental design has two special characteristics. The first is that the design includes at least one group that is subjected to a carefully specified intervention or treatment and another that is subjected to a different intervention. The second characteristic is the random assignment of the subjects (e.g., patients) to the experimental and control groups. Ideally, experimental designs are also "double blinded" in that neither the investigators nor the patients know which group is receiving which treatment. The most highly structured randomized clinical trials (RCTs) have generally aimed to establish efficacy (effects under tightly controlled

OCR for page 137
--> Box 6.1 Threats to the Internal Validity of Evaluations "History, the specific events occurring between the first and second measurement in addition to the experimental measurement." Example: During the course of a telepsychiatry project in a poor rural area, a public clinic adds a psychiatric social worker to its staff and thereby makes access to on-site mental health services easier. "Maturation, processes within the respondents [those being studied] operating as a function of the passage of time per se (not specific to the particular events), including growing older, hungrier, more tired, and the like." Example: In a long-term monitoring program for seriously ill, homebound elderly patients, an unrecognized decrease in functional abilities may limit patients' capacity to carry out instructions successfully, potentially compromising evaluators' ability to assess the program and suggest ways it might be redesigned. “Testing, the effects of taking a test upon the scores of a second testing." Example: As primary care physicians participate in a series of teleconsultations for a particular clinical problem, they gain sufficient expertise in diagnosis and management that they no longer seek consultations for the problem. "Instrumentation, in which changes in the calibration of a measuring instrument or changes in the observers or scorers used may produce changes in the obtained measurements." Example: In the midst of a test of digital radiography, a new radiologist, who replaces a more experienced radiologist, takes over the comparison of digitally transmitted images against original films. "Statistical regression [regression to the mean], operating where groups have been selected on the basis of their extreme scores." Example: Of diabetic patients who have been treated for hypoglycemia, those who test lowest on their understanding of appropriate dietary practices are called weekly by nurses or nutritionists. "Biases resulting in differential selection of respondents for the comparison groups." Example: In a telepsychiatry evaluation that involved telemedicine and control sites, the control sites include patients with greater experience with psychiatric intervention. "Experimental mortality, or differential loss of respondents for the comparison groups." Example: In a home care evaluation, sicker patients drop out of the comparison group that was not receiving special services. SOURCE: Quoted material excerpted from Campbell and Stanley, 1963, pp. 5-6.

OCR for page 137
--> conditions) rather than effectiveness (results under actual conditions of practice). The strength of RCTs is based on the protection of internal validity through the randomization, restrictive patient selection criteria, masking from researchers and patients which patients are receiving which treatments, and strictly controlling the treatment protocols. A well-designed RCT may still have problems with external validity or generalizability to less controlled practice settings. For example, a recent retrospective analysis of data from two large HMOs on patients who discontinued antihyperlipidemic drugs (drugs to treat high cholesterol) because of adverse effects and therapeutic ineffectiveness suggested that "rates reported in randomized clinical trials may not give an accurate reflection of the tolerability or effectiveness of therapy in the general population" under ordinary conditions (Andrade et al., 1995). From a practical perspective, traditional, tightly controlled RCTs suffer several handicaps: they tend to be expensive, time-consuming, complex to plan and administer, and ethically or practically unsuitable for some research questions.* Thus, researchers have sought to develop adaptations and alternatives. One adaptation of the RCT includes "large simple trials" (Zelen, 1993). Large simple trials are simple primarily in that they ask fewer questions than many traditional RCTs. They would still require random assignment but would also rely more on statistical than physical controls of the research setting. Data collection is streamlined. Patients and clinicians anywhere in the United States or elsewhere could participate in a clinical trial if they met defined eligibility criteria and agreed to follow (and document that they followed) specific treatment protocols. Depending on the complexity of the research and treatment protocols, this openness may demand sophisticated and generally expensive programs of training, monitoring, operating assistance, and auditing. In one of its last reports, the Office of Technology Assessment urged those involved with effectiveness research to explore innovative ways to conduct randomized *   Among other technologies, drugs are frequent subjects for randomized clinical trials, in large part because the introduction of new drugs requires approval from the Food and Drug Administration based on evidence of safety and efficacy. Some surgical procedures have been the subject of RCTs, but many are introduced without any rigorous evaluation.

OCR for page 137
--> clinical trials and incorporate them into ordinary practice (OTA, 1994). Another option, the clinical practice study or effectiveness trial, generally involves a relatively rigorous form of quasi-experimental research (Horn and Hopkins, 1994; McDonald and Overhage, 1994; Stiell et al., 1994). Quasi-experimental designs cover a variety of strategies that may or may not include a control group or random assignment. Although they are weaker on internal validity, a strength of clinical practice studies or effectiveness trials is that they better represent actual conditions of practice and may be somewhat less expensive and time consuming. They do not insist on homogeneous patient populations that exclude those with comorbidities or complications that may confound analysis of the link between the experimental intervention and patient outcomes. Instead, they measure relevant patient characteristics using severity assessment tools and statistically adjust for differences in experimental and comparison groups. Further, they accommodate departures from rigid treatment protocols by carefully monitoring and measuring actual treatments and then incorporating these data in the statistical analysis. Because this approach does not disqualify large numbers of patients, it is easier to generate the numbers of cases needed for comparisons. Using regression or other statistical techniques, researchers test which process steps are associated with desirable quality, access, or cost outcomes for different kinds of patients. Although clinical practice studies tend to focus on shorter- rather than longer-term outcomes, the outcomes include effects that are noticeable and important to patients rather than only those that are physiologically measurable through laboratory or other tests. Such studies are often designed to be replicated easily so that they can be undertaken at multiple sites. Sophisticated computer-based patient information systems make it more acceptable to rely—as a "second best" strategy and with appropriate caution—on statistical control techniques rather than randomization and physical control of "confounding" variables. The objective of such alternatives is not to devalue or replace the RCT but to develop additional sources of systematic information on outcomes that will improve on the anecdotal and informal knowledge base that characterizes much of clinical practice (IOM, 1992a; Horn and Hopkins, 1994; OTA, 1994). Some of the telemedicine

OCR for page 137
--> research projects discussed in Chapter 5 attempt experimental and quasi-experimental research strategies. Even with less demanding designs, tension will exist between the principles of design and the pressures of real-world evaluation. Another stream of work on alternatives or supplements to the RCT has emphasized nonexperimental research based on the retrospective analysis of large databases that have often been compiled for other purposes (Roos et al., 1982; Moses, 1990; Hannan et al., 1992; NAHDO, 1993). Until telemedicine applications become much more common and routine and are assigned codes to identify them, large databases are unlikely to be useful sources of data on telemedicine applications. Nonetheless, those looking ahead to more widespread use of telemedicine should consider how routine collection of data about telemedicine may be useful and what would be required to incorporate such data in large data systems. The appeal of these data sources lies in their relative convenience, large numbers of cases, and ease of statistical analysis. Questions or criticisms related to use of large databases for health services research, performance monitoring, and other purposes involve their completeness, accuracy, relevance, and security from authorized access (IOM, 1994b; Maklan et al., 1994; Kuller, 1995). A variety of initiatives have focused on means to reduce the amount of missing data, validate and improve coding of clinical and other information, add information (e.g., death records), and develop methods to adjust comparisons for differences in severity of patient conditions (IOM, 1994b; Roos et al., 1995). Even with improvements, data collected for one purpose (e.g., claims administration) may remain questionable for other purposes (e.g., outcomes research) if they lack reliable information about patient medical status, processes of care, and other variables. The OTA, for example, warned that "focusing on this research method as a relatively simple, inexpensive first-line tool for answering comparative questions [about the effectiveness of treatment alternatives] is unwarranted" (OTA, 1994, p. 74).