Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 29
3
Improvements to CTR Metrics
INTRODUCTION AND GENERAL DISCUSSION OF METRICS
In this chapter, the committee begins by discussing metrics in general, including
terminology and a kind of taxonomy of metrics, what metrics cannot do, and some thoughts on
measuring trust and confidence. In the recommendations section, the committee offers
recommendations about objectives and partnership, working with and learning from other
agencies, and prioritization. Following the first set of recommendations, the committee gives
specific advice concerning border security (the Weapons of Mass Destruction Proliferation
Prevention Program (WMD-PPP), and the Cooperative Biological Engagement Program (CBEP)
program because the programs illustrate the points well.23 Those sections are followed by
recommendations on independent evaluation and factoring time and change into metrics. Chapter
3 closes with a set of important issues for Cooperative Threat Reduction (CTR) that are
sometimes seen as metrics issues, but in fact are not.
On Metrics
Metrics are used in many private and public domains: strategic planning and assessment;
business planning and assessment; manufacturing and service planning and assessment; policy
analysis; campaign planning and assessment; product/service design and assessment; systems
analysis; systems engineering, and engineering management. The term “metrics” is very broad
and additional terms are useful to differentiate the full range of metrics that are needed for a
large, important program like the CTR Program. A list of the most common terms used for
metrics can be found in Box 3-1.
The terms in Box 3-1 provide a much richer understanding of the purposes and the types
of metrics. Unfortunately, there is significant overlap in the terminology. Systems thinking can
help to organize similar terms into categories that provide a reduced set of terms and allowed the
committee to perform its assessment of the current CTR metrics and identify opportunities for
improvement. Figure 3-1 shows six categories of metrics: input, process, environmental, output,
value, and benefit cost metrics. The term on the top of each category is the term the committee
uses in the report to assess the CTR metrics. The terms in parentheses are similar terms binned in
the categories. Leading and lagging indicators are shown because they are an alternative way of
grouping the six metric categories based on the timeliness of the metric for determining
corrective action.
In addition to the purpose of the metric and the category of metrics, another important distinction
is what the committee will call the “type” of metric or measure. The type of measure has two
dimensions: the linkage to the objective, and the type of the scale. A “direct metric” is a measure
directly linked to the objective, such as the number of nuclear missiles dismantled. Analysts and
managers prefer to use a direct metric because it directly reflects progress toward the objective
23
The committee offers general advice for all of the CTR programs, but CBEP and WMD-PPP are open-ended,
expanding capacity-building programs and the committee has additional specific advice for them.
29
OCR for page 30
30 IMPROVING METRICS FOR THE DOD CTR PROGRAM
that a program is trying to achieve. “Proxy measures” are indirect measures used when direct
metrics are not possible (cannot be measured) or are unavailable, such as when the direct metric
is not timely. For example, cholesterol level is a proxy measure for the risk of heart disease. A
proxy measure typically does not encompass all of the important aspects of status or
performance, so managers sometimes use multiple proxy measures. Multiple proxies can
increase the data collection requirements and only provide marginal capability to assess the
achievement of the objective, but they may be all that are available.
The type of measure can be natural or constructed. A natural measure is a metric that has
a commonly accepted definition arising from an objective measurement. Temperature and weight
are examples of natural scales, as are silos destroyed and chemical destruction facility
availability. Natural scales are preferred because people understand them intuitively and are
easier to measure. Constructed scales are designed to measure the achievement of an objective.
The five star scale used to assess the quality of a product, security classification levels
(unclassified, confidential, secret), heat index, and body mass index are all constructed scales.
Constructed scales require clear definitions to be operational to assessors and managers.
BOX 3-1 Common Terms Used for Metrics
• Cost effectiveness: Assesses attainment of an objective or task relative to the costs.
• Benefit cost measure: A metric reflecting benefits attained relative to the costs, which may
be monetary or other costs.
• Value versus costs – Compares the value aligned with our objectives to the costs.
• Value measure – Align with our objectives, which we value.
• Measure of effectiveness – Assesses achievement of an assigned objective or task.
• Measure of merit – Another term that tells us how we achieve something we care about.
• Outcome measure – Focuses on the ultimate intended or unintended results.
• Performance measure: Describes how well a subsystem, system, or process meets its
required performance.
• Output measure – Outputs usually refer to direct results of a process.
• Efficiency measure: A measure of how resources used in a project or program.
• Process measure – Captures some attribute of a process.
• Input measure – Identifies the resources or activities provided prior to a process.
• Resource measure: Identifies the resources (e.g., dollars, people’s time, or materials) used
by system or process.
• Leading indicators: Measures that provide early indicators of impact and effectiveness and
allow managers to take corrective action on the process if required.
• Lagging indicator: A measure that is available after the current activity allowing corrective
action only for future processes and operations.
• Environmental measure: A factor that could have a direct or indirect impact on the system
or process but is not under the control of the program or project managers.
• Adversary measures: Assesses the potential or actual actions of adversaries that can have a
direct impact on the process or system.
• Threat measures: Assess capabilities and intent of a potential opponent
• Criteria – Factors that differentiate alternatives.
• Attribute – Characteristic of a system or process.
• Metric – A standard of measurement.
OCR for page 31
IMPROVEMENTS TO CTR METRICS 31
Environment Process,
(Efficiency)
measures
Value
(Outcome,
Output
Benefit, Merit,
Input
(Effectiveness,
(Resource) Criteria)
Performance)
System
measures Measures
measures
Benefit Cost
(Value vs Cost,
Cost Effectiveness)
Leading Environmental Lagging
Indicators (Adversary, Indicators
Threat ) Measures
Time
FIGURE 3-1 Metrics organized into six categories using systems thinking. The term on the top of each category is
the term the committee uses in the report to assess the CTR metrics. Depending on the metrics’ timeliness it could
also be assessed as a leading or lagging metric.
For each objective, the Department of Defense’s (DoD) best option is to identify a
natural, direct metric. If that is not possible, the objective may be decomposed into tasks that are
more measureable. If DoD cannot identify a natural, direct metric then DoD should develop a
constructed scale metric. Because proxy measures only capture part of the objective, proxy
measures should only be used if decomposition and constructed scales are not practical (see
Table 3-1). There are other important qualities that make a metric more or less attractive:
validity, reliability, sensitivity to change from interventions, cost of collection, and overlap with
other measures of objectives.
TABLE 3-1 Classification of metrics by alignment with objective and type of scale ranked according to preference
(1 to 4). The first preference is a direct, natural measure because it focuses directly on the objective and is well
understood. The second preference is a direct, constructed measure. It is more important to be directly focused on
the objective even if managers have to design a constructed scale. The third preference is a natural proxy measure,
and the last preference is a constructed, proxy measure.
Alignment with objective
Direct Proxy
Type of
Scale Natural 1 3
Constructed 2 4
OCR for page 32
32 IMPROVING METRICS FOR THE DOD CTR PROGRAM
What Metrics Cannot Do
The committee already described the important role of metrics in strategic (program)
planning and tactical (project) implementation. However, it is also important to emphasize that
metrics are a management tool and are not an end in themselves. There is some information that
metrics cannot tell us, and there are several metrics implementation challenges.
First, metrics are only as good as the strategic thinking devoted to their development. For
complex problems there is always the potential that an important objective was omitted when
metrics were developed. Because metrics should follow from and be aligned with objectives,
weaknesses or oversights in strategic thinking may lead to deficiencies in metrics. Stated another
way, no set of metrics, no matter how elaborate or sophisticated, can compensate for a flawed
strategy; hence the necessity of a sound strategic underpinning and well-formulated objectives.
The best guarantee that one has a full set of metrics is meaningful discussion with all key
stakeholders including the strategic and tactical leaders in the host country.
One aspect of strategy development that often requires special attention or emphasis is
the identification of alternatives. There are usually multiple options for pursuing a given set of
objectives, each with its own advantages and disadvantages. Arguments about metrics sometimes
reflect more fundamental disagreements about how best to solve a problem, not how to measure
it. Although a set of metrics can measure the impact and effectiveness of a project or program
against objectives, and can even be used to compare competing options in some cases, metrics
alone cannot ensure that the best option or options have been identified and are being
implemented.
Second, metrics do not necessarily or automatically identify the cause of an adverse trend
on a metric. This is especially true for output metrics. Managers and overseers may need a
separate investigation, perhaps using specialized measures of the contributing causes to events or
outputs to identify and understand the root cause of a metric trend. Following these trends can
also lead to the observation of “unintended consequences,” which may be positive or negative
but worthy of note.
Third, metrics do not identify the best corrective action when a metric trend is not
according to plan. Focusing on the root cause of an adverse trend on a metric may require a new
objective and one or more new metrics. Unfortunately, if the metric is a lagging metric it may be
too late for effective corrective action. It may, however, still be useful in future planning.
Fourth, it is important to remember that metrics are dynamic. The initial metrics may be
well suited to the formulation and initiation stages of a program but may not adequately account
for issues and concerns that arise later during program implementation. Therefore, it is important
that the objectives and the metrics be dynamically updated throughout the project lifecycle, while
avoiding the problem of “moving the goalpost.”
Fifth, metrics cannot measure something that does not happen. Deterrence and some
kinds of prevention are in some sense unknowable and unmeasureable. Proxy measures may be
possible in these cases. The U.S. Government has a model of what leads to or exhibits risk. That
model may be explicit (a stated set of risk factors that can lead to national security risks for the
United States) or implicit (unstated assumptions about risk factors), but same model forms the
basis for program strategy and planning. Proxy measures can be derived from the model, but the
most desired outcomes (no WMD attacks) cannot be shown to follow directly from any
particular actions.
OCR for page 33
IMPROVEMENTS TO CTR METRICS 33
This is why in assessing the overall effectiveness of the metrics for a program like CBEP,
DoD has to step back from the numbers and ask itself whether the metrics reveal if the program
is succeeding in establishing the human and institutional relationships, the interactions, and the
transparency that result in trust and confidence between and among the partners.
Measuring Trust, Confidence, and Goodwill
How does one measure intangible impacts such as goodwill, mutual respect, collegiality,
partnership, trust, and confidence? These are some of the objectives of scientist engagement
programs, and they seem to be goals for which anecdotal rather than hard quantitative measures
might be appropriate.
In some fields of endeavor, surveys are crafted to learn respondents’ opinions, and
conclusions are drawn from sufficiently large samples. For example, an extensive study of
Russian WMD scientists was conducted to determine if respondents would be tempted by offers
of WMD employment in rogue states (Ball et al., 2004). In the absence of International Science
and Technology Center assistance, the risk of scientists “going rogue” was assessed as being
significantly greater than with the assistance firmly in place. In the words of the study’s authors:
Our data from an unprecedented survey of 602 Russian physicists, biologists, and chemists
suggest that the threat of WMD brain drain from Russia should still be at the forefront of our
attention. Roughly 20 percent of Russian physicists, biologists, and chemists say they would
consider working in rogue nations such as North Korea, Iran, Syria, or Iraq (still considered a
rogue state at the time of the survey). At the same time, the data reveal that U.S. and Western
nonproliferation assistance programs work. They significantly reduce the likelihood that Russian
scientists would consider working in these countries. (Ball et al., 2004)
Surveys may be helpful in CTR programs. At the same time, DoD should be careful in
relying on surveys, as there can be volunteer or self-selection bias (who participates) and
response bias (deliberate or inadvertent skewing of responses to conform to an expectation rather
than reflecting true beliefs), and a variety of other potential problems.
Another study (Revill, 2009) used surveys and other non-traditional measures to assess
the effectiveness of dual-use educational efforts. In that study, concepts from the social sciences
are proposed to augment the toolkit of traditional metrics. The author concludes that a multi-
pronged approach to assessment seems the most likely to yield useful actionable results, at least
in their field of endeavor. They note:
With these points in mind, it is possible to identify a number of traditional and less orthodox
methods of evaluation. However, as standalone measures none is likely to be successful in
generating a holistic evaluation of any programme or process. Rather it is likely that a mixed
method will be required to evaluate dual-use education that blends a number of different methods
including inter alia, Questionnaire/Surveys; Likert model questionnaires; Social Network Analysis
Content Analysis and Impact Evaluations. Using these combinations of measures it is argued that
there is great scope for assessing dual-use educational objectives during and after a project and
thus providing a much clearer understanding of what the process has achieved and how it could be
improved in the future.
Ultimately, one might simply take responsible, reliable, and collegial behavior to be an
indicator of outcomes (i.e., did the scientist contact international colleagues when
OCR for page 34
34 IMPROVING METRICS FOR THE DOD CTR PROGRAM
dangerous infections or anomalies arose? Are the laboratory operations transparent and
comfortable with international collaboration?).
RECOMMENDATIONS TO IMPROVE METRICS
The committee has several recommendations for how DoD can improve its metrics for
the CTR programs.
Objectives and Partnership
For each program in the DoD Metrics Report, DoD should include a concise
statement of its objectives and of how the program is intended to reduce
threat or risk.
Objectives for projects and the overall CTR Program in a partner country
are developed jointly between the United States and the partner country. An
agreed set of metrics should also be built into projects from the outset. They
may change, but the parties responsible for the projects should know at any
given time the metrics that will be used to measure impact and effectiveness.
The CTR Program was established by Congress with clear authorities and each activity
must begin with a clear statement of the United States’ authorized overall objectives. DoD then
needs to work with partner countries to define mutual, high-level objectives for their joint efforts.
To measure impacts and effectiveness, metrics must include output metrics (e.g., changes in
interdiction rates at borders) not just input metrics (e.g., training materials provided). DoD
should identify the capabilities it needs to achieve its high-level objectives, the objectives for
each capability, and where possible develop its metrics from outputs linked to the capacities that
the programs are trying to build. DoD should also build the program and capability objectives
into the DoD-partner agreement, and include provisions for metrics and the means to carry out
those measurements.
As DoD takes CTR to new countries, it has opportunities to utilize lessons from 20 years
of experience with cooperative threat reduction. The committee summarizes a logical order to
developing metrics as follows.
1. State clearly the high-level objectives of the overall U.S. CTR Program, including
linkage to threat or risk.
2. Work with the partner country to define high-level objectives for joint activities in their
country. U.S. goals and partner-country goals need not match exactly, but they must be
compatible.
3. Identify the capabilities needed to achieve the high-level objectives.
4. Define objectives for each capability being developed.
5. Define metrics with the partner country at the outset based on capability objectives.
Agree on baseline, milestones, and measures of success. Different metrics may be
appropriate for different stages of the program.
OCR for page 35
IMPROVEMENTS TO CTR METRICS 35
6. Build metrics—including exercises, if appropriate—into the implementation of the
program.
7. Evaluate results independently (U.S. only) and together with partner country.
8. Feed evaluation back into decision-making process. If circumstances or other factors lead
to a change in objectives, the metrics may need to be revised.
Working with and Learning from Other Agencies
DoD plans to leverage other U.S. Government agencies’ experience,
capabilities, and assets as CTR expands to new countries and as it continues
existing programs. DoD also needs to communicate, coordinate, and
cooperate with relevant agencies.
As noted in Chapter 2, DoD plans to leverage other U.S. Government agencies’
experience, capabilities, and assets as CTR expands to new countries and as it continues existing
programs. DoD also needs to communicate, coordinate, and cooperate with relevant agencies.
One might argue that cooperation should be measured implicitly by looking at outcomes.
Unlike safety and security (discussed in regard to the Chemical Weapons Elimination Program in
Chapter 2), cooperation itself is an input and an outcome. So, why create metrics for cooperation
if cooperation is an input measure (means) rather than an outcome (an end in itself)? It can be
difficult to link interagency cooperation to a particular outcome in CTR, but there is experience
indicating that problems arise from lack of coordination and lack of cooperation. It stands to
reason that an agency newly entering a partner country will be more effective in that country if it
learns from agencies that already work there. Furthermore, cooperation can serve as a leading
indicator (as long as it is not simply a checkbox metric)—by the time an outcome measure
indicates a problem, it may be too late to correct the lack of cooperation. The committee has
struggled with this problem: A whole-of-government approach is critical to the success of some
programs, and especially some planned programs, so a metric for interagency consultation,
coordination, and cooperation would be a useful indicator for the implementation of a program.
But the committee has not identified or developed a metric that is more than a checklist.
Therefore, the committee encourages DoD, CTR experts outside of DoD, and experts in program
assessment and evaluation to consider this challenge, but the committee makes no
recommendation on a metric for taking a whole-of-government approach, important as it may be.
DoD is not the only agency engaged in capacity-building programs. U.S. Customs and
Border Protection conducts a mission similar to the PPP along many thousands of kilometers of
border and has developed metrics for its mission and operations. The United States Department
of Agriculture’s Animal and Plant Health Inspection Service is a leader in an international
surveillance network that has many parallels to the global network DoD leadership envisions for
biothreats. United States Agency for International Development operates programs with partners
across the world to foster democratic institutions, compared to which DoD’s capacity-building
programs seem relatively tangible. The DoD Defense Security Cooperation program also shares
important similarities with the DoD CTR Program and may in some cases serve as a model. Not
only can DoD learn from other agencies, but DoD will be working with other agencies in a
“whole of government” effort, and they might already have mechanisms in place for measuring
impact and effectiveness that would be useful to DoD CTR.
OCR for page 36
36 IMPROVING METRICS FOR THE DOD CTR PROGRAM
From the variety of the national security programs that have comparable metrics
challenges to the CTR Program, the committee selected three of these programs for comparison
to gain insights into the CTR metrics approach. The Defense Security Cooperation Program,
Capacity Building in Iraq and Afghanistan, and the DoD Capabilities Based Planning Program
each illustrate important aspects of CTR. Table 3-2 displays several features of the four
programs: the program purpose, scope, number of programs, objectives, metrics, and use of
metrics.
The Defense Security Cooperation Program uses metrics to assess security cooperation
gaps and the contributions of projects in each of the geographic regions assigned to each
Combatant Command (COCOM) under the Unified Command Plan. In addition, many of the
supporting commands and services (e.g., U. S. Army Pacific Command and Marines) use metrics
to assess the contributions of their security cooperation programs to command missions and
security objectives.
The Defense Security Cooperation Agency serves as the DoD focal point … for the development
and implementation of security assistance plans and programs, monitoring major weapon sales and
technology transfer issues, budgetary and financial arrangements, legislative initiatives and
activities, and policy and other security assistance matters. (DoD Directive, 5105.65)
Capacity Building in Iraq and Afghanistan is a complex process that uses metrics to
assess the progress of the approved Campaign Plan and to assess the progress of individual
projects that build specific infrastructure capabilities in specific locations. The typical approach
TABLE 3-2 Comparison of the CTR Program with other DoD programs that face challenges developing metrics
Capacity Building
Program Cooperative Threat Defense Security DoD Capabilities
in Iraq and
Feature Reduction Cooperation Based Planning
Afghanistan
Reduce nuclear, Build the
Increase security
biological, and infrastructure to
cooperation to Develop future U.S.
Purpose chemical WMD risk provide for security
support U.S. national military capabilities
to U.S. national and economic
interests
interests development
Defense and Mission assigned by Capability gaps are
CTR Program
Scope COCOMs regional President and identified in defense
objectives
objectives Commander’s mission areas
Campaign plan
Capability gaps
Several projects for Many projects in Many projects to
Sub-Programs determine need for
each major program each COCOM develop country
weapons systems
capacity
CTR and project COCOM regional Acquisition program
Objectives Commander’s
objectives security objectives objectives
objectives
Key Performance
Metrics for each Metrics for each
Metrics Project only Parameters and Key
region and country objective
System Attributes
Assess campaign
Evaluate progress of Identify gaps and Assess program
Use of Metrics progress and project
the 4 annual projects assess annual projects progress
progress
OCR for page 37
IMPROVEMENTS TO CTR METRICS 37
to metrics is to develop a hierarchy of missions, objectives, tasks, and metrics. While quantitative
data are obtained on each of the metrics, typical reporting to senior leaders may use red, yellow,
and green colors to simplify the quick identification of gaps and problem areas (where red means
the objectives have not been met, yellow means they have partially been met, and green means
they have been or will soon be fully met). Due to the long time horizons to build national
capacity, the operational analysts that perform and present the data to senior leaders report a
tension between the focus of the project leaders who are interested in reporting on the successful
completion of their short-term projects and the campaign planners who must assess the long-term
impact and effectiveness against campaign objectives.
The DoD capabilities based planning process replaced the threat based planning process
used in the Cold War. Paul Davis defined capabilities based planning as “planning under
uncertainty to provide capabilities suitable for a wide range of modern-day challenges and
circumstances while working within an economic framework that necessitates choice (Davis,
2002). The process has grown in complexity but the basic ideas are straightforward. The process
begins by assessing each mission area to define the joint capabilities that will be needed in the
future. This typically is done using a hierarchy of capabilities, objectives, tasks, and metrics. The
next step is a capability gap analysis to identify the capability gaps for each mission. After the
capability gaps are identified, a functional solution analysis identifies and evaluates joint
concepts to provide the needed capability. Finally, where necessary, acquisition programs are
initiated to obtain the capability to fill the gap. A capability development document defines the
program success metrics which are called Key Performance Parameters with minimum
performance (threshold) and aspirational goals (objective levels) and Key System Attributes.
Comparing the CTR Program with similar programs provides insights into the different
uses of metrics. One of the major differences between the CTR Program and the other three
programs is that the CTR metrics do not address the full scope of the threat in each of the WMD
areas. Instead, the metrics address only the scope of the funded projects. While the project
metrics are useful in assessing the annual project status, they do not help Congress and senior
leaders in DoD and other parts of the government to understand the full scope of the potential for
cooperative threat reduction which could help identify the need for and scope of future projects.
Observations from Non-Security Programs
Across areas as diverse as public health, poverty reduction, democracy promotion, public
education and others, experts have been attempting for years to employ systematic efforts to
understand the effectiveness of their programs. Cultural sensitivities associated with many of
these initiatives, constraints of limited resources as well as a sincere interest in positive change
have motivated multiple and sustained efforts to develop effective metrics for these programs.
Their task is made more complicated by the fact that their objectives – improving public health,
reducing poverty, developing democracy, and increasing educational achievement – are at least
as difficult to measure as those of capacity building and improving cooperation through CTR
projects. Despite the challenge of trying to measure objectives difficult to quantify, much
progress has been made in developing methodologies and metrics to do just that. A few examples
may provide useful lessons for the DoD CTR programs.
The field of global public health assistance (see e.g., Unite for Sight, 2010) has spent
considerable resources on healthcare efforts—as much as $14 billion in 2004 (Kates et al.,
2004)—and therefore donors as well as communities affected have increasingly demanded better
OCR for page 38
38 IMPROVING METRICS FOR THE DOD CTR PROGRAM
understanding of results of those investments. This has led to the explicit differentiation between
outputs (goods or services generated by the programs) and outcomes (impacts of programs). For
management and contracting purposes, it is important to track outputs, but far more important is
measuring outcomes, which in the public health area as well as other areas, include “changes in
behavior, attitude, skills, knowledge or condition” (Unite for Sight, 2010; see McAllister, 1999).
How does one measure attitudinal and behavioral change that would ensure that the
desired outcomes are achieved? Stating the objectives as part of the initial program design by
focusing on outcomes over outputs allows for the development of metrics that, while perhaps
more difficult to measure, gets closer to understanding the extent to which the program actually
reaches the stated goals.24 The global health assistance community has been working to further
develop these methodologies and metrics. Specific efforts include those of the Health Metrics
Network and the Institute for Health Metrics and Evaluation. The World Health Organization
and International Monetary Fund also continue to work on the challenge of metrics development
necessary for understanding outcomes/impact.
Another equally challenging area to measure is that of democracy assistance, beginning
with the definition of “democracy” itself. Attempts to better understand the considerable challenge
of measuring such a difficult object as “improving democracy” led the U.S. Agency for
International Development to ask National Academy of Sciences to bring social science
methodologies to bear on the challenge. The report resulting from this request concluded that due
to the difficulty of the overall challenge, a host of methodologies is needed to more effectively
conduct impact evaluations, defined as evaluations that assess whether and how the specific
program made an impact in the expected (or even unexpected) ways. In other words, the
methodologies for evaluating the impact of projects are designed to understand what would have
happened if the USAID programs did not occur. The methodologies proposed in the 2008 report
include case studies, systematic interviews, and randomized studies (NRC, 2008).
Perhaps more important than the adoption of a particular methodology is the selection of
a rigorous methodology designed from the outset to measure progress toward the objectives of
the program along with its impacts and effectiveness.25 In the USAID context, there are three
types of objectives: project monitoring (routine oversight or project management), project
evaluation (is the project having its intended affect), and country assessment (is the country as a
whole improving). Each of these objectives requires different metrics that specifically and
systematically match these tasks with corresponding measures.
Equally important is the collection of appropriate data for each of these measures both
before the project begins (baseline data) and after the project is completed (outcome data). The
same data should be collected wherever possible for comparable individuals, groups, or
communities that, whether by assignment or for other reasons, did not participate in the program.
(NRC, 2008). The 2008 report concludes that determining effective metrics for each of the
24
Moreover, public health programs require a considerable degree of cooperation and integration with local
government and community leaders to establish joint goals, methods of implementation, and measurement of
outcomes/impact. Learning more about these interactions may provide helpful insights for DoD since there is
important overlap between the ability of a public health system to benefit the public overall as well as to accomplish
security objectives.
25
What is the difference between progress and impacts? Consider the Chemical Weapons Elimination program.
Years were spent planning and constructing the chemical weapons destruction facility. Although the program may
be making good progress, that part of the program has little impact until it begins operations actually destroying
chemical agent.
OCR for page 39
IMPROVEMENTS TO CTR METRICS 39
objectives and the collection of baseline and outcome data will lay the groundwork for
appropriate impact evaluations of USAID’s programs.
Finally, however, they found that “even if USAID were to complete a series of rigorous
evaluations with ideal data and obtained valuable conclusions regarding the effectiveness of its
projects, these results would be of negligible value if they were not disseminated throughout the
organization in a way that led to substantial learning and were not used as inputs to planning and
implementation of future [democracy and governance] projects” (NRC, 2008). This same
learning process would be useful to DoD as it evaluates the impact of its CTR programs.
Prioritization
The committee judges that using a consistent framework to prioritize and
refine metrics within a program would help DoD and other CTR decision
makers. Using such a framework, DoD can identify the highest priority
metrics, ensuring that the metrics are useable and useful, and allow decision
makers to feed results back into the overall CTR objectives and budgetary
process. Any of several decision-making or prioritizing frameworks would
work, including the decision analysis technique of swing-weight analysis and
the DoD capabilities based planning process.
As noted in Chapter 2, DoD developed each set of metrics ab initio, using a different
approach for each program, i.e. chemical, biological, borders, and nuclear. The committee found it
difficult to identify what were the most important metrics and what might be missing from DoD’s
metrics because DoD used a different approach for each program. Although each CTR program is
distinct and possibly unique, the committee’s suggestion that DoD consider using a consistent
framework is based on best practices, the fact that using a consistent framework is easier, and the
committee’s hopes that DoD will maximize what it can accomplish with metrics. Using a well-
developed framework helps ensure completeness (Did we include everything that matters?),
internal consistency (Did we double count? Do these pieces fit together logically?), and focus (Did
we remove what doesn’t matter?). It also helps with management and oversight because it offers
the possibility that managers might be able to compare across programs, and does not require the
managers and overseers to learn a new framework for each program. The burden then is on those
who prefer using different frameworks to argue why they should be different.
An important component of measuring what matters is prioritizing among capabilities
and within sets of metrics. Not all goals are equally important and the act of prioritization will
help focus limited resources. DoD should use a consistent framework to prioritize and refine
metrics within program, for both the decision and the prioritization frameworks. In this report,
the committee highlights the decision analysis technique called swing-weight analysis (described
below) and the framework used widely in DoD called capabilities based planning.
Using a consistent framework does not mean that the metrics for each CTR program or
program activity should be the same, but that a common framework should be used for defining
objectives and metrics should flow from those objectives. This will make the metrics useable and
useful, and allow decision makers to feed results back into the objectives and budgetary process.
Prioritization is a critical component of the effective and efficient use of metrics in program
OCR for page 44
44 IMPROVING METRICS FOR THE DOD CTR PROGRAM
depict the state of border security” (Fisher, 2011). This approach is built on a perception that as
resources increase, the cost of getting caught also increases, therefore, the illegal entries
decrease.
U.S. Customs and Board Protection Commissioner Alan Bersin in a different hearing
said: “The success of our efforts … must be measured in terms of the overall security and quality
of life of the border region; the promotion and facilitation of trade and travel; and the success of
our partnerships in enhancing security and efficiency” (Bersin, 2011).
This debate, which will undoubtedly continue in Congress and elsewhere in the years to
come, is a reminder of the fundamental point the committee emphasizes elsewhere in this report.
The debate is a reflection of the operational reality that an effective boarder security system
cannot rely upon a single actor or point of failure. Specifically, to be useful, metrics must be
linked to, and ideally derived from, a clearly defined and agreed set of fundamental goals and
objectives. Where fundamental differences exist about objectives, metrics can reveal or illustrate
those differences, but cannot not resolve them. However, through an iterative process focused
both on objectives and metrics, clarity about metrics can potentially inform and shed light on
debates about goals and objectives.
The DoD Metrics Report mentions that “one way to measure the effectiveness of
enhancements objectively is through testing; that is, standardized exercises that can be conducted
before and after enhancements are made to measure the impact of those enhancements.” In
addition, the Report presents a set of more detailed project-level metrics that are intended to
capture a wide range of specific impacts or effects in relation to a detailed hierarchy of goals and
objectives.
The committee recognizes the difficulty of establishing a compelling overall
effectiveness metric. As the section on border security metrics suggests, the question remains
difficult and controversial even for the U.S. Border Patrol. In the absence of an overall
effectiveness metric, it may be necessary to settle for a set of activity-based and output-based
metrics. However, the committee has several recommendations for improvement.
As discussed elsewhere in the report, the committee believes that tests and exercises,
including the use of red teams if and where appropriate, have an essential role in measuring
effectiveness of CTR programs. The committee agrees with the potential value of exercises for
measuring the impact of border security cooperation and is aware that such exercises have been
conducted (e.g., the annual Operation Sea Breeze joint training exercise on the Black Sea).
Although care should be taken in interpreting the results of exercises, to guard against overly
optimistic appraisals and the tendency to perform better during an exercise than during
unplanned events, the committee recommends that the WMD-PPP add exercise-based metrics
explicitly to its collection of metrics.
The metrics defined in terms of “miles of border enhanced” and “number of ports of
entry enhanced” are reasonable input metrics that measure the delivery of tools to accomplish
border security goals. There is also precedent for them in other CTR programs implemented by
other U.S. agencies such as the Department of Energy Materials Protection, Control and
Accounting Program (numbers of facilities/buildings/sites with rapid/comprehensive security
upgrades) and the Second Line of Defense Program (number of border crossings equipped with
radiation detection capabilities). However, these are numerator statistics. To put them in
perspective, denominators should also be provided. For example, comparing these numbers to
the total number of miles of border and the total number of ports of entry that were determined to
be within the scope of the program. The committee recognizes that selecting the denominators
OCR for page 45
IMPROVEMENTS TO CTR METRICS 45
raises additional questions, such as “how was the scope of the program determined?” For
example, most programs will not seek to upgrade 100% of all the possible borders or ports of
entry, but will instead focus on providing a “model” for some subset identified through a
prioritization or a risk assessment process, leaving the partner country with the task of extending
the model as appropriate to the remainder of the border. The committee recommends that this
step be made explicit and visible in the metrics because it ties closely to the question of overall
goals and objectives of the program. Although disagreements and controversies may arise,
evaluation is best served by making those explicit and explaining the planning assumptions on
which the program is founded, even though full consensus may be lacking.
Biological Engagement to Enhance Security
The CBEP metrics in the DoD Metrics Report represent many important aspects of the
program and yet still have room for improvement. In meetings with the committee, DoD
specifically asked the committee to provide advice on how to simplify the set of metrics for
CBEP. Table 3-4 provides the committee’s tally of the CBEP metrics (the DoD Metrics Report
TABLE 3-4 Assessment of the Cooperative Biological Engagement Program Metrics
Partner Partner Capability
Domestic Partner
Total
Stability Outcome
Process Output
(Environment) Input
(Conditional)a
(Existence)
1b
1. Secure &
1 (Consolidate 2
Consolidate
EDP)
1b
2. Improve bio
11 c
security & safety 8 2 (Consolidate
standards EDP)
3. Detect,
diagnose, & 8 17 25
report
4. Sustain 2
1 1 4
capabilities (Budget)
1
5. Engage
(Copies of
scientific & 2 3 6
EDP stains
technical people
sent to U.S.)
1
6. Eliminate BW
(Eliminate 1
technologies
BW)
4c 49 c
Total 0 2 20 23
a c
Output is conditional on the event occurring. Includes one repeated metric.
b
Same measure. EDP = especially dangerous pathogens
OCR for page 46
46 IMPROVING METRICS FOR THE DOD CTR PROGRAM
calls the CBEP metrics indicators; see Appendix B) using the measurement concepts presented
in this report. Three features should be noted. First, three important direct outcome metrics (one
is duplicated) were identified. Second, one important input metric (the partner country budget for
the program) was identified. These metrics are useful leading indicators of sustainability. Third,
the specification of the minimum performance (threshold, in capabilities based planning
parlance) and aspirational goal (objective) for each measure appear to match those used in the
DoD acquisition process (see Box 3-3).
However, there are several areas for simplification and improvement in the development
of the CBEP metrics.
• The presented structure in the DoD Metrics Report that suggests moving from
program objective to capability is too cumbersome because the capabilities being
developed through a CBEP program may support multiple objectives. This causes
metrics to be
Box 3-3 Acquisition and Metrics
DoD has conducted the CTR Program as an acquisition program: DoD contracts with companies to
implement CTR plans. Some of these plans are for elimination of nuclear warhead delivery vehicles; some are
for acquiring capabilities (facilities, technology, and training). In some ways, this is not so different from other
acquisition programs in DoD. For example, it is easy to see the parallel between DoD contracting with a
company to tear down a building on a military base and DoD contracting with a company to destroy missiles or
silos. But even the capacity building programs have parallels in other acquisitions. When DoD acquires a
weapons system, the department is not just buying a weapon. A weapon on its own has little value. What DoD
really acquires is a capability, comprising the equipment and the ability to operate and maintain the equipment.
DoD has a well developed process for acquisition, which includes evaluations of whether the
department has not only taken possession of the equipment, but also acquired the capability to operate and
maintain the equipment. In DoD acquisition parlance, there are two major milestones beyond the baseline for
such an acquisition: initial operational capability (IOC) and full operational capability (FOC). The acquisition
process is described in the Defense Acquisition Guidebook (DoD, 2011) and the following concise definitions of
IOC and FOC can be found in the Glossary of Defense Acquisition Acronyms & Terms (DoD, 2005).
Initial Operational Capability (IOC)
In general, attained when some units and/or organizations in the force structure scheduled to
receive a system 1) have received it, and 2) have the ability to employ and maintain it. The
specifics for any particular system IOC are defined in that system’s Capability Development
Document (CDD) and Capability Production Document (CPD).
Full Operational Capability (FOC)
In general, attained when all units and/or organizations in the force structure scheduled to
receive a system 1) have received it, and 2) have the ability to employ and maintain it. The
specifics for any particular system FOC are defined in that system’s Capability Development
Document and Capability Production Document.
The IOC and FOC are different for each context and attaining the IOC and FOC for a complex
acquisition typically requires that minimum performance be met on a variety of metrics. In some cases, that
performance is established by exercises.
OCR for page 47
IMPROVEMENTS TO CTR METRICS 47
repeated.30 A better structure would be to use two steps. In the first step, DoD should
identify the capabilities that are required to achieve the program objectives, i.e., what
capabilities for capacity building. In the second step, goals are set for each capability
and metrics toward achievement of the goals are identified.
• The CBEP has no benefit-cost metrics to assess the benefit or effectiveness the
United States achieves for the resources invested in the program.
• There are no metrics of the environment. For example, perhaps it would be useful to
assess the political, economic and social environments of a country in which CTR
programs are being implemented. Such assessments may be beneficial in terms of
understanding the potential for sustainability challenges. Another environmental
metric could indicate how cooperative the partner country is.
• The metrics use 20 proxy metrics of the processes required for biological safety and
security. This is a large number of proxy metrics compared to the three direct,
outcome metrics. It would appear that two constructed, process metrics (safety and
security) would be sufficient to encompass the information in the 20 proxy metrics.
• All of the 23 output metrics are conditional on the emergence of some disease
outbreak, involving dangerous pathogens occurring and the partner country executing
the correct procedure according to its plans. Again, these metrics provide the
numerator but we do not know the denominator.
The committee’s overall assessment is that the CBEP metrics could be reasonably
assessed with about 11 metrics instead of the 49 metrics listed in the DoD Metrics Report. The
11 metrics would be: 2 input (budgets), 3 process (safety, security, and technical engagement),
and 2 output (safety and security), and the 4 outcome measures.
Why are these numbers of metrics important? The total number is important because if
there are too many, it is difficult to understand the net result. Prioritization helps with this
problem, making it possible to recognize the bottom line. Indeed, as noted above, DoD officials
asked the committee to help managers identify the “bottom line.” The lower number after
prioritization is manageable and better reflects what matters.
Independent Evaluation
Capacity building programs need independent evaluation of how the
capabilities being built perform in action. This can be accomplished by
several means, ranging from independent expert observations of routine
operations to comprehensive exercises that test the full scope of capabilities.
The level of effort can be tailored to the scope of the program, its resources,
and its relative importance. DoD and its partners should build such
independent evaluation into each project. The Defense Security Cooperation
Program might be a good model for how to proceed.
Measuring progress in building capacity and effectiveness of programs to prevent low-
frequency, high-consequence events is difficult. Assessment against standards and guidelines
(e.g., is the partner’s action plan for interdicted nuclear material consistent with the International
30
See CBEP Objective 2: measures 1.1 and 2.1 are repeats of CBEP Objective 1: measures 1.1 and 2.1.
OCR for page 48
48 IMPROVING METRICS FOR THE DOD CTR PROGRAM
Atomic Energy Agency model action plan?) is one important component, but DoD and Congress
care more about likely performance when the event occurs (i.e., how effectively can the partner
implement the action plan?). Current thinking on evaluation suggests that the involvement of
independent evaluations from the beginning and for all aspects of the evaluation, including data
collection, is ideal. Independent evaluation establishes a degree of credibility that is hard to
achieve by other means. Especially for capacity-building programs, some kind of independent
evaluation is essential. Exercises are a good way to measure effectiveness and sustainment.
Exercises can help measure both capability and performance in such programs. While exercises
might be structured differently in different countries and for different projects, they should be
built into the implementation and evaluation components of capacity-building projects and
programs from the beginning, starting with a baseline evaluation and proceeding with midcourse
and final or sustainment evaluations.
The kind of evaluation employed needs to be tailored to the scope of the program, its
resources, and its relative importance. The evaluation might take the form of periodic expert
observations of the project operations or the partner country’s capacities. It might be an
impromptu test of a randomly selected part of the system (e.g., a border protection system), or it
might be an exercise of the system, such as was performed for CBEP in Georgia. Ideally, the
exercises would be designed by an entity independent from the groups being tested. Fully
independent evaluation is not feasible for some of the programs, but recognition of why
independent evaluation is desired (to avoid unintentional and intentional bias and to establish
credibility) can help guide implementation of evaluations where truly independent evaluation is
not done.
Care must be taken with any metrics, but especially exercise-based metrics, to avoid
perverse incentives and gaming of the system. These programs are at least as much about human
resources and relationships as about facilities and equipment. Because communication is a
critical element of success, evaluation requires more than just the exercise or observation; it
requires discussion afterward between credible parties (in expertise, authority, and attitude) on
both sides.
Box 3-4 An Example of Exercises as Part of the Metrics for CTR
DoD used an exercise-based of means to test the capability of technologies, systems, and support procured
by the DoD for the IOC assessment of the disease surveillance system in countries of the former Soviet Union
(FSU), such as Georgia, Kazakhstan and Uzbekistan supported by what was then the Biological Threat Reduction
Program under CTR. The IOC assessment included exercises in the recognition of clinical disease in humans and
animals in the field, the collection of samples, the transport of samples to the appropriate laboratories, diagnosis the
disease, reporting of results nationally and internationally as appropriate and destruction or storage of the agents in
a secure repository. Prior to the IOC assessment, table top and field exercises were used as nongraded training
exercises to focus on gap analysis for individuals and the overall system. Although the DoD acquisition process
establishes many of the requirements for the implementation of CTR programs, the DoD Metrics Report does not
make clear the relationship between the current metrics and the DoD acquisition process. For example, it is not
clear whether DoD will use the IOC and the Full Operational Capability (FOC) as the sets of metrics to determine
CBEP’s success. The DoD metrics used for the IOC assessments in the FSU countries appear to have been adopted
for CBEP in the DoD Metrics Report. However, the initial application of the current metrics for CBEP yielded a
different assessment from the IOC for one of the countries. If essentially the same metrics are used for the IOC and
the evaluations based on the DoD Metrics Report, then one would expect that the results would be similar, absent
any significant deviation in the performance of the host country.
If DoD continues to use the acquisition process for CTR programs, attainment of the IOC and FOC will
serve as the main metrics of success or failure, so it would make sense to integrate the metrics used to evaluate
attainment of the IOC and FOC with the formal metrics of the program. This would avoid competing assessments.
OCR for page 49
IMPROVEMENTS TO CTR METRICS 49
At the same time, there is some question whether acquisition is the right framework for some kinds of capacity
building, such as bioengagement. Biological threat reduction began in the FSU where there had been an active
biological weapons program, where the infrastructure, regulations, and training were similar, and the improvements
by DoD in infrastructure and training were essentially the same. But the new mission and locations (Africa, Asia
and South America) are different. While the overall objective may be the same, how DoD engages and how DoD
supports these countries to meet the overall objective will likely be different. If relationships, trust, and a culture of
responsibility are some of the outcomes CBEP seeks, is the acquisition model the best way to cultivate and measure
those capacities? This question is worth DoD’s consideration as it implements the program and its metrics.
Building in Exercises From the Beginning
For exercises to be successful, DoD needs an agreement with the host country at the beginning of the
program to determine how exercises will be used. Exercises can be used as a training tool or a formal measure of
the success of a program. In most cases, exercises will probably be used for both training purposes as well as to
measure the success of the program from both the host country and the U.S. funding perspective. It is important that
the parties agree upon the informal and formal exercises, how and who will conduct the exercises, and how the
outcomes will be used.
Exercises are about Learning
Informal exercises are most effective when they are used for training purposes without the stigma of a
grade attached, and when they are about learning, gap recognition, and means of improving the system. Exercises
provide an opportunity to discover performance gaps between individual components or elements of the system
through system integration. These exercises offer a new means of determining gaps in individual or program
capabilities for the host-country. When exercise observations are made by an outside party, even if it is the funding
party, the exercise should be conducted with consideration for the local culture.
Formal exercises will likely be associated with milestones for programmatic funding and host country
cooperation to meet the overall goals of the program. As the cooperative programs move toward meeting
international regulations or guidelines, third party (i.e., independent) participation in the formal exercises becomes
more important.
Preparation for Exercises
On the whole, the IOC assessment effort can be both costly and valuable. DoD implemented a formal plan
to measure attainment of the IOC and the FOC within each country for the CBEP. While the overall requirements
among the FSU countries were the same, the exact specifications and testing for each country differed based on the
government-to-government agreements in the areas of disease surveillance, cooperative research, threat
infrastructure elimination and engagement of selected host-country scientists. The FSU countries engaged are good
examples of cooperation in meeting the overall goals of the biological CTR program, and they illustrate how each
country poses unique challenges in meeting the formal metrics of the IOC.
As noted above, a number of informal table-top and field exercises were conducted in preparation for the
formal IOC exercise. These table-top and field exercises were used as training events. The training exercises were
conducted in a nonthreatening manner, so that the host country participants could gain experience and confidence
in the use of the new infrastructure (e.g., biological safety level-2 laboratories and secure repositories for especially
dangerous pathogens), equipment, diagnostic assays and standard operating procedures that were instituted through
DTRA funding. The trainers consisted of DoD sponsored in-country prime contractor personnel and subject-matter
experts from a number of government agencies that came to the FSU periodically. Preparation for the IOC took
approximately one year, once the in-country disease surveillance system was sufficiently complete to be tested.
Because DoD had funded all aspects of the infrastructure, equipment, diagnostic assays, and standard procedures,
the IOC was truly a test of the complete disease surveillance system.a
Each organization that provided infrastructure, training, reagents, supplies, and procedures was involved in
training of the local nationals so that they could meet the objectives of the IOC. Each organization was also
involved in preparation of table-top exercises, and field exercises provided additional training after each event to
rectify gaps. Metrics were determined for each portion of the disease surveillance program, based on standard
operating procedures. The procedures were based on DTRA’s requirements for biosafety, security, disease
recognition and diagnosis, and international and host government regulations, guidelines, and policies. To train and
OCR for page 50
50 IMPROVING METRICS FOR THE DOD CTR PROGRAM
conduct informal and formal exercises with U.S. personnel, all written materials related to biosafety, biosecurity,
training materials, equipment manuals, and procedures of the disease surveillance system were translated into
English and the host country’s language. Because DoD had a strong influence on the decisions regarding
infrastructure, supplies, reagents, training, and procedures, it was easy to implement a systems approach for the
disease surveillance program. DTRA’s approach was to work closely with FSU partners essentially to clone U.S.
procedures (i.e., to attain IOC). Those procedures are, of course, most familiar to DoD and if a partner country asks
to be trained on the U.S. procedures then using them makes sense; but other procedures might meet the disease
identification and reporting goals. It seems that developing metrics that assess outcome and that do not specify a
particular process (i.e., a more flexible approach) would work better in countries with different preexisting
infrastructure and operations.
The resources required by DoD, the prime integrating contractor, and the supporting USG agencies were
significant to meet the IOC. The IOC assessment was not only a test of the disease surveillance system but of
DTRA, the prime integrating contractor and the U.S. government agencies involved in the training of local
nationals. All groups worked to prepare the FSU countries to pass.
Exercises and Nunn-Lugar Global Cooperationb
The IOC and FOC represent major and critical tests of major procurements by DoD and may not be
appropriate for countries with small engagement projects, but the concept of testing (exercises) should be similar, if
the question is how well a country is safely and securely responding to especially dangerous pathogens.
Exercises under an expanded CTR Program could be similar in concept to what was previously tested in
the FSU, but it is likely that DTRA will only be funding discrete parts of the disease surveillance system or
supplementing existing programs. This will require harder to develop exercises because the procedures, diagnostic
kits, safety and security infrastructure, and controls may be different than those funded by DoD. DoD’s approach
has been to work closely with FSU partners essentially to clone U.S. procedures (i.e., to attain the IOC as noted
above). If DoD is only supplementing the partner countries disease surveillance system, the exercises will need to
be more flexible in training/testing general concepts and procedures, rather than prescribed diagnostic assay,
procedures, and infrastructure.
DoD had significant influence in countries like Azerbaijan, Georgia, Kazakhstan, and Uzbekistan due to
the funding supplied for the host country’s infrastructure and systems for acquiring, testing, securing, and reporting
results of suspected especially dangerous microorganisms. As the economic situation has improved in some
countries, DoD’s influence has decreased. Where the economic situation has improved, some countries have
decided to move forward with their own initiatives without DoD’s funding, and replacing equipment that DoD was
using to standardize diagnostic platform across countries. As a result, while the overall objectives of the CTR-
implemented program disease surveillance system may remain the same, DoD will need to be more flexible in
determining how to measure the outcomes. In other countries, the political situation is such that DoD may have less
access or less timely access to facilities where DTRA has funded infrastructure, training, supplies, and procedures.
Compare, for example, the free access that the U.S. government enjoyed when establishing the program in Georgia
versus the efforts in Uzbekistan. Where it is more difficult to determine the overall success of the program due to a
lack of access to facilities that DoD may have built, supplied, and trained, the international regulations or guidelines
may still be achieved. DoD is also reexamining the infrastructure, equipment and supplies based on what can be
sustained, but it is not clear how metrics will be applied to those countries for which that goal is not currently
attainable as a result of economic unsustainability.
a
Host country salaries were not paid by DTRA.
b
The NRC Committee on Global Security Engagement has called this global cooperation CTR 2.0 (NRC, 2010).
OCR for page 51
IMPROVEMENTS TO CTR METRICS 51
Time and Change
DoD’s metrics and planning process should factor in more explicitly both
planned and unplanned change over time. During the phases of active DoD
involvement in a CTR project and afterward during sustainment, which is its
own stage requiring resources (budgets, equipment, and trained people),
clearer planning for how changes and metrics results will feed into decision
making will make the metrics more credible and useful for both DoD and the
partner country.
As mentioned in Chapter 2, the DoD Metrics Report deliberately does not consider future
missions or changes in objectives,31 and although it notes that programs may change and expand
to new geographic areas, some sections of the DoD Metrics Report make CTR appear to be a
static program. Given that the circumstances in Africa, South East Asia, India, and elsewhere are
very different from the contexts in which DoD has worked in the past, the metrics used will have
to adjust accordingly, particularly those regarding capacity building and other more-difficult-to-
measure objectives. Also, should the objectives (and therefore the metrics) remain unchanged if
the budget available doubles? If it divides in half? If the goal is to modernize a country’s
capabilities, how does DoD think about obsolescence of the equipment and techniques (the
capability) it provides? DoD says that it will adjust and adapt its metrics by revising the Report
annually, but without a vision for how change fits into the Program, it will be difficult for DoD
to adapt well.
CTR projects change through planned evolution as they progress through different phases
and timelines, scope, and even objectives change in unplanned ways as the environment and
resources change. Indeed, changes will likely be made in response to results as reported through
the metrics. Because of this iterative linkage, having more detail on the objectives development
process and on how the metrics will be used in that process would be helpful in defining useful
metrics going forward.
For example, consider the Central Reference Laboratory (CRL) planned for Ukraine.
DoD committed to paying for the design and construction of a new CRL facility for work on
especially dangerous human pathogens. In the last few years, DoD became concerned that the
facility would cost too much, both in capital costs and in maintenance and operation costs, which
would undermine the independent sustainability of the project. To follow a more sustainable
path, DoD proposed to the Ukrainian partners to use the interim CRL (an upgraded existing
facility) rather than create a new green-field site. Ukraine reluctantly agreed. This adaptation
reflected a better understanding of budget realities in both countries and experience building a
new CRL elsewhere. This change is the kind of adaptation and course correction DoD wants to
see in its programs. However, if DoD does not revise its metrics, the metrics will not reflect this
positive outcome: A metric for progress on construction of the new CRL building would show
no progress, but the capability might be in place more quickly and sustainably using the
alternative approach.
31
“The metrics described in this report are designed to measure appropriately the impact of each CTR program area,
as the CTR program is currently constituted for Fiscal Year 2010, and does not attempt to speculate on what metrics
might be appropriate for future areas of programmatic or geographic expansion. Neither do the metrics attempt to
determine whether the activities of the CTR program are the “right” activities. The metrics described in this report
are intended to best measure the effectiveness of the CTR Program in conducting these established programs. These
metrics are not intended to revise the method for establishing these objectives.” (DoD 2010).
OCR for page 52
52 IMPROVING METRICS FOR THE DOD CTR PROGRAM
There is a school of thought that says that metrics should not change over the course of a
project, even if other factors do. The committee notes that as social scientists have assisted the
USAID in trying to develop rigorous metrics for its programs, an approach has been
recommended that is, in essence, a set of experiments that attempt to establish causal links
between aid and outcomes. Communities are identified either as recipients of a particular kind of
aid project or as a control group. The projects must proceed with the same metrics and the same
aid established at the outset to ensure the integrity of the experiment. However, controlled
experiments are not the only way to measure outcomes and do not appear to be well suited to
CTR programs. The needs and objectives of each project may be unique to the partner country
and the environmental factors and objectives for the program change. It may be trivial to
establish causal links to some desired outcomes and simply impossible for others.
If sustainability is an important objective for the CTR Program, change needs a more
prominent role in DoD’s thinking, because it will be difficult to address sustainability without
considering time and change explicitly. In the DoD Metrics Report, sustainability is mentioned
as an objective for some programs, but the only factor considered in evaluating the sustainability
of a project (the only metric) is cost (i.e., can the partner country pay for the program for the few
years after the end of the CTR project). There is more to sustainability than cost and while
measuring sustainability may be difficult, it is not impossible.
Evolution of Metrics Over Program Lifecycle
There is a natural evolution of cooperative threat reduction programs that should in
principle be reflected in the metrics that measure impact and effectiveness. The metrics that are
most useful and informative in the early stages of a program, where defining the ultimate
program scope and priorities are a central focus, will generally differ from those that warrant
closest attention toward the end of a program’s life cycle, where issues of sustainment and close-
out come to the forefront. Some aspects of this evolution are already evident at least implicitly in
the metrics described in the DoD Metrics Report, for example, the increased emphasis on
sustainability and on partner country follow-through. The committee believes this concept of
evolution of metrics should be more explicitly considered in future iterations of CTR metrics so
that the progression of CTR activities through the successive phases would be more visible and
prominent.
In many fields of application, such a progression is described in terms of “maturity
models.” Examples include:
• Quality management. in his book Quality is Free, Crosby (1992), introduced a five-stage
quality maturity model, which described the phases of a quality improvement program
from inception to maturity, with the nature of the challenges evolving along the way.
Although some metrics apply across all stages, e.g., measures of quality, others come and
go as a program progresses through the five stages.
• Information technology, enterprise architecture, and software. Maturity models have
been developed for many facets of information technology and a vast literature exists that
could, with appropriate adaptations, be instructive for CTR programs. For example, for
building enterprise architectures (EA, somewhat analogous to building threat reduction
architectures) five stages have been defined: Stage 1: Creating EA Awareness; Stage 2:
Building the EA Management Foundation; Stage 3: Developing the EA; Stage 4:
OCR for page 53
IMPROVEMENTS TO CTR METRICS 53
Completing the EA; Stage 5: Leveraging the EA to Manage Change. Clearly, some
adaptation to the threat reduction context is necessary, but the basic stages—awareness,
planning and design, building, and so forth, have parallels. For an entry point to the
literature on enterprise architecture maturity models, see for example GAO (2003), and the
references therein. DoD and other agencies have their own agency-specific products in
this area as well, e.g., Department of Defense Architecture Framework.
Many maturity models have their roots, directly or indirectly, in concepts of process
control or quality control pioneered by Shewhart, Deming, and others in the first half of the
twentieth century which defined process and quality improvement in terms of an iterative
improvement cycle, often described as plan, do, check, act, or PDCA. Depending on the field of
application, the basic PDCA, stages may also be described as planning, implementation,
evaluation, specification, production, inspection, always followed by iterative improvement
cycles repeating the same basic steps.
The committee is not recommending a particular maturity model or vocabulary because
the concepts are most useful if adapted to the problem at hand. The key point is that for CTR
programs there is often a natural progression of stages from conception to completion, and
natural metrics for each stage. Although details may differ from program to program, a series of
questions often arise at each stage:
• What is the scope of the threat reduction challenge? Metrics can include the numbers and
types of objects (sites, facilities, weapons, materials, infrastructure, people) that
collectively make up the threat to be reduced.
• What tools are needed to manage or reduce the threat? Metrics include the delivery of
specific tools (equipment, training, procedures, personnel, etc.).
• Are the tools being used, and used properly? Tests, exercises and routine evaluations can
furnish metrics on the extent to which the tools are being used.
• Are the threat reduction capabilities, as implemented, effective? Having and using the
tools is not by itself sufficient to ensure effectiveness. Metrics that measure results rather
than merely outputs are needed at this stage.
• Is there a system in place to maintain and ensure effectiveness on an ongoing basis, and
initiate corrective actions if necessary? Many of the sustainability metrics that are now
being introduced are aimed at this stage.
At any given time, a program is likely to be focused mainly on one or two stages.
Nevertheless, the committee believes it is helpful to consider and present the larger context as
well. The overall scope of the threat reduction challenge and the portion that a given program is
attempting to tackle is important contextual information that helps keep the overall objectives in
focus over the overall program lifecycle even as more specific output- and activity-based metrics
are the central focus during a given phase of a program. Even if different metrics are applied at
various stages during the conduct of a project, it would be useful to have developed the various
metrics from the start, recognizing that they might have to change.
OCR for page 54
54 IMPROVING METRICS FOR THE DOD CTR PROGRAM
OTHER MAJOR ISSUES FOR CTR IN THE FUTURE
Scope of the CTR Mission
It is beyond the scope of this study to say what is in or outside of the CTR mission. Some
authorizers and appropriators questioned whether CBEP is in fact a defense mission.
Fundamentally, whether DoD should be the agency to carry out the CBEP mission is only
secondarily a metrics question. The primary question is whether the U.S. Government wishes to
prioritize the work done under CBEP and what mix of government agencies is best equipped to
carry it out. The increases in budget and scope to date, as well as the National Strategy for
Countering Biological Threats, indicate that CBEP is a growing priority to the Administration,
and DoD’s involvement reflects a conscious choice to use DoD because of its experience
establishing on-going cooperative medical and bioresearch ventures (e.g., NAMRU-3 in Cairo
and NAMRU-6 in Lima), conducting biodefense research, and working on threat reduction.
Critics may dispute the decision to support the mission, to give DoD responsibility for the
mission, or to give Defense Threat Reduction Agency responsibility for implementation, and
they can legitimately point out that difficulties in developing reliable direct metrics, for the
program’s impact and effectiveness raise the programmatic risks of the program, but mixing
these issues with questions about the metrics themselves confuses matters and makes it more
difficult to make progress in the program and in the debate about the program.
Recognizing Success/Completion
Finally, defining and measuring completion—how do we know when we are done?—and
sustainability—will the changes take hold and will the partner nation support and sustain the
programs when U.S. funding stops?—are critically important for the CTR programs, particularly
the capacity building programs. What these mean and how they should be implemented and
measured for a given program should be part of the formulation of objectives. There is a
mismatch in the vision of sustainability and measuring completion of the program among
different CTR decision makers in the Administration and on Capitol Hill. One vision might be
called a project view, in which DoD partners with a nation, engages in a set of concrete activities
with a well defined beginning and end, and then DoD exits and monitors sustainment after
project completion. The other main vision might be called a relationship view, in which DoD
partners with a nation, works with the partner to build a joint or multilateral network that is
exercised regularly to maintain an on-going relationship with no defined end date. These visions
appear mutually exclusive, but there are different phases to capacity building programs: the
initial phase may involve intensive efforts and capital expenditures. There should be schedules
and milestones for completion of this phase. The long-term relationship that follows may be
open ended, but it also should require far less funding, which should allay concerns about
programs with no exit strategy.