Está en la página 1de 18

Article

American Journal of Evaluation


2017, Vol. 38(1) 29-46
ª The Author(s) 2016
Implications of the Changing Reprints and permission:
sagepub.com/journalsPermissions.nav
Conversation About Causality DOI: 10.1177/1098214016644068
journals.sagepub.com/home/aje
for Evaluators

Emily Gates1 and Lisa Dyson2

Abstract
Making causal claims is central to evaluation practice because we want to know the effects of a
program, project, or policy. In the past decade, the conversation about establishing causal claims has
become prominent (and problematic). In response to this changing conversation about causality, we
argue that evaluators need to take up some new ways of thinking about and examining causal claims
in their practices, including (1) being responsive to the situation and intervention, (2) building rel-
evant and defensible causal arguments, (3) being literate in multiple ways of thinking about causality,
(4) being familiar with a range of causal designs and methods, (5) layering theories to explain causality
at multiple levels; and (6) justifying the causal approach taken to multiple audiences. Drawing on
recent literature, we discuss why and how evaluators can take up each of these ideas in practice. We
conclude with considerations for evaluator training and future research.

Keywords
causality, outcomes, impact evaluation, causal pluralism

Introduction
Evaluators routinely make causal claims about social and educational policies and programs. While
evaluators may not use language of cause and effect, they use a wide range of terms that refer to the
relationship between an intervention and the changes brought about by the intervention such as
outcomes, impacts, consequences, results, and differences. Regardless of the term(s) used, most
evaluators would agree that making causal claims is central to practicing evaluation.
Across a wide variety of settings from grassroots nonprofit organizations to university grant-
funded programs to national policy initiatives, evaluators are increasingly facing pressure to make
causal claims and to defend these claims. Recent international trends toward evidence-based policy
making and results-oriented management have led to a surge in commissioning evaluations that
investigate the causal relationships between interventions and their consequences. There are often

1
University of Illinois, Urbana–Champaign, IL, USA
2
The University of Auckland, Auckland, New Zealand

Corresponding Author:
Emily Gates, University of Illinois at Urbana-Champaign, 1310 South Sixth Street, Champaign, IL 61820, USA.
Email: emilygat@gmail.com
30 American Journal of Evaluation 38(1)

significant stakes in the use of these claims. Evaluation commissioners may use the findings to
inform decisions about which interventions are funded, modified, or discontinued. Evaluation
stakeholders—both those in favor of and those in opposition to an intervention—may use these
findings to bolster their arguments for or against the intervention. Those involved in and affected by
the intervention may look to these findings to provide evidence of whether and how they are (or are
not) making a difference in the lives of intended beneficiaries. And the wider public may take
interest in and draw conclusions about the extent to which these interventions and the government
agencies, foundations, and development organizations that fund them are serving the interests of
intended beneficiaries and achieving the social changes they claim to advance.
Despite the importance of causal claims, the ways in which evaluators should establish evidence
in support of these causal claims are heavily contested in evaluation. While answering causal
questions is a core mandate of the evaluation field and has been since its origins (Mayne, 2011;
Picciotto, 2012), how evaluators warrant causal claims has recently come under considerable scru-
tiny and debate (Cook, Scriven, Coryn, & Evergreen, 2010; Donaldson, Christie, & Mark, 2009;
Picciotto, 2013; Scriven, 2008; Stern, Andersen, & Hansen, 2013). The issue of causality in evalua-
tion is complicated by the lack of agreement in philosophy of science about the nature of causality
and broader disagreements in the social sciences about how causal claims ought to be warranted. For
example, the philosopher of science Nancy Cartwright (2007) points out, ‘‘nowadays causality is
back, and with a vengeance . . . methodologists and philosophers are suddenly in intense dispute
about what these kinds of claims can mean and how to test them’’ (p. 1). Warranting causal claims in
evaluation poses an ongoing challenge for the evaluation field and for evaluators who are being
routinely called on to substantiate such claims in the evaluations they conduct.
Our aim in this article is not to try to settle the definitional matter of the nature and meaning of
causality or to evaluate the merits of methodological and epistemological arguments for establishing
the strongest evidence on behalf of causal hypotheses. Rather, we take up the issue of warranting
causal claims as a practical problem that requires evaluators to utilize their professional judgment to
make decisions in particular circumstances. We consider warranting causal claims to be a practical
problem because the appropriate and feasible causal questions, designs, and methods may be unclear
and multiple options may be possible; yet, evaluators need to make decisions about how to warrant
causal claims in response to particular intervention(s), circumstances, and stakeholders (Schwandt,
2014). Schwandt (2014) argues that theoretical knowledge can serve as ‘‘an aid in thinking through
options in a situation that a practitioner faces’’ (p. 234). Since the body of theoretical knowledge
having to do with how evaluators, and social scientists more broadly, ought to warrant causal claims
is fast changing and under dispute, engaging in reasoned reflection to decide what to do can be
overwhelming, confusing, and quite challenging to evaluators unfamiliar with these debates. There-
fore, we provide an introductory overview of this changing conversation and identify six guidelines
for evaluators when warranting causal claims in particular circumstances. These guidelines offer
evaluators a heuristic for practice—an aid for thinking through possible options, making decisions,
and taking action in particular circumstances.

Changing Conversation About Causality


For much of the history of evaluation, the issue of causality has centered on the relative merits of
different methods to establish convincing evidence for causal hypotheses and a pervasive belief in a
hierarchy of evidence. For example, many proponents of randomized controlled trials (RCTs) have
long argued that this is the superior method for assessing causality—an argument that has received
considerable political support in educational evaluation in the United States (Biesta, 2007). How-
ever, hierarchies of causal evidence, including the privileged status given to experimental designs,
have been challenged for reasons that include technical issues, the high cost, practical problems and
Gates and Dyson 31

ethical dilemmas associated with random assignment procedures, and their methodological appro-
priateness (American Evaluation Association, 2003; Cook, 2007; Scriven, 2008; U.S. Government
Accountability Office, 2009). The recent conversation about causality has moved beyond the matter
of hierarchy of methods to other concerns including relevant causal questions, variety of causal
relationships and ways of thinking about causality, and the complexity of intervention
characteristics.
Several trends characterize recent conversations about causality. First, there has been an expansion
from attribution-oriented questions that aim to attribute outcomes to an intervention to contribution-
oriented questions that investigate the contribution an intervention is making to outcomes and wider
impacts (Stern et al., 2012, p. 38). Second, there has been growing attention on theorizing and
validating how causal processes and mechanisms work. For example, Pawson and Tilley (1997,
2004) have developed realist evaluation that focuses on building and verifying a theory about how
processes and mechanisms work in particular contexts to generate effects and changes. Third, in
evaluation and social science more broadly, there is growing acknowledgment that there are multiple
ways to think about causal relationships. Cartwright (2007) makes this point in Hunting Causes and
Using Them: ‘‘Causation is not one thing, as commonly assumed, but many. There is a huge variety of
causal relations, each with different characterizing features, different methods for discovery, and
different uses to which it can be put’’ (p. 2). Fourth, concepts and approaches from the complexity
sciences and systems fields are beginning to be applied in evaluation contexts, thus introducing ways
of examining and modeling nonlinear, multidirectional, nested, and layered causal relationships.
Finally, there has been a surge of interest in methods and designs for assessing causal relationships
other than true experiments. For example, in international development evaluation considerable atten-
tion is focused on rigorous nonexperimental designs and methods for impact evaluation (Leeuw &
Vaessen, 2009; Network of Networks on Impact Evaluation [NONIE], 2008; Piccioto, 2013; Rogers,
2009; Stern et al., 2012; Tsui, Hearn, & Young, 2014; White & Philips, 2012). Likewise structural
equation modeling and econometric methods such as difference-in-difference approaches have cap-
tured significant attention of many evaluators. Political scientists have drawn attention to qualitative
approaches for assessing causality in single cases (e.g., process tracing) and multiple cases (e.g.,
qualitative comparison analysis). Stakeholder-based and narrative approaches, such as most significant
change and success case method (SCM), are also attracting more attention in the evaluation field.
Additionally, approaches for modeling complex systems (e.g., causal loop diagrams, system
dynamics) have gained the interest of some evaluators. As Stern (2013) notes, ‘‘the repertoire that
evaluators can draw on to address cause and effect questions is fast expanding . . . Designs that were
until quite recently considered marginal and exploratory are fast becoming mainstream’’ (p. 3).
This changing conversation raises several important considerations, which we frame in the
following section as guidelines for evaluators when dealing with issues of causality.

Six Guidelines for Evaluators


In response to this changing conversation about causality, we argue that evaluators need to take up
some new ways of thinking about and examining causal claims in their practices. These include
(1) being responsive to the situation and intervention, (2) building relevant and defensible causal
arguments, (3) being literate in multiple ways of thinking about causality, (4) being familiar with a
range of causal designs and methods, (5) layering theories to explain causality at multiple levels, and
(6) justifying the causal approach taken to multiple audiences. Together, these guidelines offer a
heuristic that can guide evaluators working across a variety of settings in reflecting on and making
decisions about causality in particular evaluation circumstances (see Figure 1 for this heuristic).
While presented here linearly as steps that precede and follow one another, these implications are
better suited to an iterative process in which each implication informs and is informed by the others.
32 American Journal of Evaluation 38(1)

(1) Be responsive to the


intervention and situation
(6) Justify causal approach (2) Make relevant and
taken to multiple defensible causal
audiences arguments
Guidelines for Making
Causal Claims in
Evaluation Practice
(5) Layer theories to (3) Be literate in multiple
explain causality at ways of thinking about
multiple levels causality
(4) Be familiar with a
range of causal designs
and methods

Figure 1. Six guidelines for making causal claims in evaluation practice.

Be Responsive to the Intervention and Situation


There is increasing recognition in outcome and impact evaluation of the importance of being
situationally responsive (Patton, 2008) to the circumstances of the evaluation and the nature of the
evaluand (Julnes & Rog, 2007; NONIE 2008; Rogers, 2009; Stern et al., 2012). Situational respon-
siveness involves tailoring the particular evaluation design and methods to the needs, constraints,
context, and issues of a particular evaluation situation. In outcome and impact evaluations that
require making causal claims, decisions about the nature of the causal questions and the design and
methods best suited for addressing those questions should therefore be based on an assessment of
what is most appropriate for the situation. This assessment involves considering five key issues:
(1) intervention attributes; (2) evaluation purpose, audience, and questions; (3) evidence needed;
(4) cultural and ethical considerations; and (5) logistics and resources.
Evaluators may be familiar with some of these issues and less familiar with others, but all five are
important considerations in any evaluation design. Considering attributes of an intervention such as
size, scale, complexity, stage of implementation, stability, and variability can help inform appro-
priate tailoring of an evaluation. In evaluations requiring causal claims, it is also important to
consider the way in which change in the outcomes of interest is theorized to occur. For example,
an intervention that is anticipated to have fairly simple, uniform, short-term outcomes will have
different needs to one that has more complex, long-term, varied, or latent outcomes. The purpose and
audience of the evaluation are also important considerations as design decisions should be influ-
enced by why an evaluation is being conducted and what impacts the evaluation is anticipated to
have. This includes addressing questions regarding the intended users and intended uses and what
differences the evaluation findings are intended to make. Different causal questions are relevant
depending on the intended evaluation use(s) such as program improvement, informing potential
replication or scale-up, incremental program change, or determining program continuation.
An issue that has received less attention in the evaluation literature has to do with the evidence
needed. This involves considering the level of certainty that stakeholders need and an estimate of
what evidence would be required to meet that expectation (Davidson, 2000; Mayne, 2012b). ‘‘The
weight and quality of evidence required to infer causality varies dramatically depending on the
Gates and Dyson 33

Table 1. Issues and Questions to Consider Regarding the Evaluation Situation and Intervention.

Issues Questions

(1) Intervention  What characteristics describe the intervention (e.g., the size, scale,
attributes multifaceted, and dynamic)?
 How is the intervention theorized to work (e.g., multiple mechanisms and in
conjunction with other interventions)?
 What degree of complexity characterizes the relation between the intervention
and its context?
(2) Evaluation purpose,  What is the purpose of the evaluation (e.g., accountability, program
audience, and improvement, scaling up, and empowerment)?
questions  Who are the intended audiences for the evaluation?
 Which causal questions are central to these audiences?
 What kinds of decisions will be made based on the results?
(3) Evidence needed  What existing evidence about the outcomes and impact of the intervention is
already available?
 What evidence is credible and trustworthy to the intended audiences?
 What level of certainty and confidence in this evidence is needed?
(4) Cultural and ethical  How do intended audiences view the nature of change?
considerations  Are there any cultural differences in views on change?
 Are the views of the most disadvantaged addressed equally?
(5) Resources and  What are the evaluator(s) methodological capacities?
constraints  Which views on causality do the evaluator(s) assume?
 What financial and material resources are available?
 What is the time frame for the evaluation and what, if any, constraints does this
pose?

context in which the evaluation is being conducted’’ (Davidson, 2000, p. 24). The evidence needs to
fit the purpose and the context. Donaldson, Christie, and Mark (2009) discuss the necessity to assess
what information ‘‘stakeholders [will] perceive as trustworthy and relevant for answering their
questions’’ (p. 244). Different stakeholder groups might have different evidence needs for an
evaluation, requiring balancing and prioritizing their needs.
It is also important to identify cultural and ethical considerations. In any evaluation, these are
important, but issues that are specific to outcome evaluation include the ways in which different
stakeholders may view the nature of change or the intended changes from a program. Julnes and Rog
(2007) point out that these considerations are particularly sensitive in evaluations that involve
communities of indigenous people. Social justice concerns can also arise when considering whether
the views of the most disadvantaged are addressed equally and whether the evaluation will focus on
measuring average effects or effects on the most disadvantaged. ‘‘Social justice requires that those
least advantaged not be further disenfranchised by focusing only on the information needs of those
with the greatest resources’’ (Julnes & Rog, 2007, p. 137).
Of course, these issues must be weighed against logistical constraints and available resources.
Often practical and political constraints, such as timelines, lack of available evidence, and cost
constraints, must be balanced against the desired ideal or preferred outcome evaluation methods.
The process of considering these five key issues may involve helping stakeholders understand
that there are a variety of causal questions, a variety of ways to think about causality, and a variety of
ways to gather evidence and warrant causal claims. By considering these issues, evaluators can
begin to understand, describe, and discuss with stakeholders characteristics of the intervention and
situation, which will help to inform other considerations for making causal claims. Table 1 provides
questions that can help evaluators consider each issue. These questions should be considered before
or during the process of developing an evaluation design.
34 American Journal of Evaluation 38(1)

Build Relevant and Defensible Causal Arguments


Evaluators ought to consider what kind of causal argument is most appropriate for and relevant to the
intervention, situation, and evaluation audience(s). While the nature and quality of causal claims do
rest on the methodology used and evidence generated, these are not the only inputs into a causal
argument. Thinking about causality in terms of a causal argument means the rigor comes from
critical analysis of evidence and a logical, defensible, and credible story of how an intervention
produces particular effects that is convincing to particular audience(s) and in particular circum-
stances. This builds on House’s (1977) notion that evaluation is an argument, which has since been
advanced by other scholars, including Schwandt (2008):

Evaluators [need to] learn—and become capable of explaining to the public—that an evaluation is an
argument. My concern here is that in the press to master methods of generating data we ignore the idea of
developing a warranted argument—a clear chain of reasoning that connects the grounds, reasons, or
evidence to an evaluative conclusion. (p. 147)

Patton has similarly underscored the importance of shifting from methods to reasoning: ‘‘evaluation
as a field has become methodologically manic-obsessive. Too many of us, and those who commis-
sion us, think it’s all about methods. It’s not. It’s all about reasoning’’ (Patton, 2012, p. 105).
Four interrelated considerations are involved in building relevant and defensible causal argu-
ments: (1) the nature and character of the causal argument one wants to make; (2) the types, sources,
and probative force of evidence required; (3) the audiences for the argument; and (4) standards,
norms, or criteria for what constitutes a good causal argument. First, there are a variety of causal
questions and each sets the ground for a different kind of argument. For example, a causal question
about the outcomes and impacts of a particular program calls for describing the nature, extent, and
perceived value of intended and unintended outcomes. A causal question regarding the processes
and mechanisms that contribute to a program working in particular circumstances calls for an
argument about explaining a plausible theory of change, evidence that supports this theory occur-
ring, and ruling out rival explanations. Just as ‘‘care is needed to determine the relevant cause–effect
question in any specific context, and whether or not the question is reasonable’’ (Mayne, 2012b,
p. 1), evaluators ought to connect the questions with the nature and character of argument.
Second, evaluators ought to think about the types, sources, and probative force of evidence
needed to build an argument. A way to frame this is using empirical evidence to distinguish between
more and less plausible claims (Campbell, 1999) through a preponderance of evidence approach
(Scriven, 1976) using discretionary judgment (Patton, 2002). The preponderance of evidence approach
dates back to Scriven’s (1976) modus operandi method drawing on the notion of establishing
causality that is used in the professions and in everyday life, such as in diagnosing what is wrong
with a car, or in medical diagnosis, detective work, cause-of-death determination, and so forth.
Third, in order to synthesize evidence into a relevant and convincing argument, evaluators need
to consider their primary audiences. Being situationally responsive involves constructing an argu-
ment that will have a certain rhetorical appeal to particular audience(s).
Fourth, evaluators ought to consider criteria, standards, and norms for constructing a ‘‘good’’
causal argument. Cartwright and Hardie (2012) define a ‘‘good argument’’ as ‘‘one in which the
premises themselves are all well warranted—trustworthy—and together imply the conclusion, or at
least make it highly likely’’ (p. 53) and then provide practical examples of good arguments of policy
effectiveness. In social science research, Gerring (2005) has identified 14 criteria as applicable to all
causal arguments (e.g., specification, completeness, intelligibility, and relevance). In reference to
building arguments in evaluation more generally, Davidson (2005) highlights the importance of
‘‘intelligibility,’’ arguing for the need for clear and coherent causal reasoning since evaluators need
to communicate for understanding with the lay public. In relation to evaluative evidence, Julnes and
Gates and Dyson 35

Rog (2007) contend that evidence must not only be credible but also be ‘‘actionable,’’ that is,
adequate and appropriate for guiding actions in real-world contexts. The extent to which an evalua-
tion can provide answers to ‘‘why’’ and ‘‘why not’’ questions is one criterion for determining
actionable evidence. We believe actionable offers another criterion relevant to the quality of causal
arguments.

Be Familiar With Multiple Ways of Thinking About Causality


Evaluators ought to be familiar with and consider the relevance of different ways of thinking about
causality. Methodologies for assessing outcomes and impacts are based on underlying views of
establishing causality (Karlan, 2009). There are at least five ways of thinking about causality: (1) a
successionist framework that underlies regularity and counterfactual logics; (2) narrative stake-
holder accounts; (3) generative accounts of processes and mechanisms; (4) causal packages and
contributory accounts; and (5) nonlinear, multidirectional, and dynamical accounts of relations as
found in complex systems.

Successionist. For much of the history of evaluation (as well as the social sciences more broadly), a
successionist framework has been the dominant way of thinking about and assessing causality. A
successionist framework underlies two closely related logics of causality, regularity, and counter-
factual. A regularity view of causal relations is based on the simultaneous observation of two
separate events, X and Y, in which X occurs temporally prior to Y; there is a statistical relationship
(covariation) between X and Y; X is both necessary (X always present when Y is) and sufficient
(Y always present when X is); other plausible causes can be ruled out; and the relationship can be
found in a large number of cases. This way of thinking is the basis for statistical techniques including
survey research methodology and statistical analyses of data sets. Another and related way of
thinking about causality follows a counterfactual logic in which causal claims require making a
comparison between two highly similar situations to illuminate the ‘‘counterfactual,’’ an estimate of
what would have happened in the absence of the intervention (Mark & Henry, 2006). It is based on
the assumption that causality itself is not observable. Therefore, ‘‘we observe what did happen when
people received a treatment . . . [and use a control group to estimate] what would have happened to
those same people if they simultaneously had not received treatment’’ (Shadish, Cook, & Campbell,
2002, p. 5). Counterfactual logic is the basis for RCTs, quasi-experimental designs as well as some
statistical techniques including difference in differences.

Narrative. Another way of thinking about causality and one that nearly each of us uses routinely in
our day-to-day lives relies on a narrative accounting for how we think change happens (Abell, 2004).
Narrative explanation foregrounds the importance of human agency in causality by attending to
human perception, motivation, and behavior. This way of thinking about causality does not view
participants as passive recipients but rather as active ‘agents’ (Stern et al., 2012). Under this
assumption, participants have agency and can help cause successful outcomes by their own actions
and decisions. People who advocate a narrative view of causality reject treating causal agents as
variables and treating context as a confounding variable that should be controlled for. Instead they
treat context as an important factor in determining whether a program will work in a certain setting.
This view doesn’t aggregate outcomes across different people but recognizes that people have
different values and that program outcomes will be different for different clients. The narrative
view focuses on documenting individualized outcomes of individual clients rather than measures of
standardized outcomes. Narrative accounts underlie many participatory, story-centered approaches
including most significant change and SCM.
36 American Journal of Evaluation 38(1)

Generative. A generative way of thinking about causal relationships builds and verifies a theory-based
explanation of how causal processes happen by showing how mechanisms work within particular
contexts to generate outcome patterns. This way of thinking is oriented toward understanding how,
why, for whom, and under what conditions interventions work to produce specific results. This way of
thinking assumes there are multiple possible causal pathways linking an intervention to an outcome.
These alternative causal paths will be true for certain people under certain conditions. Drawing on an
analogy with gunpowder that will only fire in favorable conditions, Pawson and Tilley (1997) have
suggested that program causal mechanisms only fire within favorable contexts. Mechanisms are not
regarded as general laws that are always true; instead, their particular context is a part of the causal
process (Pawson & Tiley, 1997). A generative way of thinking underlies some theory-based
approaches to causality including realist evaluation and process tracing.

Causal package. Another way of thinking about causality is the idea of ‘‘causal packages’’—the
copresence of multiple causes each of which may or may not be necessary and/or sufficient to produce
an effect. This way of thinking supports examining the contributory role components of interventions
and combinations of multiple interventions play in producing outcomes and impacts. The idea here is
that many interventions do not act alone, and the desired outcomes are often the result of a combination
of causal factors, including other related interventions, events, and conditions external to the inter-
vention (Mayne, 2012a). This view highlights that it is not mono-causal conditions but combinations
of conditions that need to be examined (Sager & Andereggen, 2012). This way of thinking draws on
the logic of ‘‘necessary’’ and ‘‘sufficient’’ conditions to focus on causes that are neither necessary nor
sufficient on their own (Mayne, 2012a). The logic of this way of thinking involves identifying a
package of multiple causes that work together to produce an effect; describing each cause as necessary
but not sufficient within a causal package that is sufficient; and distinguishing ground preparing,
triggering, and sustaining contributory causes (Stern et al., 2012). After all, many programs are ‘‘less
often ‘magic bullets’ that trigger change in and of themselves, but mostly prepare the ground for long-
term change . . . . Knowing whether some conditions are required for a programme to work is impor-
tant in order to make it work’’ (Befani, 2013, p. 277). A causal package way of thinking is associated
with qualitative comparison analysis and contribution analysis.

Complex systems. Conceiving of the world comprising complex systems offers a way of thinking
about nonlinear, multidirectional, hierarchical, and dynamical causal relationships in a system or
situation of interest. In this way of thinking, the focus is on examining the multiple, interdependent
causal variables (also called factors) and nonlinear, cyclical feedback processes that affect the
structure and dynamical behavior of a system over time. Feedback loops—the means by which
systems reorganize—are ‘‘closed chains of causal connections’’ that balance or reinforce system
behavior through the dynamic of stocks and flows (Meadows, 2008, p. 188). Challenging the notion
that causal factors are stable and can be studied in isolation, this way of thinking requires studying
the interrelationships between factors as they are hypothesized, empirically found, and/or computer
simulated to affect change in a particular situation or system of interest. In this way, causal relation-
ships are context dependent; however, general patterns of systemic behavior may occur in and apply
to different contexts. This systemic way of thinking about causality often requires investigating
different levels of causality (e.g., individual motivation, an organizational policy, and economic
shifts) and how these levels interact to affect change (Forss, Marra, & Schwartz, 2011). This way of
thinking puts great emphasis on modeling causal relationships as their nonlinearity (i.e., effects are
not proportional to the size, quantity, or strength of the inputs) and emergent properties (i.e.,
characteristics and behaviors of a system cannot be reduced to or predicted based on its component
parts) often lead to unpredictable, surprising, and counterintuitive behaviors. The aim is not for a
single, bottom-level explanation, which is not possible in this way of thinking but rather for an
Gates and Dyson 37

ongoing investigation of how causal relationships and feedback loops interact to influence change
over time. This complex systems’ way of thinking is found in causal loop diagramming and system
dynamics.
Each of these five ways of thinking about causality—successionist, narrative, generative, causal
package, and complex systems—illustrates a different way of thinking about and investigating
causal relationships that evaluators may draw on in their practices. Table 2 summarizes each

Table 2. Ways of Thinking About Causality.

Causal view Logic of causal argument Question(s)

 Successionist/regularity:  Simultaneously observe two separate  What effects are


frequency of observation of events and (1) show cause temporally statistically significantly
simultaneous occurrence of prior to effect, (2) a statistical associated with this
independent, single cause and relationship (covariation) between intervention?
effect cause and effect, (3) cause is both
necessary (cause always present when
effect is) and sufficient (effect always
present when cause is), (4) rule out
other plausible causes, and (5)
demonstrate association in high
number of cases
 Successionist/counterfactual:  Show that effect follows from the  Does the intervention
compare two almost identical intervention through comparison to a work to produce
cases only differing in cause highly similar control group intended effects?
(the intervention)  Can we attribute effects
to the intervention?
 Narrative: Stakeholders’  Ask participants directly how an  According to
views on how an intervention intervention influenced their lives, stakeholders, what
has influenced/affected/made collect evidence verifying observance influence, effects, and/or
a difference in their lives of these outcomes, and rule out difference did the
alternative explanations intervention make for
their lives?
 Generative: Theory-based  Claim causation by identifying  What works, how, for
explanation of how causal mechanisms that connect two events, whom, and under what
process happens by showing empirically verifying theorized causal circumstances?
how mechanisms work to relations and rejecting alternative  How and why does the
generate outcome patterns explanations intervention work?
given contextual factors
 Causal package: copresence  Identify a package of multiple causes that  Is it likely that
of multiple causes which may work together to produce an effect intervention has made a
or may not be necessary and/  Describe the cause as necessary but difference?
or sufficient for an effect not sufficient within a causal package  How does the
that is sufficient intervention work in
 Distinguish ground preparing, triggering, combination with other
and sustaining contributory causes interventions or factors
to make a difference?
 Complex systems: Multiple,  Build a conceptual model, called a  How do multiple causal
interdependent causal factors causal loop diagram, of the causal factors and feedback
and nonlinear feedback relationships at work in a situation, processes affect change
processes affect the structure intervention, or system in this intervention or
and behavior of a system or  Verify this model with empirical situation?
situation over time evidence for each variable, mathematical  What’s working now
formulas, and computer simulation and how?
38 American Journal of Evaluation 38(1)

approach and offers examples of its use in particular methodologies and its relevance to particular
evaluation circumstances.
While these ways of thinking about causality are presented here as distinct, elements and assump-
tions of each are often mixed in methodological approaches and particular circumstances. For
example, one could take a narrative approach to understanding how participants understand the
impact of a program while also employing a type of ‘‘subjective counterfactual’’ (Abell, 2004) by
asking a participant to consider what would have occurred had they not participated in a given
program or taken a particular action. Or, as in the work of Byrne (2009), a generative way of thinking
about the processes and mechanisms that produce effects in particular circumstances is combined
with a complex systems’ account to examine causality at multiple levels and with a causal package
way of thinking to identify the necessary and sufficient causes in a particular case. These examples
suggest that the point of reflecting on different ways of thinking about causality is not to classify
one’s view but rather to carefully consider the assumptions used to investigate causal relationships in
a particular evaluation. This can then influence as well as be informed by the kind of causal
argument one wants to make and the questions and subsequent design and methods used in an
evaluation.

Be Familiar With a Range of Causal Designs and Methods


Evaluators can draw on a broad range of designs and methods to gather evidence to warrant causal
claims, and new methods are regularly emerging and gaining popularity. Due to limited space and
extensive discussion of these designs and methods elsewhere (see Befani, 2012; NONIE, 2008;
Stern et al., 2012; White & Phillips, 2012), we modestly provide evaluators with an initial, yet
limited, overview of the range of designs and methods available. Table 3 provides a summary of five
design approaches, select methodologies, the basis for warranting causal claims in each design
approach, and some considerations for when and why to use each design. Choosing a method is not
like selecting an item from a menu; there are many situational and contextual factors that will influence
which approach will be most appropriate for a particular evaluation. While we have simplified the
information in the table, it should be noted that some methods may draw on more than one way of
thinking about causality such as contribution analysis, which adopts assumptions of causal package
and generative causality.
We will elaborate on four nonexperimental design approaches that may be less familiar to
evaluators: theory-based, participatory, case-based, and systems-based approaches.

Theory based. The ‘‘theory’’ in these approaches is a set of assumptions about how an intervention
achieves its goals and under what conditions. In these approaches a theory of change (or causal
chain) follows ‘‘the pathway of a program from its initiation through various causal links in a chain
of implementation, until intended outcomes are reached’’ (Stern et al., 2012, p. 25). Theories of
change and related descriptions of causal links can focus on a sequence of decisions or actions, or
may also consider ‘‘causal mechanisms.’’ The concept of causal mechanisms assumes that it is
necessary to identify the ‘‘mechanism’’ that makes things happen in order to make plausible causal
claims. Theory-based evaluation also tries to understand the contextual circumstances under which
particular mechanisms operate. Merely having similar mechanisms in place will not assure similar
outcomes if the context is different or if various ‘‘helping’’ or ‘‘support’’ factors are absent. Theory-
based methodologies can range from ‘‘telling the causal story’’ about how and to what extent the
intervention has produced results to using the theory as an explicit benchmark to ‘‘formally test
causal assumptions’’ (Leeuw & Vaessen, 2009). The primary basis for causal inference in these
approaches is in-depth theoretical analysis to identify and/or confirm causal processes or ‘chains’
and the supporting factors (and possibly mechanisms) at work in context. Some authors caution that
Gates and Dyson 39

Table 3. Range of Causal Designs and Methodologies.

Design Examples of Basis for making causal


approaches methodologies claims When/why to use it?

Experimental Randomized control Comparison to a To generate precise information about


trial counterfactual whether a particular intervention
Natural experiments worked in a particular setting
When there is a discrete intervention
When a control group and large
samples are available and feasible
Quasi- Propensity score Comparison to a When precise information about the
experimental matching counterfactual intervention is needed as with
Judgmental matching experimental approaches, but
Regression there is no random control group
discontinuity When we want to know the effects of
Interrupted time series particular variables in a large sample
When a large sample is available
Theory-based Realist evaluation Analysis of causal When there is a strong theory of change
approaches Process tracing processes or When it’s important to understand
Contribution analysis mechanisms in how context affects an
Impact pathways context intervention
analysis When it’s important to understand how
and for whom an intervention
works
Participatory Success case method Validation by To capture multiple, experiential
approaches Most significant change participants that their understandings of change and
Outcome mapping actions and possibly identify unintended
experienced effects consequences
are ‘‘caused’’ by the For internal needs of an organization
intervention (e.g., program improvement)
Feasible, timely, and affordable
When the sample size is small to
medium
Case-based Within case: analytic Analysis of causal To identify causal factors within or
approaches induction, network processes within a across multiple cases when known
analysis, and case effect(s) have been identified
process tracing Presence of causal
Across case: qualitative factors across
comparison case multiple cases
analysis
Systems-based Causal loop Build a conceptual model, Examine multiple, interdependent
approaches diagramming called a causal loop causal factors and nonlinear
System dynamics diagram, of the causal feedback processes that affect the
relationships at work structure and behavior of a
in a situation, situation or system over time
intervention, or To understand a system’s dynamical
system behavior over time
Verify this model with To identify unintended, nonlinear and
empirical evidence for emergent effects
each variable,
mathematical
formulas, and
computer simulation
40 American Journal of Evaluation 38(1)

these approaches are not effective at estimating the quantity or extent of the causal contribution of an
intervention (Stern et al., 2012) and that a causal contribution may just be assumed if there is
evidence of the expected causal chain (NONIE, 2008).
Realist evaluation is an example of a theory-based approach. Developed by Pawson and Tilley
(1997), realist evaluation develops an explanation of how an intervention brings about effects (i.e.,
mechanisms), the features and conditions that influence the activation of these mechanisms (i.e.,
context), and the intended and unintended consequences resulting from activation of different
mechanisms in different contexts (i.e., outcome patterns). By identifying and building empirical
support for a context–mechanism–outcome configuration, realist evaluators provide information
about how, why, under what circumstances, and for whom interventions work. Realist evaluation
is often used in public health (e.g., Evans & Killoran, 2010; Marchal, van Belle, van Olmen, Hoeree, &
Kegels, 2012).

Participatory. It is important to make a distinction between participatory approaches as a design for


impact evaluation and the more common use of the term ‘‘participatory evaluation,’’ which refers to
any approach actively involving program staff or participants in evaluation activities. Here, the term
specifically refers to participatory approaches that are used to warrant causal claims. The primary
basis of causal inference in these approaches is validation by program participants that their actions
and reported experiences are ‘‘caused’’ by a program, with an emphasis on meaningful change in the
lives of participants. These approaches systematically investigate program impacts for different
stakeholders and often involve stakeholders in providing or gathering evidence, often in the form
of participants’ stories. They usually focus on program participant behavior change as outcomes.
These approaches can uncover impacts valued by different stakeholders and unintended conse-
quences of the program. Some authors caution against merely asking participants if they believe
an intervention has produced certain impacts, arguing that stakeholders might try to manipulate
information to serve specific interests regarding the continuation of the intervention (NONIE, 2008;
Rogers, 2009). Other authors acknowledge that these approaches are accepted by particular stake-
holders, while other stakeholders do not believe they are robust enough to make causal claims or that
these approaches offer a sufficient degree of certainty, so decision makers can make reasonable
judgments but perhaps not sufficient certainty for standard scientific precision (Coryn, Schröter, &
Hanssen, 2009; Davidson, 2005; Duignan, 2009). These authors, along with others (Leeuw &
Vaessen, 2009; NONIE, 2008; Scriven, 2005), argue that the careful elimination of alternative
possible causes would be necessary to make causal claims, as would independent confirmation or
triangulation of these claims.
An example of a methodology that uses this approach is SCM (Brinkerhoff, 2003), which was
developed as a rapid and relatively simple process that combines analysis of successful outliers with
storytelling, survey methods, and qualitative case study methods (Brinkerhoff, 2003). In SCM,
evaluators identify ‘‘successful’’ participants and interview them about the ways they have applied
new knowledge and skills acquired from a program and the changes that have occurred for them as a
result (Brinkerhoff, 2003). Brinkerhoff (2005) notes that once evaluators identify ‘‘success cases,’’
they often look for corroborating evidence and documentation to be sure that the success story is
defensible—that it would stand up in court.

Case based. Case-based approaches to causal analysis are those that focus on examining causal
relationships within a particular case or across multiple cases. There are different philosophical
traditions and methodologies within case-based approaches, which are described by Stern et al.
(2012) as interpretive (e.g., naturalistic, grounded theory, ethnography) and structured (e.g., con-
figurations, qualitative comparison analysis, within-case analysis, and simulations and network
analysis; p. 24). Case-based approaches can also be distinguished according to approaches that
Gates and Dyson 41

focus on single cases (i.e., within case) and those that compare multiple cases (i.e., across case or
comparative case).
Process tracing, an example of a within-case approach, involves developing a theory of how
causal processes and mechanisms lead to effects, outcomes, or impacts in a particular intervention or
case; collecting evidence that these causal processes and mechanisms, in fact, took place; identifying
alternative explanations for what led to these effects; and collecting evidence that these alternative
explanations did not take place and/or are not responsible for producing the effects (Bennett, 2010).
Qualitative comparative analysis (QCA; Ragin, 2000) is an analytical tool comparing different
combinations of conditions and outcomes in multiple cases. QCA foregrounds context and empha-
sizes the importance of ‘‘constellations of causes’’ (Sager & Andereggen, 2012, p. 63) instead of
mono-causal explanations. It draws conclusions about which conditions are necessary parts of a
‘‘causal recipe’’ to bring about a given outcome (Ragin, 2000). QCA attempts to compare the
different combinations of conditions and outcomes of each case, with the goal of discovering what
configurations of conditions lead to what outcomes, and which of those conditions are key in
producing certain outcomes (White & Phillips, 2012). Analysis begins by identifying a number of
relevant cases, typically between 15 and 25, for which a specified outcome has or has not occurred.
The researcher assembles a table of the various combinations of conditions and outcomes in the
cases being examined and uses Boolean algebra to compare the combinations of conditions (Ragin
& Amoroso, 2011). The last step involves analyzing and interpreting the causal recipes with con-
sideration of existing theory. In evaluation, a QCA analysis can be useful in determining where to
invest future resources by determining what conditions are most likely to lead to certain outcomes
and has most often been used in evaluations of policy issues.

Systems based. Systems-based approaches ‘‘use a wide range of methods and methodologies devel-
oped over the past 50 years within the systems field’’ (NONIE, 2008, p. 28). Systems approaches
focus on understanding and often modeling interrelationships between aspects or influences on a
situation; examining different perspectives, values, and worldviews in relation to a situation or
intervention; and reflecting on and critiquing different boundaries drawn around a situation, inter-
vention, or evaluation. According to NONIE (2008), systems-based approaches to examining causal
relationships are useful for supporting the development of causal models that address nonlinear and
nonsimple causality and, in some approaches, paying attention to who is included in the evaluation
and how decisions are made (p. 28).
One example of a systems-based approach to examining causal relationships is system dynamics
modeling. Originally developed by Jay Forrester in the 1950s, ‘‘system dynamics is an approach for
thinking about and simulating situations and organizations of all kinds and sizes by visualizing how
the elements fit together, interact, and change over time’’ (Morecroft, 2010, p. 25). These models,
especially when computer simulated, offer real-time feedback and provide the capacity to make
decisions and take actions that are not feasible or ethical in the everyday world, such as modeling
interventions across multiple scenarios, manipulating time and conditions, stopping action to allow
for reflection, and pushing a system to extreme conditions to see what happens (Sterman, 2006).
Despite the potential of system dynamics modeling for understanding the effects of policies and
programs, there are few examples using this methodology in evaluation (see Fredericks, Deegan, &
Carman, 2008; Homer & Hirsch, 2006 for a discussion in public health).

Layer Theories to Explain Causality at Multiple Levels


For much of the history of evaluation practice, evaluators have used theories to explain causal
relationships at the level of analysis of the intervention. For example, the most widespread approach
to theory building is the logic model or logical framework approach that traces the relationships
42 American Journal of Evaluation 38(1)

between inputs, activities, outputs, outcomes, and longer term impacts. This theory is usually
developed through a combination of reviewing the literature, examining the design of the interven-
tion, and talking with key stakeholders about how they conceptualize the workings of the interven-
tion. However, amid the changing conversation about causality, some scholars contend that these
linear, intervention-centered theoretical explanations do not adequately capture the multiple levels
of change influencing and influenced by programs and policies (Barnes, Matka, & Sullivan, 2003;
Callaghan, 2008). In particular, they contend that programmatic theories do not adequately describe
or explain microlevel processes of power, negotiation, and contested interpretations of how change
happens, or more macrolevel organizational, institutional, and sociological processes that constrain
and shape the workings of interventions. They argue for evaluations that draw on theories from
across disciplines to explain change and/or causal relationships at multiple levels of analysis. This
requires evaluators to be familiar with different theories that explain causality at different levels of
analysis (e.g., psychological, social psychological, organizational, institutional, and sociological), to
know how to layer these theories in particular evaluations, and how these layered theories can
inform evaluation design, data collection, and data analysis.
Recent evaluation work that draws on theory from the complexity sciences offers examples of
which theories evaluators might draw on and how evaluators layer these theories in a particular
evaluation. Barnes, Matka, and Sullivan (2003), in their evaluation of a health action zone (HAZ),
draw on complexity theory and new institutionalist perspectives to adequately understand and
explain change in what they describe to be a multifaceted intervention:

The HAZ seeks to promote change of individuals (e.g. changing lifestyles to improve health status);
populations (e.g. reduced incidence of heart disease amongst people living in a particular area); com-
munities (e.g. increased social cohesion); services (e.g. health services that are more responsive to the
needs and circumstances of service users); and systems (the processes through which different agencies
work together, determining shared aims, delivering services and establishing appropriate systems of
governance). (p. 266)

Similarly, Westhorp (2012) contends that substantive theories that draw on complexity theories and
concepts (e.g., contingent causation, emergence of system properties) can help to understand change
processes in complex adaptive systems, particularly which aspects of context to attend to and which
interactions matter for generating outcomes, which can help shape questions and guide evaluation
design (p. 411). Sanderson (2000) advances Bhaskar’s social naturalism as a guiding framework for
conceptualizing change. These scholars’ work not only points to the need for evaluations that layer
theories to explain causality at multiple levels but also suggests that there are some substantive
theories (e.g., complex adaptive systems, social naturalism) that may be relevant for understanding
change processes across different evaluation settings and circumstances.

Justify the Causal Approach Taken to Multiple Audiences


In the face of the predominance of experimental designs and counterfactual logics and given the
multitude of ways of thinking about and methodological approaches to causality, evaluators ought to
be prepared to justify the causal approach taken to multiple audiences. Evaluation commissioners,
key stakeholders, and the wider public are likely unfamiliar with the range of ways of thinking about
causality and relevant designs and methods for warranting causal claims. Further, most evaluations
are conducted within climates in which regularity and counterfactual logics predominate and experi-
mental designs and statistical techniques are the primary ways of assessing and making causal
claims. Asking evaluation stakeholders what they think is needed (or if the approach taken is
appropriate) in these climates often is not adequate (Davidson, 2000). Instead, it is within the role
and responsibility of evaluators to educate and justify the causal approach taken to multiple
Gates and Dyson 43

audiences including evaluation commissioners, stakeholders, and the wider public. For example,
while commissioning impact evaluations using nonexperimental designs is becoming more accepted
by some funding agencies, there remains a demand that evaluators justify the necessity and feasi-
bility of alternative designs. Justifying approaches requires communicating its necessity and value
for the circumstances at hand as well as limitations. Multiple audiences include the evaluation
commissioners and stakeholders but also the wider public and media. In the context of evidence-
based and results-oriented management, nonexperimental designs that cannot neatly attribute out-
comes to programs may be less clear-cut and conclusive making communicating results to the media
more difficult.

Conclusion
The issue of how evaluators warrant causal claims will continue to be a central issue in the field and
practice of evaluation. As philosophical definitions of causality shift and new methodological
approaches emerge, evaluators will and ought to continue to reassess what they mean by causality
in their practices and what methodological approaches they use to investigate causal relationships
and warrant causal claims. However, given the routine practice of making causal claims in evalua-
tion and the growing demand for evaluators to defend how they make these claims, this article
reviewed the recent conversation about causality to identify six guidelines for evaluators when
warranting causal claims in particular circumstances. In addition to guiding evaluation practice,
this heuristic poses implications for teaching and training evaluators. The issue of making causal
claims is often covered within the confines of methodological courses with little consideration and
examination of the multitude of ways of thinking about causality and the other practical issues
relevant to making causal claims in evaluation circumstances. This heuristic of six guidelines can be
used to guide reflection, discussion, and debate about how evaluators engage and ought to engage in
making causal claims in different practical circumstances. Future research is needed to empirically
explore how these guidelines can be used by evaluators in particular circumstances and what, if any,
guidance they offer. Additionally, further discussion within the evaluation community is needed
regarding how evaluators construct relevant and defensible causal arguments and how evaluators
can justify the causal approach taken to multiple audiences.

Acknowledgments
We thank Thomas Schwandt for his guidance and thoughtful editorial comments and three anonymous
reviewers for their feedback on an earlier version of this article.

Declaration of Conflicting Interests


The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or pub-
lication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

References
Abell, P. (2004). Narrative explanation: An alternative to variable-centered explanation? Annual Review of
Sociology, 30, 287 310.
American Evaluation Association. (2003). American evaluation association response to U.S. Department of
Education notice of proposed priority. Federal Register, RIN 1890-ZA00, November 4, 2003. Retrieved
from http://www.eval.org/p/cm/ld/fid=95
44 American Journal of Evaluation 38(1)

Barnes, M., Matka, E., & Sullivan, H. (2003). Evidence, understanding and complexity: Evaluation in
non-linear systems. Evaluation, 9, 265 284.
Befani, B. (2012). Models of causality and causal inference (Report of a study commissioned by the
Department for International Development, Working paper 38). Retrieved from http://www.dfid.gov.uk/
Documents/publications1/design-method-impact-eval.pdf
Befani, B. (2013). Between complexity and generalization: Addressing evaluation challenges with QCA.
Evaluation, 19, 269 283.
Bennett, A. (2010). Process tracing and causal inference. In H. E. Brady & D. Collier (Eds.), Rethinking
social inquiry: Diverse tools, shared standards (2nd ed., pp. 207 220). Lanham, MD: Rowman &
Littlefield.
Biesta, G. (2007). Why ‘‘what works’’ won’t work: Evidence-based practice and the democratic deficit in
educational research. Educational Theory, 57, 1 22.
Brinkerhoff, R. O. (2003). The success case method: Find out quickly what’s working and what’s not. San
Francisco, CA: Berret-Koehler.
Brinkerhoff, R. O. (2005). Success case method. In S. Mathison (Ed.), Encyclopedia of evaluation (pp.
402 403). Thousand Oaks, CA: Sage
Byrne, D. (2009). Complex realists and configurational approaches to cases: A radical synthesis. In D.
Byrne & C. C. Ragin (Eds.), The SAGE handbook of case-based methods (pp. 101 111). Thousand
Oaks, CA: Sage.
Callaghan, G. (2008). Evaluation and negotiated order: Developing the application of complexity theory.
Evaluation, 14, 399 411.
Campbell, D. T. (1999). On the rhetorical use of experiments. In D. T. Campbell & M. Jean Russo (Eds.), Social
experimentation. (pp. 149–158). Thousand Oaks, CA: Sage.
Cartwright, N. (2007). Hunting causes and using them. Cambridge, England: Cambridge University Press.
Cartwright, N., & Hardie, J. (2012). Evidence-based policy: Doing it better. A practical guide to predicting if a
policy will work for you. Oxford, England: Oxford University Press.
Cook, T. D. (2007). Describing what is special about the role of experiments in contemporary educational
research: Putting the ‘‘gold standard’’ rhetoric into perspective. Journal of Multidisciplinary Evaluation, 3,
1 7.
Cook, T. D., Scriven, M., Coryn, C. L., & Evergreen, S. D. (2010). Contemporary thinking about causation in
evaluation: A dialogue with Tom Cook and Michael Scriven. American Journal of Evaluation, 31, 105 117.
Coryn, C. L. S., Schröter, D. C., & Hanssen, C. E. (2009). Adding a time-series design element to the success
case method to improve methodological rigor: An application for nonprofit program evaluation. American
Journal of Evaluation, 30, 80 92.
Davidson, E. J. (2000). Ascertaining causality in theory-based evaluation. New Directions for Evaluation, 2000,
17 26.
Davidson, E. J. (2005). Evaluation methodology basics: The nuts and bolts of sound evaluation. Thousand
Oaks, CA: Sage.
Donaldson, S. I., Christie, C. A., & Mark, M. M. (Eds.). (2009). What counts as credible evidence in applied
research and evaluation practice? Thousand Oaks, CA: Sage.
Duignan, P. (2009). Seven possible impact/outcome evaluation design types. Outcomes Theory Knowledge
Base, article No 209. Retrieved from http://knol.google.com/k/paul-duignan-phd/seven-possible-outco-
meimpact-evaluation/2m7zd68aaz774/10
Evans, D., & Killoran, A. (2010). Tackling health inequalities through partnership working: Learning from a
realistic evaluation. Critical Public Health, 10, 125 140.
Forss, K., Marra, M., & Schwartz, R. (Eds.). (2011). Evaluating the complex: Attribution, contribution, and
beyond. New Brunswick, NJ: Transaction.
Fredericks, K. A., Deegan, M., & Carman, J. G. (2008). Using system dynamics as an evaluation tool:
Experience from a demonstration program. American Journal of Evaluation, 29, 251 267.
Gates and Dyson 45

Gerring, J. (2005). Causation: A unified framework for the social sciences. Journal of Theoretical Politics, 17,
163 198.
Homer, J. B., & Hirsch, G. B. (2006). System dynamics modeling for public health: Background and oppor-
tunities. American Journal of Public Health, 96, 452 458.
House, E. R. (1977). The logic of evaluative argument. In E. Baker (Ed.) CSE Monograph Series in Evaluation
(Vol. 7). Los Angeles, CA: UCLA Center for the Study of Evaluation.
Julnes, G., & Rog, D. J. (2007). Pragmatic support for policies on methodology. New Directions for Evaluation,
113, 129 147.
Karlan, D. (2009). Thoughts on randomized trials for evaluation of development: Presentation to the Cairo
evaluation clinic. Journal of Development Effectiveness, 1, 237 242.
Leeuw, F., & Vaessen, J. (2009). Impact evaluations and development: NONIE guidance on impact evaluation.
Washington, DC: World Bank.
Marchal, B., van Belle, S., van Olmen, J., Hoeree, T., & Kegels, G. (2012). Is realist evaluation keeping its
promise? A review of published empirical studies in the field of health systems research. Evaluation, 18,
192 212.
Mark, M., & Henry, G. T. (2006). Methods for policy-making and knowledge development evaluation. In I. Shaw,
J. Greene, & M. Mark (Eds.), The Sage handbook of evaluation (pp. 317 339). London, England: Sage.
Mayne, J. (2011). Contribution analysis: Addressing cause and effect. In K. Forss, M. Marra, & R. Schwartz (Eds.),
Evaluating the complex: Attribution, contribution, and beyond (pp. 53 95). New Brunswick, NJ: Transaction.
Mayne, J. (2012a). Contribution analysis: Coming of age? Evaluation, 18, 270 280.
Mayne, J. (2012b). Making causal claims (ILAC Brief No. 26). Rome, Italy: Institutional Learning and Change
(ILAC) Initiative. Retrieved from http://www.cgiar-ilac.org/files/publications/mayne_making_causal_
claims_ilac_brief_26.pdf
Meadows, D. H. (2008). Thinking in systems: A primer. White River Junction, VT: Chelsea Green.
Morecroft, J. (2010). System dynamics. In M. Reynolds & S. Holwell (Eds.), Systems approaches to managing
change: A practical guide (pp. 25 85). London, England: Springer.
Network of Networks on Impact Evaluation Subgroup 2. (2008). NONIE impact evaluation guidance. Retrieved
from http://www.worldbank.org/ieg/nonie/docs/NONIE_SG2.pdf
Patton, M. Q. (2002). Qualitative research and evaluation methods. Thousand Oaks, CA: Sage.
Patton, M. Q. (2008). Utilization focused evaluation (4th ed.). Thousand Oaks, CA: Sage.
Patton, M. Q. (2012). Contextual pragmatics of valuing. New Directions for Evaluation, 133, 97 108. doi:10.
1002/ev.20011
Pawson, R., & Tilley, N. (1997). Realistic evaluation. London, England: Sage.
Pawson, R., & Tilley, N. (2004). Realist evaluation. British Cabinet Office, 1 36. Retrieved from http://www.
communitymatters.com.au/RE_chapter.pdf
Picciotto, R. (2012). Experimentalism and development evaluation: Will the bubble burst? Evaluation, 18,
213 229.
Picciotto, R. (2013). The logic of development effectiveness: Is it time for the broader evaluation community to
take notice? Evaluation, 19, 155 170.
Ragin, C. C. (2000). Fuzzy set social science. Chicago, IL: University of Chicago Press.
Ragin, C. C., & Amoroso, L. M. (2011). Constructing social research (2nd ed.). Thousand Oaks, CA: Pine
Forge Press.
Rogers, P. (2009). Matching impact evaluation design to the nature of the intervention and the purpose of the
evaluation. Journal of Development Effectiveness, 1, 217 226.
Sager, F., & Andereggen, C. (2012). Dealing with complex causality in realist synthesis: The promise of
qualitative comparative analysis. American Journal of Evaluation, 33, 60 78.
Sanderson, I. (2000). Evaluation in complex policy systems. Evaluation, 6, 433 454.
Schwandt, T. A. (2008). Educating for intelligent belief in evaluation. American Journal of Evaluation, 29,
139 150.
46 American Journal of Evaluation 38(1)

Schwandt, T. A. (2014). On the mutually informing relationship between practice and theory in evaluation.
American Journal of Evaluation, 35, 231 236.
Scriven, M. (1976). Maximizing the power of causal investigation: The modus operandi method. In G. V. Glass
(Ed.), Evaluation studies annual review 1 (pp. 120 139). Beverly Hills, CA: Sage.
Scriven, M. (2005). Causation. In S. Mathison (Ed.), Encyclopedia of evaluation (pp. 44 48). Thousand Oaks,
CA: Sage.
Scriven, M. (2008). A summative evaluation of RCT methodology: & An alternative approach to causal
research. Journal of Multidisciplinary Evaluation, 5, 11 24.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for
generalized causal inference. Boston, MA: Houghton Mifflin.
Stern, E. (2013). Editorial. Evaluation, 19, 3 4.
Stern, E., Andersen, O. W., & Hansen, H. (2013). Editorial: Special issue: What can case studies do? Evalua-
tion, 19, 213 216.
Stern, E., Stame, N., Mayne, J., Forss, K., Davies, R., & Befani, B. (2012). Broadening the range of designs and
methods for impact evaluations (Report of a study commissioned by the Department for International
Development, Working paper 38). Retrieved from http://www.dfid.gov.uk/Documents/publications1/
design-method-impact-eval.pdf
Sterman, J. D. (2006). Learning from evidence in a complex world. American Journal of Public Health, 96,
505 514.
Tsui, J., Hearn, S., & Young, J. (2014). Monitoring and evaluation of policy influence and advocacy (Working
paper for the Overseas Development Institute (ODI)). Retrieved from http://www.odi.org.uk/sites/odi.org.
uk/files/odi-assets/publications-opinionfiles/8928.pdf
U.S. Government Accountability Office. (2009). Program evaluation: A variety of rigorous methods can help
identify effective interventions (GAO-10-30). Retrieved from http://www.gao.gov
Westhorp, G. (2012). Using complexity-consistent theory for evaluating complex systems. Evaluation, 18,
405 420.
White, H., & Phillips, D. (2012). Addressing attribution of cause and effect in small n impact evaluations:
Towards an integrated framework. New Delhi, India. Retrieved from www.3ieimpact.org

También podría gustarte