Está en la página 1de 47

Editor

Stavroula Kousta
Executive Editor, Neuroscience
Katja Brose
Journal Manager
Rolf van der Sanden
Journal Administrator
Myarca Bonsink
Advisory Editorial Board
R. Adolphs, Caltech, CA, USA
R. Baillargeon, U. Illinois, IL, USA
N. Chater, University College, London, UK
P. Dayan, University College London, UK
S. Dehaene, INSERM, France
D. Dennett, Tufts U., MA, USA
J. Driver, University College, London, UK
Y. Dudai, Weizmann Institute, Israel
A.K. Engel, Hamburg University, Germany
M. Farah, U. Pennsylvania, PA, USA
S. Fiske, Princeton U., NJ, USA
A.D. Friederici, MPI, Leipzig, Germany
O. Hikosaka, NIH, MD, USA
R. Jackendoff, Tufts U., MA, USA
P. Johnson-Laird, Princeton U., NJ, USA
N. Kanwisher, MIT, MA, USA
C. Koch, Caltech, CA, USA
M. Kutas, UCSD, CA, USA
N.K. Logothetis, MPI, Tbingen, Germany
J.L. McClelland, Stanford U., CA, USA
E.K. Miller, MIT, MA, USA
E. Phelps, New York U., NY, USA
R. Poldrack, U. Texas Austin, TX, USA
M.E. Raichle, Washington U., MO, USA
T.W. Robbins, U. Cambridge, UK
A. Wagner, Stanford U., CA, USA
V. Walsh, University College, London, UK
Editorial Enquiries
Trends in Cognitive Sciences
Cell Press
600 Technology Square
Cambridge, MA 02139, USA
Tel: +1 617 397 2817
Fax: +1 617 397 2810
E-mail: tics@elsevier.com
March 2011 Volume 15, Number 3 pp. 95140
Forthcoming articles
Implicit social cognition: from measures to mechanisms
Brian A. Nosek, Carlee Beth Hawkins and Rebecca S. Frazier
Thalamic pathways for active vision
Robert H. Wurtz, Kerry McAlonan, James Cavanaugh and Rebecca A. Berman
Posterior cingulate cortex: adapting behavior to a changing world
John M. Pearson, Sarah R. Heilbronner, David L. Barack, Benjamin Y. Hayden and Michael L. Platt
Visual Crowding: a fundamental limit on conscious perception and object recognition
David Whitney and Dennis M. Levi
Frontal Pole Cortex: encoding ends at the end of the endbrain
Satoshi Tsujimoto, Aldo Genovesio and Steven P. Wise
Update
Opinion
Book Review
95 How does the brain make economic
decisions? Review of: Foundations of
Neuroeconomic Analysis (by Paul W. Glimcher)
97 What drives the organization of object
knowledge in the brain?
104 Specifying the self for cognitive
neuroscience
Antonio Rangel
Bradford Z. Mahon and
Alfonso Caramazza
Kalina Christoff, Diego Cosmelli,
Dorothe Legrand and Evan Thompson
Review
113 Songs to syntax: the linguistics of birdsong
122 Representing multiple objects as an
ensemble enhances visual cognition
132 Cognitive neuroscience of self-regulation
failure
Robert C. Berwick, Kazuo Okanoya,
Gabriel J.L. Beckers and
Johan J. Bolhuis
George A. Alvarez
Todd F. Heatherton and
Dylan D. Wagner
Cover: Failing to control one's own behavior underlies several social and mental health problems. On pages 132139
Todd F. Heatherton and Dylan D. Wagner review a large body of recent psychological and neuroscientific research on
self-regulation failures, including addictive or hedonistic behavior, lack of emotional control, as well as stereotyping and
prejudicial behavior. The authors propose a model of self-regulation that accounts for relf-regulation failures in terms of a
loss of balance between prefrontal cortical regions that implement cognitive control and subcortical structures that drive
appetitive behaviors. Although facetious, the cover image (Brett Lamb/iStock Vectors/Getty Images) powerfully demonstrates
the detrimental effects of loss of control.
Book Review
How does the brain make economic decisions?
Foundations of Neuroeconomic Analysis by Paul W. Glimcher. Oxford University Press, 2010. $69.95/40.00 (488 pages)
ISBN 978-0-19r-r974425-1.
Antonio Rangel
Division of Humanities and Social Sciences & Computational and Neural Systems, Caltech, 1200 E. California Blvd, Pasadena,
CA, USA
For millennia the quest to understand
human nature and, in particular, why
we behave the way we do, was mostly
the domain of religion and philosophy.
Over the last two centuries, this quest
has become the domain of three scientic
disciplines: behavioral neuroscience, psy-
chology and economics. Although these
disciplines share a common goal, their
methodology and sensibilities are signi-
cantly different, which often leads to inconsistent and even
contradictory explanations of the same behavioral phe-
nomena. Consider, for example, the basic question of
why some individuals become addicted whereas others
do not. The most popular economic theory, called the
rational addiction model [1], assumes that individuals
become addicted as a result of maximizing a strong taste
for consuming drugs in the short-term that also increases
the desire to consume them in the future. By contrast,
current neurobiological theories of addiction are based on
the idea that consumption of drugs leads to a systematic
malfunction of the brains reward learning systems, which
induces addicted individuals to consume them even when
it is not optimal to do so [2,3].
Neuroeconomics is a relatively new eld that seeks to
reconcile these conicting theories of human behavior [4].
The goal of the eld is to combine methods and theories
from behavioral neuroscience, psychology, economics and
computer science to answer the following basic questions:
(i) What are the computations made by the brain to make
different types of decisions? (ii) How does the underlying
neurobiology implement and constrain those computa-
tions? (iii) What are the implications of this knowledge
for understanding behavior in economic, clinical, policy
and legal contexts? The ultimate goal of the eld is to
produce a computational and neurobiological account of
decision-making that can serve as a common foundation for
understanding human behavior across the natural and
social sciences. In this sense, neuroeconomics can be
thought of as the realization of the dream outlined by
E.O. Wilson in Consilience: The Unity of Knowledge [5].
In Foundations of Neuroeconomic Analysis, Paul Glim-
cher, one of the founders of the eld, outlines his vision for
this ambitious research agenda. The book accomplishes
several aims with remarkable effectiveness.
First, it makes the case for bringing all of the parent
elds together in a unied and interdisciplinary effort to
understand human behavior simultaneously at multiple
levels of analysis. Importantly, Glimcher argues that the
benets of this unholy marriage ow in all directions:
economists and psychologists will benet from grounding
their theories on the reality of how the brain actually
makes decisions, and neuroscientists will benet by being
forced to understand the brain at the computational level.
Glimcher forcefully argues that this effort will result in a
synthetic theory of human behavior that will generate new
critical insights for all of the parent disciplines.
Second, the book provides a brilliant introduction to
critical ideas in economics and psychology for neuroscien-
tists, and to critical ideas in behavioral and perceptual
neuroscience for economists and psychologists. For this
reason alone, anyone considering doing research in the
computational or neurobiological foundations of decision-
making, and anyone interested in why we act the way we
do (from lawyers to philosophers), should read this book.
Third, the book reviews some critical ndings in the eld
and argues that they already provide a glimpse of how a
unied model of decision-making might look. For example,
Glimcher argues that we have begun to understand how
the brain computes values, makes choices by comparing
those values, and learns those values through a process
known as reinforcement learning. He also argues that the
existing ndings, together with some basic neuroscience
ideas such as divisive normalization [6] (a principle
explaining how the cortex integrates competing inputs
to maximize encoded information while keeping neurons
within bounded ring ranges), provide a computational
and neurobiological implementation of economic concepts
such as prospect theory or random utility. Although the
ideas in this section of the book are controversial, they are
also extremely thought-provoking.
By necessity, this ambitious book also reects some of
the current shortcomings of this young eld. For example,
traditional economists are skeptical as to whether the eld
will provide transformative insights for their discipline [7],
and at this early stage it is hard to provide concrete
examples against this view. In addition, although I admire
Glimchers attempt to begin sketching a synthetic model of
choice, it can be argued that it might be too early to do so.
For example, at this stage in our understanding of the
brains decision-making circuitry, it is unclear how to
reconcile the standard neuroeconomic model proposed in
the book with evidence showing that behavior can be
Update
Corresponding author: Rangel, A. (rangel@hss.caltech.edu).
95
inuenced by at least three different behavioral controllers
(called the Pavlovian, habitual and goal-directed control-
lers) that are often at odds with each other [4], or that that
there might be multiple and competing value learning
systems.
These caveats notwithstanding, I was truly inspired by
this book. It is an impressive piece of scholarly work by
one of the worlds most prominent neuroeconomists. Al-
though I have been working in the eld for years, it has
changed the way I think about many of the open ques-
tions we study. The book will probably stir up debate
among the parent disciplines about the feasibility and
virtues of the neuroeconomics approach. It is beautifully
written, with a voice that is scholarly yet accessible at the
same time. It will be of interest not only to those working
in the eld, but also to a wide audience of readers. Finally,
I suspect that the thoughtfulness of its arguments and
the passion of its rhetoric will inspire a new generation of
researchers to stake their careers on the vision outlined
by its author. In fact, in many ways this book might do for
neuroeconomics what David Marrs Vision did for vision
science [8].
References
1 Becker, G. and Murphy, K. (1988) Atheory of rational addiction. J. Polit.
Econ. 96, 675
2 Redish, A.D. (2004) Addiction as a computational process gone awry.
Science 306, 19441947
3 Redish, A.D. et al. (2008) Addiction as vulnerabilities in the decision
process. Behav. Brain Sci. 31, 461487
4 Rangel, A. et al. (2008) A framework for studying the neurobiology of
value-based decision making. Nat. Rev. Neurosci. 9, 545556
5 Wilson, E.O. (1999) Consilience: The Unity of Knowledge, Vintage
6 Reynolds, J.H. and Heeger, D.J. (2009) The normalization model of
attention. Neuron 61, 168185
7 Bernheim, B.D. (2009) On the potential of neuroeconomics: a critical
(but hopeful) appraisal. Am. Econ. J. Microecon. 1, 141
8 Marr, D. (1982) Vision, W.H. Freeman and Co.
1364-6613/$ see front matter
doi:10.1016/j.tics.2010.12.006 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
Update
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
96
What drives the organization of object
knowledge in the brain?
Bradford Z. Mahon
1,2
and Alfonso Caramazza
3,4
1
Department of Brain and Cognitive Sciences, Meliora Hall, University of Rochester, Rochester, NY 14627, USA
2
Department of Neurosurgery, 601 Elmwood Ave, University of Rochester Medical Center, Rochester, NY 14642, USA
3
Department of Psychology, William James Hall, 33 Kirkland Street, Harvard University, Cambridge, MA 02138, USA
4
Center for Mind/Brain Sciences, University of Trento, Palazzo Fedrigotti, Corso Bettini 31, I-38068 Rovereto (TN), Italy
Various forms of category-specicity have been de-
scribed at both the cognitive and neural levels, inviting
the inference that different semantic domains are pro-
cessed by distinct, dedicated mechanisms. In this paper,
we argue for an extension of a domain-specic interpre-
tation to these phenomena that is based on network-
level analyses of functional coupling among brain
regions. On this view, domain-specicity in one region
of the brain emerges because of innate connectivity with
a network of regions that also process information about
that domain. Recent ndings are reviewed that converge
with this framework, and a new direction is outlined for
understanding the neural principles that shape the or-
ganization of conceptual knowledge.
Category-specicity as a means to study constraints on
brain organization
Brain-damaged patients with category-specic semantic
impairments have conceptual level impairments that are
specic to a category of items, such as animals, fruit/
vegetables, nonliving things or conspecics. Detailed anal-
ysis of those patients (Box 1) suggests that conceptual
knowledge is organized according to domain-specic con-
straints [1,2]. According to the domain-specic hypothesis
[2], there are innately dedicated neural circuits for the
efcient processing of a limited number of evolutionarily
motivated domains of knowledge. This interpretation of
the neuropsychological phenomenon of category-specic
semantic decits has been extended to interpret results
from functional magnetic resonance imaging (fMRI) in
healthy subjects [3,4]. Much of the research using fMRI
to study category-specicity has focused on the pattern of
responses in the ventral visual pathway, which projects
from early visual areas to lateral and ventral occipital
temporal regions, and processes object shape, texture and
color in ways that are relatively invariant to viewpoint,
size and orientation [57]. Different regions within the
ventral pathway preferentially respond to images of faces,
animals, tools, places, written words and body parts [4,6,8
13], see also [1315].
The existence of consistent topographic biases by se-
mantic category in the ventral stream raises fundamental
questions about the principles that determine brain orga-
nization [4,1012,16,17]. To date, the emphasis of research
on the organization of the ventral stream has been on the
stimulus properties that drive responses in a particular
brain region, studied in relative isolation from other
regions. This approach was inherited from well-estab-
lished traditions in neurophysiology and psychophysics
where it has been enormously productive for mapping
psychophysical continua in primary sensory systems. It
does not follow that the same approach will yield equally
useful insights for understanding the principles of the
neural organization of conceptual knowledge. The reason
is that unlike the peripheral sensory systems, the pattern
of neural responses in higher order areas is only partially
driven by the physical input it is also driven by how the
stimulus is interpreted, and that interpretation does not
occur in a single, isolated region. The ventral object proces-
sing stream is the central pathway for the extraction of
object identity from visual information in the primate
brain but what the brain does with that information
about object identity depends on how the ventral stream is
connected to the rest of the brain.
Here, we focus on visual object recognition, as this has
been the aspect of object knowledge and processing that
has been studied in greatest depth; however, similar prin-
ciples would be expected to apply to other modalities as
appropriate. We argue that there are innately determined
patterns of connectivity that mediate the integration of
information from the ventral stream with information
computed by other brain regions. Those channels are at
the grain of a limited number of evolutionarily relevant
domains of knowledge. We further suggest that what is
given innately is the connectivity, and that specialization
by semantic category in the ventral stream is driven by
that connectivity. The implication of this proposal is that
the organization of the ventral stream by category is
relatively invariant to visually based, bottom-up, con-
straints. This approach corrects an imbalance in explana-
tions of the causes of the consistent topography by
semantic category in the ventral object-processing stream
by giving greater prominence to endogenously determined
constraints on brain organization.
The distributed domain-specic hypothesis
A domain-specic neural system is a network of brain
regions [11] in which each region processes a different type
Opinion
Corresponding authors: Mahon, B.Z. (mahon@rcbi.rochester.edu); Caramazza, A.
(caram@wjh.harvard.edu).
1364-6613/$ see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.004 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3 97
of information about the same domain or category of objects
[2,18]. The types of information processed by different parts
of a network can be sensory, motor, affective or conceptual.
The range of potential domains or classes of items that can
have dedicated neural circuits is restricted to those with an
evolutionarily relevant history that could have biased the
systemtoward a coherent organization. Asecond important
characteristic of domain-specic systems is that the compu-
tations that must be performed over items from the domain
are sufciently eccentric [19] so as to merit a specialized
process. In other words, the coupling across different brain
regions that is necessary for successful processing of a given
domain is different in kind from the types of coupling that
are needed for other domains of knowledge.
For instance, the need to integrate motor-relevant in-
formation with visual information is present for tools and
Box 1. Cognitive neuropsychological evidence for domain-specific constraints
Patients with category-specific semantic deficits can be differentially
or even selectively impaired for knowledge of animals, plants,
conspecifics or artifacts (for review see [11]). The knowledge
impairment cannot be explained in terms of a differential impairment
to a sensory or motor-based modality of information. Although
discussion and debate continues as to whether non-categorical
dimensions of organization can lead to category-specific brain
organization, there is consensus that the phenomenon itself is
categorical (see Figure I for representative patients performance in
picture naming and answering semantic probe questions).
There are important parallels between the neuropsychological
literature on category-specific semantic deficits and the findings from
functional neuroimaging and neurophysiology. First, the categories that
emerge fromthe neuropsychological literature map onto the categories
that emerge in functional imaging and neurophysiology. This indicates
that the different methods and populations are tracking the same
underlying property of brain organization. Second, the resistance of
category-specific deficits to be explained by dimensions of organization
that do not include semantic category [2] parallels the same pattern that
has emerged in imaging and neurophysiology [60]. It is clearly the case
that the brain is organized by sensory and motor modalities, and it is
also the case that different sensory and motor modalities participate to
varying extents in the representation of items from different categories.
However, the existence of category-specificity in imaging [4], neuro-
physiology [67] and neuropsychology [11] cannot be explained
exclusively by appeal to modality-based principles of organization. This
suggests that the dimensions of brain organization that express
themselves as phenomena of category-specificity (across methods
and populations) are in fact domain-specific constraints on brain
organization. Finally, there is emerging neuropsychological evidence
for endogenous constraints on brain organization, including the
existence of category-specific semantic deficits tested at age 16 years
after stroke at 1 day of age [patient Adam, see below; ref 68].
There are also parallels between the patterns of category-specific
semantic deficits and psychophysical studies of putatively specialized
routes for processing specific classes of visual stimuli. For instance,
New and colleagues [69], using a change detection paradigm,
demonstrated a significant advantage for living animate stimuli.
Thorpe and colleagues [70] have demonstrated extremely rapid and
accurate detection of face and animal stimuli. Almeida and colleagues
[65] have demonstrated that conceptual information about manipulable
objects can be extracted from stimuli that are putatively not processed
by the ventral visual pathway. These and other findings could indicate
experimental ways of isolating domain-specific networks.
Living: visual/perceptual
Living: nonvisual
Nonliving: visual/perceptual
Nonliving: nonvisual
Living animate + nonliving`
Key:
Key:
Fruit/vegetable + nonliving
Fruit/vegetable
Living animate
Nonliving
Conspecifics
100
80
60
40
20
0
100
80
60
40
20
0
Picture naming performance by category
Semantic probe questions by category and modality
RC EW RS MD KS APA CW PL
EW GR FM DB RC ADAM
P
e
r
c
e
n
t

c
o
r
r
e
c
t
P
e
r
c
e
n
t

c
o
r
r
e
c
t
Category-specific semantic deficits
Patients
Patients
TRENDS in Cognitive Sciences
Figure I. Representative patients with category-specific semantic deficits. Patients with category-specific semantic deficits may have selective impairments for naming
items from one category of items compared to other categories (top panel). Those patients may also have categorical impairments for answering questions about all
types of object properties (i.e., visual/perceptual and functional/associative; bottom panel). For further discussion and references to the patients shown here, see [11].
Opinion
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
98
other graspable objects and less so for animals or faces. By
contrast, the need to integrate affective information, bio-
logical motion processing and visual form information is
strong for conspecics and animals, and less so for tools or
places. Thus, our proposal is that domain-specic con-
straints are expressed as patterns of connectivity among
regions of the ventral stream and other areas of the brain
that process nonvisual information about the same classes
of items. For instance, specialization for faces in the lateral
fusiform gyrus (fusiform face area [2022]) arises because
that region of the brain has connectivity with the amygdala
and the superior temporal sulcus (among other regions)
which are important for the extraction of socially relevant
information and biological motion. Specicity for tools and
manipulable objects in the medial fusiformgyrus is driven,
in part, by connectivity between that region and regions of
parietal cortex that subserve object manipulation [2326].
Connectivity-based constraints can also be responsible for
other effects of category-specicity in the ventral visual
stream, such as connectivity between somatomotor areas
and regions of the ventral stream that differentially re-
spond to body parts [2729] (extrastriate body area), con-
nectivity between left lateralized frontal language
processing regions and ventral stream areas specialized
for printed words (visual word form area [30,31]), and
connectivity between regions involved in spatial analysis
and ventral stream regions showing differential responses
to highly contextualized stimuli, such as houses, scenes
and large non-manipulable objects (parahippocampal
place area [32]).
The role of visual experience
According to the distributed domain-specic hypothesis,
the organization by category in the ventral stream is not
only a reection of the visual structure of the world, it also
reects the structure of how ventral visual cortex is con-
nected to other regions of the brain [11,23,33]. However,
visual experience and dimensions of visual similarity are
also crucial in shaping the organization of the ventral
stream [34,35] after all, the principal afferents to the
ventral stream come from earlier stages in the visual
hierarchy [36].
Although some authors have recently discussed nonvi-
sual dimensions that could be relevant in shaping the
organization of the ventral stream [4,6,7], many accounts
differentially weight the contribution of visual experience
in their explanation of the causes of category specic
organization within the ventral stream. Several hypothe-
ses have been developed, and we merely touch on them
here to illustrate a common assumption: that the organi-
zation of the ventral stream reects the visual structure of
the world, as interpreted by domain-general processing
constraints. Thus, the general thrust of those accounts is
that the visual structure of the world is correlated with
semantic category distinctions in a way that is captured by
howvisual information is organized in the brain. One of the
most explicit proposals is that there are weak eccentricity
preferences in higher order visual areas that are inherited
from earlier stages in the processing stream. Those eccen-
tricity biases interact with our experience of foveating
some classes of items (e.g. faces) and viewing others in
the relative periphery (e.g. houses) [37]. Another class of
proposals is based on the suppositions that items from the
same category tend to look more similar than items from
different categories, and similarity in visual shape is
mapped onto the ventral occipitaltemporal cortex [17].
It has also been proposed that a given category could
require differential processing relative to other categories,
for instance in terms of expertise [38], visual crowding [39]
or the relevance of visual information for categorization
[40]. Other accounts appeal to feature similarity and
distributed feature maps [41]. Finally, it has been sug-
gested that multiple, visually based, dimensions of orga-
nization combine super-additively to generate the
boundaries among category-preferring regions [12]. Com-
mon to all of these accounts is the assumption that visual
experience provides the necessary structure, and that a
visual dimension of organization happens to be highly
correlated with semantic category.
Although visual information is important in shaping
how the ventral stream is organized, recent ndings indi-
cate that visual experience is not necessary in order for the
same, or similar, patterns of category-specicity to be
present in the ventral stream. In an early position emission
tomography study, Buchel and colleagues [42] showed that
congenitally blind subjects show activation for words (pre-
sented in Braille) in the same region of the ventral stream
as sighted individuals (presented visually). Pietrini and
colleagues [43] used multi-voxel pattern analysis to show
that the pattern of activation over voxels in the ventral
stream was more consistent across different exemplars
within a category than exemplars across categories. More
recently, we [44] have shown that the same medial-to-
lateral bias in category preferences on the ventral surface
of the occipitaltemporal cortex that is present in sighted
individuals is present in congenitally blind subjects. Spe-
cically, nonliving things, compared to animals elicit stron-
ger activation in medial regions of the ventral stream
(Figure 1).
Although these studies on category-specicity in blind
individuals represent only a rst-pass analysis of the role
of visual experience in driving category-specicity in the
ventral stream, they indicate that visual experience is not
necessary in order for category-specicity to emerge in the
ventral stream. This fact raises an important question if
visual experience is not needed for the same topographical
biases in category-specicity to be present in the ventral
stream, then, what drives such organization? One possi-
bility, as we have suggested, is innate connectivity between
regions of the ventral streamand other regions of the brain
that process affective, motor and conceptual information.
Connectivity as an innate domain-specic constraint
A crucial component of the distributed domain-specic
hypothesis is the notion of connectivity. The most obvious
candidate to mediate such networks is white matter con-
nectivity. However, it is important to underline that
functional networks need not be restricted by the grain
of white matter connectivity and, perhaps more important-
ly, task- and state-dependent changes could bias proces-
sing toward different components of a broader anatomical
brain network. For instance, connectivity between lateral
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
99
and orbital prefrontal regions and the ventral occipital
temporal cortex [45,46] is crucial for categorization of
visual input. It remains an open question whether multiple
functional networks are subserved by this circuit, each
determined by the type of visual stimulus being catego-
rized. For instance, when categorizing manipulable
objects, connectivity between parietofrontal somatomotor
areas and prefrontal cortex could dominate, whereas when
categorizing faces other regions could express stronger
functional coupling to those same prefrontal regions. Such
a suggestion would generate the expectation that whereas
damaging prefrontal-to-ventral stream connections could
result in difculties categorizing all types of visual stimuli,
disruption of the afferents to the prefrontal cortex from a
specic category-preferring area could lead to categoriza-
tion problems selective to that domain. The neural basis of
the connectivity that supports domain-specic neural sys-
tems is, admittedly, in need of further development and
articulation. Below, we will return to expectations that can
be drawn from this explanation.
Evidence for innate constraints
The signature of innate structure is similarity across
individuals, both within a species and potentially across

Sighted:
auditory task
Congenitally blind:
auditory task
Sighted:
picture viewing
Sighted: picture viewing
Sighted: auditory task
Congenitally blind: auditory task
0
-0.5
-1
-1.5
-2
-2.5
Right ventral ROI
Tal. Coord. X Dim
1
0.5
0
-0.5
-1
-1.5
Right ventral ROI
Tal. Coord. X Dim
2
1
0
-1
-2
-3
-4
24 26 28 30 32 34 36 38 40
24 26 28 30 32 34 36 38 40
24 26 28 30 32 34 36 38 40
Right ventral ROI
Tal. Coord. X Dim
0
-0.5
-1
-1.5
-2
-2.5
Left ventral ROI
Tal. Coord. X Dim
t

V
a
l
u
e
s

(
L
i
v
i
n
g

-

N
o
n
l
i
v
i
n
g
)
1
0.5
0
-0.5
-1
-1.5
Left ventral ROI
Tal. Coord. X Dim
t

V
a
l
u
e
s

(
L
i
v
i
n
g

-

N
o
n
l
i
v
i
n
g
)
-40 -38 -36 -34 -32 -30 -28 -26 -24
-40 -38 -36 -34 -32 -30 -28 -26 -24
-40 -38 -36 -34 -32 -30 -28 -26 -24
2
1
0
-1
-2
-3
-4
Left ventral ROI
Tal. Coord. X Dim
t

V
a
l
u
e
s

(
L
i
v
i
n
g

-

N
o
n
l
i
v
i
n
g
)
Category-specic organization does not require visual experience
TRENDS in Cognitive Sciences
Figure 1. Congenitally blind and sighted participants were presented with auditorily spoken words of living things (animals) and nonliving things (tools, non-manipulable
objects) and were asked to make size judgments about the referents of the words. The sighted participants were also shown pictures corresponding to the same stimuli in a
separate scan. For sighted participants viewing pictures, the known finding was replicated that nonliving things such as tools and large non-manipulable objects lead to
differential neural responses in medial aspects of the ventral occipitaltemporal cortex. This pattern of differential BOLD responses for nonliving things in medial aspects of
the ventral occipitaltemporal cortex was also observed in congenitally blind participants and sighted participants performing the size judgment task over auditory stimuli.
These data indicate that the medial-to-lateral bias in the distribution of category-specific responses does not depend on visual experience. For details of the study, see [44].
Opinion
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
100
species. Innate does not imply present-from-birth, al-
though present-from-birth strongly suggests an innate
contribution. Maturation in the context of the right types
of experience could be necessary for the expression of
innate structure, and interactions between innate and
experiential factors can jointly constrain outcome [47].
This is particularly the case for mental processes, as there
would be nothing to process without the content provided
by experience. Several lines of evidence show that genetic
variables capture similarity in functional brain organiza-
tion as it relates to the presence of domain-specic neural
circuits.
Twin studies
Two recent reports highlight greater neural or functional
similarity between monozygotic twin pairs than between
dizygotic twin pairs (for discussion see [48,49]). The
strength of these studies is that experiential contributions
are held constant across the two types of twin pairs. In a
fMRI study, Polk and colleagues [50] studied the similarity
between twin pairs in the distribution of responses to faces,
houses, pseudowords and chairs in the ventral stream. The
authors found that face and place-related responses within
face and place selective regions, respectively, were signi-
cantly more similar for monozygotic than for dizygotic
twins. In another study, Wilmer and colleagues [51] stud-
ied the face recognition and memory abilities [52] in mono-
zygotic and dizygotic twin pairs. The authors found that
the correlation in performance on the face recognition task
for monozygotic twins was more than double that for
dizygotic twins. This difference was not present for control
tasks of verbal and visual memory, indicating selectivity in
the genetic contribution to behavioral abilities (see also
[53]).
Congenital prosopagnosia
Further evidence for a genetic contribution to face recog-
nition abilities comes from congenital prosopagnosia, a
developmental disorder in which individuals can have
selective impairments for recognizing faces [54]. A recent
study by Thomas and colleagues [55] found that congenital
prosopagnosia was associated with reduced structural in-
tegrity of the inferior longitudinal fasciculus, which pro-
jects from the fusiform gyrus to anterior regions of the
temporal lobe. Reduced structural integrity was also ob-
served for the inferior fronto-occipital fasciculus which
projects from the ventral occipitaltemporal cortex to fron-
tal regions. Such observations of reduced integrity of major
white matter tracts linking the posterior occipitaltempo-
ral cortex with other brain regions underlines the strength
of a network-level analysis in understanding the con-
straints that shape the organization of knowledge in the
ventral stream.
Non-human primates
An expectation on the view that innate constraints shape
category-specicity in the ventral stream is that such
specicity, at least for some categories, can also be found
in non-human primates. It is well known, using neuro-
physiological recordings, that preferences for natural ob-
ject stimuli exist in the inferior temporal (IT) cortex of
monkeys [35,56], comparable to observations with similar
methods in awake human subjects [15]. More recently,
functional imaging with macaques [57] and chimpanzees
[58] suggests that at least for the category of faces, compa-
rable clusters of face preferring voxels can be found in the
temporal cortex in monkeys, as are observed in humans.
Such common patterns of neural organization for some
classes of items in monkeys and humans could, of course,
be entirely driven by dimensions of visual similarity, which
are known to modulate responses in the IT cortex [59].
However, even when serious attempts have been made to
explain such responses in terms of dimensions of visual
similarity, taxonomic structure emerges over and above
the contribution of known visual dimensions. For instance,
Kriegeskorte and colleagues [60] used multi-voxel pattern
analysis to compare the similarity structure of a large
array of different body, face, animal, plant and artifact
stimuli in the monkey IT cortex and human occipital
temporal cortex. The similarity among the stimuli was
measured in terms of the similarity of the patterns of brain
responses they elicited, separately on the basis of the
neurophysiological data (monkeys) [56] and fMRI data
(humans). The similarity structure that emerged revealed
a tight taxonomic structure common to monkeys and
humans, and which could not be reduced to known dimen-
sions of visual similarity.
Next steps
Specialization of function in the brain is clearest at the
level of primary sensory and motor areas that have a
physical organization in the brain that projects topograph-
ically onto a psychophysical dimension such as retinotopy,
tonotopy or somatotopy. At the other end of the continuum,
there are aspects of human cognition that have eluded neat
parcellation in the brain, such as the neural instantiation
of the abstract and recursive systems that make human
thought and metacognition possible. Somewhere in the
middle are conceptual representations they interface
with and draw on the sensory and motor systems and at
the same time require the exibility characteristic of sym-
bolic representations [61]. We have outlined a framework
for understanding the causes of category-specic organiza-
tion in the brain that is based on the hypothesis that there
are innate patterns of connectivity that constrain the
distribution of category-specic neural regions. This pro-
posal fully embraces a hierarchical viewof the organization
of conceptual knowledge [3]: the organization of the ventral
stream reects the nal product of a complex tradeoff of
pressures, some of which are expressed locally within the
ventral stream and some of which are expressed as con-
nectivity to the rest of the brain. Our suggestion is that
connectivity to the rest of the brain is the rst, or broadest,
principle according to which the ventral streamcomes to be
organized by semantic category.
Although there is striking overlap in the semantic cate-
gories that can dissociate under conditions of brain damage
and which showconsistent topographic organization in the
ventral stream (Box 1), there is some divergence between
the lesion locations in patients with category-specic def-
icits and the patterns of neural activation observed with
fMRI. In particular, focal lesions to category-preferring
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
101
regions within the ventral stream do not invariably lead to
category-specic semantic decits. This suggests that what
is damaged in patients with category-specic semantic
decits are the broader neural circuits that are specialized
for the impaired domain of knowledge. Damage to multiple
regions within that domain-specic neural circuit could
lead to a category-specic decit by disrupting or disorga-
nizing the broader network. Furthermore, damage to
regions that serve to integrate processing across the whole
domain, such as the anterior temporal lobes [62,63] for the
domains of animals and conspecics, could particularly
disrupt functioning throughout the broader network.
Asecond direction for research that is encouraged by the
distributed domain-specic hypothesis is to characterize
the patterns of both anatomical and functional connectivity
within domain-specic neural circuits. The expectation is
that there will be a tight coupling between patterns of
connectivity and the locations of category-preferring
regions. In this regard, it is important to note that regions
expressing connectivity with category-specic regions
within the ventral streamare not necessarily downstream
from visual object recognition, and do not necessarily
represent more developed or more processed information
than what is computed in the ventral stream. Stimuli are
processed through multiple routes in parallel, such as
subcortical processing of emotional face stimuli [20,21]
and dorsal stream processing of manipulable objects
[64,65]. Thus, one exciting possibility is that fast but coarse
analysis of the visual input that bypasses the geniculate
striate pathway could cue or bias processing within the
ventral stream according to the content of the stimulus to
be processed [45], analogous to attentional modulation of
early visual responses.
A third way in which the distributed domain-specic
hypothesis can be tested is to explore the connectivity of all
the categories that show selective responses in the ventral
stream. For instance, an expectation that could be gener-
ated is that stimuli from different domains, such as hands
and tools, can live next to each other in the ventral stream
because both would be predicted to have connectivity to the
somatomotor cortex. In other words, the way in which
representations are organized in the ventral streamshould
follow patterns of connectivity, such that they are orga-
nized according to similarity metrics represented in other
parts of the brain, rather than (only) by dimensions of
visual similarity.
Perhaps the most pressing issue that must be addressed
by the distributed domain-specic hypothesis is whether
connectivity drives specialization by category, as we have
proposed, or whether specialization of function is present
independently of connectivity, and the connectivity emerges
later. One way to empirically address this is to test individ-
uals who are blind since birth. Sensory deprivation will
remove the inuence of local constraints, presumably
expressed over short-range bottom-up connections from
earlier visual regions, but would not be expected to funda-
mentally alter the longer range connections. Combining
detailed analysis of connectivity in such individuals with
analysis of the location of category-preferring regions in the
ventral stream could ground inferences about whether con-
nectivityinfact drives the locationof categorypreferences in
the ventral stream. In particular, the regions specialized for
printed words could offer a means to test this issue, as there
is no motivation for presuming specialization of function to
be innately present for printed words in the human brain.
Because there are regions that are consistently specialized
for printed words, the expectation would be that this spe-
cialization is driven by connectivity between the ventral
stream and regions of the brain involved in linguistic pro-
cessing. The prediction can be made that subject-by-subject
variation in the location of the visual word formarea (tested
with Braille) incongenitally blind individuals will match up
with subject-by-subject variation in connectivity between
that regionof the ventral streamand other language proces-
sing regions of the brain.
The core of our proposal, that specialization in a region
of the brain is driven, in part, by constraints on how that
information will ultimately be used in the service of be-
havior, is not new. It is well established that visual proces-
sing bifurcates into a dorsal stream for object-directed
action and spatial processing and a ventral stream for
the extraction of object identity [66]. The two visual system
model places important restrictions on plasticity of func-
tion within the visual system. Analogously, the distributed
domain-specic hypothesis places new limits on plasticity
of function within the ventral object processing stream,
and suggests that the key to describing those limits lies in
the patterns of connectivity between the ventral stream
and other category-specic brain regions.
References
1 Capitani, E. et al. (2003) What are the facts of category-specic decits?
A critical review of the clinical evidence. Cogn. Neuropsychol. 20, 213
261
2 Caramazza, A. and Shelton, J.R. (1998) Domain specic knowledge
systems in the brain: the animate-inanimate distinction. J. Cogn.
Neurosci. 10, 134
3 Caramazza, A. and Mahon, B.Z. (2003) The organization of conceptual
knowledge: the evidence from category-specic semantic decits.
Trends Cogn. Sci. 7, 354361
4 Martin, A. (2007) The representation of object concepts in the brain.
Annu. Rev. Psychol. 58, 2545
5 Miceli, G. et al. (2001) The dissociation of color from form and function
knowledge. Nat. Neurosci. 4, 662667
6 Grill-Spector, K. and Malach, R. (2004) The human visual cortex. Annu.
Rev. Neurosci. 27, 649677
7 Cant, J.S. et al. (2009) fMR-adaptation reveals separate processing
regions for the perception of form and texture in the human ventral
stream. Exp. Brain Res. 192, 391405
8 Allison, T. et al. (1994) Human extrastriate visual cortex and the
perception of faces, words, numbers, and colors. Cereb. Cortex 4,
544554
9 Chao, L.L. et al. (1999) Attribute-based neural substrates in posterior
temporal cortex for perceiving and knowing about objects. Nat.
Neurosci. 2, 913919
10 Kanwisher, N. (2000) Domain specicity in face perception. Nature 3,
759763
11 Mahon, B.Z. and Caramazza, A. (2009) Concepts and categories: a
cognitive neuropsychological perspective. Annu. Rev. Psychol. 60, 115
12 Op de Beeck, H.P. et al. (2008) Interpreting fMRI data: maps, modules
and dimensions. Nat. Rev. Neurosci. 9, 123135
13 Pitcher, D. et al. (2009) Triple dissociation of faces, bodies, and objects
in extrastriate cortex. Curr. Biol. 19, 319324
14 Bentin, S. et al. (1996) Electrophysiological studies of face perception in
humans. J. Cogn. Neurosci. 8, 551565
15 Kreiman, G. et al. (2000) Category-specic visual responses of
single neurons in the human medial temporal lobe. Nat. Neurosci.
3, 946953
Opinion
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
102
16 Cantlon, J.F. et al. (2011) Cortical representations of symbols, objects,
and faces are pruned back during early childhood. Cereb. Cortex 21,
191199
17 Haxby, J.V. et al. (2001) Distributed and overlapping representations
of faces and objects in ventral temporal cortex. Science 293, 24252430
18 Carey, S. and Spelke, E. (1994) Domain specic knowledge and
conceptual change. In Mapping the Mind: Domain Specicity in
Cognition and Culture (Hirschfeld, L. and Gelman, S.A., eds), pp.
169200, Cambridge University Press
19 Fodor, J. (1983) Modularity of Mind, MIT Press
20 Pasley, B.N. et al. (2004) Subcortical discrimination of unperceived
objects during binocular rivalry. Neuron 42, 163172
21 Vuilleumier, P. et al. (2004) Distant inuences of amygdala lesion on
visual cortical activation during emotional face processing. Nat.
Neurosci. 7, 12711278
22 Martin, A. and Weisberg, J. (2003) Neural foundations for
understanding social and mechanical concepts. Cogni. Neuropsychol.
20, 575587
23 Mahon, B.Z. et al. (2007) Action-related properties shape object
representations in the ventral stream. Neuron 55, 507520
24 Valyear, K.F. and Culham, J.C. (2010) Observing learned object-
specic functional grasps preferentially activates the ventral
stream. J. Cogn. Neurosci. 22, 970984
25 Noppeney, U. et al. (2006) Two distinct neural mechanisms for
category-selective responses. Cereb. Cortex 16, 437445
26 Rushworth, M.F.S. et al. (2006) Connection patterns distinguish 3
regions of human parietal cortex. Cereb. Cortex 16, 14181430
27 Astaev, S.V. et al. (2004) Extrastriate body area in human occipital
cortex responds to the performance of motor actions. Nat. Neurosci. 7,
542548
28 Orlov, T. et al. (2010) Topographic representation of the human body in
the occipitotemporal cortex. Neuron 68, 586600
29 Peelen, M.V. and Caramazza, A. (2010) What body parts reveal about
the organization of the brain. Neuron 68, 331333
30 Dehaene, S. et al. (2005) The neural code for written words: a proposal.
Trends Cogn. Sci. 9, 335341
31 Martin, A. (2006) Shades of Dejerine forging a causal link between the
visual word form area and reading. Neuron 50, 173190
32 Bar, M. and Aminoff, E. (2003) Cortical analysis of visual context.
Neuron 38, 347358
33 Riesenhuber, M. (2007) Appearance isnt everything: news on object
representation in cortex. Neuron 55, 341344
34 Op de Beeck, H.P. et al. (2006) Discrimination training alters object
representations in human extrastriate cortex. J. Neurosci. 26, 13025
13036
35 Tanaka, K. et al. (1991) Coding visual images of objects in the
inferotemporal cortex of the macaque monkey. J. Neurophysiol. 66,
170189
36 Felleman, D.J. and Van Essen, D.C. (1991) Distributed hierarchical
processing in primate visual cortex. Cereb. Cortex 1, 147
37 Levy, I. et al. (2001) Center-periphery organization of human object
areas. Nat. Neurosci. 4, 533539
38 Gauthier, I. et al. (1999) Activation of the middle fusiform face area
increases with expertise in recognizing novel objects. Nat. Neurosci. 2,
568573
39 Rogers, T.T. et al. (2005) Fusiformactivation to animals is driven by the
process, not the stimulus. J. Cogn. Neurosci. 17, 434445
40 Mechelli, Aet al. (2006) Semantic relevance explains category effects in
medial fusiform gyri. Neuroimage 3, 9921002
41 Tyler, L.K. et al. (2003) Do semantic categories activate distinct cortical
regions? Evidence for a distributed neural semantic system. Cogn.
Neuropsychol. 20, 541559
42 Buchel, C. et al. (1998) A multimodal language region in the ventral
visual pathway. Nature 394, 274277
43 Pietrini, P. et al. (2004) Beyond sensory images: object-based
representation in the human ventral pathway. Proc. Natl. Acad. Sci.
U.S.A. 101, 56585663
44 Mahon, B.Z. et al. (2009) Category-specic organization in the human
brain does not require visual experience. Neuron 63, 397405
45 Kveraga, K. et al. (2007) Magnocellular projections as the trigger of top-
down facilitation in recognition. J. Neurosci. 27, 1323213240
46 Miller, E.K. et al. (2003) Neural correlates of categories and concepts.
Curr. Opin. Neurobiol. 13, 198203
47 Lewontin, R. (2000) The Triple Helix: Genes, Organisms, and
Environment, Harvard University Press
48 Park, J. et al. (2009) Face processing: the interplay of nature and
nurture. Neuroscientist 15, 445449
49 Zhu, Q. et al. (2010) Heritability of the specic cognitive ability of face
perception. Curr. Biol. 20, 137142
50 Polk, T.A. et al. (2007) Nature versus nurture in ventral visual cortex: a
functional magnetic resonance imaging study of twins. J. Neurosci. 27,
1392113925
51 Wilmer, J. et al. (2010) Human face recognition ability is specic and
highly heritable. Proc. Natl. Acad. Sci. U.S.A. 107, 52385241
52 Duchaine, B. and Nakayama, K. (2006) The Cambridge Face Memory
Test: results for neurologically intact individuals and an investigation
of its validity using inverted face stimuli and prosopagnosic subjects.
Neuropsychologia 44, 576585
53 Zhu, Q. et al. (2010) Heritability of the specic cognitive ability of face
perception. Curr. Biol. 20, 16
54 Duchaine, B.C. et al. (2006) Prosopagnosia as an impairment to face
specic mechanisms: elimination of the alternative hypotheses in a
developmental case. Cogn. Neuropsychol. 23, 714747
55 Thomas, C. et al. (2009) Reduced structural connectivity in ventral
visual cortex in congenital prosopagnosia. Nat. Neurosci. 12, 2931
56 Kiani, R. et al. (2007) Object category structure in response patterns of
neuronal population in monkey inferior temporal cortex. J.
Neurophysiol. 97, 42964309
57 Tsao, D.Y. et al. (2006) A cortical region consisting entirely of face-
selective cells. Science 311, 670674
58 Parr, L.A. et al. (2009) Face processing in the chimpanzee brain. Curr.
Biol. 19, 5053
59 Op de Beeck, H. et al. (2001) Inferotemporal neurons represent low-
dimensional congurations of parameterized shapes. Nat. Neurosci. 4,
12441252
60 Kriegeskorte, N. et al. (2008) Matching categorical object
representations in inferior temporal cortex of man and monkey.
Neuron 60, 11261141
61 Mahon, B.Z. and Caramazza, A. (2008) A critical look at the embodied
cognition hypothesis and a new proposal for grounding conceptual
content. J. Physiol. Paris 102, 5970
62 Damasio, H. et al. (2004) Neural systems behind word and concept
retrieval. Cognition 92, 179229
63 Patterson, K. et al. (2007) Where do you know what you know? The
representation of semantic knowledge in the human brain? Nat. Rev. 8,
976987
64 Fang, F. and He, S. (2005) Cortical responses to invisible objects in the
human dorsal and ventral pathways. Nat. Neurosci. 8, 13801385
65 Almeida, J. et al. (2008) Unconscious processing dissociates along
categorical lines. Proc. Natl. Acad. Sci. U.S.A. 105, 1521415218
66 Goodale, M.A. and Milner, A.D. (1992) Separate visual pathways for
perception and action. Trends Neurosci. 15, 2025
67 Kriegeskorte, N. et al. (2008) Matching categorical object
representations in inferior temporal cortex of man and monkey.
Neuron 60, 11261141
68 Farah, M.J. and Rabinowitz, C. (2003) Genetic and environmental
inuences on the organization of semantic memory in the brain: Is
living things an innate category? Cogn. Neuropsychol. 20, 401
408
69 New, J. et al. (2007) Category-specic attention for animals reects
ancestral priorities, not expertise. Proc. Natl. Acad. Sci. U.S.A. 104,
1659816603
70 Thorpe, S. et al. (1996) Speed of processing in the human visual system.
Nature 381, 520522
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
103
Specifying the self for cognitive
neuroscience
Kalina Christoff
1
, Diego Cosmelli
2
, Dorothe e Legrand
3
and Evan Thompson
4
1
Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC, V6T 1Z4 Canada
2
Escuela de Psicolog a, Ponticia Universidad Cato lica de Chile, Av. Vicun a Mackenna 4860, Macul, Santiago, Chile
3
Centre de Recherche en Episte mologie Applique (CREA), ENSTA-32, boulevard Victor, 75015 Paris, cedex 15, France
4
Department of Philosophy, University of Toronto, 170 St George Street, Toronto, ON, M5R 2M8 Canada
Cognitive neuroscience investigations of self-experience
have mainly focused on the mental attribution of fea-
tures to the self (self-related processing). In this paper,
we highlight another fundamental, yet neglected, aspect
of self-experience, that of being an agent. We propose
that this aspect of self-experience depends on self-spec-
ifying processes, ones that implicitly specify the self by
implementing a functional self/non-self distinction in
perception, action, cognition and emotion. We describe
two paradigmatic cases sensorimotor integration and
homeostatic regulation and use the principles from
these cases to show how cognitive control, including
emotion regulation, is alsoself-specifying. We argue that
externally directed, attention-demanding tasks, rather
than suppressing self-experience, give rise to the self-
experience of being a cognitiveaffective agent. We con-
clude with directions for experimental work based on
our framework.
Investigating self-experience in cognitive neuroscience
How does the embodied brain give rise to self-experience?
This question, long addressed by neurology [1] and neuro-
physiology [2], now attracts strong interest from cognitive
neuroscience and the neuroimaging community [36].
Recent neuroimaging studies have investigated self-
experience mainly by employing paradigms that contrast
self-related with non-self-related stimuli and tasks. Such
paradigms aim to reveal the cerebral correlates of self-
related processing (see Glossary). Recent reviews identify
several brain regions that appear most consistently acti-
vated in self-related paradigms such as assessing ones
personality, physical appearance or feelings; recognizing
ones face; or detecting ones rst name (see [4,6] for
extensive reviews). The medial prefrontal cortex (mPFC)
and the precuneus/posterior cingulate cortex (Precuneus/
PCC) are the most frequently discussed [410], but two
additional regions, the temporoparietal junction (TPJ) and
temporal pole, are also consistently activated [6].
Although these studies have contributed valuable infor-
mation about the neural correlates of self-related proces-
sing, two issues have recently arisen [3,6]. First,
the identied regions, especially the midline regions
(mPFC, Precuneus/PCC) often associated with self-related
processing [4,710], might not be self-specic, because they
are also recruited for a wide range of other cognitive
processes recall of information from memory, inferential
reasoning, and representing others mental states [3,5,6].
In addition, the PCC appears to be engaged in attentional
processes and might be a hub for attention and motivation
[11,12], whereas the TPJ is important for attentional
reorienting [13]. Hence, describing these regions (singly
or collectively) as self-specic could be unwarranted [3,5,6].
Second, studies employing self-related processing ap-
proach self-experience through the self-attribution of men-
tal and physical features, and thereby focus on the self as
an object of attribution and not the self as the knowing
subject and agent. To invoke James [14] classic distinction,
this paradigm targets the Me the self as known through
its physical and mental attributes and not the I the self
as subjective knower and agent. Thus, relying exclusively
on this paradigm would limit the cognitive neuroscience of
self-experience to self-related processing (the Me), to the
neglect of the self-experience of being a knower and agent
(the I) [6,15].
In this paper, we focus on the I experiencing oneself as
the agent of perception, action, cognition and emotion and
Opinion
Glossary
Cognitive control: the process by which one focuses and sustains attention on
task-relevant information and selects task-relevant behavior.
Emotion regulation: the process by which one influences ones experience and
expression of emotion.
Homeostatic regulation: the process of keeping vital organismic parameters
within a given dynamical range despite external or internal perturbations.
I versus Me: experiencing oneself as subjective knower and agent versus
experiencing oneself as an object of perception or self-attribution.
Self-related processing: processing requiring one to evaluate or judge some
feature in relation to ones perceptual image or mental concept of oneself.
Self-specific: a component or feature that is exclusive (characterizes oneself
and no one else) and noncontingent (changing or losing it entails changing or
losing the distinction between self and non-self).
Self-specifying: any process that specifies the self as subject and agent by
implementing a functional self/non-self distinction.
Sensorimotor integration: the mechanisms by which sensory information is
processed to guide motor acts, and by which motor acts are guided to facilitate
sensory processing.
Task-negative/default-network brain regions: regions exhibiting sustained
functional activity during rest but showing consistent deactivations during
externally directed, attention-demanding tasks. Such regions include the
precuneus/posterior cingulate cortex, medial prefrontal cortex and bilateral
temporoparietal junction.
Task-positive brain regions: regions consistently activated during externally
directed, attention-demanding tasks. Such regions include the intraparietal
sulcus, frontal eye field, middle temporal area, lateral prefrontal cortex and
dorsal anterior cingulate.
Corresponding author: Thompson, E. (evan.thompson@utoronto.ca).
104 1364-6613/$ see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.001 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
we propose a theoretical framework that links this type of
self-experience to a wide range of neuroscientic ndings
at different levels of neural functioning.
According to our proposal, experiencing oneself as an
agent depends on the existence of specic types of dynamic
interactive processes between the organism and its envi-
ronment. We call these processes self-specifying because
they implement a functional self/non-self distinction that
implicitly species the self as subject and agent [6,16]. To
illustrate the basic principles of self-specifying processes,
we describe two paradigmatic examples sensorimotor
integration and homeostatic regulation that underlie
the self-experience of being a bodily agent. We then argue
that although externally directed attention-demanding
tasks can compromise self-related processing [710,17
19], such tasks can be expected to enhance another funda-
mental type of self-experience, namely that of being a
cognitiveaffective agent [6,15,16]. In support of this point,
and to showhow cognitive neuroscience can begin to model
this type of self-experience, we apply the concept of self-
specifying processes to cognitive control, including emotion
regulation. We conclude with suggestions for future exper-
imental work based on our framework.
Self-experience as arising from self-specifying
processes
Many neuroimaging studies have focused on the type of
self-experience that occurs when a person directs his or her
attention away from the external world (e.g. when task
demands are low, when performing a self-reective task or
during rest) [710,17] (Figure 1a). At the same time, other
lines of investigation concerned with embodied experience
have examined self-experience during world-directed per-
ception and action [1,20,21] (Figure 1b). These investiga-
tions have focused on bodily awareness in sensorimotor
integration [20,21] and homeostatic regulation [1,22,23].
Central to this approach is the notion that the organism
constantly integrates efferent and afferent signals in a way
that distinguishes fundamentally between reafference
afferent signals arising as a result of the organisms own
efferent processes (self) and exafference afferent signals
arising as a result of environmental events (non-self). By
implementing this functional self/non-self distinction, ef-
ferentafferent integration implicitly species the self as a
bodily agent [6,16,21].
Sensorimotor integration
The notion of self-specifying processes is easiest to illus-
trate through the systematic linkage of sensory and motor
processes in the perceptionaction cycle (Box 1). An organ-
ism needs to be able to distinguish between sensory
changes arising from its own motor actions (self) and
sensory changes arising from the environment (non-self).
The central nervous system(CNS) distinguishes the two by
systematically relating the efferent signals (motor com-
mands) for the production of an action (e.g. eye, head or
hand movements) to the afferent (sensory) signals arising
from the execution of that action (e.g. the ow of visual or
haptic sensory feedback). According to various models
going back to Von Holst [24], the basic mechanism of this
integration is a comparator that compares a copy of the
motor command (information about the action executed)
with the sensory reafference (information about the senso-
ry modications owing to the action) [25]. Through such a
mechanism, the organism can register that it has executed
a given movement, and it can use this information to

(a) (b)
OR
TRENDS in Cognitive Sciences
Figure 1. Two types of self-experience. (a) The Me or self-related processing (here depicted as self-recognition and reflective thinking about oneself). Its neural substrates
are thought to be restricted to a subset of midline cortical regions (mPFC and Precuneus/PCC). It is also thought to compete for cognitive resources when some aspect of the
world demands attention. (b) The I as embodied agent. This type of self-experience arises from the integration of efferent and reafferent processes, notably sensorimotor
integration (green loop) and homeostatic regulation (red loop), as well as possible higher level efferentreafferent regulatory loops such as the one instantiated by cognitive
control processes (blue loop). Such regulatory loops implement a functional self/non-self distinction that implicitly specifies the self as agent. This type of self-experience
implicitly occurs during attention-demanding interactions with the environment (black arrows).
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
105
process the resulting sensory reafference. The crucial point
for our purposes is that reafference is self-specic, because
it is intrinsically related to the agents own action (there is
no such thing as a non-self-specic reafference). Thus, by
relating efferent signals to their afferent consequences, the
CNS marks the difference between self-specic (reafferent)
and non-self-specic (exafferent) information in the per-
ceptionaction cycle. In this way, the CNS implements a
functional self/non-self distinction that implicitly species
the self as the perceiving subject and agent.
Homeostatic regulation
Self-specifying reafferentefferent processes are key com-
ponents of homeostatic regulation, which implements the
self/non-self distinction at the basic level of life preserva-
tion [1,16,22,23]. To ensure the organisms survival
through changing internal and external conditions, affer-
ent signals conveying information about the organisms
internal state are continually coupled with corresponding
efferent regulatory processes that keep afferent param-
eters within a tight domain of possible values [1,22,23].
Reafferentefferent loops from spinal nuclei to brainstem
nuclei and midbrain structures are involved in somato-
autonomic adjustments; these loops are modulated by the
hypothalamus as well as mid/posterior insula (sensory)
and anterior cingulate (motor) cortices [23]. This vertically
integrated, interoceptive homeostatic system species the
self as a bodily agent by maintaining the bodys integrity
(self) in relation to the environment (non-self) [22], and by
supporting the implicit feeling of the bodys internal con-
dition in perception and action [23].
Specifying the self as knowing subject and agent
The reafferentefferent processes just described specify the
self not as an object of perception or attribution (the Me)
but as the experiential subject and agent of perception,
Box 1. Self-experience and sensorimotor integration
The self-experience of being an embodied agent depends on the
sensorimotor mechanisms that integrate efference with reafference
(Figure I). A basic level mechanism allows efferences to be system-
atically related to their reafferent consequences. This anchoring of
efference to reafference implements a functional self/non-self distinc-
tion that implicitly specifies the self as a bodily agent [6,21].
For example, consider the motor act of biting a lemon and the
resulting taste. This experience is characterized by (i) a specific
content (lemon, not chocolate); (ii) a specific mode of presentation
(tasting, not seeing); and (iii) a specific perspective (my experience of
tasting). The process of relating an efference (the biting) to a
reafference (the resulting taste of acidity) is what allows the
perception to be characterized not only by a given content (the
acidity) but also by a self-specific perspective (I am the one
experiencing the acidity of the lemon juice) [6,21].
The agents perspective is thus a central concept within this
framework. Although the basic sensorimotor integration processes
do not involve any representation of the self per se, they are
nonetheless self-specifying [6] because they implement a unique
egocentric perspective in perception and action, and thus implicitly
specify the self as subject and agent of that perspective. According to
this view, self-experience is present whenever a self-specific perspec-
tive exists, regardless of the properties of the represented content
[6,15,16,21].
The original mechanism of sensorimotor integration (Figure I) can
be elaborated to include higher level comparators between intended,
predicted and actual reafference (Figure II). For example, Wolpert and
colleagues [25] described a two-process model of action monitoring.
The first process (Figure II, left) uses the motor command and the
current state estimate to achieve a next state estimate using the
forward model (or a prediction) to simulate the arms dynamics. The
second process (Figure II, right) uses the difference between expected
and actual sensory feedback to correct the forward models next state
estimate. Through such sophisticated comparators, the model can
handle higher level phenomena, such as intentions, predictions,
mental simulation and goals [20].

Sensorimotor integration
Comparator
Reafference
Effector
Efference copy
Motor command
External world
Self
TRENDS in Cognitive Sciences
Figure I. Sensorimotor integration
Comparator mechanismfor relating efferent signals to reafferent sensory feedback.

Motor
command
Current state estimate
Next state estimate
estimate
Predicted
reafference
Comparator
Comparator
Actual
reafference
Sensory discrepancy/
state correction
Predicted
next state
(Forward model)
TRENDS in Cognitive Sciences
Figure II. Two-process model of action monitoring (Ref. [25]).
Opinion
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
106
action and feeling (the I). Sensorimotor integration
species a unique perceptual perspective on the world,
whereas homeostatic regulation species a unique affec-
tive perspective based on the inner feeling of ones body.
The resulting perspective is self-specic in the strict sense
of being both exclusive (it characterizes oneself and no one
else) and noncontingent (changing or losing it entails
changing or losing the distinction between self and non-
self) [6]. In the general case, I perceive and act from my
self-specic perspective while implicitly experiencing my-
self as perceiver and agent. In some particular cases, what
I perceive is Me, such as when I visually recognize myself.
Although many non-human animals can implicitly experi-
ence themselves as embodied agents through the types of
self-specifying sensorimotor and homeostatic processes
described above [26], only humans and a few other species
seem capable of self-recognition [27], and thus of experien-
tially relating the I and the Me. What we emphasize here
is that whereas the Me consists in the features one
perceives as belonging to oneself, the I consists in the
self-specic, agentive perspective from which such percep-
tions occur; hence, to explain the I we need to explain how
such a perspective is implemented. Our proposal is that the
reafferentefferent processes of sensorimotor integration
and homeostatic regulation implement a self-specic,
agentive perspective at the bodily level of perception and
feeling.
This model predicts that if a brain process involves only
afference without a matching efference/reafference, it will
not specify the organism as subject or agent, and thus will
not constitute a self-specifying process. For example, the
feedforward sweep in visual processing from early visual
areas to extrastriate areas, which Lamme [28] argues is
not accompanied by conscious awareness, would not quali-
fy as self-specifying, whereas recurrent processing in
multiple visual areas, which Lamme argues is associated
with phenomenal awareness (short-lived awareness that
is not necessarily reportable), would qualify as self-speci-
fying only if linked to matching efference/reafference. Our
model thus allows that non-self-specifying processes occur
in parallel with self-specifying ones, and it leaves open the
question whether there exist conscious processes that do
not include even minimal self-specication (as Lammes
proposal suggests) or whether every conscious process is
also minimally self-specifying (as others have argued
[15]).
Given this model, we next consider the view, prevalent
in the recent neuroimaging literature [710,1719], that
self-experience is suppressed during externally directed,
attention-demanding tasks. We argue that this view needs
qualication to take into account the self-experience of
being a cognitiveaffective agent.
Is self-experience suppressed during world-directed
attention?
One outcome of functional magnetic resonance imaging
(fMRI) studies using self-related processing as the main
paradigm for understanding self-experience is the view
that self-experience occurs mostly when individuals are
not preoccupied with externally oriented tasks and that it
is suppressed when such tasks do occur [710]. This viewis
based partly on ndings from a growing number of studies
examining spontaneous uctuations in the fMRI signal
during task-free, resting-state conditions [29]. These nd-
ings have distinguished between (i) task-positive regions
(e.g. dorsolateral PFC, inferior parietal cortex and supple-
mentary motor area), whose activity increases during ex-
ternally oriented attention and (ii) task-negative/default-
network regions (e.g. mPFC, Precuneus/PCC and TPJ),
whose activity decreases across a wide variety of tasks.
These task-positive and task-negative networks also ap-
pear to be anticorrelated in their spontaneous activity
during the resting state [30], so that increased activity
in one network has been noted to correlate with decreased
activity in the other [1719].
A prominent interpretation of these ndings is that the
brain alternates dynamically between a task-oriented,
externally directed state and a task-independent, self-di-
rected state, with self-experience in the formof self-related
processing mainly occurring during the task-independent,
self-directed state [810,18,19]. A wide variety of studies
have been taken to support this interpretation; these
studies indicate that externally oriented, attention-de-
manding tasks, which are considered to suppress intro-
spective thoughts, tend to suspend default-network
activity, whereas resting conditions, as well as practiced
tasks that do not suppress introspective thoughts, corre-
late with an active default network (see [31] for a compre-
hensive review). Additional support is thought to come
from the nding that tasks requiring individuals to make
explicit reference to some aspect of themselves implicate
medial prefrontal regions also active as part of the default
network [4,5,26,31]. Hence, it has been proposed, on the
one hand, that self-experience is largely absent during
world-directed attention (because self-related processing
is strongly suppressed) [17], and, on the other hand, that
during rest conditions, subjects mainly engage in self-
referential processing [710].
This conclusion, however, rests on the following
assumptions: (i) the main way to experience the self is
as an object of ones attention (i.e. through self-related
processing); (ii) self-reective, introspective processes are
linked to task-negative/default-network regions; and (iii)
the brain is organized into a dynamic system of task-
positive regions subserving world-directed attention and
task-negative/default regions subserving self-directed at-
tention, with these two networks acting in opposition so
that recruitment of one suppresses the other.
Each of these assumptions, however, needs qualication
in light of the recent theoretical literature and empirical
ndings.
First, treating self-related processing as the main form
of self-experience limits self-experience to the Me (self as
object of ones attention) while neglecting the I (self as
knowing subject and agent). For example, if the agentic I
is considered at the bodily level of sensorimotor integra-
tion, then task-positive regions such as the supplementary
motor cortex and inferior parietal cortex could be viewed as
crucial to self-experience, for these regions serve to imple-
ment sensorimotor integration tasks [25,32,33]. More gen-
erally, although world-directed attention can suppress self-
related processing, one cannot conclude that it suppresses
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
107
every form of self-experience, especially the self-experience
of being a cognitive agent (which it can instead enhance).
Second, self-referential and introspective processes
have also been linked to recruitment of regions outside
the default network. For example, self-related processing
activates the temporopolar cortex as consistently as the
three main default network regions (mPFC, Precuneus/
PCC and TPJ) [34], and is also frequently associated with
activations in the insula and lateral PFC [6]. Furthermore,
introspective mental processes have been linked to a re-
cruitment of the anterior portion of the lateral PFC, name-
ly the rostrolateral PFC [3537], which is considered to be
part of a cognitive control network separable from the
default network [38]. These ndings indicate that self-
referential processing is not uniquely associated with
task-negative/default-network regions. Therefore, reduced
or inhibited activity in default network regions does not
necessarily indicate that self-directed introspective pro-
cesses are suppressed, because they can be implemented
through regions outside the default network.
Finally, recent studies have begun to qualify the picture
of task-positive and task-negative/default networks as
invariably acting in opposition to each other. A parallel
recruitment of task-positive and task-negative/default-net-
work regions has been observed during several tasks, such
as passive sensory stimulation [39], continuous movie
viewing [40], narrative speech comprehension [41], auto-
biographical planning [42] and mind wandering during a
sustained attention task [36]. These diverse ndings sug-
gest that characterizing brain activity as either task-posi-
tive/world-directed or task-negative/self-directed is
incomplete. Rather, such neural recruitments and cogni-
tive processes can occur in parallel.
In contrast to the view that attention-demanding tasks
suppress self-experience, we propose that such tasks can
be expected to enhance the self-experience of being a
cognitiveaffective agent. An outstanding task for cogni-
tive neuroscience is to integrate this type of self-experience
and self-related processing into an overarching explanato-
ry framework that can guide empirical research. In the
next section, we propose what we believe is a crucial
element of such a framework. By describing how the
concept of self-specifying processes can be applied to cog-
nitive control, including emotion regulation, we argue that
cognitiveaffective processes instantiate the self-experi-
ence of being a cognitiveaffective agent. In this way, we
show how cognitive neuroscience can investigate this type
of self-experience by including paradigms involving atten-
tion to the external world.
Self-specifying processes during attention-demanding
tasks
Can cognitive control processes in affectively neutral con-
texts and affectively arousing contexts implicitly specify
the self as a cognitiveaffective agent?
Cognitive control processes in affectively neutral
contexts
Cognitive control processes serve both to focus attention on
task-relevant information versus other competing sources
of information and to select task-relevant behavior over
habitual or otherwise prepotent responses. For example, in
a Stroop task, the goal is to name the ink color of a printed
color name while ignoring the words meaning. Individuals
are slower to respond when the information is incongruent
(e.g. the word RED is printed in blue ink) than when it is
congruent (e.g. the word REDis printed in red ink), and the
slower response time is taken to reect the need for higher
attentional control when a conict in perceptual informa-
tion is present.
According to the inuential conict-monitoring model
[43], cognitive control is implemented through a regulatory
conictcontrol loop consisting of two components. An
evaluative or conict-monitoring component detects con-
icts in the information available for task performance,
whereas a regulative component exerts a top-down biasing
inuence on the cognitive and motor processes required for
task performance. At the neural level, the dorsal anterior
cingulate cortex (dACC) has been proposed to support the
evaluative process of conict monitoring [43,44], whereas
lateral PFC regions have been proposed to underlie the
regulative process of cognitive control [43,45]. This model
predicts that strong ACC activity should be followed by
behavior reecting relatively focused attention, and weak
ACC activity by behavior reecting less focused attention.
In keeping with this prediction, Kerns and colleagues [46]
found that high dACC activation for incongruent trials in
the Stroop task was followed by low interference on the
subsequent trial, as well as by strong activation in dorso-
lateral PFC. These ndings suggest that the dACC could
signal the need for control adjustments to lateral PFC and
thereby strengthen cognitive control [45].
Our aim in describing the conict-monitoring model
is not to endorse it against other important models of
cognitive control [4749] or ACC functioning [50,51]. In
particular, we do not suppose that dACC is involved in
cognitive but not emotional functions, whereas ventral
ACC does the reverse [52], because recent experimental
ndings and theoretical considerations argue against both
this particular cognitiveaffective division [53] as well as
emotioncognition separations more generally in the brain
and behavior [53,54]. Instead, we use the model to illus-
trate how cognitivecontrol processes can be self-specify-
ing.
For the purposes of the present argument, the key
feature of the conict-monitoring model is the functional
distinction between a regulatory function and an evalua-
tive function. The control loop comprising these two func-
tions (Figure 2) strongly resembles the integration of
efferent and reafferent information during sensorimotor
processing, with the regulative component corresponding
to efferent inuence and the evaluative component corre-
sponding to a reafferent process. We propose that such a
regulativeevaluative loop canimplement a functional self/
non-self distinction between, on the one hand, reafferent
signals about modications in level of conict resulting
from ones own cognitivecontrol efforts (self), and, on the
other hand, exafferent signals about the level of conict
resulting from environmental sources such as stimulus
properties (non-self). By implementing this self-specic,
agentive perspective in cognitive control, the regulatory
conictcontrol loop would implicitly specify the self as a
Opinion
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
108
cognitive agent. Note that this cognitive form of self-expe-
rience would subsume the self-experience of being an
embodied agent resulting from sensorimotor integration,
because cognitive control operates on sensorimotor pro-
cesses themselves, and thus occurs at higher levels of
integration in the perceptionaction cycle [55].
As originally conceived, the cognitive control of atten-
tion was closely linked to self-regulation [56,57], includ-
ing the self-experience of being a cognitive agent [57].
Concern with this link, however, seems to have largely
disappeared from the recent cognitive neuroscience liter-
ature, possibly because of the assumption that self-expe-
rience is suppressed during attention-demanding tasks
[710,1719], as well as the observation that brain regions
associated with cognitive control, such as the lateral PFC
and dACC, largely overlap with the task-positive regions
outlined earlier. Indeed, meta-analyses show that the
lateral PFC and dACC are among the most consistently
recruited brain regions across a broad range of attention-
demanding tasks, including perception, response
selection, executive control, working memory, episodic
memory and problem solving [58,59]. Nevertheless, as
discussed above, recruitment of these task-positive
regions is not mutually exclusive with recruitment of
the task-negative/default-network regions. Although in-
tense engagement in sensorimotor tasks can suppress the
task-negative/default-network regions that also subserve
self-related processing [1719], one can envision situa-
tions (e.g. introspection, envisioning the perspective of
others, mind wandering) in which the required mental
processes call upon resources from both sets of regions
and hence lead to more balanced activations between
them, as indicated by recent results [36,3942]. Further-
more, even in situations where the dACC and lateral PFC
are recruited in opposition to task-negative/default-net-
work regions (i.e. with a concomitant deactivation of these
regions), self-experience might still be crucially present in
the form of the I or self-as-cognitive-agent, as a result of
cognitive control processes being self-specifying in the
way just outlined above.
Emotion regulation
The cognitive and behavioral control of emotion in affec-
tively arousing or challenging situations [60,61] provides
another case where we can expect to nd the self-experi-
ence of being a cognitiveaffective agent. Although emotion
regulation and self-related processing have often been
linked by pointing to their common reliance on midline
cortical structures [61,62], we propose that another funda-
mental but less explored link between self-experience and
emotion regulation can be found in howemotion regulation
processes are also self-specifying.
Recent discussions have proposed a distinction between
two main forms of emotion regulation a deliberate or
voluntary form, and an implicit or incidental form
[60,61,6365]. Deliberate emotion regulation relies on
the same cognitive control mechanisms required for atten-
tion-demanding tasks [61]. Thus, tasks requiring reap-
praisal reinterpreting the meaning of a stimulus to
change ones emotional response to it [60,61] recruit
dACC and lateral PFC regions [61]. Here these regions
are thought to subserve explicit reasoning about how the
association between a situation and ones emotional re-
sponse to it can be changed. For example, if one is viewing a
picture of a burn victim in a hospital bed, it might be
possible to modify the original emotional response of dis-
tress or sadness by focusing on possible positive aspects,
such as the victims successful progress toward a healthier
state or that the victim survived. Maintaining such
descriptions is thought to bias perceptual and associa-
tive-memory systems; these systems in turn send signals
to subcortical appraisal systems, such as the amygdala and
ventral striatum [61], and thus indirectly modify the origi-
nal emotional response.
We propose that such a regulatoryevaluative loop can
implement a functional self/non-self distinction between
the effortful reappraisal process (self) and the target of that
process, namely the emotional scene (non-self). In this way,
emotion regulation can implicitly specify the self as the
cognitiveaffective agent engaged in trying to reinterpret
and thereby control an emotional response.
Deliberate forms of emotion regulation are associated
not only with dACC and lateral PFC regions crucially
involved in cognitive control but also with recruitment of
dorsomedial PFC (dmPFC) [61,64,65], a brain region con-
sidered to support reective awareness of ones feelings,
and thus to enable higher level, metarepresentations of
ones own experience [63]. By allowing the maintenance of
such emotion-specic metarepresentations, and through
its dense interconnections with the ventromedial PFC
(vmPFC) [66], the dmPFC can exert a biasing inuence
on emotion processes during deliberate attempts at emo-
tion regulation. Thus, by both inuencing and re-repre-
senting the emotion processes in more ventral systems, the
dmPFC and its interconnected ventral structures can form
another regulatoryevaluative loop that implicitly speci-
es the self as cognitiveaffective agent in effortful emotion
regulation.
In contrast to deliberate emotion regulation, implicit or
incidental emotion regulation has been linked to medial
regions suchas the rostral ACC(rACC), subgenual ACCand
vmPFC [61]. For example, the rACC is associated with
regulation of attention to emotional (but not non-emotional)
distracters during an emotional version of the Stroop task
[67,68]. During this task, subjects are not instructed to
regulate their emotions, thus the recruitment of the rACC
and its accompanying regulation of emotional attention can

Dorsal ACC
Lateral PFC
Modified level
of conflict
Posterior brain
regions
Regulation/efference
Evaluation/re-afference
Conflict detection
Biasing influence
TRENDS in Cognitive Sciences
Figure 2. Cognitive control as a self-specifying process. The conflict-monitoring
model of cognitive control [43] depicted as implementing a possible efferent/
reafferent regulatory loop. This loop can define the functional self/non-self
distinction between reafferent signals resulting from ones own cognitive control
efforts (self) and exafferent signals about the level of conflict resulting from
environmental sources such as stimulus properties (non-self).
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
109
be considered incidental to the main task [65]. Activation in
rACC appears to be accompanied by a simultaneous and
correlated reduction of amygdala activity; this relation sug-
gests that resolving emotional conict depends on a rACC
amygdala regulatory loop [67] that also appears to use the
general cognitive monitoring mechanism of the dACC to
detect the presence of conict [68]. Thus, a self-specifying
evaluativeregulatory loop can be formed between rACC
and dACC, analogous to that between lateral PFC and
dACC, but dedicated to the resolution of emotional conict
through an rACC biasing inuence on amygdala activity.
Furthermore, regions playing a role in deliberate emo-
tion regulation, such as the dACC and dmPFC [63,64], and
possibly the right ventrolateral PFC [65], also appear to
participate in implicit emotion regulation. For example,
the dACC and dmPFC have autonomic regulatory func-
tions mediated by direct neural connections with subcorti-
cal visceromotor centers such as the lateral hypothalamus
[66]. In addition, neuroimaging studies noting an inverse
correlation between medial PFC activity and heart rate
variability suggest that medial PFC activity can have a
tonic inhibitory effect mediated through the vagus nerve
[63]. Based on these ndings, researchers have described
an evaluativeregulatory feedback mechanism, including
an equilibration process between bottom-up and top-down
interactions, through which the body state is altered as
arousal processes become modulated and differentiated
[63]. This mechanism provides another candidate for
a self-specifying process at implicit levels of emotion
regulation.
Given that these candidate self-specifying processes
belong to implicit emotion regulation, the functional self/
non-self distinction they implement would be closely relat-
ed to the one established through homeostatic regulation
between the feeling body and the environment. Indeed,
implicit emotion regulation processes overlap conceptually
and neurally with the higher levels of the homeostatic
regulation system described earlier [1,22,23,26]. Thus,
the self-experience of being an emotional agent that these
processes elicit would occur at the level of affect and action
tendencies [26], whereas this bodily level would be sub-
sumed by the self-experience of being a cognitiveaffective
agent in deliberate emotion regulation, analogous to the
way the self-experience of being a cognitive agent also
subsumes the self-experience of being an embodied agent
in attention-demanding cognitive tasks.
Concluding remarks and future directions
Using the concept of self-specifying processes, we have
outlined a model of howcognitive control processes, includ-
ing emotion regulation, implicitly specify the self as a
cognitiveaffective agent. Our model suggests several
questions for future investigations (Box 2). We highlight
two issues here.
One issue concerns the types of neural mechanisms that
integrate the efferentreafferent and regulatoryevalua-
tive signals in self-specifying processes. On the one hand,
the comparison between efferent and reafferent signals can
be remapped at higher levels by specic neural structures.
For example, the anterior insula can serve to remap the
second-order comparison between efferent and reafferent
signals in more posteriorly located motor and sensory
regions during homeostatic regulation [23]. Similarly, dur-
ing cognitive control, anteriorly located lateral PFC
regions, such as the rostrolateral PFC, can remap the
second-order comparison between the regulative and eval-
uative outcomes of processes supported by the more pos-
teriorly located dorsolateral PFC and dACC [35]. Such
hierarchically organized systems can be present at multi-
ple neural levels and in multiple functional domains. On
the other hand, another type of mechanism not requiring
explicit remapping by dedicated neural structures, but
relying instead on dynamical coupling across multiple
areas [69] (e.g. through phase synchronization of neuronal
signals [70]), could be responsible for signal integration.
Such dynamical mechanisms can also be implemented at
multiple neural levels and in various functional domains
[69,70]. Whether self-specifying processes depend on either
or both of these mechanisms is an important issue for
future research.
A second issue concerns the subjective nature of self-
experience. Although objective measures from experimen-
tally controlled tasks and uncontrolled rest conditions are
certainly useful, we believe a richer understanding of self-
experience requires the incorporation of subjective mea-
sures such as self-reports into neuroimaging protocols
[36,71]. Certain questions seem tractable only with such
an approach. For example, is self-experience all-or-nothing
or graded in character? When multiple self-specifying
processes are activated at various levels of neural func-
tioning, does a stronger sense of self occur than when only a
few are recruited? Can mental training of attention and
emotion regulation [72,73] alter self-experience and its
neural substrates?
As argued here, howcognitive neuroscience species the
self profoundly shapes our view of self-experience and its
neural substrates. By broadening our investigations to
include the self-experience of being a cognitive agent, we
can deepen our understanding of how the brain and body
work together to create our sense of self.
Box 2. Questions for further research
Is the I or self as-subject all or nothing, or graded?
When multiple self-specifying processes are activated, does a
stronger sense of I occur?
Can self-specifying processes be altered through attentional and
emotion regulation training?
Do self-specifying processes require higher level remapping of
efferentreafferent integration, or can such integration occur
through dynamical mechanisms such as phase synchronization?
Can self-specifying processes be identified in neuroimaging data
through functional connectivity measures, and can statistical
measures such as Granger causality be used to identify directional
influences in such processes?
Can self-specifying processes be identified as part of the brains
intrinsic functional architecture through intrinsic connectivity
measures in resting state neuroimaging data?
Can transcranial magnetic stimulation interfere selectively in self-
specifying loops and thereby alter cognitiveaffective self-experi-
ence?
Are self-specifying processes altered in psychiatric disorders,
such as schizophrenia or anorexia nervosa, which involve altered
self-experience and self-other evaluation?
Opinion
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
110
Acknowledgments
For helpful comments we thank Norm Farb, Alisa Mandrigin, Luiz
Pessoa, Rebecca Todd and four anonymous reviewers. K.C. was supported
by grants from the Canadian Institutes of Health Research (CIHR MOP
81188), the Natural Sciences and Engineering Research Council of
Canada (NSERC) and the Michael Smith Foundation for Health Research
(MSFHR); D.C. by Fondo National de Desarrollo Cientico y Tecnologico
Grant 1090612; and E.T. by the Social Sciences and Humanities Research
Council of Canada.
References
1 Damasio, A.R. (1999) The Feeling of What Happens, Harcourt
2 Llinas, R. (2001) The I of the Vortex, MIT Press
3 Gillihan, S. and Farah, M. (2005) Is self special? A critical review of
evidence from experimental psychology and cognitive neuroscience.
Psychol. Bull. 131, 7697
4 Northoff, G. et al. (2006) Self-referential processing in our brain a
meta-analysis of imaging studies on the self. Neuroimage 31, 440
457
5 Uddin, L.Q. et al. (2007) The self and social cognition: the role of cortical
midline structures and mirror neurons. Trends Cogn. Sci. 11, 153157
6 Legrand, D. and Ruby, P. (2009) What is self-specic? A theoretical
investigation and critical review of neuroimaging results. Psychol. Rev.
116, 252282
7 Gusnard, D.A. et al. (2001) Medial prefrontal cortex and self-referential
mental activity: relation to a default mode of brain function. Proc. Natl.
Acad. Sci. U.S.A. 98, 42594264
8 Gusnard, D.A. (2005) Being a self: considerations from functional
imaging. Conscious. Cogn. 14, 679697
9 Wicker, B. et al. (2003) A relation between rest and self in the brain?
Brain Res. Rev. 43, 224230
10 Schneider, F. et al. (2008) The resting brain and our self: self-
relatedness modulates resting state neural activity in cortical
midline structures. Neuroscience 157, 120131
11 Mohanty, A. et al. (2008) The spatial attention network interacts with
limbic and monoaminergic systems to modulate motivation-induced
attention shifts. Cereb. Cortex 18, 26042613
12 Engelmann, J.B. et al. (2009) Combined effects of attention and
motivation on visual task performance: transient and sustained
motivational effects. Front. Hum. Neurosci. 3, 117
13 Corbetta, M. et al. (2000) Voluntary orienting is dissociated from
target detection in human posterior parietal cortex. Nat. Neurosci.
3, 292297
14 James, W. (1890/1981) The Principles of Psychology, Harvard
University Press
15 Legrand, D. (2007) Pre-reective self-as-subject from experiential and
empirical perspectives. Conscious. Cogn. 16, 583599
16 Thompson, E. (2007) Mind in Life, Harvard University Press
17 Goldberg, I.I. et al. (2006) When the brain loses its self: prefrontal
inactivation during sensorimotor processing. Neuron 50, 329339
18 Fransson, P. (2005) Spontaneous low-frequency BOLD signal
uctuations: an fMRI investigation of the resting-state default mode
of brain function hypothesis. Hum. Brain Mapp. 26, 1529
19 Fransson, P. (2006) How default is the default mode of brain function?
Further evidence from intrinsic BOLD signal uctuations.
Neuropsychologia 44, 28362845
20 Blakemore, S-J. and Frith, C. (2003) Self-awareness and action. Curr.
Opin. Neurobiol. 13, 219224
21 Legrand, D. (2006) The bodily self: the sensori-motor roots of pre-
reexive self-consciousness. Phenom. Cogn. Sci. 5, 89118
22 Parvizi, J. and Damasio, A.R. (2001) Consciousness and the brainstem.
Cognition 79, 135160
23 Craig, A.D. (2009) How do you feel now? The anterior insula and
human awareness. Nat. Rev. Neurosci. 10, 5970
24 Von Holst, E. (1954) Relations between the central nervous systemand
the peripheral organs. Br. J. Anim. Behav. 2, 8994
25 Wolpert, D.M. et al. (1995) An internal model for sensorimotor
integration. Science 269, 18801882
26 Northoff, G. and Panksepp, J. (2008) The transpecies concept of self and
the subcortical-cortical midline system. Trends Cogn. Sci. 12, 259264
27 de Waal, F.B.M. (2008) The thief in the mirror. PLoS Biol. 6, e201
28 Lamme, V.A.F. (2003) Why visual awareness and attention are
different. Trends. Cogn. Sci. 7, 1218
29 Fox, M.D. and Raichle, M.E. (2007) Spontaneous uctuations in brain
activity observed with functional magnetic resonance imaging. Nat.
Rev. Neurosci. 8, 700711
30 Fox, M.D. et al. (2005) The human brain is intrinsically organized into
dynamic, anticorrelated functional networks. Proc. Natl. Acad. Sci.
U.S.A. 102, 96739678
31 Buckner, R.L. et al. (2008) The brains default network: anatomy,
function, and relevance to disease. Ann. N. Y. Acad. Sci. 1124, 138
32 Andersen, R.A. and Buneo, C.A. (2003) Sensorimotor integration in
posterior parietal cortex. Adv. Neurol. 93, 159177
33 Haggard, P. andWhitford, B. (2004) Supplementary motor area provides
an efferent signal for sensory suppression. Cogn. Brain Res. 19, 5258
34 Christoff, K. et al. (2004) Neural basis of spontaneous thought
processes. Cortex 40, 623630
35 Christoff, K. and Gabrielli, J.D.E. (2000) The frontopolar cortex and
human cognition: evidence for a rostrocaudal hierarchical organization
within the human prefrontal cortex. Psychobiology 28, 168186
36 Christoff, K. et al. (2009) Experience sampling during fMRI reveals
default network and executive system contributions to mind
wandering. Proc. Natl. Acad. Sci. U.S.A. 106, 87198724
37 McCaig, R.G. et al. (2010) Improved modulation of rostrolateral
prefrontal cortex using real-time fMRI and meta-cognitive
awareness. Neuroimage [Epub ahead of print].
38 Vincent, J.L. et al. (2008) Evidence for a frontoparietal control system
revealed by intrinsic functional connectivity. J. Neurophysiol. 100,
33283342
39 Greicius, M.D. and Menon, V. (2004) Default-mode activity during a
passive sensory task: uncoupled from deactivation but impacting
activation. J. Cogn. Neurosci. 16, 14841492
40 Golland, Y. et al. (2007) Extrinsic and intrinsic systems in the posterior
cortex of the human brain revealed during natural sensory stimulation.
Cereb. Cortex 17, 766777
41 Wilson, S.M. et al. (2008) Beyond superior temporal cortex: intersubject
correlations in narrative speech comprehension. Cereb. Cortex 18, 230
242
42 Spreng, R.N. et al. (2010) Default network activity, coupled with the
frontoparietal control network, supports goal-directed cognition.
Neuroimage 53, 303317
43 Botvinick, M.M. et al. (2001) Conict monitoring and cognitive control.
Psychol. Rev. 108, 624652
44 Botvinick, M.M. et al. (2004) Conict monitoring and anterior cingulate
cortex: an update. Trends Cogn. Sci. 8, 539546
45 Miller, E.K. and Cohen, J.D. (2001) An integrative theory of prefrontal
cortex function. Annu. Rev. Neurosci. 24, 167202
46 Kerns, J.G. et al. (2004) Anterior cingulate conict monitoring and
adjustments in control. Science 303, 10231026
47 Enger, T. (2008) Multiple conict-driven control mechanisms in the
human brain. Trends Cogn. Sci. 12, 374380
48 Vergut, T. and Notebaert, M. (2009) Adaptation by binding: a learning
account of cognitive control. Trends Cogn. Sci. 13, 252257
49 Mayr, U. and Ach, E. (2009) The elusive link between conict and
conict adaptation. Psychol. Res. 73, 794802
50 Rushworth, M.F.S. et al. (2007) Contrasting roles for anterior cingulate
and orbitofrontal cortex in decisions and social behaviour. Trends
Cogn. Sci. 11, 169176
51 Etkin, A. et al. (2010) Emotional processing in anterior cingulate and
medial prefrontal cortex. Trends Cogn. Sci. DOI: 10.1016/j.tics.2010.
11.004
52 Bush, G. et al. (2000) Cognitive and emotional inuences in anterior
cingulate cortex. Trends Cogn. Sci. 4, 215222
53 Pessoa, L. (2008) On the relationship between emotion and cognition.
Nat. Rev. Neurosci. 9, 148158
54 Pessoa, L. (2010) Emotion and cognition and the amygdala: from what
is it? to whats to be done?. Neuropsychologia 48, 34163429
55 Botvinick, M.M. (2007) Multilevel structure in behaviour and in the
brain: a model of Fusters hierarchy. Philos. Trans. R. Soc. Lond. B
Biol. Sci. 362, 16151626
56 Norman, D.A. and Shallice, T. (1986) Attention to action: willed and
automatic control of behavior. In Consciousness and Self-regulation.
Advances in Research and Theory (Vol. 4) (Davidson, R.J. et al., eds), In
pp. 118, Plenum Press
57 Posner, M.I. and Rothbart, M.K. (1998) Attention, self-regulation and
consciousness. Philos. Trans. R. Soc. Lond. B Biol. Sci. 353, 19151927
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
111
58 Duncan, J. and Owen, A.M. (2000) Common regions of the human
frontal lobe recruited by diverse cognitive demands. Trends Neurosci.
23, 475483
59 Corbetta, M. and Shulman, G. (2002) Control of goal-directed and
stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3, 201215
60 Gross, J.J. and Thomspon, R.A. (2007) Emotion regulation: conceptual
foundations. In Handbook of Emotion Regulation (Gross, J.J., ed.), pp.
325, Guilford
61 Ochsner, K.N. and Gross, J.J. (2005) The cognitive control of emotion.
Trends Cogn. Sci. 9, 242249
62 Northoff, G. (2005) Is emotion regulation self-regulation? Trends Cogn.
Sci. 9, 408409
63 Lane, R.D. (2008) Neural substrates of implicit and explicit emotional
processes: a unifying framework for psychosomatic medicine.
Psychosom. Med. 70, 214231
64 Phillips, M.L. et al. (2008) A neural model of voluntary and automatic
emotion regulation: implications for understanding the
pathophysiology and neurodevelopment of bipolar disorder. Mol.
Psychiatr. 13, 833857
65 Berkman, E.T. and Lieberman, M.D. (2009) Using neuroscience to
broaden emotion regulation: theoretical and methodological
considerations. Soc. Pers. Psychol. Comp. 3/4, 475493
66 Price, J.L. et al. (1996) Networks related to the orbital and medial
prefrontal cortex; a substrate for emotional behavior? Prog. Brain Res.
107, 523536
67 Etkin, A. et al. (2006) Resolving emotional conict: a role for the rostral
anterior cingulate cortex in modulating activity in the amygdala.
Neuron 51, 871882
68 Egner, T. et al. (2008) Dissociable neural systems resolve conict from
emotional versus nonemotional distracters. Cereb. Cortex 18, 1475
1484
69 Bressler, S.L. and Menon, V. (2010) Large-scale brain networks in
cognition: emerging methods and principles. Trends Cogn. Sci. 14, 277
290
70 Varela, F.J. et al. (2001) The brainweb: phase synchronization and
large-scale integration. Nat. Rev. Neurosci. 2, 229239
71 Jack, A. and Roepstorff, A. (2002) Introspection and cognitive brain
mapping: from stimulus-response to script-report. Trends Cogn. Sci. 6,
333339
72 Lutz, A. et al. (2008) Attention regulation and monitoring in
meditation. Trends Cogn. Sci. 12, 163169
73 Farb, N.A.S. et al. (2007) Attending to the present: mindfulness
meditation reveals distinct neural modes of self-reference. Soc.
Cogn. Affect. Neurosci. 2, 313322
Opinion
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
112
Songs to syntax: the linguistics of
birdsong
Robert C. Berwick
1
, Kazuo Okanoya
2,3
, Gabriel J.L. Beckers
4
and Johan J. Bolhuis
5
1
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
2
Department of Cognitive and Behavioral Sciences, University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan
3
RIKEN Brain Science Institute, 2-1 Hirosawa, Wako-City, Saitama 351-0198, Japan
4
Department of Behavioural Neurobiology, Max Planck Institute for Ornithology, D-82319 Seewiesen, Germany
5
Behavioural Biology and Helmholtz Institute, University of Utrecht, Padualaan 8, 3584 CH Utrecht, The Netherlands
Unlike our primate cousins, many species of bird share
with humans a capacity for vocal learning, a crucial
factor in speech acquisition. There are striking beha-
vioural, neural and genetic similarities between audito-
ry-vocal learning in birds and human infants. Recently,
the linguistic parallels between birdsong and spoken
language have begun to be investigated. Although both
birdsong and human language are hierarchically orga-
nized according to particular syntactic constraints, bird-
song structure is best characterized as phonological
syntax, resembling aspects of human sound structure.
Crucially, birdsong lacks semantics and words. Formal
language and linguistic analysis remains essential for
the proper characterization of birdsong as a model sys-
temfor human speech and language, and for the study of
the brain and cognition evolution.
Human language and birdsong: the biological
perspective
Darwin [1] noted strong similarities between the ways that
human infants learn to speak and birds learn to sing. This
perspective from organismal biology [2] initially led to a
focus on apes as model systems for human speech and
language (see Glossary), with limited success, however
[3,4]. Since the end of the 20th century, biologists and
linguists have shown a renewed interest in songbirds,
revealing fascinating similarities between birdsong and
human speech at the behavioural, neural, genomic and
cognitive levels [59]. Yip has reviewed the relationship
between human phonology and birdsong [7]. Here, we
address another potential parallel between birdsong and
human language: syntax.
Comparing syntactic ability across birds and humans is
important, because at least since the beginning of the
modern era in cognitive science and linguistics, a combi-
natorial syntax has been viewed to lie at the heart of the
distinctive creative and open-ended nature of human lan-
guage [10]. Here, we discuss current understanding of the
relationship between birdsong and human syntax in light
of recent experimental and linguistic advances, focusing on
the formal parallels and their implications for underlying
cognitive and computational abilities. Finally, we sketch
the prospects for future experimental work, as part of the
Review
Glossary
Bigram: a subsequence of two elements (notes, words or phrases) in a string.
Context-free language (CFL): the sets of strings that can be recognized or
generated by a pushdown-stack automaton or context-free grammar. A CFL
might have grammatical dependencies nested inside to any depth, but
dependencies cannot overlap.
Finite-state automaton (FSA, FA): a computational model of a machine with
finite memory, consisting of a finite set of states, a start state, an input
alphabet, and a transition function that maps input symbols and current states
to some set of next states.
Finite-state grammar (FSG): a grammar that formally replicates the structure of
a FSA, also generating the regular languages.
K-reversible finite-state automaton: an FSA that is deterministic when one
reverses all the transitions so that the automaton runs backwards. One can
look behind k previous words to resolve any possible ambiguity about which
next state to move to.
Language: any possible set of strings over some (usually finite) alphabet of
words.
Locally testable language: a strict subset of the regular languages formed by
the union, intersection, or complement of strictly locally testable languages.
(First-order) Markov model or process: a random process where the next state
of a system depends only on the current state and not its previous states.
Applied to word or acoustic sequences, the next word or acoustic unit in the
sequence depends only on the current word or acoustic unit, rather than
previous words or units.
Mildly context-sensitive language (MCSL): a language family that lies just
beyond the CFLs in terms of power, and thought to encompass all the known
human languages. A MCSL is distinguished from a CFL in that it contains
clauses that can be nested inside clauses arbitrarily deeply, with a limited
number of overlapping grammatical dependencies.
Morphology: the possible word shapes in a language; that is, the syntax of
words and word parts.
Phoneme: the smallest possible meaningful unit of sound.
Phonetics: the study of the actual speech sounds of all languages, including
their physical properties, the way they are perceived and the way in which
vocal organs produce sounds.
Phonology: the study of the abstract sound patterns of a particular language,
usually according to some system of rules.
Push-down stack automaton (PDA): a FSA augmented with a potentially
unbounded memory store, a push-down stack, that can be accessed in terms of
a last-in, first-out basis, similar to a stack of dinner plates, with the last element
placed on the stack being the top of the stack, and first accessible memory
element. PDAs recognize the class of CFLs.
Recursion: a property of a (set of) grammar rules such that a phrase A can
eventually be rewritten as itself with non-empty strings of words or phrase
names on either side in the form aAb and where A derives one or more words
in the language.
Regular language: a language recognized or generated by a FSA or a FSG.
Semantics: the analysis of the meaning of a language, at the word, phrase,
sentence level, or beyond.
Strictly locally testable language (or stringset): a strict subset of the regular
languages defined in terms of a finite list of strings of length less than or equal
to some upper length k (the window length).
Sub-regular language: any subset of the regular languages, in particular
generally a strict subset with some property of interest, such as local testability.
Syllable: in linguistics, a vowel plus one or more preceding or following
consonants.
Syntax: the rules for arranging items (sounds, words, word parts or phrases)
into their possible permissible combinations in a language.
Corresponding author: Bolhuis, J.J. (j.j.bolhuis@uu.nl).
1364-6613/$ see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.002 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3 113
ongoing debate as to what is species specic about human
language [3,11]. We show that, although it has a simple
syntactic structure, birdsong cannot be directly compared
with the syntactic complexity of human language, princi-
pally because it has neither semantics nor a lexicon.
Comparing human language and birdsong
Human speech and birdsong both consist of complex, pat-
terned vocalizations (Figure 1). Such sequential structures
canbeanalysedandcomparedviaformal syntactic methods.
Aristotle described language as sound paired with meaning
[12]. Although partly accurate, a proper interspecies com-
parison calls for a more articulated system diagram of the
key components of human language, and their non-human
counterparts. We depict these as a tripartite division
(Figure 2): (i) an external interface, a sensorimotor-driven,
inputoutput system providing proper articulatory output
and perceptual analysis; (ii) a rule system generating cor-
rectly structured sentence forms, incorporating words; and
(iii) an internal interface to a conceptualintentional sys-
tem of meaning and reasoning; that is, semantics. Compo-
nent (i) corresponds tosystems for producing, perceivingand
learning acoustic sequences, and might itself involve ab-
stract representations that are not strictly sensorimotor,
such as stress placement. In current linguistic frameworks,
(i) aligns with acoustic phonetics and phonology, for both
production and perception. Component (ii) feeds into both
the sensorimotor interface (i), as well as a conceptualin-
tentional system (iii), and is usually described via some
model of recursive syntax.
Although linguists debate the details of these compo-
nents, there seems to be more general agreement as to the
nature of (i), less agreement as to the nature of (ii) and
widespread controversy as to (iii). For instance, whereas
the connection between a fully recursive syntax and a
conceptualintentional system is sometimes considered
to lie at the heart of the species-specic properties of
human language, there is considerable debate over the
details, which plays out as the distinct variants of current
linguistic theories [1316]. Some of these accounts reduce
or even eliminate the role of (ii), assuming a more direct
relation between (i) and (iii) (e.g. [17,18]). The system
diagram in Figure 2 therefore cannot represent any de-
tailed neuroanatomical or abstract wiring diagram, but

0.5 s
F
r
e
q
u
e
n
c
y

(
k
H
z
)
0
10
i i i
Note
Syllable
Motif
TRENDS in Cognitive Sciences
Figure 1. Sound spectrogram of a typical zebra finch song depicting a hierarchical structure. Songs often start with introductory notes (denoted by i) that are followed by
one or more motifs, which are repeated sequences of syllables. A syllable is an uninterrupted sound, which consists of one or more coherent time-frequency traces,
which are called notes. A continuous rendition of several motifs is referred to as a song bout.

Production
External interface
Phonological forms/sequencing
acoustic-phonetics
Sounds, gestures
(external to organism)
Internal interface
Perception
Concepts, intentions, reasoning
(internal to organism)
Words (lexical items)
+
Syntactic rules
TRENDS in Cognitive Sciences
Figure 2. A tripartite diagram of abstract components encompassing both human language and birdsong. On the left-hand side, an external interface (i), comprised of
sensorimotor systems, links the perception and production of acoustic signals to an internal system of syntactic rules, (ii). On the right-hand side, an internal interface links
syntactic forms to some system of concepts and intentions, (iii). With respect to this decomposition, birdsong seems distinct from human language in the sense of lacking
both words and a fully developed conceptualintentional system.
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
114
rather a way to factor apart the distinct knowledge types in
the sense of Marr [19]. Notably, our tripartite arrangement
does not preclude the possibility that only humans have
syntactic rules, or that such rules always x information
content in a language-like manner. For example, in song-
birds, sequential syntactic rules might exist only to con-
struct variable song element sequences rather than
variable meanings per se [9].
Birdsong and human syntax: similarities and differences
Both birdsong and human language are hierarchically
organized according to syntactic constraints. We compare
them by rst considering the complexity of their sound
structure, and then turning in the next section, to aspects
beyond this dimension. Overall, we nd that birdsong
sound structure, at least for the Bengalese nch, seems
characterizable by an easily learnable, highly restricted
subclass of the regular languages (languages that can be
recognized or generated by nite-state machines; see Box
3). Whereas human language sound structure also appears
to be describable via nite-state machines, comparable
results are lacking in the case of human language, al-
though certain parts of human language sound structure,
such as stress patterns, have also recently been shown to
be easily learnable [20].
In birdsong, individual notes can be combined as par-
ticular sequences into syllables, syllables into motifs, and
motifs into complete song bouts (Figure 1). Birdsong thus
consists of chains of discrete acoustic elements arranged in
a particular temporal order [2123]. Songs might consist of
xed sequences with only sporadic variation (e.g. zebra
nches), or more variable sequences (e.g. nightingales,
starlings, or Bengalese nches), where a song element
might be followed by several alternatives, with overall
song structure describable by probabilistic rules between
a nite number of states [23,24] (Figure I, Box 1). For
example, a song of a nightingale is built out of a xed 4-
second note sequence. An individual nightingale has 100
200 song types, clustered into 212 packages. Package
singing order remains probabilistic [25]. A starling song
bout might last up to 1 minute, composed of many distinct
motifs containing song elements in a xed order lasting
0.51.5 seconds. Gentner and Hulse [26] found that a rst-
order Markov model (i.e. bigrams) sufces to describe most
motif sequence information in starling songs (Box 2). Thus,
for the most part, the next motif is predictable by the
immediately preceding motif. Starlings also use this infor-
mation to recognize specic song bouts. Similarly, in Amer-
ican thrush species, relatively low-order Markov chains
sufce for modelling song sequence variability [27].
Can songbird phonological syntax [28] ever be more
complex than this? Bengalese nch song typically contains
approximately eight song note types organized into 25
note chunks that also follow local transition probabilities
[29] (Figure I, Box 1). Unlike single-note Markov processes,
chunks such as the three-note sequence cde can be reused
in other places in a song [24,30]. However, chunks are not
reused inside other chunks, so the hierarchical depth is
strictly limited.
If Bengalese nch song could be characterized solely in
terms of bigrams, it would belong to the class of so-called
strictly locally 2-testable languages, a highly restricted
subset of the class of the regular languages, That is, a bird
could verify, either for purposes of production or for recog-
nition, whether a song is properly formed by simply slid-
ing a set of two-note sequences or window constraints
across the entire note sequence, checking to see that all the
two-note sequences found pass (Box 3). For example, if the
valid note sequences were ab, abab, ababab, and so on,
then every a must be followed by a b, except at the song
start; and every b must be followed by an a, except at the
song end. Thus, aside fromthe beginning and end of a song,
a bird could check whether a song is well formed by using
two bigram templates: [a-b] and [b-a]. This turns out to be
the simplest kind of pattern recognizable by a nite-state
automaton (FSA), because the internal states of the au-
tomaton need not be used for any detailed computation
aside from bigram note template matching (Box 3).
The Bengalese nch song automaton in Figure I (Box 1),
which encompasses the full song sequence repertoire
extracted from a single, actual bird [31], indicates that
birdsong structure can be more complicated than a simple
Box 1. Birdsong, human language syntax and the Chomsky hierarchy
All sets of strings, or languages, can be rank ordered via strict set-
inclusion according to their computational power. The resulting
rings are called the Chomsky hierarchy [61] (Figure I; ring
numbers are used below). For birdsong and human syntax
comparisons, the most important point is the small overlap between
the possible languages generated by human syntax (the irregular-
shaded grey set), as opposed to birdsong syntax (the stippled
grey set).
1. The finite languages, all sets of strings of finite length.
2. The FSA generating the regular languages. An FSA is represented
as a directed graph of states with labelled edges, a finite-state
transition network. The corresponding grammar of an FSA has
rules of the form X!aY or X!a, or right-linear, where X and Y
range over possible automaton states (nonterminals), and a ranges
over symbols corresponding to the labelled transitions between
states. The FSA recognizing the (ab)
1
language only need to test for
four specific adjacent string symbol pairs (bigrams; the pairs (left-
edge, a); (a,b); (b, a); and (b, right-edge) [62].
3. The PDA, generating the CFLs. PDAs are finite-state machines
augmented with a potentially unbounded auxiliary memory that
can be accessed from the top working down. PDAs can be thought
of as augmenting FSA with the ability to use subroutines, yielding
the recursive transition networks. Grammars for these languages
are consequently more general and can include rules such as
X!Ya, X!aYa or X!aXa, or context-free rules.
4. The PDA whose stacks might themselves be augmented with
embedded stacks, generating the MCSLs. Examples of such
patterns in human languages are rare, but do exist [63,64]. These
patterns are exemplified by stringsets such as a
n
b
m
c
n
d
m
, where the
as and cs must match up in number and order and, separately, the
bs and the cs, so-called cross-serial dependencies (see [65,66]). A
broad range of linguistic theories accommodate this much
complexity [1316,59,66]. No known human languages require
more power than this. The two irregular sets drawn cutting across
the hierarchy depict the probable location of the human languages
(shaded) and birdsong (stippled). Both clearly do not completely
subsume any of the previously mentioned stringsets. Birdsong and
human languages intersect at the very bottom owing to the
possible overlap of finite lists of human words and the vocal
repertoire of certain birds.
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
115

0
1 2
3
TRENDS in Cognitive Sciences
a
1
a
2
b
2
b
1
2 Regular languages
Bengalese finch song
3 Context-free languages
a
n
b
n
4 Mildly context-sensitive
languages
a
n
b
n
c
n
d
n
Jon Mary Peter Jane lets help teach swim
5 Context-sensitive languages
a
n
b
n
c
n
d
n
e
n
6 Recursively enumerable
languages
a
1
a
2
a
3
b
1
b
2
b
3
a
4
b
4
the starling the cats want was tired
Human languages
Birdsong
?
fg
ab
ab
cde ab
1 Finite
languages
Figure I. The Chomsky hierarchy of languages along with the hypothesized locations of both human language and birdsong. The nested rings in the figure correspond
to the increasingly larger sets, or languages, generated or recognized by more powerful automata or grammars. An example of the state transition diagram
corresponding to a typical Bengalese finch song [31] is shown in the next ring after this, corresponding to some subset of the regular languages.
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
116
bigram description. Although there are several paths
through this network from the beginning state on the left
to the double-circled end state on the right, the loop back
from state 2 to state 1 along with the loop from state 3 to 1
can generate songs with an arbitrary number of cde ab
notes, followed by the notes cde fg. From there, a song can
continue with the notes ab back to state 1, and so lead to
another arbitrary number of cde ab notes, all nally ending
in cde fg. In fact, the transitions between states are sto-
chastic; that is, the nch can vary its song by choosing to go
from state 2 back to state 1 with some likelihood that is
measurably different from the likelihood of continuing on
to state 3. In any case, formally this means that the notes
cde fg can appear in the middle of a song, arbitrarily far
from either end, bracketed on both sides by some arbitrari-
ly long number of cde ab repetitions. Such a note pattern is
no longer strictly locally testable because now there can be
no xed-length window that can check whether a note
sequence passes. Instead of checking the note sequences
directly, one must use the memory of the FSA indirectly to
wait after encountering the rst of a possibly arbitrarily
long sequence of cde abs. The automaton must then stay in
this state until the required cde fg sequence appears. Such
a language pattern remains recognizable by a restricted
FSA, but one more powerful than a simple bigramchecking
machine. Such complexity seems typical. Figure 3 displays
more fully a second, more complex Bengalese nch song
drawn according to the same transition network method-
ology, this time explicitly showing the probability that one
state follows another via the numbers on the links between
Box 2. Is recursion for the birds?
Recursive constructs occur in many familiar human language exam-
ples, such as the starling the cats want was tired, where one finds a full
sentence phrase, the starling was tired, that contains within it a second,
nested or self-embedded sentence, S, the cats want. In this case, the
rule that constructs Sentences can apply to its own output, recursively
generating a pattern of nested or serial dependencies.
We can write a simple CFG with three rules that illustrates this
concept as follows: S!aB; B!Sb; S!e, where e corresponds to the
empty symbol. We can use this grammar to show that one can first
apply the rule that expands S as aB and then can apply the second rule
to expand B as Sb, thus obtaining, aSb; S now appears with non-null
elements on both sides, so we say that S has been self-embedded. If
we now use the third rule to replace S with the empty symbol, we
obtain the output ab. Alternatively, we could apply the first and second
rules over again to obtain the string aabb, or, more generally, a
n
b
n
for
any integer n.
In our example, the as and the bs in fact form nested
dependencies because they are correspondingly paired in the same
way that the starling must be paired with the singular form was,
rather than the plural were; similarly, the cats must be paired with
want rather than the singular form wants. So, for example, to
indicate a nested dependency pattern properly, the form a
3
b
3
should be more accurately written as a
1
a
2
a
3
b
3
b
2
b
1
, where the
superscripts indicate which as and bs must be paired up. Thus, any
method to detect whether an animal can either recognize or produce
a strictly context-free pattern requires that one demonstrates that
the correct as and bs are paired up; merely recognizing that the
number of as matches the number of bs does not suffice. This is one
key difficulty with the Gentner et al. protocol and result [56], which
probed only for the ability of starlings to recognize an equal number
of as and bs in terms of warble and rattle sound classes (i.e.
warlble
3
rattle
3
patterns) but did not test for whether these warble-
rattles were properly paired off in a nested dependency fashion. As
a result, considerable controversy remains as to whether any non-
human species can truly recognize strictly context-free patterns
[11,67].
Box 3. Descriptive complexity, birdsong and human syntax
The substructure of the regular languages, sub-regular language
hierarchies, could be relevant to gain insight into the computational
capacities of animals and humans in the domain of acoustic and
artificial language learning [62,68,69]. Similar to the Chomsky
hierarchy, the family of regular languages can itself be ordered in
terms of strictly inclusive sets of increasing complexity [69]. The
ordering uses the notion of descriptive complexity, corresponding
informally to how much local context and internal state information
must be used by a finite-state machine to recognize a particular string
pattern correctly. For example, to recognize the regular pattern used
in the starling experiment [56], (ab)
1
, a finite-state machine needs only
to check four adjacency relations or bigrams as they appear directly in
a candidate string: the beginning of the string followed by an a; an a
followed by a b;, a b followed by an a or else a b followed by the end
of the string. We can say such a pattern is strictly locally 2-testable or
SL
2
[69]. As we increase the length of these factors, we obtain a
strictly increasing set hierarchy of regular languages, the strictly
locally testable languages, denoted SL
k
, where k is the window
length [56,62,68]. It might be of some value to understand the range
of sub-regular patterns that birds can perceive or produce. To
tentatively answer this question, we applied a program for computing
local testability [38,44,70]. For example, the FSA in Figure I (Box 1)
recognizes a language that is locally testable. This answer agrees with
the independent findings of Okanoya [31] and Gentner [26,57].
Other sub-regular pattern families have been recently explored in
connection with human language soundsystems [20,71]. Some of these
might ultimately prove relevant to birdsong because they deal with
acoustic patterns. In particular, possible sound combinations might fall
into the same classes as those of human languages. Finally, all these
sub-regular families could be extended straightforwardly to include
phrases explicitly, but still without the ability to count, as seems true
of human language ([66,7274] R. Berwick, PhD Thesis, MIT, 1982). It is
clear that we have only just begun to scratch the surface of the detailed
structure of sub-regular patterns and their cognitive relevance.

aaa
b
bcadb
hh
hh
0.08
0.12
b
ilga
0.37
adb
0.33
eekfff
0.88
bhh
0.22
f
0.44
jaa
lga
0.55
TRENDS in Cognitive Sciences
Figure 3. Probabilistic finite-state transition diagram of the song repertoire of a
Bengalese finch. Directed transition links between states are labelled with note
sequences along with the probability of moving along that particular link and
producing the associated note sequence. The possibility of loops on either side of
fixed note sequences such as hh or lga mean that this song is not strictly locally
testable (see Box 3 and main text). However, it is still k-reversible, and so easily
learned from example songs [35]. Adapted, with permission, from [75].
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
117
states [32]. It too contains loops, including one from the
nal, double-circled state back to the start, so that a certain
song portion can be found located arbitrarily far in the
middle. For example, among several other possibilities, the
note sequence lga, which occurs on the transition to the
double-circled nal state, can be preceded by any number
of b hh repetitions, as well as followed by jaa b bcadb and
then an arbitrary number of eekfff adb notes, again via a
loop.
Nightingales, another species with complex songs, can
sing motifs with notes that are similarly embedded within
looped note chunks [33]. Considering that there are hun-
dreds of such motifs in a song repertoire of a nightingale,
their songs must be at least as complex as those of Benga-
lese nches, at least from this formal standpoint.
More precisely and importantly, the languages involved
here, at least in the case of Bengalese nch, and perhaps
other avian species, are closely related to constraints on
regular languages that enable them to be easily learned
[31,34,35]. Kakishita et al. [29] constructed a particular
kind of restricted FSA generating the observed sequences
(a k-reversible FSA). Intuitively, in this restricted case, a
learner can determine whether two states are equivalent
by examining only the local note sequences that can follow
from any two states, determining whether the two states
should be considered equivalent [36,37] (Figure I, Box 1). It
is this local property that enables a learner to learn cor-
rectly and efciently the proper automaton corresponding
to external song sequences simply by listening to them,
something that is impossible in general for FSA [38,39].
What about human language sound structure or its
phonology? This is also nowknown to be describable purely
in terms of FSA [40], a result that was not anticipated by
earlier work in the eld [41] which assumed more general
computational devices well beyond the power of FSA (Box
1). For example, there are familiar phonotactic con-
straints in every language, such that English speakers
know that a form such as ptak could not be a possible
English word, but plast might be [42]. To be sure, such
constraints are often not all or none but might depend on
the statistical frequency of word subparts. Such gradation
might also be present in birdsong, as reected by the
probabilistic transitions between states, as shown in Fig-
ure I (Box 1) and Figure 3 [31,43]. Once stochastic grada-
tion is modelled, phonotactic constraints more closely
mirror those found in birdsong nite-state descriptions.
Such formal ndings have been buttressed by recent
experiments with both human infants and Bengalese
nches, conrming that adjacent acoustic dependencies
of this sort are readily learnable from an early age using
statistical and prosodic cues [32,4446].
However, other human sound structure rules apparent-
ly go beyond this simplest kind of strictly local description,
although remaining nite state. These include the rules
that account for vowel harmony in languages such as
Turkish, where, for example, the properties of the vowel
u in the word pul, stamp, are propagated through to all
its endings [7], and stress patterns (J. Heinz, PhD thesis,
University of California at Los Angeles, 2007). Whereas
the limited-depth hierarchies that arise in songbird syntax
seem reminiscent of the bounded rhythmic structures or
beat patterns found in human speech or music, it remains
an open question whether birdsong metrical structure is
amenable to the formal analysis of musical meter, or even
how stress is perceived in birds as opposed to humans [47
49] (Box 4).
Tweets to phrases: the role of words
Turning to syntactic description that lies beyond sound
structure, we nd that birdsong and human language
syntax sharply diverge. In human syntax, but not birdsong,
hierarchical combinations of effectively arbitrary depth
can be assembled by combining words and words parts,
such as the addition of s to the end of apple to yield apples, a
word-construction process called morphology. Human
syntax then goes even further, organizing words into
higher-order phrases and entire sentences. None of these
additional levels appear to be found in birdsong. This
reinforces Marlers long-standing view [28] that birdsong
might best be regarded as phonological syntax, a formal
language; that is, a set of units (here acoustic elements)
that are arranged in particular ways but not others accord-
ing to a denable rule set.
What accounts for this difference between birdsong and
language? First, birdsong lacks semantics and words in the
human sense, because song elements are not combined to
yield novel meanings. Instead, birdsong can convey only a
limited set of intentions, as a graded, holistic communica-
tion system to attract mates or deter rivals and defend
territory. In terms of the tripartite diagramof Figure 2, the
conceptualintentional component is greatly reduced.
Birds might still have some internalized conceptualinten-
tional system, but for whatever reason it is not connected to
a syntactic and externalization component. By contrast,
human syntax is intimately wedded to our conceptual
system, involving words in both their syntactic and seman-
tic aspects, so that, for example, combining red with
apples yields a meaning quite distinct from, for example,
green apples. It seems plausible that this single distinc-
tion drives fundamental differences between birdsong and
human syntax. In particular, birds such as Bengalese
nches and nightingales can and do vary their songs in
the acoustic domain, rearranging existing chunks to pro-
duce hundreds of distinct song types that might serve to
identify individual birds and their degree of sexual arousal,
as well as local dialect-based congener groups [5052],
although a recent systematic study of song recombination
suggests that birds rarely introduce improvised song notes
or sequences [32]. For example, skylarks mark individual
identity by particular song notes [51], as starlings do with
song sequences [52]; and canaries use special sexy sylla-
bles to strengthen the effect of mate attraction [50]. How-
ever, more importantly, this bounded acoustic creativity
pales in comparison with the seemingly limitless open-
ended variation observed in even a single human speaker,
where variation might be found not only at the acoustic
level in how a word is spoken, but also in how words are
combined into larger structures with distinct meanings,
what could be called compositional creativity. It is this
latter aspect that appears absent in birdsong. Song var-
iants do not result in distinct meanings with completely
new semantics, but serve only to modify the entirety of the
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
118
original behavioural function of the song within the context
of mating, never producing a new behavioural context, and
so remaining part of a graded communication system. For
example, the sexy syllable conveys the strength of the
motivation of a canary, but does not change the meaning of
its song [50]. In this sense, birdsong creativity lies along a
nite, acoustic combinational dimension, never at the level
of human compositional creativity.
Second, unlike birdsong, human language sentences are
potentially unbounded in length and structure, limited only
by extraneous factors, such as short-term memory or lung
capacity [53]. Here too words are important. The combina-
tion of the Verb ate and the Noun apples yields the combi-
nation ate apples that has the properties of a Verb rather
than a Noun. This effectively names the combination as a
new piece of hierarchical structure, phrase, with the label
ate, dubbed the head of the phrase [54]. This new Verb-like
combination can then act as a single object and enter into
further syntactic combinations. For example, Allisonandate
apples can combine to formAllison ate apples, again taking
ate as the head. Phrases can recombine ad innitumto form
ever-longer sentences, so exhibiting the open-ended novelty
that von Humboldt famously called the innite use of nite
means [55], that is immediately recognized as the hallmark
of human language: Pat recalled that Charlie said that
Moira thought that Allison ate apples. Thus in general,
sentences can be embedded within other sentences, recur-
sively, as in the starling the cats want was tired, in a nested
dependency pattern, where we nd one top-level sentence,
the starling was tired, consisting of a Subject, the starling,
and a Predicate phrase was tired, that inturnitself contains
a Sentence, the cats want formed out of another Subject, the
cats, and a Predicate, want. Informally, we call such embed-
dings recursive, and the resulting languages context-free
languages (CFLs; Box 1). This pattern reveals a character-
istic possibility for human language, a nested dependency.
The singular number feature associated with the Subject,
the starling, must match up with the singular number
feature associated with top-level Verb form was, whereas
the embedded sentence, the cats want has a plural Subject,
the cats, that must agree with the plural Verb form want.
Such serial nested dependencies in the abstract form,
a
1
a
2
b
2
b
1
are both produced and recognized quite generally
in human language [53].
The evidence for a corresponding ability in birds
remains weak, despite recent experiments on training
starlings to recognize such patterns (which must be care-
fully distinguished from the ability to produce such
sequences in a naturalistic setting, as described in the
previous section) [56,57]. In starlings, only the ability to
recognize nesting was tested, and not the crucial depen-
dency aspect that pairs up particular as with particular bs
[11] (Box 2). In fact, human syntax goes beyond this kind of
recursion to encompass certain strictly mildly context-
sensitive constructions that have even more complex, over-
lapping dependency patterns (Box 1). Importantly, even
though they differ on much else, since approximately 1970
a broad range of syntactic theories, comprising most of the
major strands of modern linguistic thought, have incorpo-
rated Bloomelds [54] central insight that human lan-
guage syntax is combinatorially word-centric in the
manner described above [1316,58,59], as well as having
the power to describe both nested and overlapped depen-
dencies. To our knowledge, such mild-context sensitivity
has never been demonstrated, or even tested, in any non-
human species.
In short, word-driven structure building seems totally
absent in songbird syntax, and this limits its potential
hierarchical complexity. Birdsong motifs lack word-centric
heads and so cannot be individuated via some internal
labelling mechanism to participate in the construction of
arbitrary-depth structures. Whereas a starling song might
consist of a sequence of warbles and rattle motif classes
[57], there seems to be no corresponding way in which the
acoustical features of the warble class are then used to
name distinctively the warble-rattle sequence as a whole,
so that this combination can then be manipulated as single
unit phrases into ever-more complex syntactic structures.
Birdsong phrase structure?
Nonetheless, recent ndings suggest that birds have a
limited ability to construct phrases, at least in the acoustic
domain, as noted above, accounting for individual varia-
tion within species [32,33]. In particular, there might be
acoustic segmentation chunking in the self-produced song
of the Bengalese nch [29,31]. Suge and Okanoya used the
click protocol pioneered by Fodor et al. [60] to probe the
psychological reality of syntactic phrases in humans [34].
Box 4. Questions for future research
We do not know for certain the descriptive complexity of birdsong.
Does it belong to any particular member of the sub-regular
language hierarchies, or does it lie outside these, possibly in the
family of strictly CFLs? If birdsong is contained in some sub-regular
hierarchy, how is this result to be reconciled with the findings in the
Gentner et al. starling study [56]? If birdsong is context free, then
we can again ask to what family of CFLs it belongs: is it a
deterministic CFL (as opposed to a general CFL)? Is it learnable from
positive examples?
Current tests of finite-state versus CFL abilities in birdsong have
chosen only the weakest (computationally and descriptively
simplest) finite-state language to compare against the simplest
CFL. Can starlings be trained to recognize descriptively more
complex finite-state patterns; for example, a locally testable but
not non-strictly local testable finite-state pattern, such as a
1
(ba
1
)
1
,
where a bird would have to recognize a note(s) such as b arbitrarily
far from both ends of a song [68]? What about sub-regular patterns
that are more complicated than this?
The Gentner et al. experiment [49] did not test for the nested
dependency structure characteristic of embedded sentences in
human language. Can birds be trained to recognize truly nested
dependencies, even if just of finite depth?
Using the methods developed in, for example, [71], what is the
descriptive complexity of prosody or rhythmic stress patterns in
birdsong?
What are the neural mechanisms underlying variable song
sequences in songbirds? Both human speech and birdsong involve
sequentially arranged vocalizations. Are there similar neural
mechanisms for the production and perception of such sequences
in songbirds and humans? Bolhuis et al. [9] have summarized
current knowledge of these mechanisms in humans and birds.
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
119
Applied to human language, subjects given click stimuli in
the middle of phrases such as ate the apples, tend to
migrate their perception of where the click occurs to the
beginning or end of the phrase. Suge and Okanoya estab-
lished that 3-4 note sequences, such as the cde in Figure I
(Box 1) are perceived as unitary chunks so that the nches
tended to respond as if the click was at the c or e end of an
cde chunk [34]. Importantly, recall that Bengalese nches
are also able to produce such sequence chunks, as de-
scribed earlier and in Figure I (Box 1) and Figure 3. This
is strikingly similar to the human syntactic capacity to
remember an entire sequence encapsulated as a single
phrase or a state of an automaton, and to reuse that
encapsulation elsewhere, just as human syntax reuses
Noun Phrases and Verb Phrases. However, Bengalese
nches do not seem to be able to manipulate chunks with
the full exibility of dependent nesting found in human
syntax. One might speculate that, with the addition of
words, humans acquired the ability to label and hold in
memory in separate locations distinct phrases such as
Allison ate apples and Moira thought otherwise, parallel
to the ability to label and separately store in memory the
words ate and thought. Once words inltrated the basic
pre-existing syntactic machinery, the combinatory possi-
bilities became open ended.
Conclusions and perspectives
Despite considerable linguistic interest in birdsong, few
studies have applied formal syntactic methods to its struc-
ture. Those that do exist suggest that birdsong syntax lies
well beyond the power of bigram descriptions, but is at
most only as powerful as k-reversible regular languages,
lacking the nested dependencies that are characteristic of
human syntax [11,29,56,57]. This is probably because of
the lack of semantics in birdsong, because song sequence
changes typically alter message strength but not message
type. This would imply that birdsong might best serve as
an animal model to study learning and neural control of
human speech [9], rather than internal syntax or seman-
tics per se. Furthermore, comparing the structure of hu-
man speech and birdsong can be a useful tool for the study
of the evolution of brain and behaviour (Box 4). Bolhuis
et al. [9] have argued that, in the evolution of vocal
learning, both common descent (homologous brain
regions) and evolutionary convergence (distant taxa exhi-
biting functionally similar auditory-vocal learning) have
a role.
References
1 Darwin, C. (1882) The Descent of Man and Selection in Relation to Sex,
Murray
2 Margoliash, D. and Nusbaum, H.C. (2009) Language: the perspective
from organismal biology. Trends Cogn. Sci. 13, 505510
3 Hauser, M.D. et al. (2002) The faculty of language: what is it, who has it,
and how did it evolve? Science 298, 15691579
4 Bolhuis, J.J. and Wynne, C.D.L. (2009) Can evolution explain how
minds work? Nature 458, 832833
5 Doupe, A.J. and Kuhl, P.K. (1999) Birdsong and human speech:
common themes and mechanisms. Annu. Rev. Neurosci. 22, 567631
6 Bolhuis, J.J. and Gahr, M. (2006) Neural mechanisms of birdsong
memory. Nature Rev. Neurosci. 7, 347357
7 Yip, M. (2006) The search for phonology in other species. Trends Cogn.
Sci. 10, 442446
8 Okanoya, K. (2007) Language evolution and an emergent property.
Curr. Op. Neurobiol. 17, 271276
9 Bolhuis, J.J. et al. (2010) Twitter evolution: converging mechanisms in
birdsong and human speech. Nature Rev. Neurosci. 11, 747759
10 Chomsky, C. (1966) Cartesian Linguistics, Harper & Row
11 Corballis, M.C. (2007) Recursion, language, and starlings. Cogn. Sci.
31, 697704
12 Aristotle (1970) Historia Animalium. v.II, Harvard University Press
13 Steedman, M. (2001) The Syntactic Process, MIT Press
14 Kaplan, R. and Bresnan, J. (1982) Lexical-functional grammar: a
formal system for grammatical relations. In The Mental
Representation of Grammatical Relations (Bresnan, J., ed.), pp. 173
281, Cambridge, MA, MIT Press
15 Gazdar, G. et al. (1985) Generalized Phrase-structure Grammar,
Harvard University Press
16 Pollard, C. and Sag, I. (1994) Head-driven Phrase Structure Grammar,
University of Chicago Press
17 Culicover, P. and Jackendoff, R. (2005) Simpler Syntax, Oxford
University Press
18 Goldberg, A. (2006) Constructions at Work: The Nature of
Generalization in Language, Oxford University Press
19 Marr, D. (1982) Vision, W.H. Freeman & Co
20 Rogers, J. et al. (2010) On languages piecewise testable in the strict
sense. In Proceedings of the 11
th
Meeting of the Mathematics of
Language Association (eds), pp. 255265, Springer-Verlag
21 Okanoya, K. (2004) The Bengalese nch: a window on the behavioral
neurobiology of birdsong syntax. Ann. NY Acad. Sci. 1016, 724735
22 Sasahara, K. and Ikegami, T. (2007) Evolution of birdsong syntax by
interjection communication. Artif. Life 13, 259277
23 Catchpole, C.K. and Slater, P.J.B. (2008) Bird Song: Biological Themes
and Variations, (2nd edn), Cambridge University Press
24 Wohlgemuth, M.J. et al. (2010) Linked control of syllable sequence and
phonology in birdsong. J. Neurosci. 29, 1293612949
25 Todt, D. and Hultsch, H. (1996) Acquisition and performance of
repertoires: ways of coping with diversity and versatility. In Ecology
and Evolution of Communication (Kroodsma, D.E. and Miller, E.H.,
eds), pp. 7996, Cornell University Press
26 Gentner, T. and Hulse, S. (1998) Perceptual mechanisms for individual
vocal recognition in European starlings. Sturnus vulgaris. Anim.
Behav. 56, 579594
27 Dobson, C.W. and Lemon, R.E. (1979) Markov sequences in songs of
American thrushes. Behaviour 68, 86105
28 Marler, P. (1977) The structure of animal communication sounds. In
Recognition of Complex Acoustic Signals: Report of the Dahlem
Workshop on Recognition of Complex Acoustic Signals, Berlin
(Bullock, T.H., ed.), pp. 1735, Abakon-Verlagsgesellschaft
29 Kakishita, Y. et al. (2009) Ethological data mining: an automata-based
approach to extract behavioural units and rules. Data Min. Knowl.
Disc. 18, 446471
30 Hilliard, A.T. and White, S.A. (2009) Possible precursors of syntactic
components in other species. In Biological Foundations and Origin of
Syntax (Bickerton, D. and Szathma ry, E., eds), pp. 161184, MIT Press
31 Okanoya, K. (2004) Song syntax in Bengalese nches: proximate and
ultimate analyses. Adv. Stud. Behav. 34, 297345
32 Takahasi, M. et al. (2010) Statistical and prosodic cues for song
segmentation learning by Bengalese nches (Lonchura striata var.
domestica). Ethology 116, 481489
33 Todt, D. and Hultsch, H. (1998) How songbirds deal with large amount
of serial information: retrieval rules suggest a hierarchical song
memory. Biol. Cybern. 79, 487500
34 Suge, R. and Okanoya, K. (2010) Perceptual chunking in the self-
produced songs of Bengalese nches (Lonchuria striata var.
domestica). Anim. Cog. 13, 515523
35 Kakishita, Y. et al. (2007) Pattern extraction improves automata-based
syntax analysis in songbirds. ACAL 2007. Lect. Notes in Artif. Intell.
828, 321333
36 Kobayashi, S. and Yokomori, T. (1994) Learning concatenations of
locally testable languages from positive data. Algorithmic Learning
Theory, Lect. Notes in Comput. Sci. 872, 407422
37 Kobayashi, S. and Yokomori, T. (1997) Learning approximately regular
languages with reversible languages. Theor. Comput. Sci. 174, 251257
38 Angluin, D. (1982) Inference of reversible languages. J. ACM 29, 741
765
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
120
39 Berwick, R. and Pilato, S. (1987) Learning syntax by automata
induction. J. Mach. Learning 3, 938
40 Johnson, C.D. (1972) Formal Aspects of Phonological Description,
Mouton
41 Chomsky, N. and Halle, M. (1968) The Sound Patterns of English,
Harper & Row
42 Halle, M. (1978) Knowledge unlearned and untaught: what
speakers know about the sounds of their language. In Linguistic
Theory and Psychological Reality (Halle, M. et al., eds), pp. 294
303, MIT Press
43 Pierrehumbert, J. and Nair, R. (1995) Word games and syllable
structure. Lang. Speech 38, 78116
44 Kuhl, P. (2008) Early language acquisition: cracking the speech code.
Nat. Rev. Neurosci. 5, 831843
45 Newport, E. and Aslin, R. (2004) Learning at a distance. I. Statistical
learning of non-adjacent regularities. Cog. Sci. 48, 127162
46 Gervain, J. and Mehler, J. (2010) Speech perception and
language acquisition in the rst year of life. Ann. Rev. Psychol. 61,
191218
47 Halle, M. and Vergnaud, J-R. (1990) An Essay on Stress, MIT Press
48 Lerdahl, F. and Jackendoff, R. (1983) A Generative Theory of Tonal
Music, MIT Press
49 Fabb, N. and Halle, M. (2008) A New Theory of Meter in Poetry,
Cambridge University Press
50 Kreutzer, M. et al. (1999) Social stimulation modulates the use of the A
phrase in male canary songs. Behaviour 136, 13251334
51 Briefer, E. et al. (2009) Response to displaced neighbours in a
territorial songbird with a large repertoire. Naturwissenschaften
96, 10671077
52 Knudsen, D.P. and Gentner, T.Q. (2010) Mechanisms of song
perception in oscine birds. Brain Lang. 115, 5968
53 Chomsky, N. and Miller, G. (1963) Finitary models of language users.
In Handbook of Mathematical Psychology (Luce, R. et al., eds), pp. 419
491, Wiley
54 Bloomeld, L. (1933) Language, Henry Holt
55 von Humboldt, W. (1836) U

ber die Verschiedenheit des menschlichen


Sprachbaues und ihren Einuss auf die geistige Entwickelung des
Menshengeschlechts, Ferdinand Du mmler
56 Gentner, T.Q. et al. (2006) Recursive syntactic pattern learning by
songbirds. Nature 440, 12041207
57 Gentner, T. (2007) Mechanisms of auditory pattern recognition in
songbirds. Lang. Learn. Devel. 3, 157178
58 Chomsky, N. (1970) Remarks on nominalization. In Readings in
English Transformational Grammar (Jacobs, R.A.P. and
Rosenbaum, P., eds), pp. 184221, Ginn
59 Joshi, A. et al. (1991) The convergence of mildly context-sensitive
grammar formalisms. In Foundational Issues in Natural Language
Processing (Sells, P. et al., eds), pp. 3182, MIT Press
60 Fodor, J. et al. (1965) The psychological reality of linguistic segments.
J. Verb. Learn. Verb. Behav. 4, 414420
61 Chomsky, N. (1956) Three models for the description of language. IRE
Trans. Info. Theory 2, 113124
62 Rogers, J. and Hauser, M. (2010) The use of formal language theory in
studies of articial language learning: a proposal for distinguishing the
differences between human and nonhuman animal learners. In
Recursion and Human Language (van der Hulst, H., ed.), pp. 213
232, De Gruyter Mouton
63 Huybregts, M.A.C. (1984) The weak adequacy of context-free phrase
structure grammar. In Van Periferie Naar Kern (de Haan, G.J. et al.,
eds), pp. 8199, Foris
64 Shieber, S. (1985) Evidence against the context-freeness of natural
language. Ling. Philos. 8, 333343
65 Kudlek, M. et al. (2003) Contexts and the concept of mild context-
sensitivity. Ling Phil. 26, 703725
66 Berwick, R. and Weinberg, A. (1984) The Grammatical Basis of
Linguistic Performance, MIT Press
67 van Heijningen, C.A.A. et al. (2009) Simple rules can explain
discrimination of putative recursive syntactic structures by a
songbird species. Proc. Natl. Acad. Sci. U.S.A. 106, 2053820543
68 Rogers, J. and Pullum, G. Aural pattern recognition experiments and
the subregular hierarchy. J. Logic, Lang. & Info (in press)
69 McNaughton, R. andPapert, S. (1971) Counter-free Automata, MITPress
70 Trahtman, A. (2004) Reducing the time complexity of testing for local
threshold testability. Theor. Comp. Sci. 328, 151160
71 Heinz, J. (2009) On the role of locality in learning stress patterns.
Phonology 26, 305351
72 Crespi-Reghizzi, S. (1978) Non-counting context-free languages. J.
ACM 4, 571580
73 Crespi-Reghizzi, S. (1971) Reduction of enumeration in grammar
acquisition. In Proceedings of the 2nd International Joint Conference
on Articial Intelligence (Cooper, D.C., ed.), pp. 546552, William
Kaufman
74 Crespi-Reghizzi, S. and Braitenburg, V. (2003) Towards a brain
compatible theory of language based on local testability. In
Grammars and Automata for String Processing: from Mathematics
and Computer Science (Martin-Vide, C. and Mitrana, V., eds), pp. 17
32, Gordon & Breach
75 Hosino, T. and Okanoya, K. (2000) Lesion of a higher-order song
nucleus disrupts phrase level complexity in Bengalese nches.
Neuroreport 11, 20912095
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
121
Representing multiple objects as an
ensemble enhances visual cognition
George A. Alvarez
Vision Sciences Laboratory, Department of Psychology, Harvard University, 33 Kirkland Street, William James Hall, Room 760,
Cambridge, MA 02138, USA
The visual system can only accurately represent a hand-
ful of objects at once. How do we cope with this severe
capacity limitation? One possibility is to use selective
attention to process only the most relevant incoming
information. A complementary strategy is to represent
sets of objects as a group or ensemble (e.g. represent the
average size of items). Recent studies have established
that the visual system computes accurate ensemble
representations across a variety of feature domains
and current research aims to determine howthese repre-
sentations are computed, why they are computed and
where they are coded in the brain. Ensemble representa-
tions enhance visual cognition in many ways, making
ensemble coding a crucial mechanism for coping with
the limitations on visual processing.
Benets of ensemble representation
Unlike articial displays used in laboratory experiments,
where there is no reliable pattern across individual items,
the real world is highly structured and predictable [1,2].
For instance, at the object level, the visual eld often
consists of collections of similar objects faces in a crowd,
berries on a bush. At a more primitive feature level, natu-
ral images are highly regular in terms of their contrast and
intensity distributions [3,4], color distributions [58], re-
ectance spectra [9,10] and spatial structure [2,1114].
Where there is structure, there is redundancy, and where
there is redundancy, there is an opportunity to form a
compressed and efcient representation of information
[1517]. One way to capitalize on this structure and re-
dundancy is to represent collections of objects or features at
a higher level of description, describing distributions or
sets of objects as an ensemble rather than as individuals.
An ensemble representation is any representation that
is computed from multiple individual measurements, ei-
ther by collapsing across themor by combining themacross
space and/or time. For instance, any summary statistic
(e.g. the mean) is an ensemble representation because it
collapses across individual measurements to provide a
single description of the set. People are remarkably accu-
rate at computing averages, including the mean size
[18,19], brightness [20], orientation [18,21,22] and location
of a collection of objects [23]; the average emotion [24],
gender [24] and identity [25] of faces in a crowd; and the
average number for a set of symbolically presented num-
bers [26,27]. These are all measures of central tendency for
a collection of objects. Other statistics that describe a set,
such as variance [28], skewand kurtosis, are also ensemble
representations, although the ability to compute and rep-
resent these statistics has been the focus of less attention
in recent research (but see [29,30] for reviews on earlier
research). Finally, the concept of ensemble representations
can be extended beyond rst-order summary statistics, to
include higher-order summary statistics [3133].
Ensemble representations have been explored under
various names in the literature, including global features
[32,34,35], (w)holistic or congural features [3638], sets
[18,39] and statistical properties or statistical summa-
ries [19,40]. Each of these terms shares the notion that
multiple measurements are combined to give rise to a
higher level description. The term ensemble representa-
tion is used here as an umbrella term encompassing these
different ideas. Although there is, as yet, no unifying model
of ensemble representation across these domains, recent
research on ensemble representation is unied by a com-
mon principle: representing multiple objects as an ensem-
ble enhances visual cognition.
The power of averaging
How can computing ensemble representations help over-
come the severe capacity limitations of our visual system?
The answer lies in the power of averaging: simply put, the
average of multiple noisy measurements can be much more
precise than the individual measurements themselves. For
instance, one can measure reaction time with millisecond
precision even when rounding reaction times to the nearest
100 ms (Box 1). The same principle is at play in the wisdom
of crowds effect, in which people guess the weight of an ox
and the average response is closer to the correct answer
than are the individual guesses on average [41]. These
benets arise because, when measurements are averaged,
random error in one individual measurement will tend to
cancel out uncorrelated random error in another measure-
ment. Thus, the benets of averaging depend on the extent
to which the noise in individual measurements is correlat-
ed (less correlated, more benet) and the number of indi-
vidual measurements averaged (more measurements,
more benet). The benet of averaging can be formalized
mathematically, given certain assumptions regarding the
noise in the individual measurements (Figure 1).
If the human visual systemis capable of averaging, then
observers should be able to judge the average size of a set
more accurately than they can judge the individuals in the
set. This is exactly what was demonstrated by Dan Arielys
Review
Corresponding author: Alvarez, G.A. (alvarez@wjh.harvard.edu).
122 1364-6613/$ see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.003 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
inuential research on the ability of people to perceive the
mean size of a set [18], which showed that observers can
estimate with high accuracy the average size of a set of
objects, even when they appear unable to report the size of
the individual objects in the set.
This type of averaging provides a potential mechanism
for coping with the severe limitations on attentional pro-
cessing. Attention appears to be a uid and exible re-
source: we can give full attention to a single item and
represent that item with high precision, or we can divide
our attention among many items but consequently repre-
sent each item with lower precision [4244]. In general,
objects outside the focus of attention are perceived with
less clarity [45], lower contrast [46] and a weaker high-
frequency response [47,48]. Presumably all objects in the
visual eld are represented with varying degrees of preci-
sion, depending on the amount of attention they receive. In
some cases, objects outside the focus of attention are so
poorly represented that it seems like we have no useful
information about them at all. However, it turns out to be
possible to combine that imprecise information to recover
an accurate measure of the group [23].
Figure 2 illustrates how attention might affect the
delity of ensemble representations. Inside the focus of
attention (red beams), individual items will be represented
with relatively high precision. The average of these items
will be represented with even higher precision, as expected
from the benets of averaging. For items outside the focus
of attention, we assume that they must be attended to some
extent to be perceived at all. For instance, the results of
inattentional blindness studies have shown that without
attention, there is little or no consciously accessible repre-
sentation of visual information [4951]. These studies
typically aim for participants to completely withdraw at-
tention from the tested items, and in some cases observers
even actively inhibit information outside of the attentional
set [51]. However, when observers know they will be asked
about information outside the focus of attention, it is
probable that they diffusely attend to those items.
Figure 2 implies a parallel system with multiple foci of
Box 1. The power of averaging
Imagine you are running an experiment with an expected effect size
of 20 ms, which is not uncommon in behavioral research (e.g.
negative priming or simple detection tasks). Do you need to worry
about the sampling rate of your keyboard? First let us consider what
would happen if we simply rounded reaction times to the nearest
100 ms. By averaging multiple samples, individual errors owing to
rounding will tend to cancel each other out, and it is possible to
obtain millisecond precision in the estimate of the mean despite
rounding. Figure Ia shows the results of a simulation with ten virtual
subjects and only 30 trials per subject. The true average of the
population is 600 ms, and subjects are normally distributed around
this mean (i.e. each subject has their own true mean, but the average
across subjects will be 600 ms). For each simulated trial, reaction
time was simulated as the subjects true mean plus 15% random
noise around their true mean. This is fairly typical of reaction time
data, but the simulation results do not depend crucially on this value.
The simulated reaction times were then rounded to the nearest
100 ms. When the true reaction times (from the simulation) are
compared to the rounded reaction times, the mean and variance of
the two data sets are nearly indistinguishable.
Now suppose your keyboard checks for a key once every 100 ms.
This would be equivalent to rounding each reaction time up to the
nearest 100 ms, which on the face of it sounds like it would add error
to the estimate of the mean and variance of each condition. Indeed, it
would lead to overestimates of the reaction time in each condition.
However, the relative difference between conditions could be
preserved. The simulation above was repeated with two conditions
in which the true mean between conditions was simulated so that
condition two was 20 ms slower than condition one on average.
Figure Ib shows the results of the simulation, in which condition two
was reliably slower than condition one for each individual subject,
and the 20 ms difference is significant at p < 0.05 using a standard
within-subject t-test. In general, whether the effect can be detected
thus will depend on the degree of rounding, the expected size of the
effect and the variability of the data.
For the present purpose, the important point is that, by averaging a
relatively modest number of trials, it is possible to overcome a great
deal of noise in individual estimates to obtain a precise representation
of the mean (Figure Ia) and to detect a subtle difference between two
conditions (Figure Ib).
(a) Effect of rounding (b) Effect of rounding-up
R
e
a
c
t
i
o
n

t
i
m
e

(
m
s
)
R
e
a
c
t
i
o
n

t
i
m
e

(
m
s
)
True values Rounded values Condition 1 Condition 2
800
700
600
500
400
300
200
100
0
700
600
500
400
300
200
100
0
TRENDS in Cognitive Sciences
Figure I. (a) The effect of rounding on estimating the mean and variance in a single condition. Error bars depict the standard deviation across subjects. (b) The effect of
rounding-up on the comparison of two conditions in which the true mean differs by 20 ms. Error bars depict the within-subject standard error of the mean.
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
123
attention, plus diffuse attention spread over items outside
the foci of attention. However, a similar result could be
modeled with a single spotlight of attention that spends
more time in some locations than others. Either way this
diffuse attention results in extremely imprecise represen-
tations of the individual items, and yet averaging even just
three imprecise measurements results in a fairly precise
representation of the ensemble. If a large enough sample of
items is averaged together, then the ensemble representa-
tion for items outside the focus of attention can be nearly as
accurate as the ensemble representation for items inside
the focus of attention.
The mechanisms of averaging
Although there is general agreement that human obser-
vers can accurately represent ensemble features, many
questions remain regarding how these ensemble repre-
sentations are computed, including: (i) Are individual
representations computed and then combined to form an
ensemble representation, or are ensemble representations
somehow computed without computing individuals? (ii) If
individual representations are computed, are they dis-
carded once the ensemble has been computed? (iii) How
many individual items are sampled and included in the
calculation of the mean? Is it just a few or could it be all of
them? (iv) Do all items contribute to the mean equally?
Are ensembles built up from representations of
individuals?
Ariely [18] proposed that the visual systemperforms a type
of compression, by creating an ensemble representation
and then discarding individual representations. Some
have interpreted this proposal to mean that the ensemble
representation is computed without rst directly comput-
ing individual measurements. For instance, it is possible
that there is a total activation map and a number map
and that mean size is computed by taking the total activa-
tion and dividing it by the number of items [52]. However,
Arielys use of the termdiscard suggests that his intended
meaning was that the individual properties are computed,
combined and then discarded. This type of averaging model
has been supported by research on the computation of
mean orientation [21]. Addressing this question empirical-
ly is a challenge because it is possible to compute accurate
ensemble representations even from very imprecise indi-
vidual measurements. Consequently, a poor representa-
tion of individual items cannot be used as evidence for
mean computation without computing individuals unless
the mean can be shown to be represented more accurately
than expected based on the number and delity of individ-
ual items represented.
Are individual representations discarded?
How do we explain such poor performance when observers
are required to report the properties of individual members
of a set? One possibility is that these properties are com-
puted and then discarded. An important alternative pos-
sibility is that the individual representations are not
discarded, but are simply so noisy and inaccurate that
observers cannot consistently identify individuals from
the set owing to this high level of noise. Alvarez and Oliva
found support for this possibility by modeling their results
[23], consistently nding that the accuracy of ensemble
judgments is perfectly predicted from the accuracy of
individual judgments even when individuals appear to
be judged with near chance accuracy. This alternative
possibility ts with a framework in which the representa-
tion of an image is hierarchical, retaining information at
multiple levels of abstraction [35,53].

Ensemble
representation
Individual
representation
Focal
attention
Focal
attention
Distributed
attention
TRENDS in Cognitive Sciences
Figure 2. Effect of attention on the fidelity of ensemble representations. Two sets
of items are depicted: one set inside the focus of attention (red beams) and one set
diffusely attended outside the focus of attention (pink region). For illustrative
purposes, both sets are composed of identical individuals, and thus both sets have
the same individual and mean representations. For items inside the focus of
attention, individual representations will be relatively precise (red curves). The
ensemble representation of the items inside the focus of attention will be even
more precise, owing to the benefits of averaging. For items outside the focus of
attention which are diffusely attended, the individual representations will be very
imprecise (gray curves). However, the benefits of averaging are so great that the
ensemble representation will be fairly precise, even when a relatively small
number of individual representations are averaged (just three in this example).

Ensemble
representation
Individual
representation
Image
of world
TRENDS in Cognitive Sciences
Figure 1. Gaining precision at a higher level of abstraction. By taking individual
measurements and averaging them, it is possible to extract a higher-level
ensemble representation. If error is independent between the individual
representations, then the ensemble average will be more precisely represented
than the individuals in the set. This benefit can be quantified after making certain
assumptions. For instance, if each individual were represented with the same
degree of independent, Gaussian noise (standard deviation = s), then the average
of these individual estimates would have less noise, with a standard deviation
equal to s/Hn, where n is the number of individual measurements. The process is
depicted for the representation of object size, but the logic holds for any feature
dimension.
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
124
How many items are sampled?
A great deal of enthusiasm surrounding studies on ensem-
ble representations stems from the possibility that there
are specialized ensemble processing mechanisms which
are separate from the mechanisms employed to represent
individual objects. However, this idea has spurred some
controversy in the area of research on mean size percep-
tion, where modeling study has shown that it is possible to
accurately estimate the mean by sampling a small subset
of items [54]. In some cases, the average of the set could be
accurately estimated by strategically sampling as few as
one or two items, and estimating the average of those items
alone [54]. Consistent with this subset sampling hypothe-
sis, the accuracy of the mean estimate is typically constant
as the number of items in the set increases beyond four
items [18,55,56], whereas the benets of averaging should
accrue as more items are averaged together. This would be
expected if observers were sampling just a subset of the
items.
However, there are several reasons to believe that
observers are not strategically subsampling when they
compute the mean. In the case of crowded items, observers
simply cannot sample individual items, thus it is unlikely
that judgments for crowded displays [21] reect a sam-
pling strategy. When items are not crowded, it has been
shown that intermixing conditions that would require
different sampling strategies does not impair performance
on mean size estimation [57], suggesting that subjects
either are not using a strategic sampling strategy or can
instantly deploy a newstrategy based on some property of
the display. This latter possibility is unlikely, given that
the displays in [57] were only presented for 200 ms. One
study on perceiving the average facial expression has
shown that observers discount outliers when computing
the average, but a sampling strategy would show a large
effect of outliers [58]. Moreover, the accuracy of centroid
estimates suggests that all of the items must be averaged
to compute the centroid with the level of precision ob-
served, requiringthe representationof aminimumof eight
individual items [23].
If observers are not strategically subsampling, the fact
that the precision of mean size estimation is constant with
the number of items beyond four presents a bit of a
mystery. One possibility is that the benets of averaging
accrue quickly, and that one would predict a steep improve-
ment in the precision of mean estimation from one to four
items, with a leveling off beyond four items [58]. Another
possibility is that the precision with which each individual
item is represented decreases as the number of items
increases, because each item receives less attention
[42,44] and/or because items are more crowded and appear
further in the periphery on average. If this were the case,
then the benets fromaveraging additional items would be
offset by the decrease in precision with which the individ-
ual items are represented, as illustrated in Figure 3. This
account predicts that the slope of the function relating the
precision of mean judgments to the number of items would
depend on the degree to which the noise in individual items
increases with the number of items. In practice, this slope
is often fairly shallowor even at [18,55,56]. This raises the
intriguing possibility that averaging perfectly offsets the
increase in noise that occurs as the number of items
increases.
Do all items contribute to the mean equally?
There is already some evidence that not all items contrib-
ute equally to the mean [58]. Intuitively, if some measures
are very unreliable, and other measures are very reliable,
we should give the more reliable measures more weight
when combining these measurements. In general, comput-
ing a weighted average in which more reliable estimates
are given greater weight will minimize the error in esti-
mates of the mean. To illustrate this point, Figure 4 shows
the results of a simulation in which the mean size of eight
items was estimated. Half of the individual itemsizes were
estimated with high precision (low variance), whereas the
other half were estimated with low precision (high vari-
ance). The individual measurements were then averaged
using the standard equal-weight average or using a preci-
sion-weighted average in which each individual measure-
ment was weighted proportional to its precision. A total of
1000 trials were simulated, and for each trial error was
measured as the difference between the actual mean size
and the estimated mean size. The error distributions show
that error was lower for the precision-weighted average
than for the standard, equal-weighted average.

Ensemble
representation
Individual
representation
Image
of world
TRENDS in Cognitive Sciences
Figure 3. Effect of set size on the fidelity of individual and ensemble
representations. The ensemble average should become more precise as the
number of individual items increases, because the benefits of averaging accrue
with each additional item averaged (with diminishing returns, of course). However,
if the precision with which individual items can be represented decreases with set
size, as depicted here, it is possible for this decrease to perfectly offset the benefits
of averaging so that the precision of the average remains constant with set size.

Equal-Weighted average
Precision-Weighted average
Error
Error
F
r
e
q
u
e
n
c
y
F
r
e
q
u
e
n
c
y
TRENDS in Cognitive Sciences
Figure 4. Benefits of precision-weighted averaging. A standard equal-weighted
average will be less precise on average than a precision-weighted average in
which more reliable individual measurements are given more weight in the
average. Thus, if the precision of individual measurements is known, the optimal
strategy for computing the average is to combine individual measurements with
more weight given to more reliable individual measurements.
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
125
Exactly how to implement precision-weighted averag-
ing depends on howthe problemis formulated. When faced
with a group of samples to average, we could either assume
that each individual item is a sample drawn from a single
distribution or that each individual itemis a sample drawn
from a separate distribution. If we assume that individual
measurements are separate samples from a single distri-
bution, and the goal is to estimate the central tendency of
the underlying distribution, then each measurement i
should weighted by1/s
i
2
(where s
i
2
is the variance for item
i). For instance, if one of the items has innite variance, it
will be completely ignored. This type of weighted average
has been used extensively in the cue integration literature
to dene the optimal strategy for combining cues that have
different degrees of reliability [59]. Alternatively, if the
items are considered samples from separate distributions,
and the goal is to estimate the mean of the sample, then
items should never be given zero weight in the average.
One strategy would be to compute the mean and variance
of the samples, and to adjust the mean towards more
reliable measures in proportion to their variance. In this
case, an item with innite variance would be included in
the initial estimate of the mean, but there would be no
additional updating of the mean towards this item. This
strategy was employed in the simulations shown in
Figure 4.
For ensemble averaging mechanisms to employ this
type of precision-weighted averaging, the visual system
would either have to know the degree of reliability with
which items are represented or have a heuristic to calcu-
late it. Both of these routes are plausible. Some models of
visual perception model representations of individual
items as probabilistic [5961], in which knowledge is stored
as a probability distribution that explicitly contains a
representation of the reliability/variance of the represen-
tation. Alternatively, certain heuristics could be employed
for estimating reliability, such as giving peripheral items
less weight because visual resolution is known to drop off
with eccentricity. Similarly, items inside the focus of at-
tention might be weighted more than items outside the
focus of attention because the precision with which items
are represented is proportional to the amount of attention
we give them. These heuristics would not be explicit repre-
sentations of reliability, but they are cues that are tightly
correlated with reliability, and thus they could be used to
weigh individual items as a proxy for reliability.
It has been suggested that attended items are given
more weight in the averaging of crowded orientation sig-
nals [62]. One study has shown that when attention is
drawn to a particular itemin the set, the mean judgment is
biased towards that item [63]. One possible interpretation
of this nding is that attention enhances the resolution
with which the attended item is represented [4244,48],
and that items are weighed by their precision or reliability
when computing the mean [40]. This possibility is specu-
lative and has not been directly tested in uncrowded dis-
plays.
Beyond spatial averaging
Recent research on ensemble representation has gone
beyond assessing the ability of observers to average visual
features across space, including: (i) the ability to average
features across time; (ii) the ability to represent other
ensemble properties, such as the number of items in a
set; (iii) the ability to represent spatial patterns; (iv) the
relationship between ensemble representation and crowd-
ing; and (v) the neural correlates of ensemble representa-
tion.
Computing ensemble representations across time
In addition to spatial structure, there is a great deal of
temporal structure and redundancy in the input to the
visual system, and thus it would be advantageous to be
able to also compute ensemble representations across time.
Recent research has shown that observers can judge the
mean size of a dynamically changing item or groups of
items [40], or the mean expression of a dynamically chang-
ing face [56]. These ndings demonstrate that perceptual
averaging can operate over continuous and dynamic input,
and that averaging across time can be as precise as aver-
aging across space. Whether temporal averaging mechan-
isms constantly accumulate information or sample from
high information points, such as salient transitions or
discontinuities in the input stream, remains an open ques-
tion. However, there is some evidence that certain infor-
mation in a temporal sequence will be given more weight in
the average than other information, possibly related to the
amount of attention allocated to different points in the
temporal sequence [40].
Number as an ensemble representation
Perhaps the most basic summary description for a collec-
tion of items is the number of items in the set. Without
verbally counting, observers are able to estimate the ap-
proximate number of items in a set [6466]. Similar to the
perception of mean properties, the ability to enumerate
items in a set occurs rapidly. It is also possible to extract
the number of items across multiple sets in parallel [39].
Surprisingly, there is even evidence that number is directly
perceived in the same way as other primary visual attri-
butes [67]. Burr and Ross [67] demonstrated that it is
possible to adapt to number in the same way that it is
possible to adapt to visual properties such as color, orien-
tation or motion. Number literally seems to be a perceived
property of sets. The relationship between the mechan-
isms underlying number representation and perceptual
averaging is an important topic for future research.
Representing spatial patterns
Statistical summary representations, such as the mean or
number of items in a set, are extremely compact represen-
tations, collapsing the description of a set down to a single
number. However, images often consist of spatially distrib-
uted patterns of information, also referred to as spatial
regularities or spatial layout statistics. For example, nat-
ural images consist of regular distributions of orientation
and spatial frequency information [34,68]. In one study,
Oliva and Torralba [34] measured orientation energy at
different spatial scales over thousands of images and con-
ducted a principal components analysis on these measure-
ments. This analysis revealed that there are regularities in
the structure of natural images, with certain patterns of
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
126
spatial frequency and orientation more likely to occur than
other patterns. A schematic of a common pattern is shown
in Figure 5, in which orientation signals tend to be more
similar to each other within the top and bottom halves of
the image than they are across the top and bottom halves.
It would be efcient for the visual system to capitalize on
the redundancy in natural images by using visual mechan-
isms that are tuned to the statistics of the natural world
[11,69]. Indeed, a great deal of research has suggested that
low-level sensory mechanisms are tuned to real-world
statistical regularities [17,7072].
The representation of such spatial ensemble statistics is
robust to the withdrawal of attention, as would be expected
if these ensemble representations are computed by pooling
together local measurements [31]. For example, while
attending to a set of moving objects in the foreground,
changes to the background were only noticed when they
altered the ensemble structure of the display, not when the
ensemble structure remained the same, even though these
changes were perfectly matched in terms of the magnitude
of local change [31]. This suggests that the visual system
maintains an accurate representation of the spatial en-
semble statistics of a scene, even when attention is focused
on a subset of items in the visual eld.
Ensemble representation and crowding
Items in the visual eld are often spaced too closely for each
individual item to be resolved. For instance, it is unlikely
that one can perceive the individual letters three sentences
above or below this one. Yet, one can tell that there are
letters present, that these letters are grouped into several
words and so on. What is the nature of our perceptual
representation when looking at a crowded collection of
objects? There is a growing body of evidence suggesting
that one perceives the higher-order summary statistics of
information within the crowded region [21,73]. For a
crowded set of oriented items, one perceives the average
orientation [21]. For more complex patterns, such as a set
of letters, the perceived pattern appears to result from a
more complex statistical representation [73]. Balas and
colleagues generated stimuli using a model which uses the
joint statistics of cells which code for position, phase,
orientation and scale [73]. Any pattern, such as sets of
letters, can be passed through this model, resulting in a
synthetic image that is somewhat distorted, yet is statisti-
cally similar to the original. When directly viewed, the
original and the synthetic image look very different. How-
ever, identication performance with these synthetic
images correlates with identication performance for
crowded letters in the periphery, suggesting that percep-
tion in the periphery could consist of a similar statistical
representation. The relationship between ensemble repre-
sentation and crowding raises important questions regard-
ing whether ensemble coding occurs automatically and
whether it is perceptual in nature (Box 2).
Other studies suggest that there could be important
differences between ensemble representation and crowd-
Box 2. Automaticity and directly perceived ensemble
representations
A central question is whether the visual system automatically
computes ensemble representations without conscious intention or
effort, or whether they are computed voluntarily based on task
demands. If ensemble representations were automatically com-
puted, then we would conclude that there are dedicated mechan-
isms for computing and representing them. We might then focus on
identifying the core ensemble feature dimensions and assessing
their tuning properties. To understand such mechanisms, we can
bring to bear methods that have been employed to understand
perception, such as single-cell physiology, and perceptual adapta-
tion. If ensemble representations are not computed automatically,
but instead reflect a voluntary high-level judgment, then the
methods we would use, and questions we would ask, might be
somewhat different. For instance, physiology and adaptation are
unlikely to reveal much about these mechanisms and ensemble
representations would probably depend on task incentives and
observers goals. To understand such representations, we might
explore regularities in how observers make ensemble judgments
and turn our attention towards identifying consistent heuristics and
biases in ensemble judgments.
In addition to the distinction between automatic and voluntary,
there is an important distinction between directly perceived and
read-out ensemble representations. In some cases the observer
directly perceives the ensemble representation. For example, when
a collection of items is presented in the periphery, their orientations
appear to be automatically averaged [28]. With such crowded items,
the perceptual experience is of directly seeing the average
orientation (all items appear to have an orientation equal to the
mean of the group), with an accompanying loss of perceptual access
to the individual orientation signals. By contrast, when the same
display appears at the fovea, the oriented items are not crowded and
the orientation signals do not appear to be obligatorily averaged: it
is clear that the items have different orientations and none of them
appears to have an orientation that matches the average. However,
even for uncrowded displays, it is possible that ensemble repre-
sentations are automatically computed. For example, ensemble
representations appear to be automatically computed when the
primary task does not require it [77] and even when they impair task
performance [94].

Spatial ensemble
representation
Individual
representation
Image
of world
TRENDS in Cognitive Sciences
Figure 5. Spatial ensemble representations. Individual orientation measurements
can be combined to represent patterns of orientation information. For each
pattern, local orientation measurements are made (depicted as Gaussian curves
centered around the true orientation), but each individual measure has a high
degree of noise or uncertainty. Similar orientation signals are then pooled together
to characterize regions with similar orientation signals using the average
orientation. In the first column, the top half of the image has a mean orientation
of vertical, whereas the bottom half has the a mean orientation of horizontal. The
same is true for the image in the middle column. However, the pattern is flipped for
the third column, here the top half has a mean of horizontal and the bottom half
has a mean of vertical. Crucially, at the level of individual representations, the left
and middle columns are just as different from each other as the left and right
columns. However, at the ensemble level, the left and middle columns are more
similar to each other than the left and right columns.
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
127
ing. For instance, crowding is greater in the upper visual
eld than the lower visual eld, whereas under the same
conditions the accuracy of ensemble judgments was the
same in the upper and lower visual eld [74]. Thus, al-
though ensemble coding and crowding are closely related,
there could be important dissociations between them.
Neural correlates of ensemble representation
Relatively little research has explored the neural mechan-
isms of ensemble representation. Perhaps the most basic
question we can ask is whether there are brain regions
with neurons dedicated to computing ensemble represen-
tations (above and beyond the computation of individual
object representations). Extensive research suggests that
the parietal cortex plays an important role in the repre-
sentation of number [75]. However, much less research has
been done to explore the representation of perceptual
averages, such as mean size, mean facial expression or
mean orientation. Future research in this area would
provide important insight into the nature of ensemble
coding, as well as the functional organization of the visual
cortex.
Additional benets of computing ensemble
representations
The present article has focused on one primary benet of
ensemble representation: the ability to combine imprecise
individual measurements to construct an accurate repre-
sentation of the group, or ensemble. However, computing
ensemble representations could yield many related bene-
ts [18,76], which are discussed here.
Information compression
Compression is the process of recoding data so that it takes
fewer bits of information to represent that data. To the
extent that the encoding scheme distorts or loses informa-
tion, the compression is said to be lossy. For instance, TIFF
image encoding uses a formof lossless compression, where-
as JPEG image encoding is a lossy form of image compres-
sion although the information lost occurs at such a high
spatial frequency that human observers typically cannot
detect this loss. Ariely [18] proposed that reducing the
representation of a set to the mean, and discarding indi-
vidual representations, would be a sensible form of lossy
compression for the human visual system: it leaves avail-
able an informative global percept which could potentially
be used to navigate and choose regions of interest for
further analysis. However, this form of compression would
only be economical if ensemble representations and indi-
vidual representations were competing in some sense.
Otherwise, in terms of compression, there is no advantage
to discarding the individual representations, and one
might as well extract the ensemble and retain the individ-
ual representations. There is some evidence that ensemble
representations take the same memory space as individual
representations [39], although other studies suggest that
ensemble representations and noisy individual represen-
tations are maintained concurrently and that these levels
are mutually informative [77,78]. These ndings suggest
that ensemble representations and individual representa-
tions probably do not compete for storage, at least not in a
mutually exclusive manner. However, none of these previ-
ous studies directly pitted ensemble memory versus indi-
vidual memory and assessed possible trade-offs between
them. Future research will be necessary to explore the
extent to which ensemble representations and individual
representations compete in memory. In terms of perceptu-
al representations, it seems clear that individual and
ensemble representations can be maintained simulta-
neously [23].
Whether ensemble coding is lossy or lossless depends
on the fate of lower-level, individual representations.
However, at the level of the ensemble representation, it
is clear the data have been transformed into a more
compressed form. It is possible that this format is more
conducive to memory storage and learning. Ensemble
representations are more precise than the lower-level
representations composing them. Thus, there can be
higher specicity of response at the ensemble level than
at lower levels of representation. Such sparse coding has
several advantages [79,80], including minimizing overlap
between representations stored in memory [81] and learn-
ing associations in neural networks [82]. The extent to
which observers can learn over ensemble representations
of the type described in the present article is an important
topic for future research, because it could bridge the gap
between research on ensemble coding in visual cognition
with the vast eld of research on sparse coding and
memory.
Ensemble representations as a basis for statistical
inference and outlier detection
Another potential benet to building an ensemble repre-
sentation is to enable statistical inferences [83], including
estimating the parameters of the distribution (mean, vari-
ance, range, shape), setting condence intervals on those
parameter estimates and classifying items into groups. A
special case of classication is outlier detection, and an
ensemble representation is ideal for this purpose [18,76].
For instance, if a set is well described by a distribution
along an arbitrary dimension, say with a mean of 20 and
standard deviation of 3, then an item with a value of 30
along this dimension is unlikely to be a member of the set.
The ensemble representation would enable labeling this
itemas an outlier or even as a member of a different group.
Outlier detection has been extensively studied using the
visual search paradigm, in which the question has been
whether an oddball item will instantly pop out from a
larger set of homogeneous items [84]. Items that are very
different from the set, say a red item among green items,
are said to be salient, and are easy to nd in a visual search
task [85,86]. Interestingly, computational models of salien-
cy focus on local differences between each item and its
neighbors [87]. However, one could imagine displays in
which the local context of a search target remained un-
changed, but more distant items varied to either increase
or decrease the degree to which the target appeared to be a
member of the overall set. Finding that outlier status
guides visual search above and beyond its effects on local
saliency would provide strong support for the idea that
ensemble representations play an important role in outlier
detection.
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
128
Although it would be interesting if ensemble represen-
tations could enable rapid outlier detection, this nding is
not necessary to support the idea that ensemble represen-
tations play an important role in classifying and grouping
items. For instance, a face with a unique facial expression
does not pop-out in a visual search task [88]. However,
recent research shows that an outlier face is given reduced
weight in the ensemble representation of a group of faces
[58], even though observers often fail to perceive the outli-
er. This nding is consistent with the possibility that the
ensemble representation enables labeling of items, but
could also indicate that the ensemble computation gives
outliers lower weight without attaching a classication
label. The role of ensemble representations in determining
set membership has not yet been extensively studied, and
research in this area can potentially bridge the gap be-
tween study on ensemble representation, statistical infer-
ence and perceptual grouping.
Building a gist representation that can guide the focus
of attention
As detailed in previous sections, the power of averaging
makes it possible to combine imprecise local measure-
ments to yield a relatively precise representation of the
ensemble (Figure 1). Moreover, it is possible to combine
individual measurements to describe spatial patterns of
information (Figure 5). A primary benet of computing
either type of ensemble representation is to provide a
precise and accurate representation of the gist of infor-
mation outside the focus of attention. Without focused
attention, our representations of visual information are
highly imprecise [23]. If we were to simply discard or ignore
these noisy representations, our conscious visual experi-
ence would be limited to only those items currently within
the focus of attention. Indeed, some have argued that this
is the nature of conscious visual experience [89,90]. In such
a system, attention would be ying blind, without access
to any information about what location or region to focus on
next.
Although locally imprecise, ensemble representations
provide an accurate representation of higher-level patterns
and regularities outside the focus of attention [23,31].
These patterns and regularities are highly diagnostic of
the type of scene one is viewing [14], and therefore they are
useful for determining which environment one is currently
located within. Over experience, observers appear to learn
associations between these ensemble representations and
the location of objects in the visual eld. For instance,
observers appear to use global contextual information to
guide the deployment of attention to locations likely to
contain the target of a visual search task [33,9193].Thus,
rather than ying blind, the visual system can compute
ensemble representations, providing a sense of the gist of
information outside the focus of attention, and guiding the
deployment of attention to important regions of a scene.
In terms of forming a complete representation of a
scene, gist representation and outlier detection probably
work in tandem. For instance, when holding a scene in
working memory, observers appear to encode the gist of the
scene plus individual items that cannot be incorporated
into the summary for the rest of the scene (i.e. outliers) [78].
Benets of building a hierarchical representation of a
scene
There are distinct computational advantages to building a
hierarchical representation of a scene. In particular, by
integrating information across levels of representation, it
is possible to increase the accuracy of lower-level repre-
sentations. It appears that observers automatically con-
struct this type of representation when asked to hold a
scene in working memory [77,78]. For instance, when
recalling the size of an individual item from a display,
the remembered size was biased towards the mean size of
the set of items in the same color, and towards the overall
mean size of all items in the display [77]. These results
were well captured by a Bayesian model in which obser-
vers integrate information at multiple levels of abstrac-
tion to inform their judgment about the size of the tested
item.
Concluding remarks
Traditional research on visual cognition has typically
assessed the limits of visual perception and memory for
individual objects, often using random and unstructured
displays. However, there is a great deal of structure and
redundancy in real-world images, presenting an opportu-
nity to represent groups of objects as an ensemble. Because
ensemble representations summarize the properties of a
group, they are necessarily spatially and temporally im-
precise. Nevertheless, such ensemble representations con-
fer several important benets. Much of the previous
research on ensemble representation has focused on the
fact that the human visual system is capable of computing
accurate ensemble representations. However, the eld is
moving towards a focus on investigating the mechanisms
that enable ensemble coding, the nature of the ensemble
representation, the utility of ensemble representations and
the neural mechanisms underlying ensemble coding. This
future research promises to uncover important new prop-
erties of the representations underlying visual cognition
and to further demonstrate how representing ensembles
enhances visual cognition.
Acknowledgments
For helpful conversation and/or comments on earlier drafts, I thank Talia
Konkle, Jason Haberman and Jordan Suchow. G.A.A. was supported by
the National Science Foundation (Career Award BCS-0953730).
References
1 Kersten, D. (1987) Predictability and redundancy of natural images. J.
Opt. Soc. Am. A 4, 23952400
2 Field, D.J. (1987) Relations between the statistics of natural images
and the response properties of cortical cells. J. Opt. Soc. Am. A 4, 2379
2394
3 Brady, N. and Field, D.J. (2000) Local contrast in natural images:
normalisation and coding efciency. Perception 29, 10411055
4 Frazor, R.A. and Geisler, W.S. (2006) Local luminance and contrast in
natural images. Vis. Res. 46, 15851598
5 Webster, M.A. and Mollon, J.D. (1997) Adaptation and the color
statistics of natural images. Vis. Res. 37, 32833298
6 Hyva rinen, A. and Hoyer, P.O. (2000) Emergence of phase and shift
invariant features by decomposition of natural images into
independent feature subspaces. Neural Comput. 12, 17051720
7 Judd, D.B. et al. (1964) Spectral distribution of typical daylight as a
function of correlated color temperature. J. Opt. Soc. Am. A 54, 1031
1040
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
129
8 Long, F. et al. (2006) Spectral statistics in natural scenes predict hue,
saturation, and brightness. Proc. Natl. Acad. Sci. U.S.A. 103, 6013
6018
9 Maloney, L.T. (1986) Evaluation of linear models of surface spectral
reectance with small numbers of parameters. J. Opt. Soc. Am. A 3,
16731683
10 Maloney, L.T. and Wandell, B.A. (1986) Color constancy: a method for
recovering surface spectral reectance. J. Opt. Soc. Am. A 3, 2933
11 Field, D.J. (1989) What the statistics of natural images tell us about
visual coding. SPIE: Hum. Vis. Vis. Process. Digit. Display 1077,
269276
12 Burton, G.J. and Moorehead, I.R. (1987) Color and spatial structure in
natural scenes. Appl. Opt. 26, 157170
13 Geisler, W.S. (2008) Visual perception and the statistical properties of
natural scenes. Annu. Rev. Psychol. 59, 167192
14 Torralba, A. and Oliva, A. (2003) Statistics of natural image categories.
Network 14, 391412
15 Huffman, D.A. (1952) A method for construction of minimum
redundancy codes. Proc. IRE 40, 10981101
16 Shannon, C.E. and Weaver, W. (1949) The Mathematical Theory of
Communication, The University of Illinois Press
17 Atick, J.J. (1992) Could information theory provide an ecological theory
of sensory processing? Network: Comput. Neural Syst. 3, 213251
18 Ariely, D. (2001) Seeing sets: representation by statistical properties.
Psychol. Sci. 12, 157162
19 Chong, S.C. and Treisman, A. (2003) Representation of statistical
properties. Vis. Res. 43, 393404
20 Bauer, B. (2009) Does Stevens power law for brightness extend to
perceptual brightness averaging? Psychol. Rec. 59, 171186
21 Parkes, L. et al. (2001) Compulsory averaging of crowded orientation
signals in human vision. Nat. Neurosci. 4, 739744
22 Dakin, S.C. and Watt, R.J. (1997) The computation of orientation
statistics from visual texture. Vis. Res. 37, 31813192
23 Alvarez, G.A. and Oliva, A. (2008) The representation of simple
ensemble visual features outside the focus of attention. Psychol. Sci.
19, 392398
24 Haberman, J. and Whitney, D. (2007) Rapid extraction of mean
emotion and gender from sets of faces. Curr. Biol. 17, R751R753
25 de Fockert, J. and Wolfenstein, C. (2009) Rapid extraction of mean
identity from sets of faces. Q. J. Exp. Psychol. (Colchester) 62, 1716
1722
26 Spencer, J. (1961) Estimating averages. Ergonomics 4, 317328
27 Smith, A.R. and Price, P.C. (2010) Sample size bias in the estimation of
means. Psychon. Bull. Rev. 17, 499503
28 Morgan, M. et al. (2008) A dipper function for texture discrimination
based on orientation variance. J. Vis. 8, 18
29 Peterson, C.R. and Beach, L.R. (1967) Man as an intuitive statistician.
Psychol. Bull. 68, 2946
30 Pollard, P. (1984) Intuitive judgments of proportions, means, and
variances: a review. Curr. Psychol. 3, 518
31 Alvarez, G.A. and Oliva, A. (2009) Spatial ensemble statistics are
efcient codes that can be represented with reduced attention. Proc.
Natl. Acad. Sci. U.S.A. 106, 73457350
32 Oliva, A. and Torralba, A. (2006) Building the gist of a scene: the role of
global image features in recognition. Prog. Brain Res. 155, 2336
33 Oliva, A. and Torralba, A. (2007) The role of context in object
recognition. Trends Cogn. Sci. 11, 520527
34 Oliva, A. and Torralba, A. (2001) Modeling the shape of the scene: a
holistic representation of the spatial envelope. Int. J. Comput. Vis. 42,
145175
35 Navon, D. (1977) Forest before trees: the precedence of global features
in visual perception. Cognit. Psychol. 9, 353383
36 Kimchi, R. (1992) Primacy of wholistic processing and global/local
paradigm: a critical review. Psychol. Bull. 112, 2438
37 Thompson, P. (1980) Margaret Thatcher: a new illusion. Perception 9,
483484
38 Young, A.W. et al. (1987) Congurational information in face
perception. Perception 16, 747759
39 Halberda, J. et al. (2006) Multiple spatially overlapping sets can be
enumerated in parallel. Psychol. Sci. 17, 572576
40 Albrecht, A.R. and Scholl, B.J. (2010) Perceptually averaging in a
continuous visual world: extracting statistical summary
representations over time. Psychol. Sci. 21, 560567
41 Galton, F. (1907) Vox populi. Nature 75, 450451
42 Palmer, J. (1990) Attentional limits on the perception and memory
of visual information. J. Exp. Psychol. Hum. Percept. Perform. 16, 332
350
43 Alvarez, G.A. and Franconeri, S.L. (2007) How many objects can you
track? Evidence for a resource-limited attentive tracking mechanism.
J. Vis. 7, 110
44 Franconeri, S.L. et al. (2007) How many locations can be selected at
once? J. Exp. Psychol. Hum. Percept. Perform. 33, 10031012
45 Titchener, E.B. (1908) Lectures on the Elementary Psychology of Feeling
and Attention, Macmillan
46 Carrasco, M. et al. (2004) Attention alters appearance. Nat. Neurosci. 7,
308313
47 Carrasco, M. et al. (2002) Covert attention increases spatial resolution
with or without masks: support for signal enhancement. J. Vis. 2,
467479
48 Yeshurun, Y. and Carrasco, M. (1998) Attention improves or impairs
visual performance by enhancing spatial resolution. Nature 396, 7275
49 Mack, A. and Rock, I. (1998) Inattentional Blindness, The MIT Press
50 Neisser, U. and Becklen, R. (1975) Selective looking: attending to
visually specied events. Cognit. Psychol. 7, 480494
51 Most, S.B. et al. (2005) What you see is what you set: sustained
inattentional blindness and the capture of awareness. Psychol. Rev.
112, 217242
52 Setic, M. et al. (2007) Modelling the statistical processing of visual
information. Neurocomputing 70, 18081812
53 Kinchla, R.A. and Wolfe, J.M. (1979) The order of visual processing:
Top-down, bottom-up, or middle-out. Percept. Psychophys. 25,
225231
54 Myczek, K. and Simons, D.J. (2008) Better than average: alternatives
to statistical summary representations for rapid judgments of average
size. Percept. Psychophys. 70, 772788
55 Chong, S.C. and Treisman, A. (2005) Attentional spread in the
statistical processing of visual displays. Percept. Psychophys. 67, 113
56 Haberman, J. et al. (2009) Averaging facial expression over time. J. Vis.
9, 113
57 Chong, S.C. et al. (2008) Statistical processing: not so implausible after
all. Percept. Psychophys. 70, 13271334
58 Haberman, J. and Whitney, D. (2010) The visual system discounts
emotional deviants when extracting average expression. Atten. Percept.
Psychophys. 72, 18251838
59 Kersten, D. and Yuille, A. (2003) Bayesian models of object perception.
Curr. Opin. Neurobiol. 13, 150158
60 Vul, E. and Pashler, H. (2008) Measuring the crowd within:
probabilistic representations within individuals. Psychol. Sci. 19,
645647
61 Vul, E. and Rich, A.N. (2010) Independent sampling of features enables
conscious perception of bound objects. Psychol. Sci. 21, 11681175
62 Mareschal, I. et al. (2010) Attentional modulation of crowding. Vis. Res.
50, 805809
63 de Fockert, J.W. and Marchant, A.P. (2008) Attention modulates set
representation by statistical properties. Percept. Psychophys. 70,
789794
64 Dehaene, S. et al. (1998) Abstract representations of numbers in the
animal and human brain. Trends Neurosci. 21, 355361
65 Feigenson, L. et al. (2004) Core systems of number. Trends Cogn. Sci. 8,
307314
66 Whalen, J. et al. (1999) Nonverbal counting in humans: the
psychophysics of number representation. Psychol. Sci. 10, 130137
67 Burr, D. and Ross, J. (2008) A visual sense of number. Curr. Biol. 18,
425428
68 Geisler, W.S. et al. (2001) Edge co-occurrence in natural images
predicts contour grouping performance. Vis. Res. 41, 711724
69 Chandler, D.M. and Field, D.J. (2007) Estimates of the information
content and dimensionality of natural scenes from proximity
distributions. J. Opt. Soc. Am. A 24, 922941
70 Barlow, H.B. and Foldiak, P. (1989) Adaptation and decorrelation in
the cortex. In The Computing Neuron (Durbin, R. et al., eds), pp. 5472,
Addison-Wesley
71 Lewicki, M.S. (2002) Efcient coding of natural sounds. Nat. Neurosci.
5, 356363
72 Olshausen, B.A. and Field, D.J. (1996) Natural image statistics and
efcient coding. Network 7, 333339
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
130
73 Balas, B. et al. (2009) Asummary-statistic representation in peripheral
vision explains visual crowding. J. Vis. 9, 1318
74 Bulakowski, P.F. et al. Reexamining the possible benets of visual
crowding: dissociating crowding from ensemble percepts. Atten.
Percept. Psychophys. (in press)
75 Piazza, M. and Izard, V. (2009) Howhumans count: numerosity and the
parietal cortex. Neuroscientist 15, 261273
76 Cavanagh, P. (2001) Seeing the forest but not the trees. Nat. Neurosci.
4, 673674
77 Brady, T.F. and Alvarez, G.A. Hierarchical encoding in visual working
memory: ensemble statistics bias memory for individual items.
Psychol. Sci. (in press)
78 Brady, T.F. and Tenenbaum, J.B. (2010) Encoding higher-order
structure in visual working-memory: a probabilistic model. In
Proceedings of the 32nd Annual Conference of the Cognitive Science
Society (Ohlsson, S. and Catrambone, R., eds), pp. 411416, Cognitive
Science
79 Olshausen, B.A. and Field, D.J. (2004) Sparse coding of sensory inputs.
Curr. Opin. Neurobiol. 14, 481487
80 Olshausen, B.A. and Field, D.J. (1997) Sparse coding with an
overcomplete basis set: a strategy employed by V1? Vis. Res. 37,
33113325
81 Willshaw, D.J. et al. (1969) Non-holographic associative memory.
Nature (Lond.) 222, 960962
82 Zetzsche, C. (1990) Sparse coding: the link between low level vision
and associative memory. In Parallel Processing in Neural Systems
and Computers (Eckmiller, R. et al., eds), pp. 273276, Elsevier
Science
83 Rosenholtz, R. (2000) Signicantly different textures: a computational
model of pre-attentive texture segmentation. In Proceedings of the 6th
European Conference on Computer Vision (Vernon, D., ed.), pp. 197
211, Springer-Verlag
84 Rosenholtz, R. (1999) A simple saliency model predicts a number of
motion popout phenomena. Vis. Res. 39, 31573163
85 Itti, L. and Koch, C. (2001) Computational modelling of visual
attention. Nat. Rev. Neurosci. 2, 194203
86 Wolfe, J.M. (1994) Guided search 2.0: a revised model of visual search.
Psychon. Bull. Rev. 1, 202238
87 Itti, L. and Koch, C. (2000) A saliency-based search mechanism for
overt and covert shifts of visual attention. Vis. Res. 40, 14891506
88 Nothdurft, H.C. (1993) Faces and facial expressions do not pop out.
Perception 22, 12871298
89 Noe, A. and ORegan, J.K. (2000) Perception, attention and the grand
illusion. Psyche 6 (http://psyche.cs.monash.edu.au/v6/psche-6-15-noe.
html)
90 ORegan, J.K. (1992) Solving the real mysteries of visual perception:
the world as an outside memory. Can. J. Psychol. 46, 461488
91 Torralba, A. et al. (2006) Contextual guidance of eye movements and
attention in real-world scenes: the role of global features in object
search. Psychol. Rev. 113, 766786
92 Ehinger, K.A. et al. (2009) Modeling search for people in 900 scenes: a
combined source model of eye guidance. Vis. Cogn. 17, 945978
93 Chun, M.M. (2000) Contextual cueing of visual attention. Trends Cogn.
Sci. 4, 170178
94 Haberman, J. and Whitney, D. (2009) Seeing the mean: ensemble
coding for sets of faces. Hum. Percept. Perform. 35, 718734
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
131
Cognitive neuroscience of
self-regulation failure
Todd F. Heatherton and Dylan D. Wagner
Department of Psychological and Brain Sciences, 6207 Moore Hall, Dartmouth College, Hanover, NH 03755, USA
Self-regulatory failure is a core feature of many social
and mental health problems. Self-regulation can be
undermined by failures to transcend overwhelming
temptations, negative moods and resource depletion,
and when minor lapses in self-control snowball into self-
regulatory collapse. Cognitive neuroscience research
suggests that successful self-regulation is dependent
on top-down control from the prefrontal cortex over
subcortical regions involved in reward and emotion.
We highlight recent neuroimaging research on self-reg-
ulatory failure, the ndings of which support a balance
model of self-regulation whereby self-regulatory failure
occurs whenever the balance is tipped in favor of sub-
cortical areas, either due to particularly strong impulses
or when prefrontal function itself is impaired. Such a
model is consistent with recent ndings in the cognitive
neuroscience of addictive behavior, emotion regulation
and decision-making.
The advantages of self-control
The ability to control behavior enables humans to live
cooperatively, achieve important goals and maintain
health throughout their life span. Self-regulation enables
people to make plans, choose from alternatives, control
impulses, inhibit unwanted thoughts and regulate social
behavior [14]. Although humans have an impressive ca-
pacity for self-regulation, failures are common and people
lose control of their behavior in a wide variety of circum-
stances [1,5]. Such failures are an important cause of
several contemporary societal problems obesity, addic-
tion, poor nancial decisions, sexual indelity and so on.
Indeed, it has been estimated that 40% of deaths are
attributable to poor self-regulation [6]. Conversely, those
who are better able to self-regulate demonstrate improved
relationships, increased job success and better mental
health [7,8] and are less at risk of developing alcohol abuse
problems or engaging in risky sexual behavior [9]. An
understanding of the circumstances under which people
fail at self-regulation as well as the brain mechanisms
associated with those failures can provide valuable
insights into how people regulate and control their
thoughts, behaviors and emotions.
Self-regulation failure
The modern world holds many temptations. Every day,
people need to resist fattening foods, avoid browsing the
internet when they should be working, keep fromsnapping
at annoying coworkers and curb bad habits, such as smok-
ing or drinking too much. Psychologists have made consid-
erable progress in identifying the individual and
situational factors that encourage or impair self-control
[4,5,10]. The most common circumstances under which
self-regulation fails are when people are in bad moods,
when minor indulgences snowball into full-blown binges,
when people are overwhelmed by immediate temptations
or impulses, and when control itself is impaired (e.g. after
alcohol consumption or effort depletion). Researchers have
examined each of these and we briey discuss the major
ndings, beginning with the behavioral literature and then
discussing recent neuroscience ndings.
Negative moods
Among the most important triggers of self-regulation fail-
ure are negative emotions [11,12]. When people become
upset they sometimes act aggressively [13], spend too
much money [14], engage in risky behavior [15], including
unprotected sex [16], comfort the self with alcohol, drugs or
food [4,17], and fail to pursue important life goals. Indeed,
negative emotional states are related to relapse for a
number of addictive behaviors, such as alcoholism, gam-
bling and drug addiction [18,19]. Laboratory studies have
demonstrated that inducing negative affect leads to height-
ened cravings among alcoholics [12], increased eating by
chronic dieters [20,21] and greater smoking intensity by
smokers [22].
A theory by Heatherton and Baumeister provides an
explanation for the roles of negative affect in disinhibited
eating [23], which is also applicable to other self-regulatory
failures. This theory proposes that dieters hold a negative
view of self that is generally unpleasant (especially con-
cerning physical appearance) and that dieters are motivat-
ed to escape from these unpleasant feelings by constricting
their cognitive attention to the immediate situation while
ignoring the long-term implications and higher-level sig-
nicance of their current actions. This escape fromaversive
self-awareness not only helps dieters to forget their un-
pleasant views of self, but also disengages long-term plan-
ning and meaningful thinking and weakens the inhibitions
that normally restrain a dieters food intake. This might
explain, in part, the lack of insight that occurs in drug
addiction [24]. Other behavioral accounts of the impact of
negative mood on behavior include the idea that negative
affect occupies attention, thereby leading to fewer
resources to inhibit behavior [25], or that engaging in
appetitive behaviors reduces anxiety and comforts the self
and is therefore a form of coping [26].
Review
Corresponding author: Heatherton, T.F. (heatherton@dartmouth.edu).
132 1364-6613/$ see front matter 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2010.12.005 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
Lapse-activated consumption
A common pattern of self-regulation failure occurs for
addicts and chronic dieters when they fall off the wagon
by consuming the addictive substance or violating their
diets [5]. Marlatt coined the term abstinence violation
effect to refer to situations in which addicts respond to
an initial indulgence by consuming even more of the for-
bidden substance [11]. In one of the rst studies to examine
this effect, Herman and Mack experimentally violated the
diets of dieters by requiring them to drink a milkshake, a
high-calorie food, as part of a supposed taste perception
study [27]. Although non-dieters ate less after consuming
the milkshakes, presumably because they were full, dieters
paradoxically ate more after having the milkshake
(Figure 1a). This disinhibition of dietary restraint has been
replicated numerous times [20,28] and demonstrates that
dieters often eat a great deal after they perceive their diets
to be broken. It is currently not clear, however, howa small
indulgence, which itself might not be problematic, esca-
lates into a full-blown binge [29].
Cue exposure
At the core of self-regulation is impulse control, but how do
impulses arise? Both human and animal studies have
demonstrated that exposure to drug cues increases the
likelihood that the cued substance will be consumed [30
33], and additionally increases cravings, attention and
physiological responses such as changes in heart rate
(b)
(a)
Left NAcc (-15, 3, -8)
Right NAcc (12, 9, -3)
No preload
Milkshake
No preload
Milkshake
250
200
150
100
50
0
0.5
0.4
0.3
-0.3
0.2
-0.2
0.1
-0.1
0
Diet Non-diet
Diet Non-diet
I
c
e

c
r
e
a
m

c
o
n
s
u
m
m
e
d

(
g
)
B
o
l
d

s
i
g
n
a
l

c
h
a
n
g
e
No preload
Milkshake
0.5
0.4
0.3
-0.3
0.2
-0.2
0.1
-0.1
0
Diet Non-diet
B
o
l
d

s
i
g
n
a
l

c
h
a
n
g
e
TRENDS in Cognitive Sciences
Figure 1. (a) When restrained eaters diets were broken by consumption of a high-calorie milkshake preload, they subsequently show disinhibited eating (e.g. increased
grams of ice-cream consumed) compared to control subjects and restrained eaters who did not drink the milkshake (figure based on data from [30]). (b) Restrained eaters
whose diets were broken by a milkshake preload showed increased activity in the nucleus accumbens (NAcc) compared to restrained eaters who did not consume the
preload and satiated non-dieters [64].
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
133
[3335]. Yet people might be unaware that their environ-
ments are inuencing them because stimuli can activate
goals, cravings and so forth implicitly [36,37]. Even if
people are somewhat aware of cues around them, they
are unaware of the process by which exposure to those cues
implicitly activates cognitive processes that determine
behavior [38]. A recent meta-analysis of 75 articles found
that implicit cognition is a strong and reliable predictor of
substance use [39]. From this perspective, cognition that is
spontaneously activated by stimuli from the environment
alters how people act in a given situation.
The ability to transcend immediate temptations in the
service of long-term goals is a key aspect of self-regulation
[5,40]. In an important series of studies, Mischel and
colleagues studied how preschoolers responded in the face
of temptation in situations in which delaying gratication
led to larger rewards [40,41]. Successful self-control was
associated with either redirection of attention away from
temptation or cognitive reframing of hot appetitive fea-
tures into cool representations [40]. A related pattern is
found in behavioral economic studies in which people
discount future rewards in decision-making by choosing
less objectively valuable rewards that are immediately
available [42]. A common feature of these studies is that
people respond to appetizing cues by succumbing to imme-
diate gratication rather than resisting temptation to
achieve long-term goals.
Self-regulatory resource depletion
Self-regulation, like many other cognitive faculties, is sub-
ject to fatigue. One of the more inuential theories to
emerge from this research is that self-regulation draws
on a common domain-general resource, so that, for exam-
ple, regulating ones emotions over an extended period of
time impairs subsequent attempts at resisting the temp-
tation to eat appetizing foods and results in disinhibited
eating [43]. Baumeister and Heatherton proposed a
strength model of self-regulation in which it was hypothe-
sized that the ability to effectively regulate behavior
depends on a limited resource that is consumed by effortful
attempts at self-regulation [5]. In addition, this model also
posited that self-regulatory capacity can be built up
through practice and training (Box 1).
Since its formulation there has been a tremendous surge
in research supporting the notion that self-regulation
relies on a limited resource. Studies of self-regulatory
resource depletion have demonstrated that self-regulatory
resources can be depleted by a wide range of activities,
from suppressing thoughts [44] and inhibiting emotions
[43] to managing the impressions we make [45] and en-
gaging in interracial interactions [46]. A recent meta-anal-
ysis of 83 studies of self-regulatory depletion concluded
that the limited resource account of self-regulation
remains the best explanation for this effect [10]. More
recently, it has been suggested that self-regulation relies
on adequate levels of circulating blood glucose that are
temporarily reduced by tasks that require effortful self-
regulation (Box 2).
Functional neuroimaging studies of self-regulation
Functional neuroimaging studies of self-regulation and its
failures suggest that self-regulation involves a balance
between brain regions representing the reward, salience
and emotional value of a stimulus and prefrontal regions
Box 1. Can self-regulatory capacity be increased?
In addition to postulating that self-regulation relies on a limited
domain-general resource, the limited resource account of self-
regulatory failure [5] also predicted that that self-regulatory capacity
could be increased through practice or training. In the first study to
examine the effect of self-regulatory training, participants engaged
in a variety of daily tasks that required exertion of small amounts of
self-control (e.g. remembering to maintain good posture). Com-
pared to control participants, those who engaged in modest
amounts of daily self-control were more resistant to the effects of
self-regulatory depletion [100]. In addition, it has been shown that
simple self-control regimens, such as using the non-dominant hand
for daily activities, can reduce the depleting effects of suppressing
stereotypes [101]. More recently, these results have been extended
to health behaviors such as smoking cessation. Engaging in simple
daily self-control exercises (e.g. avoiding unhealthy foods) before
stopping smoking led to increased abstinence rates at follow-up for
those who practiced self-control compared to a control group that
did not [102]. These findings support the notion that self-regulatory
strength can be increased through practice and that once increased,
this newfound capacity to self-regulate can be used not only for
comparatively banal tasks such as maintaining posture or using
ones non-dominant hand, but also for behaviors with important
health consequences such as resisting the temptation to smoke.
If self-regulatory capacity can be increased through simple self-
control exercises over relatively short periods of time, what about
people whose profession requires constant self-regulation (e.g.
professional musicians, air traffic controllers)? The study of self-
regulatory capacity in such populations has remained largely
unexplored; however, related research has shown that a relation-
ship exists between musical training and grey matter in the
dorsolateral prefrontal cortex [103], a brain region that has been
implicated in both working memory and self-control [3].
Box 2. Self-regulatory resource depletion and blood
glucose
One issue with the limited resource model of self-regulation has
been the lack of biological specificity in identifying the actual
resource that is depleted by acts of self-control. It has recently been
suggested that self-regulation relies on circulating blood glucose
[104]. In a series of experiments, Gailliot and colleagues demon-
strated that engaging in effortful self-control reduces blood glucose
levels [105]. Moreover, they also found that artificially raising blood
glucose levels eliminates the effects of self-regulatory depletion
[105,106].
Although the notion that glucose metabolism affects self-regula-
tion is recent, the impact of glucose on cognitive performance has
been known for some time. For example, studies conducted in the
1990 s showed that administering glucose improves performance
on memory tasks and on tasks requiring response inhibition [107]. In
many respects this should come as no surprise, because glucose
metabolism is the primary contrast in functional neuroimaging with
positron emission tomography (PET), which, among numerous
other findings, has demonstrated that glucose metabolism in-
creases with task difficulty [108]. In light of this research, it seems
plausible that self-regulatory failure following resource depletion is
at least partly due to a temporary reduction in brain glucose stores.
Finally, self-regulation relies primarily on cognitive functions that
are ascribed to the prefrontal cortex, so depletion effects should
presumably be greatest when both the depleting task and the
subsequent self-regulation task recruit the same region of the brain.
Although this has yet to be tested, PET neuroimaging, with its ability
to directly measure glucose metabolism, is an ideal method for
investigating the link between focal glucose depletion in the brain
and subsequent impairments in self-regulation.
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
134
associated with self-control. When this balance tips in
favor of bottom-up impulses, either because of a failure
to engage prefrontal control areas or because of an espe-
cially strong impulse (e.g. the sight and smell of cigarettes
for an abstinent smoker), then the likelihood of self-regu-
latory failure increases (Figure 2).
Regulation of appetitive behaviors
A universal feature of rewards, including drugs of abuse, is
that they activate dopamine receptors in the mesolimbic
dopamine system, especially the nucleus accumbens
(NAcc) in the ventral striatum [4749]. Functional neuro-
imaging studies have shown that the ingestion of drugs
similarly increases activity in NAcc [50]. Earlier we noted
that cue exposure is associated with self-regulation failure.
Neuroimaging studies reveal a plausible mechanism for
such effects. When addicted individuals are exposed to
visual cues that have become associated with drugs (e.g.
images of drugs and drug paraphernalia), they also show
cue-related activity in the mesolimbic reward system [51
53] and the insula [54]. Likewise, in neuroeconomic studies
of decision-making, activity in mesolimbic reward struc-
tures is associated with choosing immediate monetary
rewards [55,56]. Indeed, dopamine agonists increase im-
pulsive behavior in intertemporal choice tasks [57]. Hence,
exposure to cues activates reward regions, probably be-
cause of learned expectancies that the observed stimulus
will be consumed and provide genuine reward. That is,
over the course of human evolution, food-relevant stimuli,
for example, were usually real and edible rather than mere
visual representations. Thus, cue exposure motivates peo-
ple to seek out relevant rewards. Interestingly, it seems
likely that cue reactivity might inuence motivation out-
side of conscious awareness [24,37,38,54]. Indeed, Child-
ress and colleagues found that unseen stimuli of cocaine
(presented for 33 ms and then backward masked) produced
striatal activity for cocaine addicts [58]. This supports the
proposition that implicit cognition might be important in
part because people are unaware that such unconscious
processes are shaping their behavior and are therefore
unable to resist their inuence [59].
Of particular interest is what happens when partici-
pants attempt to regulate their responses to reward cues
such as those representing money, food or drugs. When
cocaine users [60] or smokers [61,62] are instructed to
inhibit craving, they show increased activity in regions
of the prefrontal cortex (PFC) associated with self-control
and reduced cue-reactivity in regions associated with re-
ward processing. Specically, Volkow and colleagues
showed that when cocaine users inhibit their craving in
response to cocaine cues, they show reduced activity in the
orbitofrontal cortex and ventral striatum [60]. Moreover,
the magnitude of this reduction is correlated with an
increase in activity in lateral PFC [60]. Similarly, in smo-
kers, activity in the dorsolateral PFC during regulation of
smoking craving correlated with reduced activity in the
ventral striatum to smoking cues and this relationship
mediated reductions in self-reported craving [61]. This
effect is also observed in healthy participants who are
instructed to regulate their response to cues representing
monetary rewards; regulation of their response to reward
cues results in decreased cue-related activity in the ventral
striatum [63]. Finally, a recent study extended the above
ndings by demonstrating that individual differences in
activity in the lateral PFC during a simple inhibition task
were associated with real-world reductions in cigarette
craving and consumption among smokers over a 3-week
period [64].
The above studies indicate that regulation of craving
requires top-down control of brain reward systems by PFC
control regions [60,61,63]. But what happens when self-
control breaks down? As mentioned previously, one com-
mon reason why self-regulation fails is lapse-activated
consumption, such as when dieters break their diet and
temporarily engage in disinhibited eating [20,27,65,66].
One possible mechanism for this paradoxical pattern is
that the initial intake of the food serves as a hedonic prime,
and thereby brain regions involved in reward (i.e. NAcc)
are freed from the regulatory inuence of PFC, subse-
quently demonstrating a heightened response to appetiz-
ing food. A recent study tested this proposition by
examining the effect of breaking a diet on neural

Threats to self-regulation
Cue exposure
Lapse activated consumption
Negative mood
Resource depletion
Alcohol consumption
Prefrontal brain damage
Lateral
PFC
NAcc
amygdala
Impulses overwhelm
prefrontal control
Prefrontal-subcortical
circuit is broken
Leading to
self-regulatory failure
PFC function
is impaired
X
X
TRENDS in Cognitive Sciences
Figure 2. Schematic of a balance model of self-regulation and its failure, highlighting the four threats to self-regulation identified in the text and their putative impact on
brain areas involved in self-regulation. This model suggests that self-regulatory failure occurs whenever the balance is tipped in favor of subcortical regions involved in
reward and emotion, either due to the strength of an impulse or due to a failure to appropriately engage top-down control mechanisms.
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
135
cue-reactivity to appetizing foods in dieters [67]. Compared
to both non-dieters and dieters whose diet remained intact,
those who had their diet broken showed increased cue-
reactivity to appetizing foods in the NAcc (Figure 1b),
which echoes the behavioral ndings of Herman and
Mack[27]. Interestingly, non-dieters showed the opposite
result; the NAcc showed the greatest response in the water
condition, when subjects might have been hungry, but not
in the milkshake condition, when participants were sati-
ated. Thus, exposure to relevant cues or ingestion of for-
bidden substances heightens subcortical activity in reward
regions, thereby tipping the balance so that frontal
mechanisms seem to have less power over behavior.
Self-regulation failure also occurs when frontal execu-
tive functions are compromised, such as following alcohol
consumption [68] or injury [3]. For instance, patients with
frontal lobe damage show a preference for immediate
rewards in intertemporal choice tasks [69]. Likewise, tran-
scranial magnetic stimulation to lateral PFC increases
choices of immediate over delayed rewards [70]. It is
plausible that negative mood and resource depletion inter-
fere with self-regulation because they disrupt frontal con-
trol, thereby tipping the balance. We noted above that
negative emotional states are associated with self-regula-
tion failure, possibly because they interfere with higher-
order representations, such as those involved in self-
awareness and insight. Sinha and colleagues found that
recall of personally distressing episodes led to decreased
activity in PFC and increased activity in ventral striatal
regions [71], which supports the idea that stress tips the
balance to favor subcortical structures.
Regulation of emotions
Paralleling studies of appetitive regulation, research on
emotion regulation has converged on a top-down model
whereby neural responses to emotional material in the
amygdala and associated limbic regions are downregulated
by the lateral PFC [7274]. Analogous to the cue-reactivity
research outlined above, a frequent nding in studies of
emotion regulation is of an inverse relationship between
activity in the lateral PFC and the amygdala, a limbic
structure sensitive to emotionally arousing stimuli [74
78]. For instance, Wager and colleagues found that two
independent pathways mediate frontal regulation of emo-
tion: a frontalstriatal pathway is associated with success-
ful regulation whereas a frontalamygdala pathway is
associated with less successful regulation [79]. Likewise,
Schardt et al. found that increased functional coupling
between lateral PFC and amygdala was associated with
successful emotion regulation for those with genotypes
associated with hyper-responsivity to negative stimuli
[80].
Research on patients with mood disorders has demon-
strated that the reciprocal relationship between PFC and
amygdala during emotion regulation breaks down in
patients suffering from major depressive disorder and
borderline personality disorder (BPD) [75,81,82]. Recent
studies suggest that this prefrontalamygdala circuit
might be related to differences in brain structure and
connectivity. For instance, in contrast to controls, partici-
pants with BPD showed no coupling of metabolism be-
tween the medial PFC and the amygdala [83]. Similarly,
reductions in white matter connectivity between the me-
dial PFC and the amygdala, as measured with diffusion
tensor imaging, were found for individuals with high anxi-
ety [84]. In the non-clinical population, it has been shown
that prolonged sleep deprivation leads to increased amyg-
dala response to aversive images [85].
Regulation of attitudes and prejudice
Social psychological models of person categorization sug-
gest that stereotypes are automatically activated on en-
countering outgroup members and that active inhibition is
required to suppress stereotypes and thereby avoid preju-
dicial behavior [86,87]. Functional neuroimaging research
on race perception has largely corroborated these models
by showing evidence of top-down regulation of the amyg-
dala by the lateral PFC when viewing members of a racial
outgroup [88,89]. Echoing the ndings on the regulation of
craving and emotions outlined above, activity in the lateral
PFC was found to be inversely correlated with amygdala
activity to racial outgroup members (i.e. African Ameri-
cans) when viewing faces [88] and when assigning a verbal
label to faces [89].
Further evidence that the recruitment of lateral PFC
observed in these studies reects self-regulatory processes
comes from a study by Richeson and colleagues that com-
bined functional neuroimaging with a behavioral measure
of self-regulatory resource depletion [90]. Activity in the
PFC (specically lateral PFC and anterior cingulate cor-
tex) when viewing black versus white faces was correlated
with the degree to which participants experienced self-
regulatory resource depletion in a separate behavioral
experiment in which they were required to discuss racially
charged topic with a black confederate [90]. Put differently,
the degree to which participants found the inter-racial
interaction cognitively depleting was associated with in-
creased activity in lateral prefrontal regions when viewing
black versus white faces during fMRI. Taken together,
these ndings suggest that, as with emotions and drug
cues, regulation of attitudes towards outgroup members
requires downregulation of the amygdala by the PFC.
Prefrontalsubcortical balance model of self-regulation
Alongstanding idea in psychology is that resisting tempta-
tions reects competition between impulses and self-con-
trol [2,5,40]. More recently, such dual-system models have
received support from imaging research, with substantial
evidence of frontalsubcortical connectivity and reciprocal
activity [15,49,60,9194]. Neuroscientic models of emo-
tion regulation and self-control in drug addiction share
conceptual similarities. For instance, models of drug ad-
diction posit that brain reward systems are hypersensi-
tized to drug cues and become uncoupled fromPFCregions
involved in top-down regulation [95,96]. Likewise, neuroe-
conomic studies of decision-making nd that PFC activity
is associated with long-termoutcomes, whereas subcortical
activity is associated with more immediate outcomes [97].
Similarly, models of emotion regulation and stereotype
suppression suggest that prefrontal regions are involved
in actively regulating emotion or prejudicial attitudes
based on the observation of an inverse relationship be-
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
136
tween PFC and activity in the amygdala [77,88,89]. Stud-
ies of patients with anxiety and mood disorders offer
similar evidence in the form of reduced functional [75]
and structural [84] connectivity between the PFC and
the amygdala. Similarly, alcohol consumption, which is
known to disrupt self-regulation, shifts activity from the
PFC to subcortical limbic structures [98], whereas exces-
sive alcohol use leads to degeneration in cortical areas
important for controlling impulsivity [68], which might
serve to further undermine attempts to control impulses
among alcoholics. During development, when frontal exec-
utive functions are still maturing, subcortical structures
might more easily tip the balance and overwhelm self-
regulatory resources, thereby explaining why adolescents
might be prone to heightened emotionality and risk-taking
[15].
What these different models have in common is the
notion that during successful self-regulation, there is a
balance between prefrontal regions involved in self-control
and subcortical regions involved in representing reward
incentives, emotions or attitudes. We propose that the
precise subcortical target of top-down control is dependent
on the regulatory context that individuals nd themselves
in: whena personregulates their food intake, this involves a
prefrontalstriatal circuit, and when this same person later
regulates their emotions, they instead invoke a prefrontal
amygdala circuit. From this perspective, the nature of self-
regulation is constant across different types of regulation,
despite variability in the neural regions that are being
regulated [49]. Indeed, a recent review of self-control across
six different domains found that lateral PFC is involved in
exerting control regardless of the specic domain [99]. This
supports our conjecture that the mechanismfor self-regula-
tion is domain-general, whereas the subcortical region in-
volved varies depending on the nature of the stimulus,
which might explain why the effects of resource depletion
are not tied to any one self-regulatory domain.
Why do people fail at self-regulation?
Giving in to temptations can occur for a variety of reasons;
for instance, dieters attempting to control their food in-
take might nd it easy to ignore most foods, but when
confronted with their favorite dessert their craving can
overpower their resolve. Similarly, bad moods or compet-
ing regulatory demands can all conspire to break the hold
people have over their impulses and desires. From the
perspective of the prefrontalsubcortical balance model
outlined above, anything that tips the balance in favor of
subcortical regions can lead to self-regulatory collapse.
This can occur in a bottom-up manner when people are
confronted with especially potent cues, such as a favorite
food, a free drink or a strong emotion, and in a top-down
manner, such as when prefrontal functioning is impaired
either when self-regulatory resources are depleted or due
to drugs, alcohol or brain damage [3]. Therefore, for suc-
cessful self-regulation, current self-regulatory ability
must withstand the strength of an impulse. On this point,
researchers have generally neglected to consider the situ-
ational factors that inuence the balance between activity
in subcortical regions and the PFC in self-regulation
failure (Box 3). Our review suggests that some classic
self-regulatory failures occur because of their inuence
on reward (i.e. cue reactivity and lapse-activated con-
sumption) whereas others occur because of their inuence
on PFC (i.e. negative moods, self-regulatory depletion,
physiological disruption or damage of PFC).
We also note that self-regulatory failure depends on the
individual. That is, the particular domain a person tries to
control is the one that is most prone to self-regulation
failure. For example, self-regulatory resource depletion
might lead an abstinent smoker to turn to cigarettes, a
dieter to high-calorie foods or a prejudiced individual to
make bigoted remarks; although the outcome is different in
each case and the underlying subcortical regions involved
can even differ (i.e. striatum or amygdala), the overall
process is probably the same.
Concluding remarks
In this review we highlighted a number of threats to self-
regulation, from negative mood and potent appetitive cues
to lapse-activated consumption and self-regulatory re-
source depletion. Neuroimaging research on self-regulato-
ry failure is still in its infancy. Recently, a small number of
studies of drug addicts, patients and healthy individuals
have shed light on the neural mechanisms underlying self-
regulatory failure. This research corroborates theoretical
models of self-control in which the PFC is involved in
actively regulating subcortical responses to emotions
and appetitive cues. This prefrontalsubcortical balance
model emphasizes that self-regulatory collapse can occur
because of both insufcient top-down control and over-
whelming bottom-up impulses.
Acknowledgments
We thank Bill Kelley and Paul Whalen for helpful discussions in
developing this model. This work was supported by NIH grant
R01DA022582.
References
1 Baumeister, R.F. et al. (1994) Losing Control: How and Why People
Fail at Self-Regulation, Academic Press
2 Hofmann, W. et al. (2009) Impulse and self-control from a dual-
systems perspective. Perspect. Psychol. Sci. 4, 162176
3 Wagner, D.D. and Heatherton, T.F. (2010) Giving in to temptation:
the emerging cognitive neuroscience of self-regulatory failure, In
Handbook of Self-Regulation: Research, Theory, and Applications
(2nd edn) (Vohs, K.D. and Baumeister, R.F., eds), pp. 4163,
Guilford Press
Box 3. Outstanding questions
Are individual differences in susceptibility to self-regulatory
failure related to prefrontalsubcortical connectivity or the
integrity of frontal circuitry?
Can direct measurements of brain glucose levels with FDG PET be
used to test the glucose model of resource depletion?
Does self-regulatory training alter brain connectivity and morpho-
metry and do these changes predict greater self-regulatory
success?
Are patients with prefrontal damage, or adults with age-related
cognitive decline, more susceptible to external cues such as
appetizing foods or the sight and smell of cigarettes?
Does the frontalsubcortical reciprocal relation change during
childhood development or during aging or as a function of
substance use?
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
137
4 Heatherton, T.F. (2011) Self and identity: neuroscience of self and self-
regulation. Annu. Rev. Psychol. 62, 363390
5 Baumeister, R.F. and Heatherton, T.F. (1996) Self-regulation failure:
an overview. Psychol. Inq. 7, 115
6 Schroeder, S.A. (2007) We can do better improving the health of the
American people. New Engl. J. Med. 357, 12211228
7 Tangney, J.P. et al. (2004) High self-control predicts good adjustment,
less pathology, better grades, and interpersonal success. J. Pers. 72,
271324
8 Duckworth, A.L. and Seligman, M.E. (2005) Self-discipline outdoes IQ
in predicting academic performance of adolescents. Psychol. Sci. 16,
939944
9 Quinn, P.D. and Fromme, K. (2010) Self-regulation as a protective
factor against risky drinking and sexual behavior. Psychol. Addict.
Behav. 24, 376385
10 Hagger, M.S. et al. (2010) Ego depletion and the strength model of self-
control: a meta-analysis. Psychol. Bull. 136, 495525
11 Marlatt, G.A. and Gordon, J.R. (1985) Relapse Prevention:
Maintenance Strategies in the Treatment of Addictive Behaviors,
Guilford Press
12 Sinha, R. (2009) Modeling stress and drug craving in the laboratory:
implications for addiction treatment development. Addict. Biol. 14,
8498
13 Anderson, C.A. and Bushman, B.J. (2002) Human aggression. Annu.
Rev. Psychol. 53, 2751
14 Bruyneel, S.D. et al. (2009) I felt low and my purse feels light:
depleting mood regulation attempts affect risk decision making. J.
Behav. Decis. Making 22, 153170
15 Somerville, L.H. et al. (2010) A time of change: behavioral and neural
correlates of adolescent sensitivity to appetitive and aversive
environmental cues. Brain Cogn. 72, 124133
16 Bousman, C.A. et al. (2009) Negative mood and sexual behavior
among non-monogamous men who have sex with men in the
context of methamphetamine and HIV. J. Affect. Disord. 119, 8491
17 Magid, V. et al. (2009) Negative affect, stress, and smoking in college
students: unique associations independent of alcohol and marijuana
use. Addict. Behav. 34, 973975
18 Sinha, R. (2007) The role of stress in addiction relapse. Curr.
Psychiatry Rep. 9, 388395
19 Witkiewitz, K. and Villarroel, N.A. (2009) Dynamic association
between negative affect and alcohol lapses following alcohol
treatment. J. Consult. Clin. Psychol. 77, 633644
20 Heatherton, T.F. et al. (1991) Effects of physical threat and ego threat
on eating behavior. J. Pers. Soc. Psychol. 60, 138143
21 Macht, M. (2008) How emotions affect eating: a ve-way model.
Appetite 50, 111
22 McKee, S. et al. (2010) Stress decreases the ability to resist smoking
and potentiates smoking intensity and reward. J. Psychopharmacol.
DOI: 10.1177/0269881110376694
23 Heatherton, T.F. and Baumeister, R.F. (1991) Binge eating as escape
from self-awareness. Psychol. Bull. 110, 86108
24 Goldstein, R.Z. et al. (2009) The neurocircuitry of impaired insight in
drug addiction. Trends Cogn. Sci. 13, 372380
25 Ward, A. and Mann, T. (2000) Dont mind if I do: disinhibited eating
under cognitive load. J. Pers. Soc. Psychol. 78, 753763
26 Sinha, R. (2008) Chronic stress, drug use, and vulnerability to
addiction. Ann. N.Y. Acad. Sci. 1141, 105130
27 Herman, C.P. and Mack, D. (1975) Restrained and unrestrained
eating. J. Pers. 43, 647660
28 Herman, C.P. and Polivy, J. (2010) The self-regulation of eating:
theoretical and practical problems, In Handbook of Self-
Regulation: Research, Theory, and Applications (2nd edn) (Vohs,
K.D. and Baumeister, R.F., eds), pp. 492508, Guilford Press
29 Marlatt, G.A. et al. (2009) Relapse prevention: evidence base and
future directions, In Evidence-Based Addiction Treatment (1st edn)
(Miller, P.M., ed.), pp. 215232, Elsevier/Academic Press
30 Drummond, D.C. et al. (1990) Conditioned learning in alcohol
dependence: implications for cue exposure treatment. Br. J. Addict.
85, 725743
31 Glautier, S. and Drummond, D.C. (1994) Alcohol dependence and cue
reactivity. J. Stud. Alcohol. 55, 224229
32 Jansen, A. (1998) A learning model of binge eating: cue reactivity and
cue exposure. Behav. Res. Ther. 36, 257272
33 Stewart, J. et al. (1984) Role of unconditioned and conditioned drug
effects in the self-administration of opiates and stimulants. Psychol.
Rev. 91, 251268
34 Drobes, D.J. and Tiffany, S.T. (1997) Induction of smoking urge
through imaginal and in vivo procedures: physiological and self-
report manifestations. J. Abnorm. Psychol. 106, 1525
35 Payne, T.J. et al. (2006) Pretreatment cue reactivity predicts end-of-
treatment smoking. Addict. Behav. 31, 702710
36 Ferguson, M.J. and Bargh, J.A. (2004) How social perception can
automatically inuence behavior. Trends Cogn. Sci. 8, 3339
37 Stacy, A.W. and Wiers, R.W. (2010) Implicit cognition and addiction: a
tool for explaining paradoxical behavior. Annu. Rev. Clin. Psychol. 6,
551575
38 Bargh, J.A. and Morsella, E. (2008) The unconscious mind. Perspect
Psychol. Sci. 3, 7379
39 Rooke, S.E. et al. (2008) Implicit cognition and substance use: a meta-
analysis. Addict. Behav. 33, 13141328
40 Metcalfe, J. and Mischel, W. (1999) Ahot/cool-systemanalysis of delay
of gratication: dynamics of willpower. Psychol. Rev. 106, 319
41 Mischel, W. et al. (2010) Willpower over the life span: mechanisms,
consequences, and implications. Soc. Cogn. Affect. Neurosci. DOI:
10.1093/scan/nsq081
42 Bickel, W.K. and Marsch, L.A. (2001) Toward a behavioral economic
understanding of drug dependence: delay discounting processes.
Addiction 96, 7386
43 Vohs, K.D. and Heatherton, T.F. (2000) Self-regulatory failure: a
resource-depletion approach. Psychol. Sci. 11, 249254
44 Muraven, M. et al. (2002) Self-control and alcohol restraint: an initial
application of the self-control strength model. Psychol. Addict. Behav.
16, 113120
45 Vohs, K.D. et al. (2005) Self-regulation and self-presentation:
regulatory resource depletion impairs impression management and
effortful self-presentation depletes regulatory resources. J. Pers. Soc.
Psychol. 88, 632657
46 Richeson, J.A. and Shelton, J.N. (2003) When prejudice does not pay:
effects of interracial contact on executive function. Psychol. Sci. 14,
287290
47 Baler, R.D. and Volkow, N.D. (2006) Drug addiction: the neurobiology
of disrupted self-control. Trends Mol. Med. 12, 559566
48 Robinson, T.E. and Berridge, K.C. (2003) Addiction. Annu. Rev.
Psychol. 54, 2553
49 Volkow, N.D. et al. (2008) Overlapping neuronal circuits in addiction
and obesity: evidence of systems pathology. Philos. Trans. R. Soc.
Lond. B: Biol. Sci. 363, 31913200
50 ODoherty, J.P. et al. (2003) Temporal difference models and reward-
related learning in the human brain. Neuron 38, 329337
51 Garavan, H. et al. (2000) Cue-induced cocaine craving:
neuroanatomical specicity for drug users and drug stimuli. Am. J.
Psychiatry 157, 17891798
52 Grant, S. et al. (1996) Activation of memory circuits during cue-
elicited cocaine craving. Proc. Natl. Acad. Sci. U.S.A. 93, 1204012045
53 Myrick, H. et al. (2008) Effect of naltrexone and ondansetron on
alcohol cue-induced activation of the ventral striatum in alcohol-
dependent people. Arch. Gen. Psychiatry 65, 466475
54 Naqvi, N.H. and Bechara, A. (2009) The hidden island of addiction: the
insula. Trends Neurosci. 32, 5667
55 Diekhof, E.K. and Gruber, O. (2010) When desires collide with reason:
functional interactions between anteroventral prefrontal cortex and
nucleus accumbens underlie the human ability to resist impulsive
desires. J. Neurosci. 30, 14881493
56 McClure, S.M. et al. (2004) Separate neural systems value immediate
and delayed monetary rewards. Science 306, 503507
57 Pine, A. et al. (2010) Dopamine, time, and impulsivity in humans. J.
Neurosci. 30, 88888896
58 Childress, A.R. et al. (2008) Prelude to passion: limbic activation by
unseen drug and sexual cues. PLoS ONE 3, e1506
59 Wagner, D.D. et al. (2011) Spontaneous action representation in
smokers watching movie smoking. J. Neurosci. 31, 894898
60 Volkow, N.D. et al. (2010) Cognitive control of drug craving inhibits
brain reward regions in cocaine abusers. Neuroimage 49, 2536
2543
61 Kober, H. et al. (2010) Prefrontal-striatal pathway underlies cognitive
regulation of craving. Proc. Natl. Acad. Sci. U.S.A. 107, 1481114816
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
138
62 Brody, A.L. et al. (2007) Neural substrates of resisting craving during
cigarette cue exposure. Biol. Psychiatry 62, 642651
63 Delgado, M.R. et al. (2008) Regulating the expectation of reward via
cognitive strategies. Nat. Neurosci. 11, 880881
64 Berkman, E.T., et al. In the trenches of real-world self-control: Neural
correlates of breaking the link between craving and smoking. Psychol.
Sci., in press
65 Heatherton, T.F. et al. (1992) Effects of distress on eating: the
importance of ego-involvement. J. Pers. Soc. Psychol. 62, 801803
66 Heatherton, T.F. et al. (1993) Self-awareness, task failure, and
disinhibition: how attentional focus affects eating. J. Pers. 61, 4961
67 Demos, K.E. et al. (2011) Dietary restraint violations inuence reward
responses in nucleus accumbens and amygdala. J. Cogn. Neurosci.
21568 DOI: 10.1162/jocn. 2010
68 Crews, F.T. and Boettiger, C.A. (2009) Impulsivity, frontal lobes and
risk for addiction. Pharmacol. Biochem. Behav. 93, 237247
69 Sellitto, M., Ciaramelli, E. and de Pellegrino, G. (2010) Myopic
discounting of future rewards after medial orbitofrontal damage in
humans. J. Neurosci. 30, 64296436
70 Figner, B. et al. (2010) Lateral prefrontal cortex and self-control in
intertemporal choice. Nat. Neurosci. 13, 538539
71 Sinha, R. et al. (2005) Neural activity associated with stress-induced
cocaine craving: a functional magnetic resonance imaging study.
Psychopharmacology 183, 171180
72 Davidson, R.J. et al. (2000) Dysfunction in the neural circuitry of
emotion regulation a possible prelude to violence. Science 289, 591
594
73 Ochsner, K.N. and Gross, J.J. (2005) The cognitive control of emotion.
Trends Cogn. Sci. 9, 242249
74 Hariri, A.R. et al. (2003) Neocortical modulation of the amygdala
response to fearful stimuli. Biol. Psychiatry 53, 494501
75 Johnstone, T. et al. (2007) Failure to regulate: counterproductive
recruitment of top-down prefrontalsubcortical circuitry in major
depression. J. Neurosci. 27, 88778884
76 Ochsner, K.N. et al. (2002) Rethinking feelings: an FMRI study of the
cognitive regulation of emotion. J. Cogn. Neurosci. 14, 12151229
77 Ochsner, K.N. et al. (2004) For better or for worse: neural systems
supporting the cognitive down- and up-regulation of negative
emotion. Neuroimage 23, 483499
78 Urry, H.L. et al. (2006) Amygdala and ventromedial prefrontal cortex
are inversely coupled during regulation of negative affect and predict
the diurnal pattern of cortisol secretion among older adults. J.
Neurosci. 26, 44154425
79 Wager, T.D. et al. (2008) Prefrontalsubcortical pathways mediating
successful emotion regulation. Neuron 59, 10371050
80 Schardt, D.M. et al. (2010) Volition diminishes genetically mediated
amygdala hyperreactivity. Neuroimage 53, 943951
81 Donegan, N.H. et al. (2003) Amygdala hyperreactivity in borderline
personality disorder: implications for emotional dysregulation. Biol.
Psychiatry 54, 12841293
82 Silbersweig, D. et al. (2007) Failure of frontolimbic inhibitory function
in the context of negative emotion in borderline personality disorder.
Am. J. Psychiatry 164, 18321841
83 New, A.S. et al. (2007) Amygdalaprefrontal disconnection in
borderline personality disorder. Neuropsychopharmacology 32,
16291640
84 Kim, M.J. and Whalen, P.J. (2009) The structural integrity of an
amygdalaprefrontal pathway predicts trait anxiety. J. Neurosci. 29,
1161411618
85 Yoo, S.S. et al. (2007) The human emotional brain without sleep a
prefrontal amygdala disconnect. Curr. Biol. 17, R877878
86 Devine, P.G. (1989) Stereotypes and prejudice their automatic and
controlled components. J. Pers. Soc. Psychol. 56, 518
87 Fiske, S.T. (1998) Stereotyping, prejudice, and discrimination. In The
Handbook of Social Psychology (Vol. 2) (Gilbert, D. et al., eds), In pp.
357411, McGraw-Hill
88 Cunningham, W.A. et al. (2004) Separable neural components in the
processing of black and white faces. Psychol. Sci. 15, 806813
89 Lieberman, M.D. et al. (2005) An fMRI investigation of race-related
amygdala activity in African-American and Caucasian-American
individuals. Nat. Neurosci. 8, 720722
90 Richeson, J.A. et al. (2003) An fMRI investigation of the impact of
interracial contact on executive function. Nat. Neurosci. 6, 13231328
91 Banks, S.J. et al. (2007) Amygdalafrontal connectivity during
emotion regulation. Soc. Cogn. Affect. Neurosci. 2, 303312
92 Batterink, L. et al. (2010) Body mass correlates inversely with
inhibitory control in response to food among adolescent girls: an
fMRI study. Neuroimage 52, 16961703
93 Li, C.S. and Sinha, R. (2008) Inhibitory control and emotional stress
regulation: neuroimaging evidence for frontallimbic dysfunction in
psycho-stimulant addiction. Neurosci. Biobehav. Rev. 32, 581597
94 MacDonald, K.B. (2008) Effortful control, explicit processing, and the
regulation of human evolved predispositions. Psychol. Rev. 115, 1012
1031
95 Bechara, A. (2005) Decision making, impulse control and loss of
willpower to resist drugs: a neurocognitive perspective. Nat.
Neurosci. 8, 14581463
96 Koob, G.F. and Le Moal, M. (2008) Addiction and the brain antireward
system. Annu. Rev. Psychol. 59, 2953
97 Heuttel, S.A. (2010) Ten challenges for decision neuroscience. Front.
Neurosci. 4, 17
98 Volkow, N.D. et al. (2008) Moderate doses of alcohol disrupt the
functional organization of the human brain. Psychiatry Res. 162,
205213
99 Cohen, J.R. and Lieberman, M.D. (2010) The common neural basis of
exerting self-control in multiple domains. In Self Control in Society,
Mind, and Brain (Hassin, R. et al., eds), pp. 141162, Oxford
University Press
100 Muraven, M. et al. (1999) Longitudinal improvement of self-
regulation through practice: building self-control strength through
repeated exercise. J. Soc. Psychol. 139, 446457
101 Gailliot, M.T. et al. (2007) Increasing self-regulatory strength can
reduce the depleting effect of suppressing stereotypes. Pers. Soc.
Psychol. Bull. 33, 281294
102 Muraven, M. (2010) Practicing self-control lowers the risk of smoking
lapse. Psychol. Addict. Behav. 24, 446452
103 Bermudez, P. et al. (2009) Neuroanatomical correlates of
musicianship as revealed by cortical thickness and voxel-based
morphometry. Cereb. Cortex 19, 15831596
104 Gailliot, M.T. and Baumeister, R.F. (2007) The physiology of
willpower: linking blood glucose to self-control. Pers. Soc. Psychol.
Rev. 11, 303327
105 Gailliot, M.T. et al. (2007) Self-control relies on glucose as a limited
energy source: willpower is more than a metaphor. J. Pers. Soc.
Psychol. 92, 325336
106 Gailliot, M.T. et al. (2009) Stereotypes and prejudice in the blood:
sucrose drinks reduce prejudice and stereotyping. J. Exp. Soc.
Psychol. 45, 288290
107 Benton, D. et al. (1994) Blood glucose inuences memory and
attention in young adults. Neuropsychologia 32, 595607
108 Jonides, J. et al. (1997) Verbal working memory load affects regional
brain activation as measured by PET. J. Cogn. Neurosci. 9, 462475
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
139

También podría gustarte