Documentos de Académico
Documentos de Profesional
Documentos de Cultura
words, phrases and sentences. Notably, the neural tracking of hierarchical linguistic structures was dissociated from the encoding
of acoustic cues and from the predictability of incoming words. Our results indicate that a hierarchy of neural processing
timescales underlies grammar-based internal construction of hierarchical linguistic structure.
corresponding to larger linguistic structures such as phrases and Cortical activity was recorded from native listeners of Mandarin
sentences, and that the neural representation of each linguistic level Chinese using magnetoencephalography (MEG). Given that differ-
corresponds to timescales matching the timescales of the respective ent linguistic levels, that is, the monosyllabic morphemes, phrases
linguistic level. and sentences, were presented at unique and constant rates, the
Although linguistic structure building can clearly benefit from hypothesized neural tracking of hierarchical linguistic structure
prosodic20,21 or statistical cues22, it can also be achieved purely on was tagged at distinct frequencies.
the basis of the listeners grammatical knowledge. To experimentally The MEG response was analyzed in the frequency domain and
isolate the neural representation of the internally constructed hier- we extracted response power in every frequency bin using an opti-
archical linguistic structure, we developed new speech materials mal spatial filter (Online Methods). Consistent with our hypothesis,
in which the linguistic constituent structure was dissociated from the response spectrum showed three peaks at the syllabic rate (P = 1.4
prosodic or statistical cues. By manipulating the levels of linguistic 105, paired one-sided t test, false discovery rate (FDR) corrected),
abstraction, we found separable neural encoding of each different phrasal rate (P = 1.6 104, paired one-sided t test, FDR corrected)
linguistic level. and sentential rate (P = 9.6 107, paired one-sided t test, FDR
1Department of Psychology, New York University, New York, New York, USA. 2College of Biomedical Engineering and Instrument Sciences, Zhejiang University,
Hangzhou, China. 3Department of Neurology, New York University Langone Medical Center, New York, New York, USA. 4Department of Neurophysiology, Max-Planck
Institute for Brain Research, Frankfurt, Germany. 5Department of Psychiatry, Columbia University, New York, New York, USA. 6Department of Psychology and Beijing
Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China. 7PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, China.
8Peking-Tsinghua Center for Life Sciences, Beijing, China. 9New York University Shanghai, Shanghai, China. 10NYU-ECNU Institute of Brain and Cognitive Science
at NYU Shanghai, Shanghai, China. 11Neuroscience Department, Max-Planck Institute for Empirical Aesthetics, Frankfurt, Germany. Correspondence should be
addressed to N.D. (ding_nai@zju.edu.cn) or D.P. (david.poeppel@nyu.edu).
Intensity
one-sided t test, FDR corrected). The topographical maps of response
power across sensors are shown for the peak frequencies.
1 Hz 2 Hz 3 Hz 4 Hz
corrected) and the response was highly consistent across listeners Neural response spectrum
(Fig. 1c). Given that the phrasal- and sentential-rate rhythms were
not conveyed by acoustic fluctuations at the corresponding frequen- c fsentence fphrase fsyllable
cies (Fig. 1b), cortical responses at the phrasal and sentential rates * * *
must be a consequence of internal online structure building processes.
Power (dB)
Cortical activity at all the three peak frequencies was seen bilater- Max
ally (Fig. 1c). The response power averaged over sensors in each
hemisphere was significantly stronger in the left hemisphere at the
2016 Nature America, Inc. All rights reserved.
6 dB
sentential rate (P = 0.014, paired two-sided t test), but not at the 1 Hz 2 Hz 3 Hz 4 Hz
Min
phrasal (P = 0.20, paired two-sided t test) or syllabic rates (P = 0.40, Frequency
paired two-sided t test).
To test whether the phrase-level responses segregate from the sen-
Dependence on syntactic structures tence level, we constructed longer verb phrases that were unevenly
Are the responses at the phrasal and sentential rates indeed separate divided into a monosyllabic verb followed by a three-syllable noun
neural indices of processing at distinct linguistic levels or are they phrase (Fig. 2c). We expect that the neural responses to the long
merely sub-harmonics of the syllabic rate response, generated by verb phrase to be tagged at 1 Hz, whereas the neural responses to the
intrinsic cortical dynamical properties? We address this question by monosyllabic verb and the three-syllable noun phrase will present as
manipulating different levels of linguistic structure in the input. When harmonics of 1 Hz. Consistent with our hypothesis, cortical dynam-
the stimulus is a sequence of random syllables that preserves the ics emerged at one-fourth of the syllabic rate, whereas the response
acoustic properties of Chinese sentences (Fig. 1 and Supplementary at half of the syllabic rate is no longer detectable (P = 1.9 104, 1.7
Fig. 2), but eliminates the phrasal/sentential structure, only syllabic 104 and 9.3 104 at 1, 3 and 4 Hz, respectively, paired one-sided
(acoustic) level tracking occurs (P = 1.1 104 at 4 Hz, paired one- t test, FDR corrected).
sided t test, FDR corrected; Fig. 2a). Furthermore, this manipulation
preserves the position of each syllable in a sentence (Online Methods) Dependence on language comprehension
and therefore further demonstrates that the phrasal- and sentential- When listening to Chinese sentences (Fig. 1a), listeners who did not
rate responses are not a result of possible acoustic differences between understand Chinese only showed responses to the syllabic (acoustic)
npg
the syllables in a sentence. When two adjacent syllables and mor- rhythm (P = 3.0 105 at 4 Hz, paired one-sided t test, FDR corrected;
phemes combine into verb phrases, but there is no four-element sen- Fig. 2d), further supporting the argument that cortical responses
tential structure, phrasal-level tracking emerges at half of the syllabic to larger, abstract linguistic structures is a direct consequence of
rate (P = 8.6 104 at 2 Hz and P = 2.7 104 at 4 Hz, paired one-sided language comprehension.
t test, FDR corrected; Fig. 2b). Similar responses are observed for If aligning cortical dynamics to the time course of linguistic constit-
noun phrases (Supplementary Fig. 3). uent structure is a general mechanism required for comprehension,
it must apply across languages. Indeed, when native English speakers
Chinese materials, Chinese listeners Chinese materials, English listeners were tested with English materials (Fig. 1a), their cortical activity also
a *
d * followed the time course of larger linguistic structures, that is, phrases
12 dB 12 dB and sentences (P = 4.1 105, syllabic rate; Fig. 2e; P = 3.9 103,
Power
Power
1 2 3 4
1 2 3 4 Figure 2 Tracking of different linguistic structures. Each panel shows
b * * e English materials, English listeners syntactic structure repeating in the stimulus (left) and the cortical
response spectrum (right; shaded area indicates 2 s.e.m. over listeners,
Power
*
N = 8). (a) Chinese listeners, Chinese materials: syllables were
Power
Figure 3 Dissociating sentential structures and transitional probability. a Constant transitional probability c Transitional Fourier
(a,b) Grammar of an artificial Markovian stimulus set with constant (a) 1/5 1/5 1/5 probability spectrum
C1 C2 C3 fc
or variable (b) transitional probability. Each sentence consists of three 1/5
acoustic chunks, each containing 12 English words. The listeners
the boy ordered beer, soup, salad C1 C2 C3 C1 C2 C3
memorized the grammar before experiments. (c) Schematic time course a girl lives in pizza, coffee 1 1 1 1
.. fs fc
and spectrum of the transitional probability. (d) Neural response spectrum her dad speaks . 1/25 1/25
(shaded area covers 2 s.e.m. over listeners, N = 8). Significant neural John didnt book, letter, story
Jess wrote a poem, memo Time Frequency
responses to sentences were seen for both languages. Spectral peaks are
shown by an asterisk (P < 0.001, paired one-sided t test, FDR corrected,
b d Constant transitional probability
Varying transitional probability
same color code as the spectrum). Responses were not significantly Varying transitional probability
different between the two languages in any frequency bin (paired ** fs **
1/25 1 1 fc
two-sided t test, P > 0.09, uncorrected). C1 C2 C3 6 dB
**
Power (dB)
my cat is so lovely
4.3 103 and 6.8 106 at the sentential, phrasal and syllabic rates, N = 25 they grow apples
Sarah looks happy
respectively; Fig. 2f; paired one-sided t test, FDR corrected). ... ... ... 1/1.05 2/1.05 3/1.05
Frequency (Hz)
become more predictable. Thus, cortical networks solely tracking prior knowledge of the transitional probabilities between acoustic
transitional probabilities across smaller units could show temporal chunks. To control for the effect of such prior knowledge, we created
dynamics matching the timescale of larger structures. To test this a set of Artificial Markovian Sentences (AMS). In the AMS, the tran-
alternative hypothesis, we crafted a constant transitional probability sitional probability between syllables was the same in and across sen-
Markovian Sentence Set (MSS) in which the transitional probability tences (Supplementary Fig. 4a). The AMS was composed of Chinese
of lower level units was dissociated from the higher level structures syllables, but no meaningful Chinese expressions were embedded in
(Fig. 3a and Supplementary Fig. 1e,f). The constant transitional the AMS sequences. As the AMS was not based on the grammar of
probability MSS is contrasted with a varying transitional probability Chinese, the listeners had to learn the AMS grammar to segment
MSS, in which the transitional probability is low across sentential sentences. By comparing the neural responses to the AMS sequences
boundaries and high in a sentence (Fig. 3b,c). If cortical activity only before and after the grammar was learned, we were able to separate
encodes the transitional probability between lower level units (for the effect of prior knowledge of transitional probability and the effect
example, acoustic chunks in the MSS) independent of the underlying of grammar learning. Here, the grammar of the AMS indicates the
syntactic structure, it can show tracking of the sentential structure for set of rules that governs the sequencing of the AMS, that is, the rule
the varying probability MSS, but not for the constant probability MSS. of which syllables can follow which syllables.
In contrast with this prediction, indistinguishable neural responses The neural responses to the AMS before and after grammar learning
to sentences were observed for both MSS (Fig. 3d), demonstrating were analyzed separately (Supplementary Fig. 4). Before learning,
that neural tracking of sentences is not confounded by transitional when the listeners were instructed that the stimulus was just a sequence
probability. Specifically, for the constant transitional probability MSS, of random syllables, the response showed a statistically significant
npg
the response was statistically significant at the sentential rate, twice peak at the syllabic rate (P = 0.0003, bootstrap), but not at the senten-
the sentential rate and the syllable rate (P = 1.8 104, 2.3 104 and tial rate. After the AMS grammar was learned, however, a significant
response peak emerged at the sentential rate (P = 0.0001, bootstrap).
A response peak was also observed at twice the sentential rate, possibly
a Neural tracking of sentences
of variable durations (48 syllables)
c Single-trial decoding
of sentence duration reflecting the second harmonic of the sentential response. This result
4 40 further confirms that neural tracking of sentences is not confounded
(number of syllables)
30
6
6 dB 7
20 Neural tracking of sentences varying in duration and structure
Sentence 10
These results are based on sequences of sentences that have uniform
4 5 6 8
duration:
7 8
duration and syntactic structure. We next addressed whether cortical
4 5 6 7 8
2.25 s Actual duration (number of syllables)
Figure 5 Localizing cortical sources of the sentential and phrasal rate High-gamma power Low-frequency waveform
Sentential rate
hemisphere were projected to the left hemisphere, and right hemisphere
(left hemisphere) electrodes are shown by hollow (filled) circles.
The figure only displays electrodes that showed statistically significant
neural responses to sentences in Figure 2e and no significant response to
10 dB
the acoustic control shown in Figure 2f. Significance was determined by Left hemisphere
bootstrap (FDR corrected) and the significance level is 0.05. The response Right hemisphere
strength, that is, the response at the target frequency relative to the mean
response averaged over a 1-Hz wide neighboring region, is color coded. 16 dB
Electrodes with response strength less than 10 dB are shown by smaller
Phrasal rate
symbols. The sentential and phrasal rate responses were seen in bilateral
pSTG, TPJ and left IFG.
frequency, the MEG responses were analyzed in the time domain by course of a sentence rather than being a transient response only occur-
averaging sentences of the same duration. To focus on sentential level ring at the sentence boundary.
processing, we low-pass filtered the response at 3.5 Hz. The MEG A single-trial decoding analysis was performed to independ-
response (root mean square, r.m.s., over all sensors) rapidly increased ently confirm that cortical activity tracks the duration of sentences
after a sentence boundary and continuously changed throughout the (Fig. 4c). The decoder applied template matching for the response
duration of a sentence (Fig. 4a). To illustrate the detailed temporal time course (leave-one-out cross-validation, Online Methods) and
correctly determined the duration of 34.9 0.6% sentences (mean
s.e.m. over subjects, significantly above chance, P = 1.3 107,
a High-gamma power b Low-frequency waveform one-sided t test).
fs fp f fs fp f After demonstrating cortical tracking of sentences, we further
6 dB N = 22
tested whether cortical activity also tracks the phrasal structure inside
N = 62 of a sentence. We constructed sentences that consist of a noun phrase
followed by a verb phrase and manipulated the duration of the noun
phrase (three syllable or four syllable). The cortical responses closely
follow the duration of the noun phrase: the r.m.s. response gradually
N = 10
N = 60 decreased in the noun phrase, then showed a transient increase after
the onset of the verb phrase (Fig. 4d).
npg
Figure 7 Syllabic-rate ECoG responses to English sentences and the High-gamma power Low-frequency waveform
acoustic control (N = 5). Top, electrodes showing statistically significant
28 dB
syllabic-rate ECoG responses to the acoustic control, that is, shuffled
Acoustic control
sequences, which had the same acoustic and syllabic rhythm as the
English sentences, but contained no hierarchical linguistic structures
(Fig. 2f). Significance was determined by bootstrap (FDR corrected) and
the significance level is 0.05. The responses were most strongly seen
10 dB
in bilateral STG for both high-gamma and low-frequency activity and in Left hemisphere
bilateral pre-motor areas for low-frequency activity. Bottom, syllabic-rate Right hemisphere
ECoG responses to English sentences. The electrodes displayed are those 16 dB
Sentence specific
that showed statistically significant neural responses to sentences and no
significant response to the acoustic control. The syllabic rate responses
specific to sentences were strong along bilateral STG for high-gamma
activity and were widely distributed in the frontal and temporal lobes for
low-frequency activity. 10 dB
epilepsy patients for clinical evaluation (see Supplementary Fig. 5 For electrodes showing a significant response at either the sentential
for the electrode coverage), and they possess better spatial resolution rate or the phrasal rate, the strength of the sentential rate response was
than MEG. We first analyzed the power of the ECoG signal in the also negatively correlated with the strength of the phrasal rate response
high gamma band (70200 Hz), as it highly correlates with multiunit (R = 0.21, significantly greater than 0, P = 0.023, bootstrap).
firing23. The electrodes exhibiting significant sentential, phrasal and
syllabic rate fluctuations in high gamma power are shown separately DISCUSSION
2016 Nature America, Inc. All rights reserved.
(Fig. 5). The sentential rate response clustered over the posterior and Our data show that the multiple timescales that are required for the
middle superior temporal gyrus (pSTG), bilaterally, with a second processing of linguistic structures of different sizes emerge in corti-
cluster over the left inferior frontal gyrus (IFG). Phrasal rate responses cal networks during speech comprehension. The neural sources for
were also observed over the pSTG bilaterally. Notably, although the sentential, phrasal and syllabic rate responses are highly distributed
sentential and phrasal rate responses were observed in similar cortical and include cortical areas that have been found to be critical for
areas, electrodes showing phrasal rate responses only partially over- prosodic (for example, right STG), syntactic and semantic (for exam-
lapped with electrodes showing sentential rate responses in the pSTG ple, left pSTG and left IFG) processing9,2528. Neural integration on
(Fig. 6). For electrodes showing a significant response at either the different timescales is likely to underlie the transformation from
sentential rate or the phrasal rate, the strength of the sentential rate shorter lived neural representations of smaller linguistic units to longer
response was negatively correlated with the strength of the phrasal lasting neural representations of larger linguistic structures1114.
rate response (R = 0.32, P = 0.004, bootstrap). This phenomenon These results underscore the undeniable existence of hierarchical
demonstrates spatially dissociable neural tracking of the sentential structure building operations in language comprehension1,2 and can
and phrasal structures. be applied to objectively assess language processing in children and
Furthermore, some electrodes with a significant sentential or difficult-to-test populations, as well as animal preparations to allow
phrasal rate response showed no significant syllabic rate response for cross-species comparisons.
(P < 0.05, FDR corrected, Fig. 6). In other words, there are cortical
circuits specifically encoding larger, abstract linguistic structures Relation to language comprehension
without responding to syllabic-level acoustic features of speech. Concurrent neural tracking of hierarchical linguistic structures
npg
In addition, although the syllabic responses were not significantly provides a plausible functional mechanism for temporally integrat-
different (P > 0.05, FDR corrected) for English sentences and the ing smaller linguistic units into larger structures. In this form of
acoustic control in the MEG results, they were dissociable spatially concurrent neural tracking, the neural representation of smaller
in the ECoG results (Fig. 7). Electrodes showing significant syllabic linguistic units is embedded at different phases of the neural activity
responses (P < 0.05, FDR corrected) to sentences, but not the acoustic tracking a higher level structure. Thus, it provides a possible
control, were seen in bilateral pSTG, bilateral anterior STG (aSTG), mechanism to transform the hierarchical embedding of linguistic
and left IFG. structures into hierarchical embedding of neural dynamics, which
We then analyzed neural tracking of the sentential, phrasal and may facilitate information integration in time 10,11. Low-frequency
syllabic rhythms in the low-frequency ECoG waveform (Fig. 5), which neural tracking of linguistic structures may further modulate higher
is a close neural correlate of MEG activity. Fourier analysis was directly frequency neural oscillations2931, which have been proposed to pro-
applied to the ECoG waveform and the Fourier coefficients at 1, 2 and vide additional roles for structure building7. In addition, multiple
4 Hz are extracted. Low-frequency ECoG activity is usually viewed as resources and computations are needed for syntactic analysis, for
the dendritic input to a cortical area24. The low-frequency responses example, access to combinatorial syntactic subroutines, and such
are more distributed than high-gamma activity, possibly reflecting operations may correspond to neural computations on distinct
the fact that the neural representations of different levels of linguistic frequency scales, which are coordinated by the low-frequency
structures serve as inputs to broad cortical areas. Sentential and neural tracking of linguistic constituent structures. Furthermore,
phrasal rate responses are strong in STG, IFG and temporoparietal low-frequency neural activity and oscillations have been hypoth-
junction (TPJ). Compared with the acoustic control, the syllabic-rate esized as critical mechanisms to generate predictions about future
response to sentences was stronger in broad cortical areas, including events32. For language processing, it is likely that concurrent neural
the temporal and frontal lobes (Fig. 7). Similar to the high-gamma tracking of hierarchical linguistic structures provides mechanisms
activity, the low-frequency responses to the sentential and phrasal to generate predictions on multiple linguistic levels and allow
structures were not reflected in the same set of electrodes (Fig. 6). interactions across linguistic levels33.
Neural entrainment to speech predictability of words43,45 and its amplitude continuously reduces in
Recent work has shown that cortex tracks the slow acoustic fluctuations a sentence46,47. For syntactic processing, when two words combine
of speech below 10 Hz (refs. 1518,34,35), and this phenomenon is into a short phrase, increased activity is seen in the temporal and
commonly described as cortical entrainment to the syllabic rhythm of frontal lobes4. Our results build on and extends these findings by
speech. It has been controversial whether such syllabic-level cortical demonstrating structure building at different levels of the linguis-
entrainment is related to low-level auditory encoding or language tic hierarchy, during online comprehension of connected speech
processing6. Our findings demonstrate that processing goes well materials in which the structural boundaries are neither physically
beyond stimulus-bound analysis: cortical activity is entrained to larger cued nor confounded by the semantic predictability of the individual
linguistic structures that are, by necessity, internally constructed, words (Fig. 3). Note that, although the two Markovian languages
based on syntax. The emergence of slow cortical dynamics provides (compared in Fig. 3) differed in their transitional probability between
timescales suitable for the analysis of larger chunk sizes13,14. acoustic chunks, they both had fully predictable syntactic structures.
A long-lasting controversy concerns how the neural responses to The equivalence in syntactic predictability is likely to result in the
sensory stimuli are related to intrinsic, ongoing neural oscillations. very similar responses between the two conditions.
This question is heavily debated for the neural response entrained Lastly, the emergence of slow neural dynamics tracking superor-
to the syllabic rhythm of speech36 and can also be asked for neural dinate stimulus structures is reminiscent of what has been observed
activity entrained to the time courses of larger linguistic structures. during decision making48, action planning49 and music perception50,
Our experiment was not designed to answer this question; however, suggesting a plausible common neural computational framework to
we clearly found that cortical speech processing networks have the integrate information over distinct timescales12. These findings invite
capacity to generate activity on very long timescales corresponding MEG and EEG studies to extend from the classic event-related designs
to larger linguistic structures, such as phrases and sentences. In other to investigating continuous neural encoding of internally constructed
words, the timescales of larger linguistic structures fall in the perceptual organization of an information stream.
2016 Nature America, Inc. All rights reserved.
N.D., L.M. and D.P. conceived and designed the experiments. N.D., H.Z. and X.T.
into the neural representation of abstract linguistic structures that performed the MEG experiments. L.M. performed the ECoG experiment.
are internally constructed on the basis of syntax alone. Although the N.D., L.M. and D.P. wrote the paper. All of the authors discussed the results
and edited the manuscript.
construction of abstract structures is driven by syntactic analysis, when
such structures are built, different aspects of the structure, including COMPETING FINANCIAL INTERESTS
semantic information, can be integrated in the neural representa- The authors declare no competing financial interests.
tion. Indeed, the wide distribution of cortical tracking of hierarchical
linguistic structures suggests that it is a general neurophysiological Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.
mechanism for combinatorial operations involved in hierarchical
linguistic structure building in multiple linguistic processing networks
1. Berwick, R.C., Friederici, A.D., Chomsky, N. & Bolhuis, J.J. Evolution, brain, and
(for example, phonological, syntactic and semantic). Furthermore, the nature of language. Trends Cogn. Sci. 17, 8998 (2013).
coherent synchronization to the correlated linguistic structures in dif- 2. Chomsky, N. Syntactic Structures (Mouton de Gruyter, 1957).
ferent representational networks, for example, syntactic, semantic and 3. Phillips, C. Linear order and constituency. Linguist. Inq. 34, 3790 (2003).
4. Bemis, D.K. & Pylkknen, L. Basic linguistic composition recruits the left anterior
phonological, provides a way to integrate multi-dimensional linguistic temporal lobe and left angular gyrus during both listening and reading. Cereb.
representations into a coherent language percept38,40, just as tempo- Cortex 23, 18591873 (2013).
ral synchronization between cortical networks provides a possible 5. Giraud, A.-L. & Poeppel, D. Cortical oscillations and speech processing: emerging
computational principles and operations. Nat. Neurosci. 15, 511517 (2012).
solution to the binding problem in sensory processing41. 6. Sanders, L.D., Newport, E.L. & Neville, H.J. Segmenting nonsense: an event-related
potential index of perceived onsets in continuous speech. Nat. Neurosci. 5,
700703 (2002).
Relation to event-related responses 7. Bastiaansen, M., Magyari, L. & Hagoort, P. Syntactic unification operations are
Although many previous neurophysiological studies on structure reflected in oscillatory dynamics during on-line sentence comprehension. J. Cogn.
building have focused on syntactic and semantic violations4244, fewer Neurosci. 22, 13331347 (2010).
8. Buiatti, M., Pea, M. & Dehaene-Lambertz, G. Investigating the neural correlates
have addressed normal structure building; on the lexical-semantic of continuous speech computation with frequency-tagged neuroelectric responses.
level, the N400/N400m has been identified as a marker of the semantic Neuroimage 44, 509519 (2009).
9. Pallier, C., Devauchelle, A.-D. & Dehaene, S. Cortical representation of the constituent 30. Lakatos, P. et al. An oscillatory hierarchy controlling neuronal excitability and
structure of sentences. Proc. Natl. Acad. Sci. USA 108, 25222527 (2011). stimulus processing in the auditory cortex. J. Neurophysiol. 94, 19041911
10. Schroeder, C.E., Lakatos, P., Kajikawa, Y., Partan, S. & Puce, A. Neuronal oscillations (2005).
and visual amplification of speech. Trends Cogn. Sci. 12, 106113 (2008). 31. Sirota, A., Csicsvari, J., Buhl, D. & Buzski, G. Communication between neocortex
11. Buzski, G. Neural syntax: cell assemblies, synapsembles and readers. Neuron 68, and hippocampus during sleep in rodents. Proc. Natl. Acad. Sci. USA 100,
362385 (2010). 20652069 (2003).
12. Bernacchia, A., Seo, H., Lee, D. & Wang, X.-J. A reservoir of time constants for 32. Arnal, L.H. & Giraud, A.-L. Cortical oscillations and sensory predictions. Trends
memory traces in cortical neurons. Nat. Neurosci. 14, 366372 (2011). Cogn. Sci. 16, 390398 (2012).
13. Lerner, Y., Honey, C.J., Silbert, L.J. & Hasson, U. Topographic mapping of a hierarchy 33. Poeppel, D., Idsardi, W.J. & van Wassenhove, V. Speech perception at the interface
of temporal receptive windows using a narrated story. J. Neurosci. 31, 29062915 of neurobiology and linguistics. Phil. Trans. R. Soc. Lond. B 363, 10711086
(2011). (2008).
14. Kiebel, S.J., Daunizeau, J. & Friston, K.J. A hierarchy of time-scales and the brain. 34. Pea, M. & Melloni, L. Brain oscillations during spoken sentence processing.
PLoS Comput. Biol. 4, e1000209 (2008). J. Cogn. Neurosci. 24, 11491164 (2012).
15. Luo, H. & Poeppel, D. Phase patterns of neuronal responses reliably discriminate 35. Gross, J. et al. Speech rhythms and multiplexed oscillatory sensory coding in the
speech in human auditory cortex. Neuron 54, 10011010 (2007). human brain. PLoS Biol. 11, e1001752 (2013).
16. Ding, N. & Simon, J.Z. Emergence of neural encoding of auditory objects while 36. Ding, N. & Simon, J.Z. Cortical entrainment to continuous speech: functional roles
listening to competing speakers. Proc. Natl. Acad. Sci. USA 109, 1185411859 and interpretations. Front. Hum. Neurosci. 8, 311 (2014).
(2012). 37. Jackendoff, R. Foundations of Language: Brain, Meaning, Grammar, Evolution
17. Zion Golumbic, E.M. et al. Mechanisms underlying selective neuronal tracking of (Oxford University Press, 2002).
attended speech at a cocktail party. Neuron 77, 980991 (2013). 38. Hagoort, P. On Broca, brain, and binding: a new framework. Trends Cogn. Sci. 9,
18. Peelle, J.E., Gross, J. & Davis, M.H. Phase-locked responses to speech in human 416423 (2005).
auditory cortex are enhanced during comprehension. Cereb. Cortex 23, 13781387 39. Cutler, A., Dahan, D. & van Donselaar, W. Prosody in the comprehension of spoken
(2013). language: a literature review. Lang. Speech 40, 141201 (1997).
19. Pasley, B.N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 40. Frazier, L., Carlson, K. & Clifton, C. Jr. Prosodic phrasing is central to language
10, e1001251 (2012). comprehension. Trends Cogn. Sci. 10, 244249 (2006).
20. Steinhauer, K., Alter, K. & Friederici, A.D. Brain potentials indicate immediate use 41. Singer, W. & Gray, C.M. Visual feature integration and the temporal correlation
of prosodic cues in natural speech processing. Nat. Neurosci. 2, 191196 hypothesis. Annu. Rev. Neurosci. 18, 555586 (1995).
(1999). 42. Friederici, A.D. Towards a neural basis of auditory sentence processing. Trends
2016 Nature America, Inc. All rights reserved.
21. Pea, M., Bonatti, L.L., Nespor, M. & Mehler, J. Signal-driven computations in Cogn. Sci. 6, 7884 (2002).
speech processing. Science 298, 604607 (2002). 43. Kutas, M. & Federmeier, K.D. Electrophysiology reveals semantic memory use in
22. Saffran, J.R., Aslin, R.N. & Newport, E.L. Statistical learning by 8-month-old language comprehension. Trends Cogn. Sci. 4, 463470 (2000).
infants. Science 274, 19261928 (1996). 44. Neville, H., Nicol, J.L., Barss, A., Forster, K.I. & Garrett, M.F. Syntactically based
23. Ray, S. & Maunsell, J.H. Different origins of gamma rhythm and high-gamma activity sentence processing classes: evidence from event-related brain potentials. J. Cogn.
in macaque visual cortex. PLoS Biol. 9, e1000610 (2011). Neurosci. 3, 151165 (1991).
24. Einevoll, G.T., Kayser, C., Logothetis, N.K. & Panzeri, S. Modeling and analysis of 45. Lau, E.F., Phillips, C. & Poeppel, D. A cortical network for semantics: (de)constructing
local field potentials for studying the function of cortical circuits. Nat. Rev. Neurosci. the N400. Nat. Rev. Neurosci. 9, 920933 (2008).
14, 770785 (2013). 46. Halgren, E. et al. N400-like magnetoencephalography responses modulated by
25. Hagoort, P. & Indefrey, P. The neurobiology of language beyond single words. semantic context, word frequency and lexical class in sentences. Neuroimage 17,
Annu. Rev. Neurosci. 37, 347362 (2014). 11011116 (2002).
26. Grodzinsky, Y. & Friederici, A.D. Neuroimaging of syntax and syntactic processing. 47. Van Petten, C. & Kutas, M. Interactions between sentence context and word
Curr. Opin. Neurobiol. 16, 240246 (2006). frequency in event-related brain potentials. Mem. Cognit. 18, 380393 (1990).
27. Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. 48. OConnell, R.G., Dockree, P.M. & Kelly, S.P. A supramodal accumulation-to-bound
Neurosci. 8, 393402 (2007). signal that determines perceptual decisions in humans. Nat. Neurosci. 15,
28. Friederici, A.D., Meyer, M. & von Cramon, D.Y. Auditory language comprehension: 17291735 (2012).
an event-related fMRI study on the processing of syntactic and lexical information. 49. Koechlin, E., Ody, C. & Kouneiher, F. The architecture of cognitive control in the
Brain Lang. 74, 289300 (2000). human prefrontal cortex. Science 302, 11811185 (2003).
29. Canolty, R.T. et al. High gamma power is phase-locked to theta oscillations in 50. Nozaradan, S., Peretz, I., Missal, M. & Mouraux, A. Tagging the neuronal entrainment
human neocortex. Science 313, 16261628 (2006). to beat and meter. J. Neurosci. 31, 1023410240 (2011).
npg
syllables were 75354 ms in duration (mean duration 224 ms), and were adjusted (N = 20) or a five-syllable verb phrase (N = 25). A four-syllable noun phrase was
to 250 ms by truncation or padding silence at the end. The last 25 ms of each followed by a three-syllable verb phrase (N = 20) or a four-syllable verb phrase
syllable were smoothed by a cosine window. (N = 25). Sentences with different noun phrase durations and verb phrase dura-
Four-syllable sentences. 50 four-syllable sentences were constructed, in which tions were intermixed. In a normal trial 10 different sentences were played
the first two syllables formed a noun phrase and the last two syllables formed a sequentially, without inserting any acoustic gap between phrases or sentences.
verb phrase (Supplementary Table 1). The noun phrase could be composed of In an outlier trial one sentence was replaced by a sentence with the same syntactic
either a single two-syllable noun or a one-syllable adjective followed by a one- structure but that was semantically anomalous.
syllable noun. The verb phrase could be composed of either a two-syllable verb or AMS. Five sets of AMS were created. Each sentence consisted of three compo-
a one-syllable verb followed by a one-syllable noun object. In a normal trial, ten nents, C1, C2 and C3. Each component (C1, C2 or C3) was independently chosen
sentences were sequentially played and no acoustic gaps were inserted between from three candidate syllables with equal probability. The grammar of the AMS
sentences (Supplementary Fig. 1a). Due to the lack of phrasal and sentential is illustrated in Supplementary Figure 4a. In the experiments, sentences were
level prosodic cues, the sound intensity of the stimulus, characterized by the played sequentially without any gap between sentences. Since all components
sound envelope, only fluctuates at the syllabic rate but not at the phrasal or sen- were chosen independently and each component was chosen from three syllables
tential rate (Supplementary Fig. 2). An outlier trial was the same as a normal with equal probability, all components were equally predictable regardless of its
trial except that the verb phrases in two sentences were exchanged, creating two position in a sequence. In other words, P(C1) = P(C2) = P(C3) = P(C2|C1) =
nonsense sentences with incompatible subjects and predicates (an example in P(C3|C2) = P(C1|C3) = 1/3.
English would be new plans rub skin). All Chinese syllables were synthesized independently and adjusted to 300 ms
Four-syllable verb phrases. Two types of four-syllable verb phrases were by truncation or padding silence at the end. In each trial, 60 sentences were
created. Type I verb phrase contained a one-syllable verb followed by a three- played and no additional gap was inserted between sentences. Therefore, the
npg
syllable noun phrase, which could be a compound noun or an adjective + noun syllables were played at a constant rate of 3.33 Hz and the sentences were played
phrase (Supplementary Fig. 1b and Supplementary Table 1). Type II verb phrase at a constant rate of 1.11 Hz. To make sure that neural encoding of the AMS was
contained a two-syllable verb followed by a two-syllable noun (Supplementary not confounded by acoustic properties of a particular set of syllables, five sets of
Fig. 1c, all phrases listed in Supplementary Table 1). 50 instances were created AMS were created (Supplementary Table 1). No meaningful Chinese expressions
for each type of verb phrases. In a normal trial, ten phrases of the same type were are embedded in the AMS sequences.
sequentially presented. An outlier trial was the same as a normal trial except that
the verbs in two phrases were exchanged, creating two nonsense verb phrases Stimuli II: English materials. All English materials were synthesized using the
with incompatible verbs and objects (an example in English would be drink a MacinTalk Synthesizer (male voice Alex, in Mac OS X 10.7.5).
long walk). Four-syllable English sentences. 60 four-syllable English sentences were con-
Two-syllable phrases. The verb phrases (or the noun phrases) in the four- structed (Supplementary Table 1), and each syllable was a monosyllabic word. All
syllable sentences were presented in a sequence (Supplementary Fig. 1d). sentences had the same syntactic structure: adjective/pronoun + noun + verb +
In a normal trial, 20 different phrases were played. In an outlier trial, one of noun. Each syllable was synthesized independently, and all the synthesized
the 20 phrases was replaced by two random syllables that did not constitute a syllables (250347 ms in duration) were adjusted to 320 ms by padding silence
sensible phrase. at the end or truncation. The offset of each syllable was smoothed by a 25-ms
Random syllabic sequences. The random syllabic sequences were generated cosine window. In each trial, 12 sentences were presented without any acous-
based on the four-syllable sentences. Each four-syllable sentence was trans- tic gap between them. In an outlier trial, 3 consecutive words from a random
formed into four random syllables using the following rule: the first syllable in position were replaced by three random words so that the corresponding
the sentence was replaced by the first syllable of a randomly chosen sentence. sentence(s) became ungrammatical.
The second syllable was replaced by the second syllable of another randomly Shuffled sequences. Shuffled sequences were constructed as an unintelligible
chosen sentence and the same for the third and the fourth syllables. This way, if sound sequence that preserved the acoustic properties of the sentence sequences.
there were any consistent acoustic differences between the syllables at different All syllables in the four-syllable English sentences were segmented into five over-
positions in a sentence, those acoustic differences were preserved in the random lapping slices. Each slice was 72 ms in duration and overlapped with neighboring
syllabic sequences. Each normal trial contained 40 syllables. In outlier trials, four slices for 10 ms. The first 10 ms and the last 10 ms of each slice was smoothed by a
consecutive syllables were replaced by a Chinese idiom. linear ramp, except for the onset of the first slice and the offset of the last slice.
The constant predictability sentences were generated based on the grammar cyclic way: 1, 2, 1, 2, 1, 2 and report whether the final count was 1 or 2 at the end
specified in Figure 3a and Supplementary Figure 1e. Listeners were familiar- of each trial via button press. Since each trial contained 179 or 180 rapidly presented
ized with the grammar and were able to write down the full grammar table syllables, the listeners were not able to count accurately (mean performance 52
before participating in the experiment. In each trial, ten sentences were sepa- 9.7%, not significantly above chance, P > 0.8, t test). However, the listeners were
rately generated based on the grammar and sequentially presented without any asked to follow the rhythm and keep counting even when they lost count. After
acoustic gap between them. the first session of the experiment was finished, the listeners were told about that
The other type of Markovian sentences, called the predictable sentences, con- the general structure of the AMS and examples were given based on real Chinese
sisted of a finite number of sentences (N = 25, Supplementary Table 1) that sentences. In the second session of the experiment, the listeners had to learn the
were extensively repeated (1112 times) in a ~7-min block. In these sentences, 5 sets of AMS separately (lower row, Supplementary Fig. 4b). For each set of the
the second and the third acoustic chunks were uniquely determined by the first AMS, during training, the listeners listened to 20 sentences from the AMS set in a
chunk. In each trial, ten different sentences were played sequentially without any sequence, with a 300-ms gap being inserted between sentences to facilitate learning.
acoustic gap between them. Then, the listeners listened to two trials of sentences from the same AMS set, which
they also listened to in the first session (shown by symbol S in Supplementary
Acoustic analysis. The intensity fluctuation of the sound stimulus is charac- Fig. 4b). They had to do the same cyclical counting task. However, they were told
terized by its temporal envelope. To extract the temporal envelope, the sound that the last count was 1 if the last sentence was incomplete and the last count was 2
signal is first half-wave rectified and then downsampled to 200 Hz. The Discrete if the last sentence was complete (mean performance 82 8.0%, significantly above
Fourier Transform of the temporal envelope (without any windowing) is shown in chance P < 0.2, t test). At the end of the two trials, the listeners had to report the
Figure 1 and Supplementary Figure 2. grammar of the AMS, i.e. which 3 syllables could be the first syllable of a sentence,
which three syllables could be the middle one, and which three syllables could
npg
Experimental procedures. Seven experiments were run. Experiment 14 be the last one. The grammatical roles of 77 7.6% (mean s.e. across subjects)
involved Chinese listeners listening to Chinese materials, experiment 5 involved syllables were reported correctly.
English listeners listening to Chinese materials, and experiment 6 involved
English listeners listening to English materials. Experiment 7 involved Chinese Neural recordings. Cortical neuromagnetic activity was recorded using a
listeners listening to AMS. 157-channel whole-head MEG system (KIT) in a magnetically shielded room.
In all experiments except for experiment 5, listeners were instructed to detect The MEG signals were sampled at 1 kHz, with a 200-Hz low-pass filter and a
outlier trials. At the end of each trial, listeners had to report whether it was a 60-Hz notch filter applied online and a 0.5-Hz high-pass filter applied offline
normal trial or an outlier trial via button press. Following the button press, (time delay compensated). The environmental magnetic field was recorded using
the next trial was played after a delay randomized between 800 and 1,400 ms. three reference sensors and regressed out from the MEG signals using time-
In experiment 5, listeners performed a syllable counting task described below. shifted PCA52. Then, the MEG responses were further denoised using the blind
Behavioral results are reported in Supplementary Table 2. source separation technique, Denoising Source Separation (DSS)53. The MEG
Experiment 1. Four-syllable Chinese sentences, four-syllable idioms, random responses were decomposed into DSS components using a set of linear spatial
syllabic sequences and backward syllabic sequences were presented in separate filters, and the first 6 DSS components were retained for analysis and transformed
blocks. The order of the blocks was counter balanced across listeners. Listeners back to the sensor space. The DSS decomposes multi-channel MEG recordings
took breaks between blocks. In each block, 20 normal trials and ten outlier trials to extract neural response components that are consistent over trials and has
were intermixed and presented in a random order. been demonstrated to be effective in denoising cortical responses to connected
Experiment 2. Four-syllable sentences, type I four-syllable verb phrases, type II speech18,54,55. The DSS was applied to more accurately estimate the strength of
four-syllable verb phrases, two-syllable noun phrases, and two-syllable verb neural activity phase-locked to the stimulus. Even when the DSS spatial filtering
phrases were presented in separate blocks. The order of the blocks was counter process was omitted, for the r.m.s. response over all MEG sensors, the senten-
balanced across listeners. Listeners took breaks between blocks. In each block, tial, phrasal, and syllabic responses in Figure 1 were still statistically significant
20 normal trials and five outlier trials were intermixed and presented in a (P < 0.001, bootstrap).
random order.
Experiment 3. Sentences with variable durations and syntactic structures, as Data analysis. Only the MEG responses to normal trials were analyzed.
described above, were played in an intermixed order. Listeners took a break every Frequency domain analysis. In experiments 1, 2, 5 and 6, the linguistic
25 trials. In total, 80 normal trials and 20 outlier trials were presented. structures of different hierarchies were presented at unique and constant rates
The spatial filter w is an 157 1 vector (for the 157 sensors), the same size as X(f), implant and two patients with a right hemisphere implant, additional depth elec-
and R(f) is a 157 157 matrix. The spatial filter could be viewed as a virtual sen- trodes implanted for some patients but not analyzed). The electrode locations per
sor that was optimized to record phase-locked neural activity at each frequency. patient are shown in Supplementary Figure 5. Electrode localization followed previ-
Power of the scalar output of the spatial filter, |XT(f)R1(f)X(f)|2, was the power ously described procedures58. In brief, for each patient we obtained pre-op and post-
spectral density shown in the figures. op T1-weighted MRIs which were co-registered with each other and normalized to a
Time domain analysis. The response to each sentence was baseline corrected MNI-152 template, allowing the extraction of the electrode location in MNI space.
based on the 100-ms period preceding the sentence onset, for each sensor. The ECoG signals were recorded with a Nicolet clinical amplifier at a sampling
To remove the neural response to the 4-Hz isochronous syllabic rhythm and focus rate of 512 Hz. The ECoG recordings were re-referenced to the grand average over
on the neural tracking of sentential/phrasal structures, we low-pass filtered the all electrodes (after removing artifact-laden or noisy channels). Electrodes from
neural response waveforms using a 0.5-s duration linear phase FIR filter (cut-off different subjects were pooled per hemisphere, resulting in 385/261 electrodes
3.5 Hz). The filter delay was compensated by filtering the neural signals twice, in the left/right hemispheres. High gamma activity was extracted by high-pass
once forward and once backward. When the response power at 4 Hz was extracted filtering the ECoG signal above 70 Hz (with additional notch filters at 120 and
separately by a Fourier analysis, it does not significantly change as a function of 180 Hz). The energy envelope of high gamma activity was extracted by taking
sentence duration (P > 0.19, one-way ANOVA). The r.m.s. of the MEG responses the square of high-gamma response waveform.
was calculated as the sum of response power (that is, square of the MEG response) ECoG procedures. Participants performed the same task as healthy subjects in
of all sensors, and the r.m.s. response was further low-pass filtered by a 0.5-s the MEG (Fig. 2e,f). In brief, they listened to a set of English sentences and control
duration linear phase FIR filter (cut-off 3.5 Hz, delay compensated). stimuli in the first and second block. The control stimulus, that is, the shuffled
A linear decoder was built to decode the duration of sentences. In the decoding sequences, preserves the syllabic-level acoustic rhythm of English sentences but
analysis, the multi-channel MEG responses were compressed to a single channel, contain no hierarchical linguistic structure. The procedure was the same as the
npg
i.e. the first DSS component, and the decoder solely relied on the time course of MEG experiment, except for a familiarization session in which the subjects listened
the neural response. A 2.25-s response epoch was extracted for each sentence, to individual sentences with visual feedback. 60 trials of sentences and control stim-
starting from the sentence onset. A leave-one-out cross-validation procedure uli were played. The ECoG data from each electrode was analyzed separately and
was employed to evaluate the decoders performance. Each time, the response converted to the frequency domain via DFT (frequency resolution 0.071 Hz).
to one sentence was used as the testing response, and the responses to all other A significant response at the syllabic, phrasal or sentential rate was reported if
sentences were treated as the training set. The training signals were averaged the response power at the target frequency was stronger than the response power
for sentences of the same duration, creating a template for the response time averaged over neighboring frequency bins (0.5-Hz range above and below the
course for each sentence duration. The testing response was correlated with all target frequency). The significance level for each electrode was first determined
the templates and the category of the most correlated templates was the decoders based on a bootstrap procedure that randomly sampled the 60 trials 1,000 times
output. For example, if the testing response was most correlated with the template and then underwent FDR correction for multiple comparisons across all elec-
for 5-syllable sentences, the decoders output would be that the testing response trodes in the same hemisphere.
was generated by a five-syllable sentence. A Supplementary Methods Checklist is available.
Statistical analysis and significance tests. For spectral peaks (Figs. 1 and 2), a 51. Oldfield, R.C. The assessment and analysis of handedness: the Edinburgh inventory.
Neuropsychologia 9, 97113 (1971).
one-tailed paired t test was used to test if the neural response in a frequency bin 52. de Cheveign, A. & Simon, J.Z. Denoising based on time-shift PCA. J. Neurosci.
was significantly stronger than the average of the neighboring four frequency Methods 165, 297305 (2007).
bins (two bins on each side). Such a test was applied to all frequency bins below 53. de Cheveign, A. & Simon, J.Z. Denoising based on spatial filtering. J. Neurosci.
5 Hz, and a FDR correction for multiple comparisons was applied. Except for the Methods 171, 331339 (2008).
54. Ding, N. & Simon, J.Z. Adaptive temporal encoding leads to a background-insensitive
analysis of the spectral peaks, two-tailed t tests were applied. For all the t tests cortical representation of speech. J. Neurosci. 33, 57285735 (2013).
applied in this study, data from the two conditions had comparable variance 55. Ding, N. & Simon, J.Z. Neural coding of continuous speech in auditory cortex during
and showed no clear deviation from the normal distribution when checking the monaural and dichotic listening. J. Neurophysiol. 107, 7889 (2012).
histograms. If the t test was replaced by a bias-corrected and accelerated 56. Wang, Y. et al. Sensitivity to temporal modulation rate and spectral bandwidth in the
human auditory system: MEG evidence. J. Neurophysiol. 107, 20332041 (2012).
bootstrap, all results remained significant. 57. Efron, B. & Tibshirani, R. An Introduction to the Bootstrap (CRC press, 1993).
In Figure 4, the s.e.m. over subjects was calculated using bias-corrected and 58. Yang, A.I. et al. Localization of dense intracranial electrode arrays using magnetic
accelerated bootstrap57. In the bootstrap procedure, all the subjects were resampled resonance imaging. Neuroimage 63, 157165 (2012).