Está en la página 1de 4

24.

900: Introduction to Linguistics

Gabriel Teixeira

Zipfian Analysis of Language Diversity


In the study of linguistics, particularly the acquisition of language, it is
vastly debated whether other animals have languageor if it is a uniquely
human phenomena. One such case that has been heavily scrutinized is
that of chimpanzees due to their many similarities to humans. In order
to answer the question of whether chimpanzees or childrenor bothhave
language, Yang first defines what differentiates an innate language from a
communication system based in rote memorization by means of language
diversity, a measure of grammatical complexity stemming from the notentirely-understood, but seemingly present underlying structure of Zipfs
law. Upon applying this definition of language to the early acquisition of
language by children and chimps, Yang concludes that children do indeed
acquire the rules of language in a short period of time while chimpanzees
merely perform imitation after prolonged exposure.
Yang defines the unequivocal nature of language as the ability to combine words to express infinitely varied meanings [1]. He claims that this
variability and diversity of language can be measured and modeled by Zipfs
law. Zipfs law states that the rank of most-used words and their corresponding frequencies are inversely proportional, and their product is a constant.
This can be written in a logarithmic scale as
log f requency = log C log rank,
which when plotted gives a straight line of slope 1. Zipfs law is present in
various other ranking such as the largest U.S. cities with the largest having
Page 1

24.900: Introduction to Linguistics

Gabriel Teixeira

roughly twice the population of the second, and so forth. Yet the implications of Zipfs law in linguistics are that if the underlying rules of grammar
only allow certain word pairings, there must be many more combinations
of wordsnot necessarily grammaticalfrom mere permutations that exist.
This poses a tremendous apparent challenge for children learning language:
developing an innate understanding of proper word-pairings from a very limited and often redundant exposure to language. However, all children seem
to achieve this; the same task has yet to be mastered by computers. Thus
stems the definition of language posited by Yang.
With a definition of language, Yang then goes on to design an analysis
for determining whether the speech of children fits this definition. Yang
suggest the use of the phrase a doggie as opposed to the doggie, along
with any other pairing of a noun with one of the two articles. Although both
combinations are grammatically correct, it was observed that only 20% - 40%
of childrens nouns appear with both articles. This posits the question of
whether this limited diversity is indicative of memorization by children or
rather an innate preference for one of the article-noun combinations. Yang
reveals that the latter is the case for adults, where approximately 25% of
nouns appear with both articles; this is not to be expected, for children
to have greater diversityand thus mastery of grammarthan adults. The
low percentages can be explained by main factors. The first is that many
nouns are only used once (such is Zipfs rule), and thus cannot possibly
appear with both articles, bringing down the average. The second is that
there does appear to be a preferred combination where the preferred article

Page 2

24.900: Introduction to Linguistics

Gabriel Teixeira

appears roughly twice as often as its counterpart. Yang thus determines that
he must test whether children use the two articles in the same proportions
as adults. To achieve this, he ran a statistical simulation akin to a biased
coin weighted with Zipfian probabilities on a set of speech transcripts from
nine childrenthree of them just under two years of ageas well as on a
million-word collection of English words. Yang claims that there is very
tight agreement between the two datasets diversity with = 0.977 at a 95%
confidence interval, thus showing that children do indeed have language [1].
Having determined that the speech of children does indeed fit his definition of language, Yang goes on to study the case of Nim Chimpskythe
signing chimpanzee. Nim was raised in a human household, learning American Sign Language (ASL). Although, primates have displayed the ability to
pair forms and meaning, this is not indicative of language. At first glance,
Nim seemed to use words like give and more as a template that was then
followed by water or banana, yet further study revealed that he rarely
created spontaneous combinations of words but rather imitated his trainer.
In order to determine whether or not Nim had language, Yang performed
the same Zipfian analysis as on the children. The results reveal a significantly lower diversity of language than that of children. Ultimately, Yang
concludes that children do in fact have the ability to boundlessly combine
words to convey meaning, whereas Nim merely displayed rote memorization.
Yang postulates that children have language whilst chimpanzees do not.
While he asserts that this hypothesis has been confirmed, there are two main
flaws in his analysis. The first is the limited sample size used; nine children

Page 3

24.900: Introduction to Linguistics

Gabriel Teixeira

and one chimpanzee do not seem nearly enough data when compared to the
million-word collection that was also used. Larger sample sizes might provide more statistical significance to Yangs results. The secondand arguably
more criticalflaw is the assumption that Zipfs law does indeed accurately
represent the diversity of language. Based in nothing but a lack of contrary
evidence, Zipfs law is a phenomena that is not understood; this draws into
scrutiny the validity of Yangs entire article. However, if we were to follow
the assumption that Zipfs law holds, the analyses done by Yang seem to
support his hypothesis.
Ultimately, the study performed by Yang is largely limited in its sample size. Increasing the sample sizeespecially of the chimpanzees or other
primates studiedwould be the next step in solidifying the claims already
posited. However, this study marks an important leap in the determination
of the presence of language. Distinguishing between the memorization of
forms as representations of meaning and the spontaneous acquisition of an
innate, underlying grammatical structure marks a critical point in the study
of the seemingly-unique mechanisms of human acquisition.

References
[1] Charles Yang. Whos afraid of George Kingsley Zipf ? Or: Do children
and chimps have language?. The Royal Statistical Society, 2013.

Page 4