Está en la página 1de 25

Corpus Linguistics

Magorzata Warzecha

What is a corpus?

What is a corpus?

Corpus:
From the Latin for body (plural corpora), a
corpus is a body of language representative of a
particular variety of language or genre which is
collected and stored in electronic form for
analysis using concordance software.

CASS: Briengs, Tony McEnry

Types of corpora:

1. Specialised corpus

Types of corpora:

1. Specialised corpus
2. General corpus

Types of corpora:

1. Specialised corpus
2. General corpus
3. Multilingual corpora

Types of corpora:

1.
2.
3.
4.

Specialised corpus
General corpus
Multilingual corpora
Parallel corpus

Types of corpora:

1.
2.
3.
4.
5.

Specialised corpus
General corpus
Multilingual corpora
Parallel corpus
Learner corpus

Types of corpora:

1.
2.
3.
4.
5.
6.

Specialised corpus
General corpus
Multilingual corpora
Parallel corpus
Learner corpus
Historical or Diachronic corpus

Types of corpora:

1.
2.
3.
4.
5.
6.
7.

Specialised corpus
General corpus
Multilingual corpora
Parallel corpus
Learner corpus
Historical or Diachronic corpus
Monitor corpus

What is corpus linguistics?


A theory of language or a methodology of
language?

What is corpus linguistics?

NOT a theory of language!

but:

a collection of methods for studying language

Corpus linguistics is perhaps best described in


simple terms as the study of language based on
examples of real life language use.

History of corpus linguistics


1.

Early Corpus Linguistics

2.

Criticism

3.

Modern Approach to Corpus Linguistics

1. Early Corpus Linguistics

Harris (1993: 27) summarises the approach well:


'The approach began ... with a large collection of
recorded utterances from some language, a
corpus. The corpus was subjected to a clear,
stepwise, bottom-up strategy of analysis.'

1. Early Corpus Linguistics


Examples of early corpus research:
Language acquisition: diary studies period 18761926; Preyer (1889), Stern (1924)
Spelling conventions: Kding (1897), 11 million
German words
Language pedagogy
Comparative Linguistics: Eaton (1940)
Syntax, Semantics: Fries (1952) <- predecessor of
A comprehensive Grammar of English Language
by Quirk

Example of annotation

2. Criticism by Chomsky
Any natural corpus will be skewed. Some
sentences won't occur because they are obvious,
others because they are false, still others because
they are impolite. The corpus, if natural, will be so
wildly skewed that the description would be no
more than a mere list.
(Chomsky, University of Texas, 1962)

2. Criticism by Chomsky
Chomsky: The verb perform cannot be used
with mass word objects: one can perform a task
but one cannot perform labour.
Hatcher: How do you know, if you don't use a
corpus and have not studied the verb perform?
Chomsky: How do I know? Because I am a
native speaker of the English language.
(Hill, 1962)

3. Modern Approach to Corpus Linguistics


He sits in a deep soft armchair, with his eyes
closed and his hands clasped behind his head.
Once in a while he opens his eyes, sits up
abruptly shouting, 'Wow, what a neat fact!', grabs
his pencil, and writes something down ... having
come still no closer to knowing what language is
really like.
Fillmore (1992)

3. Modern Approach to Corpus Linguistics


*He shines Tony books.
but:
He gives Keith the stare that works on small
boys.
We didnt know that he owes Dempster a lot of
money.

3. Modern Approach to Corpus Linguistics


Why do we use corpus studies?

Corpus data is observable and veriable


It is more objective
Made up sentences are articial and far away
from the one occuring in the corpus
Some types of langugae that can be only
gathered accurately from a corpus

3. Modern Approach to Corpus Linguistics

Brown and LOB

Brown- American English corpus; around 1 million


words
LOB- British English corpus; also around 1 million
words

Disadvantage: small, not up to date


Advantage: similarly structured

Brown and LOB categories

A Press: reportage (44, 44)


B Press: editorial (27, 27)
C Press: reviews (17, 17)
D Religion (17, 17)
E Skills, trades and hobbies (36, 38)
F Popular lore (48, 44)
G Belles letters, biography, essays (75, 77)
H Miscellaneous (documents, reports, etc.) (30, 30)
J Learned and scientic writings (80, 80)
K General Fiction (29, 29)
L Mystery and detective ction (24, 24)
M Science ction (6, 6)
N Adventure and western ction (29, 29)
P Romance and love story (29, 29)
R Humour (9, 9)

Sources
Websites:

https://www.futurelearn.com/courses/corpus-linguistics

http://www.lancaster.ac.uk/

http://www.antlab.sci.waseda.ac.jp/software.html

Litearature:

McEnery, T. and Wilson, A. (2001) Corpus Linguistics,


Edinburgh University Press, Edinburgh.

Leech, G. (1991). The state of the art in corpus linguistics. In


Aijmer, K. & B. Altenberg (eds.),English corpus linguistics:
studies in honour of Jan Svartvik. London: Longman. 829.

CASS: Briengs 2013, The ESRC Centre for Corpus


Approaches to Social Science (CASS), Lancaster University, UK

También podría gustarte