P. 1
Language Assessment - Principles and Classroom Practice

Language Assessment - Principles and Classroom Practice

5.0

|Views: 19.996|Likes:
Publicado porflywhile

More info:

Published by: flywhile on Sep 16, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

11/10/2015

pdf

text

original

longman.

com
CONTENTS
Preface
Text Credits
1 Testing, Assessing, and Teaching
What Is a Test?, 3
Assessment and Teaching, 4
Informal and Formal Assessment, 5
Formative and Summative Assessment, 6
Norm-Referenced and Criterion-Referenced Tests, 7
Approaches to Language Testing: A Brief History, 7
Discrete-Point and Integrative Testing, 8
Communicative Language Testing, 10
Performance-Based Assessment, 10
Current Issues in Classroom Testing, 11
New Views on Intelligence, 11
Traditional and "Alternative" Assessment, 13
Computer-Based Testing, 14
Exercises, 16
For Your Further Reading, 18
2 Principles of Language Assessment
Practicality, 19
Reliability, 20
Student-Related Reliability, 21
Rater Reliability, 21
Test Administration Reliability, 21
Test Reliability, 22
Validity, 22
Content-Related Evidence, 22
Criterion-Related Evidence, 24
Construct-Related Evidence, 25
Consequential Validity, 26
ix
xii
1
19
iv CONTENTS
Face Validity, 26
Authenticity, 28
Washback, 28
Applying Principles to the Evaluation of Classroom Tests, 30
1. Are the test procedures practical? 31
2. Is the test reliable? 31
3. Does the procedure demonstrate content validity? 32
4. Is the procedure face valid and "biased for best"? 33
5. Are the test tasks as authentic as possible? 35
6. Does the test offer beneficial washback to the learner? 37
Exercises, 38
For Your Further Reading, 41
3 Designing Classroom Language Tests
Test Types, 43
Language Aptitude Tests, 43
Proficiency Tests, 44
Placement Tests, 45
Diagnostic Tests, 46
Achievement Tests, 47
Some Practical Steps to Test Construction, 48
Assessing Clear, Unambiguous Objectives, 49
Drawing Up Test Specifications, 50
Devising Test Tasks, 52
Designing Multiple-Choice Test Items, 55
1. Design each item to measure a specific objective, 56
2. State both stem and options as simply and directly as pOSSible, 57
42
3. Make certain that the intended answer is clearly the only correct one, 58
4. Use item indices to accept, discard, or revise items, 58
Scoring, Grading, and Giving Feedback, 61
Scoring, 61
Grading, 62
Giving Feedback, 62
Exercises, 64
For Your Further Reading, 65
4 Standardized Testing
What Is Standardization?, 67
Advantages and Disadvantages of Standardized Tests, 68
Developing a Standardized Test, 69
1. Determine the purpose and objectives of the test, 70
2. Design test specifications, 70
3. Design, select, and arrange test tasks/items, 74
4. Make appropriate evaluations of different kinds of items, 78
66
5. Specify scoring procedures and reporting formats, 79
6. Perform ongoing construct validation studies, 81
Standardized Language Proficiency Testing, 82
Four Standardized Language Proficiency Tests, 83
Test of English as a Foreign Language (TOEFL@),84
Michigan English Language Assessment Battery (MEIAB), 85
International English Language Testing System (lELTS), 85
Test of English for International Communication (TOEIC@), 86
Exercises, 87
For Your Further Reading, 87
Appendix to Chapter 4:
Commercial Proficiency Tests: Sample Items and Tasks, 88
Test of English as a Foreign Language (TOEFL 88
Michigan English Language Assessment Battery (MEIAB), 93
International English Language Testing System (lELTS), 96
Test of English for International Communication 100
5 Standards-Based Assessment
ELD Standards, 105
ELD Assessment, 106
CASAS and SCANS, 108
Teacher Standards, 109
The Consequences of Standards-Based and Standardized Testing, 110
Test Bias, 111
Test-Driven Learning and Teaching, 112
Ethical Issues: Critical Language Testing, 113
Exercises, 115
For Your Further Reading, 115
6 Assessing Listening
Observing the Performance of the Four Skills, 117
The Importance of Listening, 119
Basic Types of Listening, 119
Micro- and Macroskills of Listening, 121
Designing Assessment Tasks: Intensive Listening, 122
Recognizing Phonological and Morphological Elements, 123
Paraphrase Recognition, 124
Designing Assessment Tasks: Responsive Listening, 125
Designing Assessment Tasks: Selective Listening, 125
Listening Cloze, 125
Information Transfer, 127
Sentence Repetition, 130
Designing Assessment Tasks: Extensive Listening, 130
CONTENTS V
104
116
vi CONTENTS
Dictation, 131
Communicative Stimulus-Response Tasks, 132
Authentic Listening Tasks, 135
Exercises, 138
For Your Further Reading, 139
7 Assessing Speaking
Basic Types of Speaking, 141
Micro- and Macroskills of Speaking, 142
Designing Assessment Tasks: Imitative Speaking, 144
PhonePass® Test, 145
Designing Assessment Tasks: Intensive Speaking, 147
Directed Response Tasks, 147
Read-Aloud Tasks, 147
Sentence!Dialogue Completion Tasks and Oral Questionnaires, 149
Picture-Cued Tasks, 151
Translation (of Limited Stretches of Discourse), 159
Designing Assessment Tasks: Responsive Speaking, 159
Question and Answer, 159
Giving Instructions and Directions, 161
Paraphrasing, 161
Test of Spoken English (TSE®), 162
Designing Assessment Tasks: Interactive Speaking, 167
Interview, 167
Role Play, 174
Discussions and Conversations, 175
Games, 175
Oral Proficiency Interview (OPD, 176
Designing Assessment: Extensive Speaking, 179
Oral Presentations, 179
Picture-Cued Story-Telling, 180
Retelling a Story, News Event, 182
Translation (of Extended Prose), 182
Exercises, 183
For Your Further Reading, 184
8 Assessing Reading
Types (Genres) of Reading, 186
Microskills, Macroskills, and Strategies for Reading, 187
Types of Reading, 189
Designing Assessment Tasks: Perceptive Reading, 190
Reading Aloud, 190
Written Response, 191
140
185
Multiple-Choice, 191
Picture-Cued Items, 191
Designing Assessment Tasks: Selective Reading, 194
Multiple-Choice (for Form-Focused Criteria), 194
Matching Tasks, 197
Editing Tasks, 198
Picture-Cued Tasks, 199
Gap-Filling Tasks, 200
Designing Assessment Tasks: Interactive Reading, 201
Cloze Tasks, 201
Impromptu Reading Plus Comprehension Questions, 204
Short-Answer Tasks, 206
Editing (Longer Texts), 207
Scanning, 209
Ordering Tasks, 209
Information Transfer: Reading Charts, Maps, Graphs, Diagrams, 210
Designing Assessment Tasks: Extensive Reading, 212
Skimming Tasks, 213
Summarizing and Responding, 213
Note-Taking and Outlining, 215
Exercises, 216
For Your Further Reading, 217
9 Assessing Writing
Genres of Written Language, 219
Types of Writing Performance, 220
Micro- and Macroskills of Writing, 220
Designing Assessment Tasks: Imitative Writing, 221
Tasks in [Hand] Writing Letters,Words, and Punctuation, 221
CONTENTS VII
218
Spelling Tasks and Detecting Phoneme-Grapheme Correspondences, 223
Designing Assessment Tasks: Intensive (Controlled) Writing, 225
Dictation and Dicto-Comp, 225
Grammatical Transformation Tasks, 226
Picture-Cued Tasks, 226
Vocabulary Assessment Tasks, 229
Ordering Tasks, 230
Short-Answer and Sentence Completion Tasks, 230
Issues in Assessing Responsive and Extensive Writing, 231
Designing Assessment Tasks: Responsive and Extensive Writing, 233
Paraphrasing, 234
Guided Question and Answer, 234
Paragraph Construction Tasks, 235
Strategic Options, 236
Test of Written English (TWE@), 237
viii CONTENTS
Scoring Methods for Responsive and Extensive Writing, 241
Holistic Scoring, 242
Primary Trait Scoring, 242
Analytic Scoring, 243
Beyond Scoring: Responding to Extensive Writing, 246
Assessing Initial Stages of the Process of Composing, 247
Assessing Later Stages of the Process of Composing, 247
Exercises, 249
For Your Further Reading, 250
10 Beyond Tests: Alternatives in Assessment
The Dilemma of Maximizing Both Practicality and Washback, 252
Performance-Based Assessment, 254
Portfolios, 256
Journals, 260
Conferences and Interviews, 264
Observations, 266
Self- and Peer-Assessments, 270
Types of Self- and Peer-Assessment, 271
Guidelines for Self- and Peer-Assessment, 276
A Taxonomy of Self- and Peer-Assessment Tasks, 277
Exercises, 279
For Your Further Reading, 280
11 Grading and Student Evaluation
Philosophy of Grading: What Should Grades Reflect? 282
Guidelines for Selecting Grading Criteria, 284
Calculating Grades: Absolute and Relative Grading, 285
Teachers' Perceptions of Appropriate Grade Distributions, 289
Institutional Expectations and Constraints, 291
Cross-Cultural Factors and the Question of Difficulty, 292
What Do Letter Grades "Mean"?, 293
Alternatives to Letter Grading, 294
Some Principles and Guidelines for Grading and Evaluation, 299
Exercises, 300
For Your Further Reading, 302
Bibliography
Name Index
Subject Index
251
281
303
313
315
PREFACE
The field of second language acquisition and pedagogy has enjoyed a half century
of academic prosperity, with exponentially increasing numbers of books, journals,
articles, and dissertations now constituting our stockpile of knowledge. Surveys of
even a subdiscipline within this growing field now require hundreds of biblio-
graphic entries to document the state of the art. In this melange of topics and issues,
assessment remains an area of intense fascination. What is the best way to assess
learners' ability? What are the most practical assessment instruments available? Are
current standardized tests of language proficiency accurate and reliable? In an era of
communicative language teaching, do our classroom tests measure up to standards
of authenticity and meaningfulness? How can a teacher design tests that serve as
motivating learning experiences rather than anxiety-provoking threats?
All these and many more questions now being addressed by teachers,
researchers, and specialists can be overwhelming to the novice language teacher,
who is already baffled by linguistic and psychological paradigms and by a multitude
of methodological options. This book provides the teacher trainee with a clear,
reader-friendly presentation of the essential foundation stones of language assess-
ment, with ample practical examples to illustrate their application in language class-
rooms. It is a book that simplifies the issues without oversimplifying. It doesn't
dodge complex questions, and it treats them in ways that classroom teachers can
comprehend. Readers do not have to become testing experts to understand and
apply the concepts in this book, nor do they have to become statisticians adept in
manipulating mathematical equations and advanced calculus.
PURPOSE AND AUDIENCE
This book is designed to offer a comprehensive survey of essential principles and
tools for second language assessment. It has been used in pilot forms for teacher-
training courses in teacher certification and in Master of Arts in TESOL programs. As
the third in a trilogy of teacher education textbooks, it is designed to follow my
other two books, Principles of Language Learning and Teaching (Fourth Edition,
ix
X PREFACE
Pearson Education, 2000) and Teaching by Principles (Second Edition, Pearson
Education, 2001). References to those two books are sprinkled throughout the cur-
rent book. In keeping with the tone set in the previous two books, this one features
uncomplicated prose and a systematic, spiraling organization. Concepts are intro-
duced with a maximum of practical exemplification and a minimum of weighty def-
inition. Supportive research is acknowledged and succinctly explained without
burdening the reader with ponderous debate over minutiae.
The testing discipline sometimes possesses an aura of sanctity that can cause
teachers to feel inadequate as they approach the task of mastering principles and
designing effective instruments. Some testing manuals, with their heavy emphasis
on jargon and mathematical equations, don't help to dissipate that mystique. By the
end of Language Assessment: Principles and Classroom Practices, readers will have
gained access to this not-so-frightening field. They will have a working knowledge
of a number of useful fundamental principles of assessment and will have applied
those principles to practical classroom contexts. They will have acquired a store-
house of useful, comprehensible tools for evaluating and designing practical, effec-
tive assessment techniques for their classrooms.
PRINCIPAL FEATURES
Notable features of this book include the following:
• clearly framed fundamental principles for evaluating and designing assess-
ment procedures of all kinds
• focus on the most common pedagogical challenge: classroom-based assess-
ment
• many practical examples to illustrate principles and guidelines
• concise but comprehensive treatment of assessing all four skills (listening,
speaking, reading, writing)
• in each skill, classification of assessment techniques that range from con-
trolled to open-ended item types on a specified continuum of micro- and
macroskills of language
• thorough discussion of large-scale standardized tests: their purpose, design,
validity, and utility
• a look at testing language proficiency. or "ability"
• explanation of what standards-based assessment is, why it is so popular, and
what its pros and cons are
• consideration of the ethics of testing in an educational and commercial
world driven by tests
• a comprehensive presentation of alternatives in assessment, namely, portfo-
lios, journals, conferences, observations, interviews, and self- and peer-
assessment
PREFACE XI
• systematic discussion of letter grading and overall evaluation of student per-
formance in a course
• end-of-chapter exercises that suggest whole-class discussion and individual,
pair, and group work for the teacher education classroom
• a few suggested additional readings at the end of each chapter
WORDS OF THANKS
Language Assessment: Principles and Classroom Practices is the product of many
years of teaching language testing and assessment in my own classrooms. My students
have collectively taught me more than I have taught them, which prompts me to
thank them all, everywhere, for these gifts of knowledge. I am further indebted to
teachers in many countries around the world where I have offered occasional work-
shops and seminars on language assessment. I have memorable impressions of such
sessions in Brazil, the Dominican Republic, Egypt, Japan, Peru, Thailand, Turkey, and
Yugoslavia, where cross-cultural issues in assessment have been especially stimulating.
I am also grateful to my graduate assistant, Amy Shipley, for tracking down
research studies and practical examples of tests, and for preparing artwork for some
of the figures in this book. I offer an appreciative thank you to my friend Maryruth
Farnsworth, who read the manuscript with an editor's eye and artfully pointed out
some idiosyncrasies in my writing. My gratitude extends to my staff at the American
Language Institute at San Francisco State University, especially Kathy Sherak, Nicole
Frantz, and Nadya McCann, who carried the ball administratively while I completed
the bulk of writing on this project. And thanks to my colleague Pat Porter for
reading and commenting on an earlier draft of this book. As always, the embracing
support of faculty and graduate students at San Francisco State University is a con-
stant source of stimulation and affirmation.
H. Douglas Brown
San Francisco, California
September 2003
TEXT CREDITS
xii
Grateful acknowledgment is made to the following publishers and authors for per-
mission to reprint copyrighted material.
American Council on Teaching Foreign Languages (ACTFL), for material from
ACTFL Proficiency Guidelines: Speaking (1986); Oral ProfiCiency Inventory (OPI):
Summary Highlights.
Blackwell Publishers, for material from Brown, James Dean & Bailey, Kathleen M.
(1984). A categorical instrument for scoring second language writing skills. Language
Learning, 34, 21-42.
California Department of Education, for material from California English
Language Development (ELD) Standards: Listening and Speaking.
Chauncey Group International (a subsidiary of ETS), for material from Test of
English for International Communication (TOEIC®).
Educational Testing Service (ETS), for material from Test of English as a Foreign
Language (TOEFL®); Test of Spoken English (TSE®); Test of Written English (TWE®).
English Language Institute, University of Michigan, for material from Michigan
English Language Assessment Battery (MELAB).
Ordinate Corporation, for material from PhonePass®.
Pearson!Longman ESL, and Deborah Phillips, for material from Phillips, Deborah.
(2001). Longman Introductory Course for the TOEFL® Test. White Plains, NY: Pearson
Education.
Second Language Testing, Inc. (SLm, for material from Modern Language Aptitude
Test.
University of Cambridge Local Examinations Syndicate (UCLES), for material from
International English Language Testing System.
Yasuhiro Imao, Roshan Khan, Eric Phillips, and Sheila Viotti, for unpublished material.
CHAPTER 1
TESTING, ASSESSING,
AND TEACHING
If you hear the word test in any classroom sening, your thoughts arc nOt likely to be
positive, plcas,lllt, or :lffirming. The anticipation of a leSt is almost always accompa-
ni ed by feelings of anxiety and self-doubt-;dong with II fervent hope th .. t you will
come out of it llUve. Tests seem as unavoidable as tOmorrow's sunrise in virrually
every kjnd of educational setting. Courses of study in every diSCipline are marked
by periodic (csts-milcstom::s of progress (or inadequacy)-and you intensely wish
for a mil'lcuious exemption from these ordeals. We live by tests and sometimes
(mcl'aphoricall y) die by lhem.
For a quick revisiting of how tests affect many learners, take the following
vocabulary quiz. All tJlt: words are found in standard English dictionaries, SO rOll
should be able to answer aU six items correctly, right ? Okay, take the quiz and circle
the correct definition for each word.
Circle the correct answer. You have 3 minutes to complete this examination!
1. polygene a.
b.
c.
d.
2. cynosure a.
b.
c.
d.
the first stratum of lower-order protozoa containing multiple genes
a combination of two or more plastics to produce a highly durable
material
one of a set of cooperating genes, each producing a small
quantitative effect
any of a number of multicellular chromosomes
an object that serves as a focal point of attention and admiration; a
center of interest or attention
a narrow opening caused by a break or fault in limestone caves
the cleavage in rock caused by glacial activity
one of a group of electrical Impulses capable of passing through
metals
1
2 CliAPTfR 1 Testing. N5eSsing. and Te.1c::hing
3. gudgeon
4. hippogri ff
5. reglet
6. Hctile
a.
b.
c.
d.
a.
b.
c.
a jail for commoners during the Middle Ages, located in the villages
of Germany and France
a strip of metal used to reinforce beams and girders in building
construction
a tool used by Alaskan Indians to carve totem poles
a small Eurasian freshwater fish
a term used in children's literature to denote colorful and descriptive
phraseology
a mythological monster having the wings, claws, and head of a
griffin and the body of a horse
ancient Egyptian cuneiform writing commonly found on the walls of
tombs
d. a skin transplant from the leg or foot to the hip
a. a narrow, flat molding
b. a musical composition of regular beat and harmonic intonation
c. an Australian bird of the eagle family
d. a short sleeve found on women's dresses in Victorian
England
a. a short. oblong-shaped projectile used in early eighteenth-century
cannons
b. an Old English word for the leading character of a fictional
novel
c. moldable plastic; formed of a moldable substance such as clay or
earth
d. pertaining to the tendency of certain lower mammals to lose visual
depth perception with increasing age
Now, how did that make you feel? Probably just the same as many learners
feel when t hey take many mul tiple-choice (or shall we say multi ple·guess?).
ti med, Rt ricky· tests. To add to the torme,llt, if this were a commerciall y adminis·
tered standardi zed rest , you might have to wait weeks before learning your
resul tS. You can check you,. answers on this quiz now by furning to page 16. If
yOll correctl y idcntifi ed three or more items, congrat ulations! YOli jllst exceeded
the average.
Of course, this little pop qui z on obscure vocabulary is not :m appropri ate
example of classroom·based achievement tcsting, nor is it intended to be. It's simply
an illustration of how tests make us [eel much of the ti mc. can tests be positive
experiences? am they build a person's confidence and become learning experi-
ences? C.1n they bring OUi tbe best in students? The answer is a resounding yes!
Tests need not be degrading, artifiCial, anxiety·provoking experiences. And that's
partly what this book is a11 about: helping YOll to create more authentic, intrinsicallr
CHM'TfR I Testing. Assessing, and Te;!ching 3
motivating assessment procedures that are appropriate for their context and
designed to offcr constnlctive feedback to your students.
Before we look at tests and (CSt design in second language education, we need
to understand three basic interrelated concepts: testing, assessment, and teaching.
Notice that the title of this book is Langtwge Assessmenl, not Language Testing.
Thcre are impon'am differences between these tWO constructs. and an even more
important relationShip among testing, assessing, and tcaching.
WHAT IS A TEST?
A test, in simple terms, is a melhod of measuring a person's abillt;l knowledge, or
performance in a given domain. Let's look at the components of this definition. A
test is first a method. It is an instrument-a set of techniques, procedures, or ilems-
th:lt reqUires performance on the part of the test-taker. 10 qualify as a test, the method
must be explicit and stnlCtured: multiple-choice questions with prescribed correct
answers; a writing prompt with a searing mbric; an oral intervic.!w based on a ques-
tion script and a dlecklist of expected n:sponses to be filled in by the administrator.
Second, a test must measure. Some tests measure gener ... 1 ability. while olhers
focus on very specific competencies or objectives. A muJti-skili proficiency test
dctermines a gene!"AI ability level; a quiz on recognizing correct use of definite arti-
cles measures specific knowledge. TIle W"dY the fCSuJtS or measurements are com-
mUll.icated may vary. Some tests. SUdl as a classroom-based shon-answer essay test,
may earn the test-taker a letter grade accompanied by the instructor'S marginal com-
ments. Others, particuJarly large-scaJe standardized tests, provide a tOtal numerical
score, a percenlile rank, and perhaps some subscores. If an instrument does not
specify a form of reporting measurement-a means for offering the test-taker some
kind of result-then that technique cannot appropriately be defmed as a test.
Ncxt, a test measures an individual's ability, knowledge, or performance. Testers
need to understand who the test-takers are. What is their previous experiencc and
background? Is the test appropriately matdled to their abilities? How should test-
takers interpret lheir scores?
A leSt measures performance, but the results imply the test·raker's ability, or, to
use a concept common in the field of linguistics, competence. Most language tests
measufC one's ability to perform language. that is, to speak, write, read, or li sten to a
subset of language. On the other hand, it is not uncommon to find teStS designed to
tap into a test-taker's knowledge i!h2!.!! langlL1ge:dcftning a vocabulary item, reciting
a granunatical rule, or identifying a rhetorical feature in written discourse.
Performance-based tests sample the test-taker's actual use of language, but from
those samples the test administrator infers generdl competence. A test of reading
comprehension, for example, may consist of several short reading passages each fol-
lowed by a limited number of comprehension questions-a small sample of a
second language learner's total reading behavior. But from the results of that test, the
examiner may infer a certain level of general reading ability.
4 awrrCR I Tf!S,ing, Assessing, ;lIId Teaching
Finally. a 1.eSI measures a given domain. In the case of a proficiency (CSt , even
though the actual perfonnance on me test involves only a sampling of skills, that
domain is overnU proficiency in a language-general competence in all skills of a
language. Olher tests may have more specific criteria. A test of pronunciation might
well be a tCSt of only a limited set of phonemic minimal pairs. A vocabulary lesl may
focus on only the set of words covered in a particular lesson or unit. One of the
biggest obstacles to overcome in conslructing adequate tests is to measure the
desired criterion and nOt include other factors inadvertcmiy, an issue that is
addressed in Chapters 2 and 3.
A well-constructed test is an instnlment that provides an :ICC urate measure of
the test-laker's ability wi thin a particular domain. 11lt:: definition sounds fdirly simple,
bur in fdCt , constructing a good test is a complex task involving both science and art.
ASSESSMENT AND TEACHING
Assessment is :1 popular :lnd sometimes misunderstood Icrm in current educational
practice. You might be tempted to think of testing and assessing as synonymous
terms, but they are nOL Tests are prepared administr:nivc proccdu.fCS that occur at
idenrifiable times in a curriculum when learners muster aJl their fuculties to offer
peak perfornlance, knowing that their responses arc being measured and evaluated.
Assessment. on the other hand, is an ongoing process that encompasses a much
wider domain. Whenever a student responds to a questiOn, offers a comment, or
tries Out a new word or SlruCfure, the teacher subconsciously makes an assessmem
of the student's performance. Written work-from a joued-down phrase to a formal
essay-is performance that u1timatt:1)' is assessed by self, teacher, and possibly other
.students. Reading and listening activities lIsuaUy require some SOrt of productive
performance that the teacher implicitly judges, however peripheral that judgment
may be. A good teacher never ceases to assess students, whether those assessmcnts
are incidental or intend cd.
Tests, then, arc a subset of assessment; they are certainly not the only form of
a'i$Cssment that a teacher can make. Tests can be useful devices, but they are only one
among many procedures and t:lsks that teachers can ultimately uSt: to asscss students.
But now, you might be thinking, if you make assessments every time you teach
something in 1.11C dassroom, does all teaching involve assessment? Are teachers con·
stantly assesSing students with no interaction that is assessment-free?
'nlC depends on your perspective. For optim:.l learning to take place,stll-
dent'i in the classroom must have the freedom to experiment, to tryout their own
hypotheses about language without feeling that their ove.raU competence is being
judged in temlS of those trials and errors. I.n the same way that tournament tennis
pbyers must, before a tournament, have the freedom to pra<.:tice their skiUs with no
implications for their final placement on th:11 day of days, so also must learners have
3ID.pIe. opportunities to "play" with language in a classroom Without being formally
cw,PTER 1 Tesling. Assessing. and Teaching 5
graded, Teaching sets up the practict: games of language learning: the opportunities
for learners to listen. think, take risks, set goals, and process feedback from the
"coach- and then recyde through the skills that they are trying to m:lSter, (A diagram
of the rel:uionship among testi ng, teaching, and assessment is found in Figure 1. 1.)
E:v
ASSESSMENT
TEACHING
Figure 1. 1. Tests, assessment and teaching
At the same time, during these practice activities, teachers (and tennis coaches)
are indeed observing snldents' performance and making varions evaluations of cadI
learner: How did the performance compare to previolls performance? Which
aspects of the performance were better than others? Is tlle learner pedorming up
to an c..'Cpccred potential? How does the performance compare to that of others in
the same learning communi ty? In the ideal dassroom, all these obscrv.lIions feed
intO tlle way the teacher provides instruction to each student.
Informal and Formal Assessment
One way to begin untangling the lexical conundnml created by distinguishing
among tests, assessment, and teaching is to distinguish between informal and formal
assessment. Informal assessment can (:Ike a number of forms, starting wi th inci-
demal , unplanned comments and responses, along with coaching and other
impromptu feedback to the student. Examples include saying kGood
you say can or can't?" 4l think you meant to say you broke the glass,
not you break the glass," or putting a @ on some homework.
Informal assessment does not SLOp there. A good deal of a reacher's informal
assessment is embedded in dassroom tasks designed to elicit performance wi thout
recording results and making fLxed judgments about a student 's competence.
E.xamples at this end of the continuum are marginal comments on papers,
responding to a drAft of all essay, advice about how to bener pronounce a word, a
6 Testing, Assessing, ,mel Teaching
suggestion for a strategy for compensating for a reading difficulty, and showing how
to modify a student's note-taking to bener remember the coment of a lecture.
On the other hand, formal assessments arc exercises or procedures specifi-
call y designed to tap into a storehouse of skills and knowledge. They are systematic,
planned sampling tedutiqucs constructed to give teacher and student an appraisal
of studem achievement. To extend the tennis analogy, formal assessments are tbe
tournament games that occur periodically in the course of a regimen of practice.
Is fonnal assessment the same as a test? We can say that aU tests arc form31
assessments, but not all fonnal assessment is testing. For example, you might use a
student' s journal or portfoliO of materi31s as a formal assessment of the allainment
of certain course objectives, but it is problematic to call those two procedures
"tests.M A systematic set of observations of a student's frequen<:y of oral participation
in class is certainly a formal assessment, but it too is hardly what anyone would call
a test. Tests arc usually rel:ltively tinH!-constrained (usually spanning a class period
or at mOSt several hours) and draw on a limiled sample of behavior.
Formative and Summative Assessment
Another useful distinction to bear in mind is the function of an assessment: How is
the procedure to be used? 1\vo functions are commonly identified in the literature:
formative and summative assessment. Most of our dassroom assessment is forma-
tive assessment: evaluating students in the process of tbeir competen-
cies and skills with the goal of helping them to continue that growth process. The
key (0 such formation is lhe delivery (by the teacher) and intermLtizaUon (by the stu-
dent) of appropriatc feedback on performance, with an eye toward the future can·
tinuation (or formation) of learning.
For 311 pr.1clical purposes, virtually all kinds of informal assessment are (or
should be) formative. Tbey have as their primary focus tbe ongoing development of
the learner's language. So wben you give a student a comment or a suggestion, or
call attention to an error, that feedback is offered in order to improve tbe leamer's
language ability.
Summative assessment aims to measure, or summarize, what a student has
grasped, lind typically occurs at the end of a course or unit of instruction. A sum-
mat ion of what a student has learned implies looking back :lOd taking stock of bow
well that studem has accomplished objectives, but does not necessarily point the
way to future progress. Final exams in a course and general proficiency exams arc
examplcs of summative assessment.
One of the problems with prevailing attitudes toward testing is the view that
all tests (quizzes, periodic review tests, midterm exams, etc.) arc summative. At var-
iOlIS points in your past educational c.xperiences, no doubt you've considered such
tests as summ:J.tive. You may have tbought."Whew! I'm glad that's over. Now I don' t
ha .... e to remember thar stuff anymore! " A challenge to you as a teacher is to change
thai attitude among your students: Can you instill a more formative quality to what
F
01APJ'ER 1 Testing, A55eS5iog. and Teachin8 7
yOllr students might otherwise view as a summalivc test? Can you offer yOllr Stu-
dents ao opportunity to convert testS into "learning experiences· ? We will take lip
that dl3.lIenge in subsequent chapters in this book.
Norm·Referenced and Criterion-Referenced Tests
Anotber dichotomy that is important to clarify here and that aids in sorting out
common terminology in assessment is tbe distinction between norm·referenced
and criterion-referenced testing. In norm-referenced tests, each tcst-taker's score
is interpreted in relation to a mean (average score), median (middle score), standard
deviation (extent of variance in scores), and/or percentile rank. The purpose i.o such
tests is to place tcst·takers along a mathematical continuum in rank order. Scores are
usually reponed back to the test-taker in the form of a numerical score (for
example, 230 oul of 300) and a percentile rank (such :IS 84 percent , which means
that the test-taker's score was higher than 84 percent of the total number of test-
lakers, but lower than 16 percent in that administration). Typical of norm-referenced
tests are standardized testS like the Scholastic Aptitude Test (SAl 4J or the Test of
English as a Foreign Language (TOEFL"), imended to be administered to large audi-
ences, with resuils efficiently disseminated to tesHakers. SUdl tests mUSt bave fixed,
predetermined responses in a format that can be scored quickJy at minimum
expense. Money and cfiiciency are primary concernS in these tests.
Criterion-referenced tcsts, on the otber hand, arc designed to give test-takers
feedback, usuaUy in the fonn of grades, on specific course or lesson objectives_
Classroom tests involving lhe srudents in onl y one class, and connected to a c ur-
ri culum, are typical of criterion-referenced testing. Here, much lime and effort on the
part of the teacher (test administratOr) are sometimes required in order to deliver
useful, appropriate feedback to studems, or what OiJer ( 1979. p. 52) called
lional In a c ri terion-referenced tcst, the distribution of students' scores lCroSS
a continuum may be of little concern as long as the instrument assesses appropriate
ohjectives. In Ltmguage Assessme1lf, with an ludience of classroom language
teachers and teachers in training, and with its emphasis on classroom-based assess-
ment (as opposed to standardized, large-scale tcsting), c riterion-referenced testing is
of more prominent interest than norm-referenced testing.
APPROACHES TO LANGUAGE TESTING: A BRIEF HISTORY
Now that yOll have a reasonably clear grasp of some common assessment terms, we
now rum to one of the primary concerns of this book: the creation and use of tests,
particularly classroom tests. A brief history of language testing over the past half-
century will serve as a backdrop to an understanding of classroom-based t'esting.
HistOrically, language-testing trends and practices have followed the shifting
sands of reaching methodology (for a description of these trends, see Brown,
8 CHAPTfR ' Testing. Assessing. and Tei/ching
Tetlcbing by Prlndples [hereinafter TBP) , Chapter 2). I For example, in the 1950s, an
era of behaviorism and special anention to contrastive analysis. testing focused on
specific language elements such as the phonological , grammatical , and lexical con·
trasts between two languages. In the 1970s and 1980s, communicative theories of
language brought with them a more integrative view of testing in whlch specialists
daimed that ~ t h c whole of the communicative event was considerably greater than
the sum of its linguistic elements& (Clark, 1983, p. 432). Today, tcst designers are still
challenged in their quest for more authentic, valid instruments that simulate real·
world interaction.
Discrete·Point and Integrative Testing
11tis historical perspective underscores (\'10 major approaches to language testing
that were debat.ed in the 1970s and early I 980s. TIlcse approaches stiU prevail today,
even if i.n mutated form: the dlOice between discrete·poi nt and integr.uive testing
methods (OUCI', 1979). Discrete·point tests are constructed on the assumption that
langlt:lge can be broken down into its componem parts and that those parts can be
tested successfully. These components an:: the skills of UStening, speaking, reading,
and writing, and various unilS of language (discrete points) of phonologYI
graphology, morphology, lexicon, syntax, and discourse. It was claimed that an
overall language proficiency test, then, should sample all four skills and as many lin-
guistic discrete points as pos!!>ible.
Such an approach demanded a decontextuali .. ..ation that often confused the
test-taker. So, as the profession emerged into an era of emphasizing communication,
authentiCity, and context, new approaches were sought. Oller (1979) argued that
language competence is a unified set of interJ.Cting abilities that cannot be tested
separately. His claim was that communicative competcnce is so global and requires
such integration (hencc the term "integrative- testing) that it cannot be captured in
additive tests of grammar, reading, vocabulary. and other discrete points of language.
Others (among them C .. Jko, 1982. and Savignon, 1982) soon followed in their sup-
pOrt for integrative testing.
What docs an integrative test look like? Two types of tests have historically
been claimed to be examples of integrative tests: doze teSls and dictations. A doze
test is a reading passage (perhaps 150 to 300 words) i.ll which roughly every sixth
0 1' seventh word has bet:n deleted; the test-taker is required to supply words that fit
into those blanks. (See Chapter 8 for a full discussion of doze testing.) Oller (1979)
I Frequent references are made in this book 10 companion vol umes by Lhe author.
Principles of umguage Leaming and TeachiflE (pUj) (Founh Edition, 2000) is a
basic teacher reference book on essential foundations of second language acquisition
on which pedagogical practices arc based. Teachf" E by Pri1lclples (TEP) (Second
Edition. 200 I) spells out that pedagogy in practical terms for the language teacher.
OW'TER' Testing. AssessinS- and Teaching 9
claimed that doze test results are good measures of overall proficiency. According
to theoretical cOnStruCLS underlying this claim, the ability to supply appropriate
words in blanks requires a number of abilities that lie at the bean of competence in
a language: knowledge of vocabuJary, grammatical strucrure, discourse structure,
reading skills and strategies, and an internalized "expectancy" grammar (enabling
onc to predkl an item lhat will come next in a seque.nce). It was argued that suc-
cessful comple.tion of doze items taps into all of those abilities, wl:tich were said to
be the essence of global language proficiency.
Dictation is a familiar language-teaching tt.'dlnique that evolved into a testing
technique. Esscmially, learners listen to a passage of 100 to 150 words read aloud by
an admjnistr'J.tor (or audiotape) and write what they bear, using correct spelling. TIle
listening portion usually has three stages: an ornl reading without' pauses; an oral
reuding witJ, long pauscs between every phrase (to give the learner time to write
down what is heard); and a third reading at normal speed to give test-takers a chance
to check what they wrote. (Sec Chapter 6 for more discussion of di ctation as an
assessment device.)
Supporters arguc that dictation is an integralive test because it lapS into gram-
matical and discourse competencies required for other modes of performance in a
language. Success on a dictation requires careful listening, reproduction in writing
of what is heard, efficient shon-ter:m memory, and, to an extent, some expectancy
rules to aid the short-term memory. Funher, dictation test result's tend to correlate
strongly with other tesLS of profiCiency. Dictation testing is usually classroom-
centered since large-scale administration of dictations is quite impractical from a
scoring standpoint. Reliability of scoring criteria for dictation tests can be improved
by designing mUltiple-choice or exact-word cloze test scoring.
Iwponents of integrative test methods soon centered their arguments on what
became known as the unitary trait hypothesis, which suggestt':d an "indivisible-
view of language proficiency: that vocabulary, grammar, phonology, the
and other discrete points of language couJd not be disemangJcd from each other in
language performance. The unitary trait hypothcsis contended that there is a gen-
eral factor of language proficiency such that all the discrete pOillLS do not add up to
that whole.
Others argued strongly against the unitary trait pOSition. In a study of students
in Brazil and the Philippines, Farhady (1982) found signlfic:mt and widely varyi ng
differences in performance on an ESt proficiency test, dcpt':nding on subjects' native
country, major field of study,and graduate versus undergraduate status. For example,
Brazili ans scored very low in listening comprehension and reillti\'ely high in reading
comprehension. Filipinos, whose scores on five of the six componenLS of the test
were conSiderably higher than Bra:dlians' scores, were actually lower than Brazilians
in re3ding comprehension scores. Farhady's contentions were supported in other
research that seriously questioned the unitary trait hypothesis. Finally, in the face of
the evidence, Oll er retreated from his earlier stand and admitted that "the unit3ry
tro.it hypothesiS was (1983, p. 352).
10 CWoPTfR' Testing. and Teaching
Communicative Language Testing
By lhe mid-1980s, the language-testing field had abandoned argumentS abom the
unilary trait hypothesis and had begun to focus on designi ng communicative
languagNesting tasks. Bachman and Palmer (1996, p. 9) include among Kfunda·
menial- principles of language testing the need for a correspondence between lan·
guage leSt performance and language use:"ln order for a particular language test to
be useful for itS intended purposes, test performance must correspond in demon-
stl"'Jble ways to language use in non-teSt situations." The problem that language
assessment expens faced was that tasks tended to be artificial, contrived, and
unlikely to mirror language use in real life. As Wei r ( 1990, p. 6) noted, "Integrative
tests such ;IS doze only tell us about a candidat e's linguistic competence. llley do
not tell us anything directl y about a student's performance ability."
And so a quest for aUlhentici ty was launched, as test deSigners centered on
communicative performance. FoUowing Canale and Swain's ( 1980) model of com·
lUunicative competence. Baclunan ( I proposed a model of language compe-
tence consisting of organi zational and pragmat ic competence, respectively
subdivided into grammatical and textual components, and into iIlocutionary and
sociolinguistic components , (Further discussion of both Camle and Swain's and
Bachman's models can be found in PUT, Chapter 9.) Bachman and Palmer (1996.
pp. 700 also emphasized the importance of strategic competence (the ability to
emplo)' communicative strategies to compensate for breakdowns as well as [ 0
enhance the rhetorical effect of utterances) in the process of communication. All
elements of the model , especially pr.lgmatic and strategic abilities, needed to be
included in the constructs of language testing and in the actual performance
required of test·rakers.
Communicative testing presented challenges to test deSigners, as we will see in
subsequent chapters of this book. Test constructors began to identify the kinds of
real·world tasks that language learners were caJled upon to perform. It was dear that
the contextS for those tasks were extraordimlrily widely varied and that the sam-
pling of tasks for anyone assessment procedure needed to be validated by what lan-
guage users actually do wi th language. Weir ( 1990, p. II ) reminded his readers that
measure language proficiency ... account must now be taken of: where, when,
how, wi th wholll , and why language is 10 be used, and on what IOpics,and with what
effect." And the assessment ficJd became more and more concerned with the
:tlIthentidty of tasks and the genuineness of lexts. (See Skehan, 1988. 1989, for a
survey of communicative testing research.)
Performance-Based Assessment
In language courses and progr.tms around the world, test deSigners are now tackling
this new and more s[udent-centercd agenda (Alderson. 2001, 2(02). Instead of just
offering p:aper-and-pencil selective response test's of a plethora of separate items,
perfomlance·ba.sed assessment of language typically involves oral production,
CHAf'TCR 1 Tesfing, A5S1!S5ing. and Teachin8 11
written production, open-ended responses, integrated performance (across skill
areas), group perfonllance, and other interactive tasks. To be sure, such assessment
is ti.me-coosuming and therefore expensive, hut those extra efforts are paying off in
the form of more direct testing because students are assessed as they perform actual
or simulated real-world tasks. In technical terms, higher comcm validity (see
Chapter 2 for an explanation) is achieved btcausc:: learners arc measured in the
process of perfornling t he targeted linguistic acts.
to an English language-teaching context, performance-based assessmem means
that you may have a difficult time diStinguishing between formal and informal
assessment . If you rely a little less on formally structured tests and a little more on
evaluation while students are performing vari ous tasks, you will be taking some
steps toward meeting the goals of performance-based testing. (Sec Chapter 10 for a
funher discussion of performance·based assessment.)
A dl3racteristi<.- of many (but not all) performance-based language assessments
is lhe presence of interactive tasks. In such cases, the assessments involve learners
in acttlall y performing the behavior that we want to me:lSure. In interactive tasks,
test-takers arc measured in the act of speaking, requesting, responding, or in com-
bining li stening and speaking, and in integrating reading and writing. Paper-and-
pencil tests certainly do nOI eli cit such communicative perform'lOce.
A prime examplc of an interactive language assessmclll procedure is an oral
intcn' iew. TIle test-taker is required to listen accurately 10 someone else and to
respond appropriately. If care is taken in the test design process, language elicited
and volunteered by the st'Udent can be personali zed and mC'J.ningful, and tasks can
approadl the authenticity of real-life language use (see Chapter 7).
CURRENT ISSUES IN CIASSRooM TESTING
The design of communicative, performance-based assessment rubrics continues to
challenge both assessment experts and classroom teachers. Such efforts to improve
variolls facets of classroom testing are accompanied by some stimulating issucs, aU
of which are hel ping to shape our current understanding of effecti ve assessment.
t et's look at three such issues: the effect of new theori es of intelligence on the
testing industry; the advent of what has come to be called "alternative assessment;
and the increasing popularity of computer-based testing.
New Views o n Intelligence
intelligence was once viewed strictly as the ability to perform (a) linguistic and (b)
logical·mathematical problem solving. This &IQ" (intelligence quoti ent) concept of
intelligence: bas permeated the Western world and its way of testing for almost a
century. Since MSmarLness· in general is measured by timed, discrete-point teSts con-
Sisting of a hierMChy of separate items, why shouldn' t evCJ")' field of sttldy be so mea-
sured? For many ~ ' e a r s , we have li ved in a world of standardized, norm-referenced
12 cw.rnlf ' Tes/ing. Assessing. and Teaching
tests that are timed in a multiple-choice fonnat consiSting of a multiplicity of logic-
constr.tined items, many of which are inauthentic.
However, research on intelligence by p!t)'chologists like Howard Gardner.
Robert Sternberg, and Daniel Goleman has begun to [urn the psychometric world
upside down. Gardner (1983, 1999), for example, extended the traditional view of
intelligence to seven different components. Z He accepted the u-aditional conceptu-
alizations of linguistic intelligence and logical·mathematical intelligence on which
standardized IQ tests are based, but he induded five other "frames of in his
theory of multiple intelligences:
• spatial intelligence (the ability to fLOd your way around an enVironment, to
form mental images of reality)
• musical intclligence (the ability to perceive and create pitch and rhythmiC
patterns)
bodily-kinesrhetic intelligence (fine motor movement, athletic prowess)
• interperson:1I intelligence (the ability to understand others and how they
feel , and to interact effectively with them)
• intrapersonal intelligence (the ability to lUldersland oneself and to develop a
sense of self-identity)
Raben Sternberg 0988, 1997) also charted new territory in intelligence re-
search in recognizing creative thinking and manipulative su-ategies as pan of intel·
Iigence. All · sman- people aren't necessarily adept at fast, reactive thinking. They
may be vcry innovative in being able to think beyond the normal limits imposed by
existing tests, but they may need a good deal of processing time to enact this cre·
ativity. Other forms of smartness are found in those who know how to manipulate
their environment, namely, other people. Debaters, politicians, successful salesper·
sons, smooth talkers, and con artists are all smart in their manipulative ability to per-
suade others to think their way, vote for them, make a purchase, or do something
they might not otherwise do.
More recently, Daniel Goleman'S (1995) concept (emotional quotient)
bas spurred us to underscore the importance of the emotions in our cognitive pro-
cessing. Those who manage their emotions-especially emotions that can be detri-
mental-tend to be more capable of full y intelligent processing. Anger, grief,
resentment, self-doubt, and other feel ings can easily impair peak performance in
everyday tasks as well as highcr-order problem solving.
These new conceptualizations of intelligence have not been universally
accepted by the academic community (see White, 1998, for example). Nevertheless,
their intuitive appeal infused the decade of the 1990s with a sense of both freedom
and responsibility in our testing agenda. Coupled with parallel educational reforms
at the time (Armsuong, 1994), they helped to free us from relying exclusively on

'I For a summary of Gardner's theory of intelligence, see Brown (2000. pp. 100-102).
O«PTER I Testing, "'S5eS5inS. and Teachin8 13
timed, discrete-point, analytical tests in measuring language. We were prodded to
cautiously combat the potential tyranny of "objectivity" and its accompanying imper-
SOllal approach. But we also assumed the responsibility for tapping into whole lan-
guage skiUs, learning processes. and the abili ty to negotiate meaning. Our dlallenge
was to test interpersonal. creative. communicative, interactive skills, and in doing so
to place some trust in our subjectivity and intuition.
Traditional and "Alternative" Assessment
lmplied in some of the earlier description of performance-based dassroom assess-
ment is a trend to supplement traditional test deSigns willi alternati ves that are more
ambeoric in their elicitation of meaningful communication. Table 1. I highlights dif·
ferences between the two approaches (adapt ed from Armstrong, 1994, and Bailey,
1998, p. 207).
Two caveats need to be stated here. First, the concepts in Table 1.1 represent
some overgeneralizations and should therefore be considered with caution. It is dif-
ficult, in fact, (0 dr.tw a clear line of distincti on between what Armstrong ( 1994) and
BaiJcy (1998) have caU<.'d traditional and alternative asseSSmenl. Many forms of
assessment fall in betwecn the two, and some combine the best of both.
Second, it is obvious that the table shows a bias toward alternativc assessment,
and one should not be misled into thinking that everything on the left-hand side is
tainted while the list on the righr-hand side offers salvation to lhe field of language
assessment! As Brown and Hudson (1998) aptly pointed Ollt . the assessment tradi-
tions av:liIable to us should be valued and utilized for the functions that they pro-
vide. At the same time, we might all be stimulated to look at the right-hand list and
ask ourselves if, among those concepts. there are alternatives to assessment that we
can constructively use in our classrooms.
It should be noted here thal considerably more time and higher institutional
budgelS are required to administer and score assessments Illat presuppose more
Table 1. , . Traditional and alternative assessment
Traditional Assessment
One-shot, standardized exams
Timed, multiple-choi ce formal
Decontext uali zed test items
Scores suffice for feedback
Norm-referenced scores
focus on the "right" answer
Summative
Oriented to product
Non-interactive performance
Fosters extrinsic motivation
Alternative Assessment
Continuous !ong-term assessment
Unti med, free-response format
Contextuali zed communicative tasks
Indivi dualized feedback and washback
Criterion-referenced scores
Open-ended, creati ve answers
Formative
Oriented to process
Interactive performance
Fosters int rinsic motivat ion
r
14 CHAPTffl I Testing. Assessing. and Te1Jching
subjective evaluation, more individualization, and morc interaction in the process of
offering feedback. The: payoff for the latter. however. comes with more useful feed-
back to students, the potcntial Jor intrinsic motivation, and ultimately a more
complete description of a student's abil ity. (See Chapler 10 (or a complete [fCatment
of alternatives in assessment.) More and more educators :lnd adVOC:ttes for educa-
tional reform arc arguing for a de-emphasis on large-scale standardized testS in favor
of building budgets that will offer the kind of comcxtlmli zcd, communicative
performance-based assessment that will bener facitit:ue learning in our schools. On
Chapter 4, issues surrounding standardized testing are addressed at length.)
computer-Based Testing
Recent years have seen a burgeoning of assessmcm in which the lCSt-taker performs
responses on a computer. Some comput er-bascd rests (also known as "compUl er-
assisted" Or tests) are small·scale "home-grown" tests availlible on weI>-
si tes. Others arc standardjzed, largc-scale tests in which thousands Or even tens of
thousands of test-t'akers are invoh'cd. Students receive prompts (or probes, as they
are sometimes referred to) in thc form of spoken or wriHen stimuli from tile com-
pute rized test and are required to type (or in some cases, spcak) their responses.
A1most all computer-based tcst items have fIXed, closed-ended responscs; however,
tests like the Test of English as a Foreign Language (fOEFL offer a written essay
section that must be scored by hUUlans (as opposed to automatic, e lectronic, or
machine scoring). As this book goes to press, the deSigners of the TOEFL are on the
verge of offering a s poken English section.
A specific type of computer-based tcst , a computer -adaptive test, has been
available for many years but has recently gainel1 momentum. In a computer-adaptive
lest (CAn, each test-taker receives a set of questions that meet the test specifica·
tions and that arc gencraJJy appropriate for his or her performance levd. The CAT
starts with questions of moderate difficul ry.As test-takers answer eadl question, the
computer scores the question and uses that information, as well as tllC responses to
previous questions, to determine whidl question will be presented next. As long as
ex.1minees respond correctly, the computer typically selectS questjons of greate r or
equal difficulty. Incor rect :mswers, however, lypically bring questions of lesser or
equal diffi cul ty. Tht! comp uter is programmed to fulfill the tcst design as it continu-
ously adjusts to fi nd quest ions of appropriate difficulty for test-takers at all perfor-
mance levels. In CATs, the test-take.r sees only one question at a time, and the
computer scores each question before selecti ng (he next one. As a result, test-takers
cannot skip questions, and once tllCY have entered and confirmed thei r answers,
they cannot rerum to questions or to any earJjer part of the test.
Computer-based l"esting, with or witlIom CAT tedlllology, offt!tS these advantages:
cI:lSsroom·bascd testing
self.·directed testi ng on V".!.rious aspects of a language (vocabulary, grammar,
discourse, onc or all of the four skills, etc.)
cm.I'TlIl I Tesfing, Asse55ing. andTeilching 15
• pr.tcticc for upcoming high-stakcs standardized tests
• some individualization, in the case of CATs
• large-scale standardized tests thai can be administered casil)' to thousands of
tcst-takers at many different stations, then scored electronically for rapid
reponing of results
Of course, some disad\rantagcs are presem in our currenl predile(:lion for com-
puterizing testing_ Among them:
• Lack of sec urity and the possibili ty of cheating are inhe rent in c1assroom-
based, unsupervised compute rized tests.
Occasional Quizzes that appear on unofficial websites may be
mistaken for validated assessments.
• The multiple-choice format preferred for most computer-based tests comains
the usual potential for flawed item design (see Chapter 3).
• Open-ended responses are less likel y to appear because of lhe need for
human scorers, with all the attendant issues of COSt, reliability, lind turn-
around time.
• '111c human interactive clement (especially in oral production) is absent _
More is said about computer-based testing in subsequent chaplers, especially
Chapter 4, in a discussion of large·scale standardized testing. In addition, the fol-
lowing websi[t"s provide further information and examples of computer·based tests:
Educational Testing Service
Test of English as 3. Foreign Language
Tcst of English for International Communication
lntem:nionaJ English Language Testing System
Dave'S ESL Cafe (computerized quizzes)
www.ets.org
www.tocfl.org
www.todc.com
www.ie1ts.org
www.eslcafe.com
Some argue that computer-based testing, pushed to its ul timate level, might mil-
igate agai nst recent cfforts to rerurn tcsting (0 its artful form of being tailored by
teachers for their dassrooms, of being designed [Q be performance-based , and of
allOWing a teache r- student dialogue to form the basis of assessment. This need not
be t he Cllse. Complllcr tcchnolo!.')' a m bc a boon to communicati ve language
testi ng. Tead1ers and test-makers of the fut ure wilJ have access to an ever-increasing
rdnge of tools 10 safeguard against impcrsonaJ, stamped-om formulas for aSsessment.
By using tedlOoJogical innovations creatively, testers will be able to enhance authen·
ticity, 10 increase interactive c.'Xchange, and to promote autonomy.
I I I I I
As rou TCad this book, I hope you \"\-' ilI do so with an appreciat ion for the place
of testing in assessment, and with a sense of the interconnection of assessment and
16 CHAmll I Testing. Assessing. C1nd TeClching
teaching. Assessment is an integral part of the teaching-learning cyde. In an inter-
active, communicative c urriculum, assessment is almost constant. Tests, which are a
subset of assessment, can provide authenticity, motiV"dtion, and feedback to the
learner. Tests are essential components of a successful curriculum and one of sev-
eral partners in the learning process. Keep in mind these basic principles:
1. I ~ r i o d i c assessments, both formal and informal, can increase motivation by
serving as milestones of srudent progress.
2. Appropriate assessments aid in [he reinforcement and retention of informa-
lion.
3. Assessments can confirm areas of strength and pinpoint areas needing further
work.
4. Assessments can provide a sense of periodiC closure to modules within a cur-
ric ulum.
5. Assessments can promote student autonomy by encouraging students' self-
evruu3t ion of their progress.
6. Assessments can spur learners to set goals for themselves,
7. Assessments can aid in evaluating teadling effect iveness.
Answers to the vocabul ary quiz on pages 1 and 2: l c, 2a, 3d, 4b, Sa, 6c.
EXERCISES
[Note: (I) lndividual work; ( G) Group or pair work; (C) Whole-class discussion.)
1. (G) In a smaU group, look at Figure 1.1 on page 5 that shows tests as a subset
of assessment and the laner as a subset of teaching. Do yOll agree with this
diagrammatic depiction of the three terms? Consider the following classroom
tcaching techniques: choral drill, pair pronunciation practice, reading aloud,
infomlation gap task, Singi ng songs in English, wri ting a description of ule
weekend's activities. What proportion of ellch has an assessment facet to il?
Share your conclusions with the resl of t he class.
2. (G) TIle chart below shows a hypothetical tine of distinction between fonna-
tivc and summati ve assessment. and betwecn informal and forma! assessment.
As a group, place the foUowing techniques/procedures into one of the four
ceUs and justify your decision. Share your results with othe r groups and dis-
CllSS any differences of opinion.
Placement tests
Diagnostic testS
Periodic achievemem tests
Short pop qujzzes
-
Standanlized proficiency lestS
Final exams
Portfolios
CWol'ml' Testing, Mres5ing, and Teiiching 17
Journals
Speeches (prepared and rehearsed)
Oral presentatiOns (prepared, but not rehearsed)
Impromptu student responses to teacher's questions
Student-written response (one paragrnph) to a reading assignment
Drafting and revising writing
Final essays (after several drafts)
Student oral responses to tcacher questions after a videotaped lecture
Whole class open-ended discussion of a topi c
formative Summative
Informal
Formal
3. (lie) Review the distinction between norm·referenced and cril'crion-
referenced testing. If norm-referenced tests typically yield a distribution of
scores that resemble a beU-shaped curve, what kinds of distributions are
typical of classroom acruevement tests in your experience?
4. (lie) Restate i.n your own words lhe argumcill betwecn unilary trai t propo-
nems and discrete-point testing advocates. Why did OUer back down from the
unitary trait hypothesis?
5. eve) Why are doze and dictation considered to be integra live tests?
6. (G) Look at the lisl of Gardner's seven intelligences. Take one or two intelli-
gences, as assigned to your group, and brainstorm some teaching activities
that foster thai type of intelligence. Then, bminstorm some assessment tasks
18 CHIoPTrR 1 Testing. Assessing, and Teaching
that may presuppose the same intelligence in order to perform well . Share
your resulls with other groupS.
7. (C) As a whole-c.lass discussion, brainstorm a variery of test tasks that class
members have e.xperienced in learning a foreign language. Then decide
whi<:h of those tasks are performance-based, which are nOt, and which ones
faU in between.
8. (G) Table 1. 1 lists traditional and alternative assessment tasks and characteris-
tics. ln pairs, quickly review the advantages and disad\'antages of cad}, on
both sides of the dun. Share your conclusions with the rest of the class.
9. (C) Ask class members to share any experi ences with computer-based testing
and evaluate the adv-dntages and disadvantages of those experiences.
FOR YOUR FURTHER READING
McNamara, Tim. (2000). Latlgllage testing. Oxford: Oxford University Press.
One of a number of Oxford University Press's brief introductions to various
a r c ~ l s of language study, this 140-page primer on testing offers definitions
of basic terms in language testing with brief explanations of fundamental
concepts. It i.s a useful little reference book to check your understanding of
testing jargon and issues in the field.
Mousavi, Seyyed Abbas. (2002). An e'lcyclopedlc dictlo1lary of language testing.
111ird Edition. Taipei : Tung Bua Book Company.
111i5 publicalion may be difficult to find in local lx>okslores, but it is a
highly useful compilation of Virtually every term in the field of language
testi ng, with definitions, background history. and research references. It
provides comprehensive explanations of theories, principles, issues, tools,
and tasks. Its exhaustive BS-page bibliography is also downloadable at
http://www.abbas· mousavi.com. A shorter version of this 942-page lome
may be fOUlld in the previous version, Mousavi's (1999) Dictiollary of lan-
gllage les/ing (Tehran: Rahnama Publications)

CONTENTS

Preface Text Credits 1 Testing, Assessing, and Teaching
What Is a Test?, 3 Assessment and Teaching, 4 Informal and Formal Assessment, 5 Formative and Summative Assessment, 6 Norm-Referenced and Criterion-Referenced Tests, 7 Approaches to Language Testing: A Brief History, 7 Discrete-Point and Integrative Testing, 8 Communicative Language Testing, 10 Performance-Based Assessment, 10 Current Issues in Classroom Testing, 11 New Views on Intelligence, 11 Traditional and "Alternative" Assessment, 13 Computer-Based Testing, 14
Exercises, 16 For Your Further Reading, 18

ix

xii
1

2

Principles of Language Assessment

19

Practicality, 19 Reliability, 20 Student-Related Reliability, 21 Rater Reliability, 21 Test Administration Reliability, 21 Test Reliability, 22 Validity, 22 Content-Related Evidence, 22 Criterion-Related Evidence, 24 Construct-Related Evidence, 25 Consequential Validity, 26

iv

CONTENTS

Face Validity, 26 Authenticity, 28 Washback, 28 Applying Principles to the Evaluation of Classroom Tests, 30 1. Are the test procedures practical? 31 2. Is the test reliable? 31 3. Does the procedure demonstrate content validity? 32 4. Is the procedure face valid and "biased for best"? 33 5. Are the test tasks as authentic as possible? 35 6. Does the test offer beneficial washback to the learner? 37 Exercises, 38 For Your Further Reading, 41

3 Designing Classroom Language Tests

42

Test Types, 43 Language Aptitude Tests, 43 Proficiency Tests, 44 Placement Tests, 45 Diagnostic Tests, 46 Achievement Tests, 47 Some Practical Steps to Test Construction, 48 Assessing Clear, Unambiguous Objectives, 49 Drawing Up Test Specifications, 50 Devising Test Tasks, 52 Designing Multiple-Choice Test Items, 55 1. Design each item to measure a specific objective, 56 2. State both stem and options as simply and directly as pOSSible, 57 3. Make certain that the intended answer is clearly the only correct one, 58 4. Use item indices to accept, discard, or revise items, 58 Scoring, Grading, and Giving Feedback, 61 Scoring, 61 Grading, 62 Giving Feedback, 62 Exercises, 64 For Your Further Reading, 65

4 Standardized Testing
What Is Standardization?, 67 Advantages and Disadvantages of Standardized Tests, 68 Developing a Standardized Test, 69 1. Determine the purpose and objectives of the test, 70 2. Design test specifications, 70 3. Design, select, and arrange test tasks/items, 74 4. Make appropriate evaluations of different kinds of items, 78

66

CONTENTS

V

5. Specify scoring procedures and reporting formats, 79 6 . Perform ongoing construct validation studies, 81 Standardized Language Proficiency Testing, 82 Four Standardized Language Proficiency Tests, 83 Test of English as a Foreign Language (TOEFL@),84 Michigan English Language Assessment Battery (MEIAB), 85 International English Language Testing System (lELTS), 85 Test of English for International Communication (TOEIC@), 86 Exercises, 87 For Your Further Reading, 87 Appendix to Chapter 4: Commercial Proficiency Tests: Sample Items and Tasks, 88 Test of English as a Foreign Language (TOEFL ~, 88 Michigan English Language Assessment Battery (MEIAB), 93 International English Language Testing System (lELTS), 96 Test of English for International Communication (TOEIC~, 100

5

Standards-Based Assessment

104

ELD Standards, 105 ELD Assessment, 106 CASAS and SCANS, 108 Teacher Standards, 109 The Consequences of Standards-Based and Standardized Testing, 110 Test Bias, 111 Test-Driven Learning and Teaching, 112 Ethical Issues: Critical Language Testing, 113 Exercises, 115 For Your Further Reading, 115

6

Assessing Listening

116

Observing the Performance of the Four Skills, 117 The Importance of Listening, 119 Basic Types of Listening, 119 Micro- and Macroskills of Listening, 121 Designing Assessment Tasks: Intensive Listening, 122 Recognizing Phonological and Morphological Elements, 123 Paraphrase Recognition, 124 Designing Assessment Tasks: Responsive Listening, 125 Designing Assessment Tasks: Selective Listening, 125 Listening Cloze, 125 Information Transfer, 127 Sentence Repetition, 130 Designing Assessment Tasks: Extensive Listening, 130

vi

CONTENTS

Dictation, 131 Communicative Stimulus-Response Tasks, 132 Authentic Listening Tasks, 135 Exercises, 138 For Your Further Reading, 139

7

Assessing Speaking

140

Basic Types of Speaking, 141 Micro- and Macroskills of Speaking, 142 Designing Assessment Tasks: Imitative Speaking, 144 PhonePass® Test, 145 Designing Assessment Tasks: Intensive Speaking, 147 Directed Response Tasks, 147 Read-Aloud Tasks, 147 Sentence!Dialogue Completion Tasks and Oral Questionnaires, 149 Picture-Cued Tasks, 151 Translation (of Limited Stretches of Discourse), 159 Designing Assessment Tasks: Responsive Speaking, 159 Question and Answer, 159 Giving Instructions and Directions, 161 Paraphrasing, 161 Test of Spoken English (TSE®), 162 Designing Assessment Tasks: Interactive Speaking, 167 Interview, 167 Role Play, 174 Discussions and Conversations, 175 Games, 175 Oral Proficiency Interview (OPD, 176 Designing Assessment: Extensive Speaking, 179 Oral Presentations, 179 Picture-Cued Story-Telling, 180 Retelling a Story, News Event, 182 Translation (of Extended Prose), 182 Exercises, 183 For Your Further Reading, 184

8

Assessing Reading

185

Types (Genres) of Reading, 186 Microskills, Macroskills, and Strategies for Reading, 187 Types of Reading, 189 Designing Assessment Tasks: Perceptive Reading, 190 Reading Aloud, 190 Written Response, 191

CONTENTS

VII

Multiple-Choice, 191 Picture-Cued Items, 191 Designing Assessment Tasks: Selective Reading, 194 Multiple-Choice (for Form-Focused Criteria), 194 Matching Tasks, 197 Editing Tasks, 198 Picture-Cued Tasks, 199 Gap-Filling Tasks, 200 Designing Assessment Tasks: Interactive Reading, 201 Cloze Tasks, 201 Impromptu Reading Plus Comprehension Questions, 204 Short-Answer Tasks, 206 Editing (Longer Texts), 207 Scanning, 209 Ordering Tasks, 209 Information Transfer: Reading Charts, Maps, Graphs, Diagrams, 210 Designing Assessment Tasks: Extensive Reading, 212 Skimming Tasks, 213 Summarizing and Responding, 213 Note-Taking and Outlining, 215
Exercises, 216 For Your Further Reading, 217

9 Assessing Writing
Genres of Written Language, 219 Types of Writing Performance, 220 Micro- and Macroskills of Writing, 220 Designing Assessment Tasks: Imitative Writing, 221 Tasks in [Hand] Writing Letters,Words, and Punctuation, 221 Spelling Tasks and Detecting Phoneme-Grapheme Correspondences, 223 Designing Assessment Tasks: Intensive (Controlled) Writing, 225 Dictation and Dicto-Comp, 225 Grammatical Transformation Tasks, 226 Picture-Cued Tasks, 226 Vocabulary Assessment Tasks, 229 Ordering Tasks, 230 Short-Answer and Sentence Completion Tasks, 230 Issues in Assessing Responsive and Extensive Writing, 231 Designing Assessment Tasks: Responsive and Extensive Writing, 233 Paraphrasing, 234 Guided Question and Answer, 234 Paragraph Construction Tasks, 235 Strategic Options, 236 Test of Written English (TWE@), 237

218

270 Types of Self.and Peer-Assessment Tasks. 302 Bibliography Name Index Subject Index 303 313 315 . 266 Self. 247 Exercises. 252 Performance-Based Assessment. 260 Conferences and Interviews. 291 Cross-Cultural Factors and the Question of Difficulty. 277 Exercises. 242 Primary Trait Scoring.viii CONTENTS Scoring Methods for Responsive and Extensive Writing. 294 Some Principles and Guidelines for Grading and Evaluation. 276 A Taxonomy of Self. 249 For Your Further Reading. 279 For Your Further Reading. 264 Observations. 242 Analytic Scoring. 299 Exercises. 243 Beyond Scoring: Responding to Extensive Writing. 293 Alternatives to Letter Grading. 271 Guidelines for Self. 247 Assessing Later Stages of the Process of Composing. 246 Assessing Initial Stages of the Process of Composing. 241 Holistic Scoring. 250 10 Beyond Tests: Alternatives in Assessment 251 The Dilemma of Maximizing Both Practicality and Washback. 284 Calculating Grades: Absolute and Relative Grading. 254 Portfolios.and Peer-Assessment.and Peer-Assessments. 256 Journals. 300 For Your Further Reading. 285 Teachers' Perceptions of Appropriate Grade Distributions. 280 11 Grading and Student Evaluation 281 Philosophy of Grading: What Should Grades Reflect? 282 Guidelines for Selecting Grading Criteria.and Peer-Assessment. 289 Institutional Expectations and Constraints. 292 What Do Letter Grades "Mean"?.

and specialists can be overwhelming to the novice language teacher. do our classroom tests measure up to standards of authenticity and meaningfulness? How can a teacher design tests that serve as motivating learning experiences rather than anxiety-provoking threats? All these and many more questions now being addressed by teachers. and dissertations now constituting our stockpile of knowledge. journals. with exponentially increasing numbers of books. Principles of Language Learning and Teaching (Fourth Edition. Readers do not have to become testing experts to understand and apply the concepts in this book. As the third in a trilogy of teacher education textbooks.PREFACE The field of second language acquisition and pedagogy has enjoyed a half century of academic prosperity. In this melange of topics and issues. who is already baffled by linguistic and psychological paradigms and by a multitude of methodological options. Surveys of even a subdiscipline within this growing field now require hundreds of bibliographic entries to document the state of the art. PURPOSE AND AUDIENCE This book is designed to offer a comprehensive survey of essential principles and tools for second language assessment. It doesn't dodge complex questions. What is the best way to assess learners' ability? What are the most practical assessment instruments available? Are current standardized tests of language proficiency accurate and reliable? In an era of communicative language teaching. nor do they have to become statisticians adept in manipulating mathematical equations and advanced calculus. researchers. ix . reader-friendly presentation of the essential foundation stones of language assessment. It has been used in pilot forms for teachertraining courses in teacher certification and in Master of Arts in TESOL programs. It is a book that simplifies the issues without oversimplifying. with ample practical examples to illustrate their application in language classrooms. articles. and it treats them in ways that classroom teachers can comprehend. This book provides the teacher trainee with a clear. assessment remains an area of intense fascination. it is designed to follow my other two books.

By the end of Language Assessment: Principles and Classroom Practices. with their heavy emphasis on jargon and mathematical equations. this one features uncomplicated prose and a systematic.X PREFACE Pearson Education. classification of assessment techniques that range from controlled to open-ended item types on a specified continuum of micro. Some testing manuals. and self. and what its pros and cons are • consideration of the ethics of testing in an educational and commercial world driven by tests • a comprehensive presentation of alternatives in assessment. The testing discipline sometimes possesses an aura of sanctity that can cause teachers to feel inadequate as they approach the task of mastering principles and designing effective instruments. Supportive research is acknowledged and succinctly explained without burdening the reader with ponderous debate over minutiae. They will have a working knowledge of a number of useful fundamental principles of assessment and will have applied those principles to practical classroom contexts. namely. Pearson Education.and peerassessment . and utility • a look at testing language proficiency. design. comprehensible tools for evaluating and designing practical. References to those two books are sprinkled throughout the current book.and macroskills of language • thorough discussion of large-scale standardized tests: their purpose. why it is so popular. 2000) and Teaching by Principles (Second Edition. portfolios. conferences. interviews. writing) • in each skill. don't help to dissipate that mystique. reading. journals. observations. 2001). validity. Concepts are introduced with a maximum of practical exemplification and a minimum of weighty definition. readers will have gained access to this not-so-frightening field. effective assessment techniques for their classrooms. or "ability" • explanation of what standards-based assessment is. spiraling organization. PRINCIPAL FEATURES Notable features of this book include the following: • clearly framed fundamental principles for evaluating and designing assessment procedures of all kinds • focus on the most common pedagogical challenge: classroom-based assessment • many practical examples to illustrate principles and guidelines • concise but comprehensive treatment of assessing all four skills (listening. They will have acquired a storehouse of useful. In keeping with the tone set in the previous two books. speaking.

and Nadya McCann. and group work for the teacher education classroom • a few suggested additional readings at the end of each chapter WORDS OF THANKS Language Assessment: Principles and Classroom Practices is the product of many years of teaching language testing and assessment in my own classrooms. California September 2003 . for tracking down research studies and practical examples of tests. Peru. I offer an appreciative thank you to my friend Maryruth Farnsworth. Nicole Frantz. Egypt. Amy Shipley. pair. H. the Dominican Republic. especially Kathy Sherak. for these gifts of knowledge.PREFACE XI • systematic discussion of letter grading and overall evaluation of student performance in a course • end-of-chapter exercises that suggest whole-class discussion and individual. My students have collectively taught me more than I have taught them. which prompts me to thank them all. and for preparing artwork for some of the figures in this book. Japan. Turkey. who carried the ball administratively while I completed the bulk of writing on this project. who read the manuscript with an editor's eye and artfully pointed out some idiosyncrasies in my writing. Douglas Brown San Francisco. everywhere. As always. I am also grateful to my graduate assistant. I have memorable impressions of such sessions in Brazil. Thailand. I am further indebted to teachers in many countries around the world where I have offered occasional workshops and seminars on language assessment. where cross-cultural issues in assessment have been especially stimulating. My gratitude extends to my staff at the American Language Institute at San Francisco State University. the embracing support of faculty and graduate students at San Francisco State University is a constant source of stimulation and affirmation. And thanks to my colleague Pat Porter for reading and commenting on an earlier draft of this book. and Yugoslavia.

(1984). for material from Brown. xii . Longman Introductory Course for the TOEFL® Test. Test of Spoken English (TSE®). University of Cambridge Local Examinations Syndicate (UCLES). California Department of Education. for material from ACTFL Proficiency Guidelines: Speaking (1986). and Deborah Phillips. for material from California English Language Development (ELD) Standards: Listening and Speaking. White Plains. Inc. A categorical instrument for scoring second language writing skills. 21-42. American Council on Teaching Foreign Languages (ACTFL). for material from Phillips. (2001). Chauncey Group International (a subsidiary of ETS). Eric Phillips. Ordinate Corporation. for unpublished material. Pearson!Longman ESL. for material from Modern Language Aptitude Test. Deborah. for material from Test of English as a Foreign Language (TOEFL®). (SLm. for material from PhonePass®. for material from Michigan English Language Assessment Battery (MELAB). Educational Testing Service (ETS). for material from International English Language Testing System. Oral ProfiCiency Inventory (OPI): Summary Highlights. Yasuhiro Imao. Roshan Khan.TEXT CREDITS Grateful acknowledgment is made to the following publishers and authors for permission to reprint copyrighted material. James Dean & Bailey. for material from Test of English for International Communication (TOEIC®). Second Language Testing. Test of Written English (TWE®). University of Michigan. Language Learning. NY: Pearson Education. Blackwell Publishers. and Sheila Viotti. English Language Institute. 34. Kathleen M.

. take the quiz and circle the correct definition for each word. Courses of study in every diSCipline are marked by periodic (csts-milcstom::s of progress (or inadequacy)-and you intensely wish for a mil'lcuious exemption from these ordeals. t you w ill come out o f it llUve. a combination of two or more plastics to produce a highly durable material c. ASSESSING.CHAPTER 1 TESTING. one of a group of electrical Impulses capable of passing through metals 1 . polygene a. one of a set of cooperating genes. You have 3 minutes to complete this examination! 1. Circle the correct answer. a center of interest or attention b. right? Okay. the first stratum of lower-order protozoa containing multiple genes b. Tests seem as unavoidable as tOmorrow's sunrise in virrually every kjnd of educational setting.lllt. an object that serves as a focal point of attention and admiration. plcas. your thoughts arc nOt likely to be positive. a narrow opening caused by a break or fault in limestone caves c. We live by tests and sometimes (mcl'aphoricall y) die b y lhem . The anticipation of a leSt is almost always accompanied by feelings of anxiety and self-doubt-. All tJlt: words are found in standard English dictionaries. each producing a small quantitative effect d. the cleavage in rock caused by glacial activity d. For a quick revisiting of how tests affect many learners. take the following vocabulary quiz. cynosure a. AND TEACHING If you hear the word test in any classroom sening. or :lffirming. SO rOll should be able to answer aU six items correctly.dong with II ferven t hope th. any of a number of multicellular chromosomes 2.

timed. a musical composition of regular beat and harmonic intonation c. flat molding b. a jail for commoners during the Middle Ages.2 CliAPTfR 1 Testing. You can check you. a short. R tricky· tests. located in the villages of Germany and France b. a mythological monster having the wings.1n they bring OUi tbe best in stude nts? The answer is a resounding yes! Tests need not be degrading. ancient Egyptian cuneiform writing commonly found on the walls of tombs d. a small Eurasian freshwater fish 4. gudgeon a. an Australian bird of the eagle family d. If yOll correctly idcntifi ed three or more items.llt. formed of a moldable substance such as clay or earth d. Hctile Now. hippogriff a.. a short sleeve found on women's dresses in Victorian England a. yo u might have to wait weeks befo re learning your resul tS . if this were a commercially adminis· tered standardi zed rest . And that's partly what this book is a11 about: helping YOll to create more authe ntic. anxiety·provoking experiences. It's simply an illustration of how tests make us [eel m uch of the timc. how did that make you feel? Probably just the same as many learners feel w he n they take many multiple-cho ice (or shall we say multiple·guess?). N5eSsing. can tests be positive experiences? am they build a person's confidence and become learning experiences? C. a strip of metal used to reinforce beams and girders in building construction c. this little pop q uiz o n obscure vocabulary is not :m appropriate example of classroom·based achievement tcsting. a tool used by Alaskan Indians to carve totem poles d. answers o n this quiz now by furning to page 16. intrinsicallr .1c::hing 3. an Old English word for the leading character of a fictional novel c. moldable plastic. To add to the torm e. reglet a. nor is it intended to be. and head of a griffin and the body of a horse c. artifiCial. Of course. oblong-shaped projectile used in early eighteenth-century cannons b. congrat ulations! YOli jllst exceeded the average. claws. pertaining to the tendency of certain lower mammals to lose visual depth perception with increasing age 6. a narrow. a skin transplant from the leg or foot to the hip 5. a term used in children's literature to denote colorful and descriptive phraseology b. and Te.

particuJarly large-scaJe standardized tests. but from those samples the test administrator infers generdl competence. read. not Language Testing.!! langlL1ge:dcftning a vocabulary item. Second. may consist of several short reading passages each followed by a limited number of comprehension questions-a small sample of a second language learner's total reading behavior. a percenlile rank. or performance. But from the results of that test. assessment. and Te.. it is not uncommon to find teStS designed to tap into a test-taker's knowledge i!h2!. TIle W"dY the fCSuJtS or measurements are commUll. or. Before we look at tests and (CSt design in second language education. assessing.!w based on a question script and a dlecklist of expected n:sponses to be filled in by the administrator.!ching 3 motivating assessment procedures that are appropriate for their context and designed to offcr constnlctive feedback to your students. provide a tOtal numerical score. Ncxt. A test of reading comprehension. or listen to a subset of language. procedures. the method must be explicit and stnlCtured: multiple-choice questions with prescribed correct answers. competence . or ilemsth:lt reqUires performance on the part of the test-taker. write. that is. What is their previous experiencc and background? Is the test appropriately matdled to their abilities? How should testtakers interpret lheir scores? A leSt measures performance. and an even more important relationShip among testing.CHM'TfR I Testing.a test measures an individual's ability. may earn the test-take r a letter grade accompanied by the instructor'S marginal comments. while olhers . Assessing. focus on very specific competencies or objectives. is a melhod of measuring a person's abillt. knowledge. for example. Some tests. to speak.l knowledge. Most language tests measufC o ne 's ability to perform language. a quiz on recognizing correct use of definite articles measures specific knowledge. A test is first a method. but the results imply the test·raker's ability.icated may vary. If an instrument does not specify a form of reporting measurement-a means for offering the test-taker some kind of result-then that technique cannot appropriately be defmed as a test. a test must measure. WHAT IS A TEST? A test.10 qualify as a test.. reciting a granunatical rule. It is an instrument-a set of techniques. SUdl as a classroom-based shon-answer essay test. Let's look at the components of this definition . or performance in a given domain. and tcaching. Others. we need to understand three basic interrelated concepts: testing. A muJti-skili proficiency test dctermines a gene!"AI ability level. . the examiner may infer a certain level of general reading ability. Notice that the title of this book is Langtwge Assessmenl. in simple terms. Performance-based tests sample the test-taker's actual use of language. and perhaps some subscores. Thcre are impon'a m differences between these tWO constructs. and teaching. or identifying a rhetorical feature in written discourse. a writing prompt with a searing mbric. Testers need to understand w ho the test-takers are. Some tests measure gener 1 ability. On the other hand. an oral intervic. to use a concept common in the field of linguistics.

:tice their skiU with no s implications for their final placement on th:11 day of days. Assessment.pIe. arc a subset of assessment.stlldent'i in the classroom must have the freedom to experiment. then. so also must learners have 3ID. Assessing.does all teaching involve assessment? Are teachers con· stantly assesSing students w ith no interaction that is assessment-free ? 'nlC . Reading and listening activities lIsuaUy require some SOrt of productive s perfo rmance that the teacher implicitly judges. One of the biggest obstacles to overcome in conslructing adequate tests is to measure the desired criterion and nOt include other factors inadvertcmiy. they are certainly not the only form of a'i$Cssment that a teacher can make. you might be thinking. But now. Written work-from a joued-down phrase to a formal essay-is performance that u1timatt:1)' is assessed by self. to tryout their own hypotheses about language without feeling that their ove. even though the actual perfonnance on me test involves only a sampling of skills. or tries Out a new word or SlruCfure. however peripheral that judgment may be. ASSESSMENT AND TEACHING Assessment is :1 popular :lnd sometimes misunde rstood Icrm in current educational practice. Tests can be useful devices.Ul~'wer depends on your perspective. Olher tests may have more specific criteria. Tests.4 awrrC I Tf!S. 11lt:: definition sounds fdirly simple. w hether those assessmcnts are incidental or intend cd. if you make assessmen ts every time you teach something in 1. tudents. A well-constructed test is an instnlment that provides an :ICC urate measure of the test-laker's ability within a particular domain. A test of pronunciation might well be a tCSt of only a limited set of phonemic minimal pairs. is an ongoing process that encompasses a much wider domain . offers a comment. teacher. A good teacher never ceases to assess students. constructing a good test is a complex task involving both science and art. In the case of a proficiency (CSt .l learning to take place. A vocabulary lesl may focus on on ly the set of words covered in a particular lesson or unit. You might be tempted to think of testing and assessing as synonymous terms.n the same way that tournament tennis pbyers must. bur in fdCt . .lIId Teaching R Finally. o n the other hand. an issue that is addressed in Chapters 2 and 3. opportunities to "play" with language in a classroom Without being formally . a 1. Whenever a student responds to a questiOn. but they are nOL Tests are prepared administr:nivc proccdu. and possibly o ther . I.raU competence is being judged in temlS of those trials and errors. have the freedom to pra<. before a tournament. but they are only one among many procedures and t:lsks that teachers can ultimately uSt: to asscss students. the teacher subconsciously makes an assessmem of the student's performance.eSI measures a given domain. knowing that their responses arc being measured and evaluated. that domain is overnU proficiency in a language-general competence in all skills of a language.fCS that occur at idenrifiable times in a curriculum when learners muster aJl their fuculties to offer peak perfornlance. For optim:.11C dassroom .ing.

Informal assessment does not SLOp there. assessment and teaching At the same time. take risks. not you break the glass. starting with incidemal . teachers (and tennis coaches) are indeed observing snldents' performance and making varions evaluatio ns of cadI learner: How did the performance compare to previolls performance? Which aspects of the performance were better than othe rs? Is tlle learner pedorming up to an c. all these obscrv.lIions feed intO tlle way the teacher provides instruction to each student. think. Teaching sets up the practict: games of language learning: the opportunities fo r learners to listen." o r putting a @ on some homework. 1.PTER 1 T esling. set goals. Informal assessment can (:Ike a number of forms. advice about how to bener pronounce a word.cw.) E:v ASSESSMENT TEACHING F igure 1. Assessing. E.'Cpccred potential? How does the performance compare to that of others in the same learning community? In the ideal dassroom . A good deal of a reacher's informal assessment is embedded in dassroom tasks designed to elicit performance without recording results and making fLxed judgments about a student 's competence. (A diagram of the rel:uionsh ip amo ng testing. unplanned comments and responses. along with coaching and othe r impromptu feedback to the student..xamples at this end of the continuum are marginal comments on papers. and assessment is found in Figure 1. Informal and Formal Assessment One way to begin untangling the lexical conundnml c reated by distinguishing among tests. teaching. and teaching is to distinguish between informal and formal assessment. a . during these practice activities.and then recyde through the skills that they are trying to m:lSter. and process feedback from the "coach. Examples include saying ~ N i ce jobl ~ kGood wo rk! ~k Did you say can o r can 't?" 4l think you meant to say you broke the glass. Tests. 1. and T eaching 5 graded. responding to a drAft of all essay. assessment.

They are systematic.e to remember thar stuff anymore! " A challenge to you as a teacher is to change thai attitude among your students: Can you instill a more formative quality to what .6 C~f1TE/l ' T esting.mel T eaching suggestion for a strategy for compensating for a reading difficulty. On the other hand. or call attention to an error. virtually all kinds of informal assessment are (or should be) formative. So wben you give a student a comment or a suggestion. you might use a student's journal or portfoliO of materi31s as a formal assessment of the allainment of certain course objectives.. what a student has grasped . Now I don 't ha . A summat ion of what a student has learned implies looking back :lOd taking stock of bow well that studem has accomplished objectives. You may have tbought.1clical purposes. To extend the tennis analogy. One of the problems with prevailing attitudes toward testing is the view that all tests (quizzes. formal assessments are tbe tournament games that occur periodically in the course of a regimen of practice... but it too is hardly what anyone would call a test. Is fonnal assessment the same as a test? We can say that aU tests arc form31 assessments. Most of our dassroom assessment is formative assessment: evaluating students in the process of ~ fonning · tbeir competencies and skills with the goal of helping them to continue that growth process. Assessing.M A syste matic set of observations of a student's frequen<:y of oral participation in class is certainly a formal assessment. but not all fonnal assessment is testing. formal assessments arc exercises or procedures specifically designed to tap into a storehouse of skills and knowledge. etc. At variOlIS points in your past educational c. Final exams in a course and general proficiency exams arc examplcs of summative assessment. Formative and Summative Assessment Another useful distinction to bear in mind is the func tion of an assessment: How is the procedure to be used? 1\vo functions are commonly identified in the literature: formative and summative assessment.xperiences. planned sampling tedutiqucs constructed to give teacher and student an appraisal of studem achievement.) arc summative . The key (0 such formation is lhe delivery (by the teacher) and intermLtizaUon (by the student) of appropriatc feedback on performance. w ith an eye toward the future can· tinuation (or formation) of learning.tive. Summative assessment aims to measure. and showing how to modify a student's note-taking to bener remember the coment of a lecture. Tests arc usually rel:ltively tinH!-constrained (usually spanning a class period or at mOSt several hours) and draw on a limiled sample of behavior. midterm exams. lind typically occurs at the end of a course or unit of instruction. For example. but does not necessarily point the way to future progress. For 311 pr. but it is problematic to call those two procedures "tests. that feedback is offered in order to improve tbe leamer's language ability. no doubt you've considered such tests as summ:J. or summarize. periodic review tests. Tbey have as their primary focus tbe ongoing development of the learner's language. ."Whew! I'm glad that's over.

particularly classroom tests. A brief history of language testing over the past halfcentury w ill serve as a backdrop to an understanding of classroom-based t'e sting. A55eS5iog. appropriate feedback to studems. Norm·Referenced and Criterion-Referenced Tests Anotber dichotomy that is important to clarify here and that aids in sorting out common terminology in assessment is tbe distinction between norm·referenced and criterion-referenced testing. and connected to a c urriculum. HistOrically. Scores are usually reponed back to the test-taker in the form of a numerical score (for example. arc designed to give test-takers feedback . SUdl tests mUSt bave fixed . and with its emphasis on classroom-based assessment (as opposed to standardized.o such tests is to place tcst·takers along a mathematical continuum in rank order. 52) called ~ instruc­ lional va ille . In Ltmguage Assessme1lf. c riterion-referenced testing is of more prominent interest than norm-referenced testing. we now rum to one of the primary concerns of this book: the creation and use o f tests. each tcst-taker's score is interpreted in relation to a mean (average score). APPROACHES TO LANGUAGE TESTING: A BRIEF HISTORY Now that yOll have a reasonably clear grasp of some common assessment terms. and T eachin8 7 yOllr students might otherwise view as a summalivc test? Can you offer yOllr Students ao opportunity to convert testS into "learning experiences· ? We will take lip that dl3. language-testing trends and practices have followed the shifting sands of reaching methodology (for a description of these trends. Typical of n orm-referenced tests are standardized testS like the Scholastic Aptitude Test (SAl 4J or the Test of English as a Foreign Language (TOEFL"). standard deviation (extent of variance in scores). imended to be administered to large audiences. usuaUy in the fonn of grades. see Brown. with an ludience of classroom language teachers and teachers in training.lIenge in subsequent chapters in this book. large-scale tcsting). 230 o ul of 300) and a percentile rank (such :IS 84 p ercent . but lower than 16 percent in that administration). p. predetermined responses in a format that can be scored quickJy at minimum expense. In norm-referenced tests. with resuils efficiently disseminated to tesHakers. and/or percentile rank. on specific course or lesson objectives_ Classroom tests involving lhe srudents in only one class. .F 01APJ'ER 1 Testing. are typical of criterion-referenced testing. the distribution of students' scores lCroSS a continuum may be of little concern as lo ng as the instrument assesses appropriate ohjectives. or what OiJer ( 1979. much lime and effort o n the part of the teacher (test administratOr) are sometimes required in order to deliver useful. on the otber hand. median (middle score). The purpose i.~ In a c riterion-referenced tcst. Here. Money and cfiiciency are primary concernS in these tests. Criterion-referenced tcsts. w hich means that the test-taker's score was higher than 84 percent of the total number of testlakers.

1983. 2000) is a basic teacher reference book on essential foundations of second language acquisition on which pedagogical practices arc based. In the 1970s and 1980s..testing) that it cannot be captured in additive tests of grammar. an era of behaviorism and special anention to contrastive analysis. and various unilS of language (discrete points) of phonologYI graphology. speaking. TIlcse approaches stiU prevail today. authentiCity. and Savignon. and context. p . Prin ciples of umguage Leaming and TeachiflE (pUj) (Founh Edition. Today. syntax. as the profession emerged into an era o f emphasizing communication.8 CHAPTfR ' Testing. valid instruments that simulate real· world interaction. 1982.) Oller (1979) I Frequent references are made in this book 10 companion vol umes by Lhe author. morphology. Discrete·point tests are constructed o n the assumptio n that langlt:lge can be broken down into its componem parts and that those parts can be tested successfully. vocabulary. tcst designers are still challenged in their quest for more authentic. communicative theories of language brought with them a more integrative view of testing in whlch specialists daimed that ~ thc whole of the communicative event was considerably greater than the sum of its linguistic elements& (Clark. 200 I) spells out that pedagogy in practical terms for the language teacher. Others (among them C. So. I For example. and lexical con· trasts between two languages. the test-taker is required to supply words that fit into those blanks. Jko. 432). Chapter 2). grammatical . These components an:: the skills of U Stening.. It was claimed that an overall language proficiency test. Teachf" E by Pri1lclples (TEP) (Second Edition. Discrete·Point and Integrative Testing 11tis historical perspective underscores (\'10 major approaches to language testing that were debat.. lexicon. Such an approach demanded a decontextuali . then.ation that often confused the test-taker. testing focused on specific language elements such as the phonological . even if i. Assessing. What docs an integrative test look like? Two types of tests have historically been claimed to be examples of integrative tests: doze teSls and dictations.uive testing methods (OUCI'. in the 1950s. should sample all four skills and as many linguistic discrete points as pos!!>ible. reading. 1979). Oller (1979) argued that language competence is a unified set of interJ. and writing . reading. 1982) soon followed in their suppOrt for integrative testing.ed in the 1970s and early I 980s.n mutated form : the dlOice between discrete·point and integr. and Tei/ching Tetlcbing by Prlndples [hereinafter TBP) . (See Chapter 8 for a full discussio n of doze testing. and other discrete points of language. A doze test is a reading passage (perhaps 150 to 300 words) i. His claim was that communicative competcnce is so global and requires such integration (hencc the term "integrative.Cting abilities that cannot be tested separately. and discourse. .ll which roughly every sixth 0 1' seventh word has bet:n deleted. new approach es were sought..

It was argued that successful comple. Oller retreated from his earlier stand and admitted that "the unit3ry tro. and an internalized "expectancy" grammar (enabling onc to predkl an item lhat will come next in a seque. the ability to supply appropriate words in blanks requires a number of abilities that lie at the bean of competence in a language: knowledge of vocabuJary. The unitary trait hypothcsis contended that there is a general factor of language proficiency such that all the discrete pOillLS do not add up to that whole. p . For example.it hypothesiS was wrong ~ (1983 .and graduate versus undergraduate status. Others argued strongly against the unitary trait pOSition. In a study of students in Brazil and the Philippines. (Sec Chapter 6 for more d iscussion of dictation as an assessment device. whose scores on five of the six componenLS of the test were conSiderably higher than Bra:dlians' scores.nce). 352). which suggestt':d an "indivisibleview of language proficiency: that vocabulary. TIle listening portion usually has three stages: an ornl reading without' pauses. Success on a dictation requires careful listening. Reliability of scoring criteria for dictation tests can be improved by designing mUltiple-choice or exact-word cloze test scoring. . learners listen to a passage of 100 to 150 words read aloud by an admjnistr'J.'dlnique that evolved into a testing technique. to an extent. using correct spelling. discourse structure.~ and other discrete points of language couJd not be disemangJcd from each other in language performance. Funher. and a third reading at no rmal speed to give test-takers a chance to check what they wrote. Dictation testing is usually classroomcentered since large-scale administration of dictations is quite impractical from a scoring standpoint. grammatical strucrure. were actually lower than Brazilians in re3ding comprehension scores.tor (or audiotape) and write what they bear. an oral reuding witJ. reading skills and strategies. Farhady's contentions were supported in other research that seriously questioned the unitary trait hypothesis. in the face of the evidence. major field of study. Iwponents of integrative test methods soon centered their arguments on what became known as the unitary trait hypothesis. Esscmially. grammar. reproduction in writing of what is heard. some expectancy rules to aid the short-term memory.tion of doze items taps into all of those abilities. According to theoretical cOnStruCLS underlying this claim.OW'TER' T esting. lo ng pauscs between every phrase (to give the learner time to write down what is heard).) Supporters arguc that dictation is an integralive test because it lapS into grammatical and discourse competencies required for o ther modes of performance in a language. the ~ four skills. wl:tich were said to be the essence of global language proficiency. Filipinos. AssessinS. efficient shon-ter:m memory. Finally. dictation test result's tend to correlate strongly with other tesLS of profiCiency. dcpt':nding o n subjects' native country. Brazilians scored very low in listening comprehension and reillti\'ely high in reading comprehension. and. Farhady (1982) found signlfic:mt and widely varying differences in performance on an ESt proficiency test. phonology.and T eaching 9 claimed that doze test results are good measures of overall proficiency. Dictation is a familiar language-teaching tt.

ng." And the assessment ficJd became more and more concerned with the :tlIthe ntidty of tasks and the genuineness of lexts. (See Ske han. llley do not tell us anything directly about a student's performance ability. and Teaching Communicative Language Testing By lhe mid-1980s. As Weir ( 1990.) Performance-Based Assessment In language courses and progr. as test deSigne rs centered on communicative performance. especially pr. when .) Bachman and Palme r (1996. p .and o n w hat IOpics . test performance must correspond in demonstl"'Jble ways to language use in no n-teSt situations. Communicative testing presented challenges to test deSigners.. respectively subdivided into grammatical and textual compo ne nts. 700 also emphasized the importance of strategic competence (the ability to e mplo)' communicative strategies to compensate for breakdo wns as well as [ 0 enhance the rhetorical effect of utterances) in the process of communicatio n.IS doze o nly tell us about a candidat e's linguistic competence.sed assessment of language typically involves o ral production." And so a quest for aUlhenticity was launched . account must now be taken of: where. Instead of just offering p:aper-and-pencil selective response test's of a pletho ra of separate items." The problem that language assessment expens faced was that tasks tended to be artificial. perfomlance·ba.and why language is 10 be used. 2(02). . Weir ( 1990.and w ith what effect. contrived. and into iIlocutio nary and sociolinguistic components .tms around the world.lgmatic and strategic abilities. Chapte r 9. (Furthe r discussion of both Camle and Swain 's and Bachman 's models can be found in PUT. Fo U owing Canale and Swain's ( 1980) model of com· lUunicative competence. for a survey of communicative testing research. It was dear that the contextS fo r those tasks were extraordimlrily w idely varied and that the sampling of tasks fo r anyone assessment procedure needed to be validated by what language users actually do with language. and unlike ly to mirror language use in real life. 1989. Bachman and Palmer (1996. "Integrative tests such .. pp. A5~. Test constructors began to identify the kinds of real·world tasks that language learners were caJled upon to perform. II ) reminded his readers that ~ to measure language proficiency . 1988. test deSigners are now tac kling this new and more s[ude nt-centercd agenda (Alderson. 6) no ted. as we will see in subsequent chapters of this book. 200 1. p. Baclunan ( I ~)O) proposed a model of language compete nce consisting of o rganizational and prag matic competence. the language-testing field had abandoned argumentS abom the unilary trait hypothesis and had begun to focus o n designing communicative languagNesting tasks. p. 9) include among K funda· me nial. needed to be included in the constructs of language testing and in the actual performance required of test·rakers. how. with w holll .10 CWoPTfR' Testing. All elements of the model .principles of language testing the need for a correspondence between lan· guage leSt performance and language use:"ln o rder for a particular language test to be useful for itS intended purposes.

If care is take n in the test design process. and in integrating reading and w riting. integrated performance (across skill areas). Since M SmarLness· in general is measured by timed. If you rely a little less on fo rmally structured tests and a little more o n evaluation w hile students are performing various tasks. and other interactive tasks. responding. test-take rs arc measured in the act of speaking. norm-referenced . t et's look at three such issues: the effect of new theories of intelligence on the testing industry. and T eachin8 11 written production. d iscrete-point teSts conSisting of a hierMChy of separate items. language elicited and volunteered by the st'Udent can be personalized and mC'J. hut those extra efforts are paying off in the fo rm of more direct testing because students are assessed as they perform actual o r simulated real-world tasks. you will be taking some steps toward meeting the goals of pe rfo rmance-based testing.ningful. In interac tive tasks. In such cases. In technical terms. the advent of what has come to be called "alternative assessment.) A dl3racteristi<. higher comcm validity (see Chapter 2 for an explanatio n) is achieved btcausc:: learne rs arc measured in the process of perfornling the targeted linguistic acts. Paper-andpencil tests certainly do nOI elicit such communicative perform'lOce. This & (intelligence q uo tient) concept of IQ" intelligence: bas permeated the Western world and its way of testing for almost a century. open-ended responses. Such efforts to improve variolls facets of classroom testing are accompanied by some stimulating issucs. performance-based assessment rubrics continues to challenge both assessment experts and classroom teache rs. A5S1!S5ing.. A prime examplc of an inte ractive language assessmclll procedure is an oral intc n 'iew. CURRENT ISSUES IN CIASSRooM TESTING The design of communicative. group perfonllance. performance-based assessmem means that you may have a d ifficu lt time diStinguishing between forma l and informal assessment . o r in combining liste ning and speaking. w hy shouldn 't evCJ")' fie ld of sttldy be so measured? For many ~'ears.of many (but not all) performance-based language assessments is lhe presence of interactive tasks.CHAf'TCR 1 Tesfing. suc h assessment is ti. aU of w hich are helping to shape our current understand ing of effective assessment. to an English language-teaching context. requesting. (Sec Chapter 10 for a funher discussion of performance·based assessment. To be sure. New Views o n Intelligence intellige nce was once viewed strictly as the ability to perform (a) linguistic and (b) logical·mathematical problem solving. and tasks can approadl the authe nticity of real-life language use (see Chapter 7). and the increasing popularity of computer-based testing. we have lived in a world of standardized. TIle test-taker is required to listen accurately 10 someone else and to respond appropriately.me-coosuming and therefore expensive. the assessments involve learne rs in acttlally performing the behavior that we want to me:lSure.

people aren't necessarily adept at fast. for example). vote for them . Those who manage their emotions-especially emotions that can be detrimental-tend to be more capable of fu lly intelligent processing.tined items.12 cw. Assessing. see Brown (2000. 1999). they helped to free us from relying exclusively o n • 'I Fo r a summary of Gardner's theory of intelligence. reactive thinking. However. 1997) also charted new territory in intelligence research in recognizing creative thinking and manipulative su-ategies as pan of intel· Iigence. but they may need a good deal of processing time to enact this cre· ativity. . other people. Debaters.rnlf ' Tes/ing. All · sman. Anger. 1998. smooth talke rs. politicians. Coupled with parallel educational reforms at the time (Armsuong. athletic prowess) • inte rperson:1I intelligence (the ability to understand others and how they feel . research o n intelligence by p!t)'chologists like Howard Gardner. namely. Other forms of smartness are found in those w ho know how to manipulate their environme nt. These new conceptualizations of intelligence have not been universally accepted by the academic community (see White. Z He accepted the u-aditional conceptualizations of linguistic intelligence and logical·mathematical intelligence on which standardized IQ tests are based. Daniel Goleman'S (1995) concept of ~ EQ ~ (emotional quotient) bas spurred us to underscore the importance of the emotions in our cognitive processing. Nevertheless. successful salesper· sons. Robert Sternberg. pp. They may be vcry innovative in being able to think beyond the normal limits imposed by existing tests. 100-102). many of which are inauthentic . or do something they might no t otherwise do. and Teaching tests that are timed in a multiple-choice fonnat consiSting of a multiplicity of logicconstr. Gardner (1983. and to interac t effectively with them) • intrapersonal intelligence (the ability to lUldersland o neself and to develop a sense of self-identity) Raben Sternberg 0988. make a purchase. but he induded five o ther "frames of mind ~ in his theory of multiple intelligences: • spatial intelligence (the ability to fLOd your way around an enVironment. grief. to fo rm me ntal images of reality) • musical intclligence (the ability to perceive and create pitch and rhythmiC patterns) bodily-kinesrhetic intelligence (fine mo tor movement. and Daniel Goleman has begun to [urn the psychometric world upside down. for example. 1994). and con artists are all smart in their manipulative ability to persuade others to think their way. More recently. extended the traditional view of intelligence to seven different components. self-doubt. their intuitive appeal infused the decade of the 1990s with a sense of both freedom and responsibility in our testing agenda. resentment. and other feel ings can easily impair peak performance in everyday tasks as well as highcr-order problem solving.

Traditional and alternative assessment Traditiona l Assessment One-shot. multiple-choice formal Decontextualized test items Scores suffice for feedba ck Norm-referenced scores focus on the "right" answer Summative Oriented to product Non-interactive performance Fosters extrinsic motivation Alternative Assessment Continuous !ong-term assessment Untimed. we might all be stimulated to look at the right-hand list and ask ourselves if. and one should not be misled into thinking that everything on the left-hand side is tainted while the list o n the righr-hand side offers salvation to lhe field of language assessment! As Brown and Hudson (1998) aptly pointed Ollt . But we also assumed the responsibility for tapping into whole language skiUs. learning processes. Table 1. I highlights dif· ferences between the two approaches (adapt ed from Armstro ng. and Bailey. Traditional and "Alternative" Assessment lmplied in some of the earlier description of performance-based dassroom assessment is a trend to supplement traditional test deSigns w illi alternatives that are more ambe oric in their elicitation of meaningful communication. and Teachin8 13 timed . there are alternatives to assessment that we can constructively use in our classrooms. the assessment traditions av:liIable to us should be valued and utilized for the fu nctions that they provide. At the same time. 1998. among those concepts. creative answers Formative Oriented to process Interactive performance Fosters intrinsic motivation . "'S5eS5inS. and some combine the best of both . It should be noted here thal considerably more time and higher institutio nal budgelS are required to administer and score assessments Illat presuppose more Table 1. 1994. creative. free-response format Contextualized communicative tasks Individualized feedback and washback Criterion-referenced scores Open-ended. We were prodded to cautiously combat the potential tyranny of"objectivity" and its accompanying imperSOllal approach. . and in doing so to place some trust in our subjectivity and intuition. Two caveats need to be stated here. discrete-point. (0 dr. standardized exams Timed. Our dlallenge was to test interpersonal. communicative. . analytical tests in measuring language. it is obvious that the table shows a bias toward alternativc assessment. Second.1 represent some overgeneralizatio ns and should therefore be considered w ith caution. in fact.O«PTER I Testing. Many forms of assessment fa ll in betwecn the two. and the ability to negotiate meaning.tw a clear line of distinction between what Armstrong ( 1994) and BaiJcy (1998) have caU<. 207).'d traditional and alternative asseSSmenl. p. First. interactive skills. the concepts in Table 1. It is difficu lt.

Assessing.r 14 CHAPTffl I T esting. to determine w hidl question w ill be presented next. As long as ex.) More and more educators :lnd adVOC:ttes for educational reform arc arguing for a de-emphasis on large-scale standardized testS in favor of building budgets that will offer the kind of comcxtlmlizcd.As test-takers answer eadl question. and Te1Jching subjective evaluation . lypically bring questions of lesser or equal diffic ulty. As a result. A specific type of computer-based tcst . On Chapter 4 . however. the computer typically selectS questjons of greate r or equal d iffic ulty.) . with or witlIom CAT tedlllology. and the computer scores each question before selecti ng (he next one. Some com p uter-bascd rests (also known as "com p Ul erassisted " Or ~ web-based n tests) are small·scale "home-grown " tests availlible on weI>si tes. Tht! com p uter is programmed to fulfill the tcst design as it continuously adjusts to fi nd quest ions of appropriate d ifficu lty for test-takers at all performance levels. discourse. Others arc standardjzed. and ultimately a more complete description of a student's abil ity. however. has been available for many years but has recently gainel1 momentum.) compute r-Based Testing Recent years have seen a burgeoning of assessmcm in w h ich the lCSt-taker performs responses on a com puter. tests like the Test of English as a Foreign Language (fOEFL~ offer a written essay section that mus t be scored by hUUlans (as opposed to automatic. the test-take. more individualization. the computer scores the questio n and uses that information .1minees respond correctly. the potcntial Jor intrinsic motivation. The: payoff for the latter. and once tllCY have entered and confirmed their answers. e lectronic. issues surrounding s tandardized testing are addressed at length . Incor rect :mswers. onc or all of the four skills. the deSigners of the TOEFL are on the verge of offering a s poken English section . and morc interaction in the process of offering feedback . Th e CAT starts with questions o f moderate difficul ry. A1most all computer-based tcst items have fIXed . a computer -ada ptive test. communicative performance-based assessment that w ill bener facitit:ue learning in our schools. (See Chapler 10 (or a complete [fCatment of alternatives in assessment.!. spcak) their responses. largc-scale tests in whic h thousands Or even tens of thousands of test-t'a kers are invo h'cd . as well as tllC responses to previous questions.rious aspects of a language (vocabulary. In a computer-adaptive lest (CAn. etc. Computer-based l"esting. each test-taker receives a set of questions that meet the test specifica· tions and that arc gencraJJy appropriate for h is or her performance levd. As this book goes to press. grammar. as they are sometimes referred to) in thc form of spoken or wriHen stimuli from tile compute rized test and are required to type (or in some cases. o r mac hine scoring). comes with more useful feedback to s tudents.r sees only one question at a time. In CATs. however. Students receive pro mpts (or probes.·directed testing o n V". they cannot rerum to questions or to any earJjer part of the test. closed-ended responscs. o fft!tS these advantages: cI:lSsroom·bascd tes ting self. test-takers cannot skip questions.

org www. and to promote auto no my.tocfl. lind tu rnaround time. • Open-ended responses are less likely to appear because of lhe need for human scorers.eslcafe. I I I I I As rou TCad this book. in the case of CATs • large-scale standardized tests thai can be administered casil)' to thousands of tcst-takers at many different stations.org www. some disad\rantagcs are presem in our currenl predile(:lion for computerizing testing_Among them: • Lack of sec urity and the possib ility of cheating are inhe rent in c1assroombased . stamped-om fo rmulas fo r aSsessment.com Some argue that computer-based testing. This need no t be t he Cllse. • The multiple-choice format p referred for most computer-based tests comains th e us ual potential fo r flawed item design (see Chapter 3). Foreign Language Tcst of English for International Communication lntem:nionaJ English Language Testing System Dave'S ESL Cafe (computerized quizzes) www. then scored electronically for rapid repo ning of results Of course. • '111c human interactive clement (especially in oral production) is absent_ More is said about computer-based testing in subsequent chaplers. Tead1ers and test-makers of the fu t ure w ilJ have access to an ever-increasing rdnge of tools 10 safeguard against impcrsonaJ. in a discussion of large·scale standardized testing. with all the attendant issues of COSt.student d ialogue to form the basis o f assessment. Occasional ~ home·grow n ~ Quizzes that appear on unofficial websites may be mistaken for validated assessments. especially Chapter 4.cm. pus hed to its ul timate level.todc. and of allOWing a teache r. and Teilching 15 • pr. and with a sense of the interconnection of assessment and .I'TlIl I T esfing.ie1ts.ets. testers will be able to enh ance authen· ticity. I hope you \"\-'ilI do so with an appreciat ion for the place of testing in assessment. th e following websi[t"s provide further informatio n and examples o f computer·based tests: Educational Testing Service Test of English as 3.')' a m bc a boon to communicative language testing . reliability. might miligate agai nst recent c fforts to rerurn tcsting (0 its artful form of being tailored by teachers for their dassrooms. Complllcr tcchnolo!. 10 increase interactive c. In additio n.'Xchange. unsupervised compute rized tests. Asse55ing.com www.org www. of b eing designed [Q be performance-based .tcticc for upcoming high-stakcs standardized tests • some individualization. By using tedlOoJogical innovations creatively.

can increase motivation by serving as milestones of srudent progress. 4b. and feedback to the learner. (C) Whole-class d iscussion . pair pronunciation practice. 2a. w h ich are a subset of assessment. wri ting a description of ule weekend 's activities. assessment is almost constant. Appropriate assessments aid in [he reinforcement and retention of informalion . In an interactive. readin g aloud. Keep in mind these basic p rinciples: 1. 6c. Answers to the vocabulary quiz on pages 1 and 2: l c. Tests. Assessments can provide a sense of periodiC closure to modules within a curric ulum . motiV"dtion. Share your results with othe r groups and diss CllSS any differences of opinion. I~ riodic assessments. Tests are essential components of a successful curriculum and one of several partners in the learning p rocess.16 CHAmll I Testing. Assessments can aid in evaluating teadling effectiveness. ( G ) Group or pair work.1 on page 5 that shows tests as a subset of assessment and the laner as a subset of teaching . look at Figure 1. both formal and informal. 6. C1nd T eClching teaching. Singing songs in English . EXERCISES [Note: (I) lndividual work. What proportion o f ellch has an assessment facet to il? Share your conclusions with the resl of t he class. 2. 7. 3d.) 1. 3. Placement tests Diagnostic testS Periodic achievemem tests Short pop qujzzes . and betwecn informal and forma! assessment. (G) In a smaU group. 5. Assessments can spur learners to set goals for themselves. 4. Do yOll agree with this diagrammatic depiction of the three terms? Consider the following classroom tcaching techniques: choral drill. (G) TIle chart below shows a hypo thetical tine of distinction between fonnativc and summative assessment. place the foUowing techniques/procedures into one o f the four ceU and justify your decision. Assessment is an integral part of the teaching-learning cyde. Sa. infomlation gap task. Assessments can confirm areas of strength and pinpoint areas needing further work . Assessing. communicative c urriculum. 2. Assessments can promote student autonomy by encouraging students' selfevruu3tion of their progress. As a group. can provide authenticity.

Why did OUer back down from the unitary trait hypothesis? 5. Take one or two intelligences. If norm-referenced tests typically yield a distribution of scores that resemble a beU -shaped curve. (lie) Review the distinction between norm·referenced and cril'c rionreferenced testing. bminstorm some assessment tasks .n your own words lhe argumcill betwecn unilary trait proponems and discrete-point testing advocates. and brainstorm some teaching activities that foster thai type of intelligence. (lie) Restate i. and T eiiching 17 Standanlized proficiency lestS Final exams Portfolios Journals Speeches (prepared and rehearsed) Oral presentatiOns (prepared. (G) Look at the lisl of Gardner's seven intelligences. but not rehearsed) Impromptu student responses to teacher's questions Student-written response (one paragrnph) to a reading assignment Drafting and revising writing Final essays (after several drafts) Student oral responses to tcacher questions after a videotaped lecture Whole class open-ended discussion of a topic formative Summative Informal Formal 3. eve) Why are doze and dictation considered to be integra live tests? 6. as assigned to your group. Mres5ing. what kinds of distributions are typical of classroom acruevement tests in your experience? 4.CWol'ml' T esting. Then.

(2002). tools. issues. Share your resulls with o the r groupS. Taipei : Tung Bua Book Company.xperienced in learning a foreign language. and research references. (G) Table 1. It i. Mousavi's (1999) Dictiollary of lang llage les/ing (Tehran: Rahnama Publications) . (2000). this 140-page primer on testing offers definitions of basic terms in language testing with brief explanations of fundamental concepts. FOR YOUR FURTHER READING McNamara. 7.com. Latlgllage testing. brainstorm a variery o f test tasks that class members have e. and which o nes faU in between. Then decide whi<:h o f those tasks are performance-based . 1 lists traditional and alternative assessment tasks and characteristics.18 CHIoPTrR 1 Testing. Tim. 8. 9 .abbas·mousavi. Share your conclusions with the rest of the class.s a useful little reference book to check your understand ing of testing jargon and issues in the field. 111ird Editio n. 111i5 publicalion may be d ifficult to find in loca l lx>okslores. quickly review the advantages and disad\'antages of cad}. principles. An e'lcyclopedlc dictlo1lary of language testing. (C) As a who le-c. ln pairs. Oxford: Oxford University Press. Its exhaustive BS-page b ibliography is also down loadable at http://www. It provides comprehensive explanations of theories. Seyyed Abbas. (C) Ask class members to share any experiences with computer-based testing and evaluate the adv-dntages and disadvantages of those experiences.lass discussion. but it is a highly useful compilation of Virtually e very term in the field of language testi ng. which are nOt. Mousavi. background history. and Teaching that may presuppose the same intelligence in order to perfo rm well . and tasks. One of a number of Oxford University Press's brief introductions to various arc~ls of language study. o n both sides of the dun. Assessing. w ith definitions. A shorter version of this 942-page lome may be fOUlld in the previous version.

You're Reading a Free Preview

Descarga
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->