Documentos de Académico
Documentos de Profesional
Documentos de Cultura
In the first part of the paper the development of language tests towards
authenticity is surveyed. The advantages and shortcomings of indirect vs
direct (authentic) types are analysed. Whereas indirect tests were more
efficient and lent themselves to psychometric analyses, they did not tap
real-life language. On the other hand, direct tests based on real interactions
aim at capturing real language and its variations in use. In the second part
of the paper two problems involved in authentic tests are addressed:
a) The difficulty of applying appropriate psychometric measures, caused
by the complexity of tapping the whole construct of authentic language,
and b) the large number of test variables which interfere with the authen-
ticity of the language produced and reduce it to authentic test language.
enough so as not to have too many silent moments before the beep
went off. ,
Suggestions were made as to the need for examining the question with
more appropriate statistical procedures which would estimate the
effect of the testing method on the trait which was being measured.
In fact, research studies which used the multimethod multitrait
validation procedure (Stevenson, 1974; Clifford, 1977; 1978; Bach-
man and Palmer, 1981; Brutsch, 1979) did find that the methods of
testing affected the assessment of the trait and therefore it could not
always be claimed that direct and indirect tests are the same. Other
considerations against the use of indirect tests were brought up as
well. Shohamy (1982) made claims that the degree of correlation
may be a result of the instruction that had taken place; there may be
unique aspects of language skills which are not being tapped by
51
obtaining the same score on the same oral interview test when he was
tested by two different testers on two different occasions. Moreover,
when the speech style changed from interviewing to reporting the
probability of obtaining the same score in oral production was even
lower. In another study (Shohamy et al., 1983), it was again found
that when four different speech styles were tested, as exemplified
in an interview, a reporting task, various role-play situations and a
group discussion, most test takers did not obtain the same scores in
all the four oral interactions. These findings indicate that the varia-
tions which exist in natural language performance do, in effect, come
across in authentic tests. These variations, however, point to problems
of reliability and variability of authentic tests. It seems that the
stable and reliable scores obtained from indirect tests were partially
due to their low validity: they were not tapping the broader and
fuller construct of real-language use.
Since in language testing we cannot afford to have either low
validity or unstable and unreliable results, we are confronted here
with a serious fundamental problem which has to be solved. One
approach would be to construct special tests for every language
interaction, which would in combination, reflect all the possible
language variations. This, however, would obviously be an almost
inconceivable task. A more realistic approach would operate under
the assumption that in language output there are some stable elements
which run across all authentic language performances, as well as
others which are variable and specific to each of the language per-
formances. This approach requires the estimation of both the stable
component of language output and the specific fluctuating elements
of a number of commonly used language interactions. The estimation
could be done by administering different types of language tests
which will invariably include the elements believed to be stable,
while manipulating those elements which are believed to be specific,
such as the testers status, the speech style, the environment, the
mood, etc. By comparing the results, in terms of scores obtained on
such tests, it may be possible to estimate the stable elements versus
the specific ones of various language interactions. This procedure will
hopefully help to solve the problem of reliability. In the meantime,
however, the accuracy of the scores obtained from the authentic
tests remains questionable.
There are also other psychometric problems which are unique to
authentic tests, such as the issue of dependency of items: real-life
language performance is integrative and therefore does not consist of
independent items. Raatz, in his paper in this volume, deals with
some aspects of this issue and suggests various statistical approaches
54
for handling the problem. Since, in any case, the accuracy of authen-
tic test scores is still an issue to be solved, users of such tests should
be very cautious in interpreting their results.
reporting task and a group discussion. Ihe first three tests were
administered on a one-to-one basis, while the latter involved four
test takers interacting in a group. The five factors considered respon-
sible for reducing the authentic language into authentic test
language are: the goal of the interaction, the participants, the test
setting, the topic and the time of the tests.
1 The goal of the interaction,
We refer here to the fact that in real-life situations people may inter-
act for various purposes, none of which is to obtain a score for their
language performance. In a test, on the other hand, both the tester
and the test taker know that the only purpose of the interaction is
to obtain an assessment of the test takers language performance. The
tester evaluates the test takers performance with a score which may
have serious bearing upon the test takers future. They both know
that they would not have been in the specific situation created by
the test, had there not been a need for an assessment of the test
takers language performance. While in real-life situations participants
ignore the quality of the language in favour of transmitting the mes-
sage, in a test the quality of the language produced is the central
issue. Thus, both parties are clearly aware of the artificiality of the
test situation they are involved in, which heavily imposes its con-
straints as well as its consequences.
Even in tasks which are communicative and resemble real-life
55
interactions, the test taker is constantly aware of the fact that the
goal of the interaction is the evaluation of the language produced.
The issue involved in the goal of the interaction is exemplified in
each of the oral tests as follows: in the oral interview the tester is
seldom interested in the test takers opinions; he elicits them only
in order to obtain sufficient language for assigning a score. In the role
play various roles are assumed by both test taker and tester, only in
order to facilitate the production of a wide range of speech acts
which will provide enough language to enable the tester to assign a
score. In the group discussion partners debate a controversial issue,
not really to convince one another or to reach a consensus. In the
reporting task the goal of the test takers performance is not to
convey the content of an article to the tester, but rather to perform
a language task. In all these examples the genuine goal of the inter-
action is to prompt the test taker to produce sufficient language so
that he may be assigned a score. Another aspect of this same factor
relates to the rating scale on oral tests. The constant awareness of
the use of the rating scale by the tester may substantially influence
the test takers language performance.
Thus, unlike real language use, the real goal of the interaction in a
test is the test itself. This undeniable fact is likely to impinge upon
the genuine authenticity of the language produced.
2 The participants
By referring to the participants as one of the factors responsible for
reducing the authenticity of the language used in a test, we recognize
the fact that the tester and test taker would not necessarily be
involved in a similar communicative act with one another in real life.
In the one-to-one tests (interview, role-play, reporting), it is
probably the first time the tester and the test taker have met. They
may be coming from very different and mutually unknown back-
grounds, and they are probably not used to talking to one another.
These factors make the interaction artificial, awkward and difficult.
The unfamiliarity is especially noticeable in the oral interview, in
which the test taker is asked personal and often private questions by
someone he has never met before. For test takers who are not used
to talking openly to strangers, this may be a very embarrassing and
restricting situation: we rarely tell strangers our opinions, thoughts
or problems in first-time encounters. This may therefore have a con-
siderable effect upon the language produced.
In the group discussion the participants know one another, and are
used to having conversations among themselves. In this test, however,
56
4 The topic
Topic refers to the content of the conversation in the test. In real-
life communicative situations the topic is determined by both
participants, usually in an unplanned manner; it evolves from the cir-
cumstances, the environment and the common background of the
participants. However, in a test situation the topic is determined and
imposed by the tester. In the oral interview the tester often plans
ahead of time what he will be asking in order to conduct the inter-
view. In the role-play test the tester determines for the test taker
what role he will play; in the reporting task the tester decides what
the test taker will be reporting on, and in the group test what the
group will be discussing. These situations hardly ever happen in real-
life communication, where the topic of the interaction is most rarely
determined by external considerations ahead of time. Imposing the
topic artificially is very likely to have an impairing effect on the
authenticity of the language.
5 The time
Time refers to the time limits imposed on the tests. Whereas in most
57
tests there is a limit of time i.e. the test taker must begin and com-
plete his task in a given period of time, in real-life interactions time
does not play such an important role. It is very likely that the time
limit has some effect on the quality of language produced. Also,
different communication strategies are probably set in motion under
the pressure of limited time.
These five factors, the goal of the interaction, the participants, the
setting, the topic and the time of the test, can be considered threats
to the authenticity of the language produced on these tests. Each of
them may violate the authenticity of the language produced by the
test taker, so that the score assigned as a result of the test takers
performance on the test is most probably not the true reflection of
the real oral proficiency underlying it. In other words, the test
takers performance on the authentic communicative language test is
not likely to be a true manifestation of his oral language competence.
Nevertheless, the language output in communicative tests which is
meant to approximate real-life authenticity is in all probability more
authentic than the one produced in tests where talking to a tape
recorder was the standard situational context. There is, however,
still a gap in terms of authenticity between the language produced
on any test whatsoever and the language used in real-life situations.
IV Conclusion
In this paper we first reviewed the trend of development towards
language used in real life, they still have some major deficiencies. One
such deficiency is the lack of measurement and statistical analysis
and the limited empirical evidence to show their psychometric
qualities. There are also difficulties in trying to impose classical
measurement theories on these unique types of tests which aim at
tapping the whole construct of language performance. It was also
pointed out that the language of authentic tests is not a true repre-
sentation of real-life language. It was shown that it is difficult, if not
impossible, to even approximate real-life language use on language
tests. The most we can obtain, at the moment, is authentic test
language.
Two alternative suggestions can be made. If we insist on eliciting
authentic real-life language and not only authentic test language
58
V References
Bachman, L.F. and Palmer, A.S. 1981: The construct validation of the FSI Oral
Interview. Language Learning 31, 67-86.
Brutsch, S. 1979: Convergent/discriminant validation of prospective teachers
proficiency in oral and written production of French. University of
Minnesota, doctoral dissertation.
Carroll, J.L. 1973: Foreign language testing; will the persistent problems persist?
In OBrien, M.C., editor, ATESOL testing in second language teaching:
new dimensions, Dublin: The Dublin University Press.