Está en la página 1de 4

1

REFERENCIA BIBLIOGRFICA:
An Evaluation Framework for Multimodal Interaction

2 (Wechsung, 2013)
Wechsung, I. (2013). An evaluation framework for multimodal interaction. Berln: Springer.

PALABRAS CLAVES DEL TEXTO:

DESCRIPCIN DEL TEXTO:

Durante los ltimos quince aos, la multimodalidad ha ampliado sus


horizontes debido al desarrollo de tecnologas como el habla (speech). Si
bien la utilizacin de dispositivos de entrada como el mouse o el teclado
eran privilegiados por los usuarios hace dcadas atrs, en la actualidad hay
un aumento de la preferencia por las interfaces tctiles y de reconocimiento
de voz.
Otra tendencia, que tiene aument la popularidad del speech y ha debilitado
el dominio del teclado y el mouse, es la pantalla tctil, el uso de sta se ha
intensificado con la posibilidad que tienen los usuarios de acceder a la
compra de celulares inteligentes, tabletas, computadores tctiles. (Explicar
con cifras de la encuesta Common sense media tambin se podra
presentar el ejemplo de la cada de BlackBerry, 9 de cada 10 ventas en
Norteamrica de porttiles son de pantalla tctil). (citar ejemplo de Siri o de
Watson)

Definiciones de multimodal
Pero que es multimodal, Cmo definirlo sin entrar en tensiones exclusivas
desde las orientaciones psicolgicas o tecnolgicas?
Multimodal dialogue systems are systems which enable humanmachine interaction through a number of media, making use of
different sensory channels. The understanding of the term media
in the scientific community is, in contrast to the term modality,
mostly uniform. Media is associated with the physical realization
respectively presentation of information via input and output
devices (cf e.g. Bernsen, 1997; Gibbon, Moore, & Winski, 1998;
Hovy & Arens, 1990; Jokinen & Raike, 2003; Sturm, 2005).
Descripcin de otras definiciones
Thus, the three senses sight, hearing, and touch, correspond to the three
perceptual channels. Thereby the terms visual and auditive refer to the
perception and the sensory modalities; the terms optical and acoustical refer
to physical (and not physiological) parameters (Schomaker et al., 1995).
According to Charwats (1992) definition only three different modalities
respectively three different human senses can be distinguished. Although
the aforementioned senses are nowadays those with the highest relevance
for human-computer-interaction (HCI), at least three more senses (smell,
vestibular, taste) are defined in physiology.

PALABRAS
CLAVES

Explicacin:
Interaction Performance Aspects on the User Side
All processing steps described above, can be mapped to the interaction performance aspects proposed by Mller et
al. (2009) and Wechsung et al. (2012a). These aspects are perceptual effort, cognitive workload and response effort.
Perceptual effort. Perceptual effort is the effort required for decoding the system messages, and for understanding
and extracting their meaning (Zimbardo, 1995), e.g. listening-effort or reading effort. This aspect refers to the
perceptual modalities described above. The Borg scale can be used to assess perceptual effort (Borg,1982).
Cognitive workload. The cognitive workload is defined as the specification of the costs of task performance, such
as the necessary information processing capacity and resources (De Waard, 1996). It refers to the processing codes
and processing stages. An overview of methods assessing cognitive workload is given in De Waard (1996) and Jahn
et al. (2005). A popular method is the NASA-TLX questionnaire (Hart & Staveland, 1988). A lightweight
instrument shown to have excellent psychometric properties (Sauro & Dumas, 2009) even in comparison to more
elaborate measures (De Waard, 1996) is the Rating Scale Mental Effort (RSME) by Zijlstra (1993). Note that the
RSME is also known as the SMEQ (Subjective
Mental Effort Questionnaire).
Physical response effort. The physical response effort, is the effort required to communicate with the system
(Mller et al. 2009), such as the effort required for typing in an answer or pushing a button. This aspect refers to the
response codes.
A scale specifically designed to measure physical response effort, is to the authors knowledge, not available.
However, the questionnaire proposed in the ITU-T Recommendation P.851 (ITU-T Rec. P.851, 2003) contains items
related to physical Response effort. Also an adapted version of the RSME (Zijlstra, 1993) may be used.

4
The degree of interference (and consequently the workload and the effort) increases with the degree to which
different tasks or information refer to the same processing dimensions (see Sec. Processing Steps on the User
Side).
To measure performance on the user side, peri-physiological parameters, derived from the users body, can be used.
These measures include pupil diameter, eyetracking and psycho-physiological measures like electrocardiography
(ECG), electromyography (EMG), electroencephalography (EEG) and electro-dermal activity (EDA) (Schleicher,
2009). Generally, these measures are rather unspecific and the valence of a situation (positive or negative) is not
determinable even for EMG measures (Mahlke & Minge, 2006). Consequently, drawing inference based solely on
these methods is difficult. Other possible data sources are log-files, which may be employed to record task success,
task duration, or modality choice. Please note that for all the performance aspects, questionnaires using self-report
are mentioned. Selfreports require the user to judge their performance. If such measurements are taken, the
experienced workload is measured. The experienced performance and the performance assessed via indirect
measurements as described above do not necessarily have to correspond.

Processing Steps on the System Side


On the system side, six processing steps have been identified based on the frameworks of Lopez Cozar and Araki
(2005) and Herzog and Reithinger (2006).
Input Processing. In the first step the input of the various sensors (e.g. microphones, face recognition, gesture
recognition) is processed (Herzog & Reithinger,2006). The input is decoded into a format understandable to the
system, e.g. from acoustics to a text string in case of speech input.
Modality Specific Interpretation. In this step, the transformed input is further transformed into symbolic
information and meaning is provided to the data (Herzog & Reithinger, 2006). For example, a sequence of words is
analysed to gain the meaning (Lopez Cozar, & Araki, 2005).
Fusion. This is the stage, in which the meaning obtained from the different sensors is merged and combined into
one coherent representation, in order to acquire the users intention (Lopez Cozar, & Araki, 2005; Herzog &
Reithinger, 2006).
Dialogue Management. The dialogue management decides on the next steps or actions to be taken, in order to
maintain the dialogue coherence to lead the dialogue to the intended goal (Gibbon, Moore, & Winski, 1998; Lopez
Cozar, & Araki, 2005; Herzog & Reithinger, 2006).
Fission. The fission operation selects the modalities presenting the output and their coordination (Lopez Cozar, &
Araki, 2005; Herzog & Reithinger, 2006).
Modality Specific Response Generation. After fission, modality specific responses are generated; here the
abstract output information is transformed into media objects, understandable to the user (Lopez Cozar, & Araki,
2005; Herzog & Reithinger, 2006).
Output Rendering. Finally, the output rendering, the actual presentation of the coordinated system response in the
defined media channels like speakers and displays takes place (Lopez Cozar, & Araki, 2005; Herzog & Reithinger,
2006).

También podría gustarte