Documentos de Académico
Documentos de Profesional
Documentos de Cultura
General Co-Chairs: Hideo Saito, Keio University, Japan & Jean-Marc Seigneur, University of
Geneva, Switzerland
Program Co-Chairs: Guillaume Moreau, Ecole Centrale de Nantes, France & Pranav Mistry, MIT
Media Lab, USA
Organisation Chair: Jean-Marc Seigneur, University of Geneva, Switzerland
Augmented/Mixed Reality Co-Chairs: Guillaume Moreau, Ecole Centrale de Nantes, France &
Masahiko Inami, Keio University, Japan
Brain Computer Interface Co-Chairs: Karla Felix Navarro, University of Technology Sydney,
Australia and Ed Boyden, MIT Media Lab, USA
Biomechanics and Human Performance Chair: Guillaume Millet, Laboratoire de Physiologie de
l'Exercice de Saint-Etienne, France
Wearable Computing Chair: Bruce Thomas, University of South Australia
Security and Privacy Chair: Jean-Marc Seigneur, University of Geneva, Switzerland
Program Committee:
Peter Froehlich, Forschungszentrum Telekommunikation Wien, Austria
Pranav Mistry, MIT Media Lab, USA
Jean-Marc Seigneur, University of Geneva, Switzerland
Guillaume Moreau, Ecole Centrale de Nantes, France
Guillaume Millet, Laboratoire de Physiologie de l'Exercice de Saint-Etienne, France
Jacques Lefaucheux, JLX3D, France
Christian Jensen, Technical University of Denmark
Jean-Louis Vercher, CNRS et Université de la Méditerranée, France
Steve Marsh, National Research Council Canada
Didier Seyfried, INSEP, France
Hideo Saito, Keio University, Japan
Narayanan Srinivasan, University of Allahabad, India
Qunsheng Peng, Zhejiang University, China
Karla Felix Navarro, University of Technology Sydney, Australia
Brian Caulfield, University College Dublin, Ireland
Masahiko Inami, Keio University, Japan
Ed Boyden, MIT Media Lab, USA
Bruce Thomas, University of South Australia
Franck Multon, Université de Rennes 2, France
Yanjun Zuo, University of North Dakota, USA
ACM Press
The Association for Computing Machinery
2 Penn Plaza, Suite 701
New York New York 10121-0701
ACM COPYRIGHT NOTICE. Copyright © 2010 by the Association for Computing Machinery,
Inc. Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed for profit
or commercial advantage and that copies bear this notice and the full citation on the first
page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to
redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or permissions@acm.org.
For other copying of articles that carry a code at the bottom of the first or last page,
copying is permitted provided that the per-copy fee indicated in the code is paid
through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923,
+1-978-750-8400, +1-978-750-4470 (fax).
Acknowledgments
Many thanks to: the eHealth division of the European Commission who has circulated the
call for papers in its official lists of events; the municipality of Megève and Megève
Tourisme who helped organising the conference; the EU-funded FP7-ICT-2007-2-
224024 PERIMETER project who partially funds the organisation chair as well as the
University of Geneva where he is affiliated; the French association for virtual reality
(AFRV) who organised the industrial and scientific session; the ACM who published the
proceedings of the conference in its online library; the French “pôle de compétitivité”
Sporaltec who sponsored the best paper award; and all the program committee members
who reviewed the submitted papers and circulated the CFP to their contacts.
Table of Contents
Article 1: “ExoInterfaces: Novel Exosceleton Haptic Interfaces for Virtual Reality,
Augmented Sport and Rehabilitation”, Dzmitry Tsetserukou, Katsunari Sato and Susumu
Tachi.
Article 5: “Relevance of EEG Input Signals in the Augmented Human Reader”, Inês
Oliveira, Ovidiu Grigore, Nuno Guimarães and Luís Duarte.
Article 6: “Brain Computer Interfaces for Inclusion”, Paul McCullagh, Melanie Ware,
Gaye Lightbody, Maurice Mulvenna, Gerry McAllister and Chris Nugent.
Article 7: “Emotion Detection using Noisy EEG Data”, Mina Mikhail, Khaled El-Ayat,
Rana El Kaliouby, James Coan and John J.B. Allen.
Article 8: “World’s First Wearable Humanoid Robot that Augments Our Emotions”,
Dzmitry Tsetserukou and Alena Neviarouskaya.
Article 10: “Airwriting Recognition using Wearable Motion Sensors”, Christoph Amma,
Dirk Gehrig and Tanja Schultz.
Article 11: “Augmenting the Driver’s View with Real-Time Safety-Related Information“,
Peter Fröhlich, Raimund Schatz, Peter Leitner, Stephan Mantler and Matthias Baldauf.
Article 12: “An Experimental Augmented Reality Platform for Assisted Maritime
Navigation”, Olivier Hugues, Jean-Marc Cieutat and Pascal Guitton.
Article 13: “Skier-ski System Model and Development of a Computer Simulation Aiming
to Improve Skier’s Performance and Ski”, François Roux, Gilles Dietrich and Aude-
Clémence Doix.
Article 14: “T.A.C: Augmented Reality System for Collaborative Tele-Assistance in the
Field of Maintenance through Internet.” Sébastien Bottecchia, Jean Marc Cieutat and
Jean Pierre Jessel.
Article 16: “Partial Matching of Garment Panel Shapes with Dynamic Sketching
Design”, Shuang Liang, Rong-Hua Li, George Baciu, Eddie C.L. Chan and Dejun Zheng.
Article 17: “Fur Interface with Bristling Effect Induced by Vibration”, Masahiro
Furukawa, Yuji Uema, Maki Sugimoto and Masahiko Inami.
Article 19: “The Reading Glove: Designing Interactions for Object-Based Tangible
Storytelling”, Joshua Tanenbaum, Karen Tanenbaum and Alissa Antle.
Article 22: “Bouncing Star Project: Design and Development of Augmented Sports
Application Using a Ball Including Electronic and Wireless Modules”, Osamu Izuta,
Toshiki Sato, Sachiko Kodama and Hideki Koike.
Article 23: “On-line Document Registering and Retrieving System for AR Annotation
Overlay”, Hideaki Uchiyama, Julien Pilet and Hideo Saito.
Article 24: “Augmenting Human Memory using Personal Lifelogs”, Yi Chen and Gareth
Jones.
Article 25: “Aided Eyes: Eye Activity Sensing for Daily Life”, Yoshio Ishiguro, Adiyan
Mujibiya, Takashi Miyaki and Jun Rekimoto.
ExoInterfaces: Novel Exosceleton Haptic Interfaces for
Virtual Reality, Augmented Sport and Rehabilitation
Dzmitry Tsetserukou Katsunari Sato Susumu Tachi
Toyohashi University of Technology University of Tokyo Keio University
1-1 Hibarigaoka, Tempaku-cho, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 4-1-1 Hiyoshi, Kohoku-ku,
Toyohashi, Aichi, 113-8656 Japan Yokohama, 223-8526 Japan
441-8580 Japan
dzmitry.tsetserukou@erc. Katsunari_Sato@ ipc.i. tachi@tachilab.org
tut.ac.jp u-tokyo.ac.jp
Text=Fext × dext
Origin
Tflex=Fflex × dflex
Figure 2. FlexTorque on the human’s arm surface.
Muscle
Fext FlexTorque is made up of two DC motors (muscles) fixedly
Fflex mounted into plastic Motor holder unit, Belts (tendons), and two
Belt fixators (Insertions). The operation principle of the haptic
Tload=Fload × dload
display is as follows. When DC motor is activated, it pulls the belt
Tendon
dflex Fload and produces force Fflex generating the flexor torque Tflex. The
dext Insertion
oppositely placed DC motor generates the extensor torque Text.
dload Therefore, the couple of antagonistic actuators produce a net
torque at operator elbow joint Tnet. We defined the position of the
Insertion point to be near to the wrist joint in order to develop
Tnet= Tflex - Text large torque at the elbow joint.
The position of the operator’s arm, when flexor torque is
Figure 1. Structure and action of a skeletal muscle. generated, is shown in Figure 3 (where θ stands for angle of
forearm rotation in relation to upper arm).
Main functions of the muscles are contraction for locomotion and
skeletal movement. A muscle generally attaches to the skeleton at
both ends. Origin is the muscle attachment point to the more
stationary bone. The other muscle attachment point to the bone
that moves as the muscle contracts is Insertion. Muscle is
connected to the periosteum through tendon (connective tissue in
the shape of strap or band). The muscle with tendon in series acts
Direction of The angle α varies according to the relative position of the
Direction motor shaft forearm and upper arm. It can be found using the following
of belt rotation equation:
tension
⎛ l 2 + d 2f − d t2 ⎞ , (3)
α = cos−1 ⎜ ⎟⎟
⎜ 2ld f
⎝ ⎠
where dt is the distance from the pivot to the Origin; l is the length
of belt, it can be calculated from the rotation angle of the motor
shaft.
The detailed view of the FlexTorque is presented in Figure 5.
Pulleys
Timing belt
Tm Supporter
Shaft Stopper
Ft
dt lα
Figure 5. 3D exploded view of the driving unit of FlexTorque.
Fty
Ftx Each unit is compact and light in weight (60 grams). This was
df achieved due to the use of plastic and duralumin materials in
manufacturing the main components. The Supporter surface has
concave profile to match the curvature of human arm surface
Tn (Figure 6).
Tmi , (1)
Ft =
r
Figure 6. Driving unit of FlexTorque.
where Tm is the motor torque, i is the gear ratio, and r is the shaft The essential advantage of the structure of FlexTorque device is
radius. that heaviest elements (DC motors, shafts, and pulleys) are
located on the part of upper arm, which is nearest to the shoulder.
The net torque Tn acting at the elbow joint is:
Therefore, operator’s arm undergoes very small additional
loading. The rest of components (belts, belt fixators) are light in
weight and do not load the operator’s muscles considerably. We
Tn = Fty d f = Ft d f cos (α ) , (2)
propose to use term “Karate (empty hand) Haptics” to such kind
of novel devices because they allow presenting the forces to the
human arm without using additional interfaces in the human
where df is the moment arm. hands. The developed apparatus features extremely safe force
presentation to the human’s arm. While overloading, the belt is
physically disconnected from the motor and the safety of the
human is guaranteed.
The vibration of the human arm (e.g., simulation of driving the
heavy truck) can be realized through alternate repeatable jerks of
torque of antagonistic motors. Thus, the operator can perceive the
roughness of road surface.
The FlexTorque enables the creation of muscle stiffness. By
contracting belts before the perturbation occur we can increase the
joint stiffness. For example, during collision of human hand with
the moving object in Virtual Environment the tension of the belt
of one driving units drops abruptly and the tension of the belt Figure 7. Augmented Arm Wrestling and Augmented
pulling the forearm in the direction of the impact force increases Collision.
quickly.
The contact and collision with virtual object can be presented 4. USER STUDY AND FUTURE
through FlexTorque as well. In the case of collision, the limb RESEARCH
must be at rest. In such a case, the net torque produced by the FlexTorque haptic interface was demonstrated at SIGGRAPH
muscles is opposed by another equal but opposite torque Tload. ASIA 2009 [1,2,10]. To maintain the alignment of the extensor
Similarly to the human muscles, the net torque produced by the belt on the elbow avoiding thus slippage, user wears specially
haptic display restrains the further movement of the user’s arm. designed pad equipped with guides.
3. APPLICATIONS We designed three games with haptic feedback. We developed the
The main features of FlexTorque are: (1) it presents high fidelity Gun Simulator game with the recoil imitation (Figure 8). Quick
kinesthetic sensation to the user according to the interactive single jerk of the forearm simulates the recoil force of a gun.
forces; (2) it does not restrict the motion of the human arm; (3) it High-frequency series of impulsive forces exerted on the forearm
has wearable design; (4) it is extremely safe in operation; (5) it imitate the shooting by machine gun. In this case upper motor is
does not require a lot of storage space. These advantages allow a supplied with short ramp impulses of current.
wide range of applications in virtual and augmented reality
systems and introduce a new way of game playing.
Here we summarize the possible application of haptic display
FlexTorque:
1) Virtual and Augmented Environments (presentation of
physical contact to human’s arm, muscle stiffness, object
weight, collision, physical contact, etc.).
2) Augmented Sport and Games (enhancing the immersive
experience of the sport and games through the force
feedback).
3) Rehabilitation (user with physical impairments can easily
control the applied torque to the arm/leg/palm during
performing the therapeutic exercises). Figure 8. The Gun Simulator game.
4) Haptic navigation for blind persons (the obstacle detected In Teapot Fishing game player casts a line by quick flicking the
by camera is transferred to force restricting the arm rod towards the water (Figure 9).
motion in the direction of the object).
A number of games for augmented sport experiences, which
provide a natural, realistic, and intuitive feeling of immersion into
virtual environment, can be implemented. The Arm Wrestling
game that mimics the real physical experience is currently under
development (Figure 7). The user wearing FlexTorque and Head
mounted display (HMD) can play either with a virtual character
or a remote friend for more personal experience. The virtual
representation of players’ arms are shown on the HMD. While
playing against a friend, user sees the motion of arms and
experiences the reaction force from rival.
Hands
Origin/ Origin/
Insertion Ft1 Ft2 Insertion
T m1 T m2
Left Arm Right Arm
Shoulder Shoulder
joint joint
Figure 10. The Virtual Gym game.
In total more than 100 persons had experienced novel haptic
interface FlexTorque. We have a got very positive feedback from
the users and companies. While discussing the possible useful Figure 12. Kinematic diagram of FlexTensor and human arm.
applications with visitors, the games for physical sport exercises
and rehabilitation were frequently mentioned. The majority of In the configuration when the middle of the belt is not fixed,
users reported that this device presented force feedback in a very FlexTensor presents the external force resisting the expanding of
realistic manner. the human arms (basic configuration). This action can be used for
simulation of the breaststroke swimming technique, when human
sweeps the hands out in water to their widest point (Figure 13).
5. DESIGN OF MULTIPURPOSE HAPTIC The configuration, in which the middle of the belt is fixed by user
standing on the band with both (or one) feet, enables presentation
DISPLAY FlexTensor of the object weight (Figure 14). The tension of the belt represents
The motivation behind the development of the FlexTensor (haptic the magnitude of gravity force acting on the human arms. The
display that uses Flexible belt to produce Tension force) was to fixation of the middle of the belt can be positioned on the human
achieve realistic feedback by using simple and easy to wear haptic neck (for simulation of human arm lifting) and on the waist (for
display. simulation of resistance of environment in the direction of arm
The multipurpose application is realized by means of fixation of stretching, e.g., in the case of contact with the virtual wall).
different elements of FlexTensor (i.e. middle of the belt,
Origin/Insertion points) in the particular application. The structure FEXT
of the FlexTensor is similar to the flexor part of FlexTorque haptic
display. The main differences are: (1) belt connects the movable Breaststroke technique
points on the human arm; (2) both attachment points of the belt FEXT
have embedded DC motors. Belt
In the haptic display FlexTorque the function of each attachment
point is predetermined (Figure 11). The configuration of
FlexTensor allows each point to perform the function of Insertion
and Origin depending on the purpose of application (Figure 12).
This fact enables to enlarge the area of FlexTensor applications in
Virtual Reality extraordinary.
mg 8. REFERENCES
Motor [1] FlexTorque. Games presented at SIGGRAPH Asia 2009.
Belt http://www.youtube.com/watch?v=E6a5eCKqQzc
[2] FlexTorque. Innovative Haptic Interface. 2009.
http://www.youtube.com/watch?v=wTZs_iuKG1A&feature=
related
Biceps curl exercise [3] Hayashi, T., Kawamoto, H., and Sankai, Y. 2005. Control
Middle of a belt is fixed method of robot suit HAL working as operator’s muscle
Figure 14. Application of FlexTensor for the weight using biological and dynamical information. In Proceedings
presentation and strength training exercise. of the International Conference on Intelligent Robots and
Systems (Edmonton, Canada, August 02 - 06, 2005). IROS
In the case when the palm of one arm is placed on some part of '05. IEEE Press, New York, 3063-3068.
the body (e.g., waist, neck), this attachment point becomes Origin. [4] Jeong, Y., Lee, Y., Kim, K., Hong, Y-S., and Park, J-O.
Such action as unsheathing the sword can be simulated by 2001. A 7 DOF wearable robotic arm using pneumatic
stretching out the unfixed arm. FlexTensor can interestingly actuators. In Proceedings of the International Symposium on
augment the 3D archery game presenting the tension force Robotics (Seoul, Korea, April 19-21, 2001). ISR '01.
between arms. 388-393.
[5] Lee, S., Park, S., Kim, W., and Lee, C-W. 1998. Design of a
The illusion of simultaneous pulling of both hands can be
force reflecting master arm and master hand using pneumatic
implemented by exertion of different values of forces Ft1 and Ft2
actuators. In Proceedings of the IEEE International
in the basic configuration (see Figure 12). The illusion of being
Conference on Robotics and Automation (Leuven, Belgium,
pulled to the left side and to the right side can be achieved when
May 16-20, 1998). ICRA '98. IEEE Press, New York, 2574-
Ft1>Ft2 and Ft1<Ft2, respectively.
2579.
The developed apparatus features extremely safe force [6] Murayama, J., Bougrila, L., Luo, Y., Akahane, K.,
presentation to the human’s arm. While overloading, physical Hasegawa, S., Hirsbrunner, B., and Sato, M. 2004. SPIDAR
disconnection of the belt from the motor protects the user from the G&G: a two-handed haptic interface for bimanual VR
injury. interaction. In Proceedings of the EuroHaptics (Munich,
Germany, June 5-7, 2004). Springer Press, Heidelberg,
6. CONCLUSIONS 138-146.
Novel haptic interfaces FlexTorque and FlexTensor suggest new [7] PHANTOM OMNI haptic device. SensAble Technologies.
possibilities for highly realistic, very natural physical interaction http://www.sensable.com/haptic-phantom-omni.htm
in virtual environments, augmented sport, and augmented game
applications. [8] Raytheon Sarcos Exoskeleton. Raytheon Company.
http://www.raytheon.com/newsroom/technology/rtn08_exos
A number of new games for sport experiences, which provide a keleton/
natural, realistic, and intuitive feeling of physical immersion into
virtual environment, can be implemented (such as skiing, biathlon [9] Richard, P., Chamaret, D., Inglese, F-X., Lucidarme, P., and
(skiing with rifle shooting), archery, tennis, sword dueling, Ferrier, J-L. 2006. Human scale virtual environment for
driving simulator, etc.). product design: effect of sensory substitution. The
International Journal of Virtual Reality, 5(2), 37-34.
The future goal is the integration of the accelerometer and MEMS
gyroscopes into the holder and fixator of the FlexTorque and into [10] Tsetserukou, D., Sato, K., Neviarouskaya, A., Kawakami,
FlexTensor for capturing the complex movement and recognizing N., and Tachi, S. 2009. FlexTorque: innovative haptic
the gesture of the user. The new version of the FlexTorque and interface for realistic physical interaction in Virtual Reality.
FlexTensor (ExoInterface) will take advantages of the In Proceedings of the 2nd ACM SIGGRAPH Conference and
Exoskeletons (strong force feedback) and Wii Remote Interface Exhibition on Computer Graphics and Interactive
(motion-sensing capabilities). Technologies in Asia (Yokohama, Japan, December 16-19,
2009), Emerging Technologies. ACM Press, New York, 69.
We expect that FlexTorque and FlexTensor will support future [11] Wii Remote. Nintendo Co. Ltd.
interactive techniques in the field of robotics, virtual reality, sport http://www.nintendo.com/wii/what/accessories
simulators, and rehabilitation.
PossessedHand: A Hand Gesture Manipulation System
using Electrical Stimuli
ABSTRACT
Acquiring knowledge about the timing and speed of hand
gestures is important to learn physical skills, such as play-
ing musical instruments, performing arts, and making hand-
icrafts. However, it is difficult to use devices that dynam-
ically and mechanically control a user’s hand for learning
because such devices are very large, and hence, are unsuit-
able for daily use. In addition, since groove-type devices
interfere with actions such as playing musical instruments,
performing arts, and making handicrafts, users tend to avoid
wearing these devices. To solve these problems, we propose
PossessedHand, a device with a forearm belt, for controlling
a user’s hand by applying electrical stimulus to the muscles
around the forearm of the user. The dimensions of Pos-
sessedHand are 10 × 7.0 × 8.0 cm, and the device is portable
and suited for daily use. The electrical stimuli are gener-
ated by an electronic pulse generator and transmitted from
14 electrode pads. Our experiments confirmed that Pos-
sessedHand can control the motion of 16 joints in the hand.
We propose an application of this device to help a beginner
learn how to play musical instruments such as the piano and
koto.
General Terms
Design
Figure 1: Interaction examples of PossessedHand.
Keywords (a)A feedback system. (b)A navigation system.
interaction device, output device, wearable, hand gesture,
electrical stimuli
1. INTRODUCTION
Although a number of input systems for hand gestures
have been proposed, very few output systems have been pro-
posed for hand gestures. If a computer system controls a
user’s hand, the system can also be used to provide feed-
backs to various interaction systems such as systems for rec-
Permission to make digital or hard copies of all or part of this work for ognizing virtual objects (Fig. 1-a) and navigation (Fig. 1-
personal or classroom use is granted without fee provided that copies are b), assistant systems for playing musical instruments, and
not made or distributed for profit or commercial advantage and that copies a substitute sensation system for the visually impaired and
bear this notice and the full citation on the first page. To copy otherwise, to hearing impaired. In this paper, we propose PossessedHand,
republish, to post on servers or to redistribute to lists, requires prior specific a device with a forearm belt, for controlling a user’s hand
permission and/or a fee.
Augmented Human Conference April 2–3, 2010, Megève, France by applying electrical stimulus to the muscles around the
Copyright 2010 ACM 978-1-60558-825-4/10/04 ...$10.00. forearm.
2. PHASE OF DEVELOPMENT
There are four phases for controlling the hand posture. In
this research, we confirm the phase for which PossessedHand
can be used. Thereafter, we propose interaction systems
based on PossessedHand.
ABSTRACT 1. INTRODUCTION
There is a trend these days to add emotional character- Emotion awareness has become one of the most innovative
istics as new features into human-computer interaction to features in human computer interaction in order to achieve
equip machines with more intelligence when communicating more natural and intelligent communications. Towards var-
with humans. Besides traditional audio-visual techniques, ious measures of automatic emotion recognition in the engi-
physiological signals provide a promising alternative for au- neering way, numerous efforts have been deployed to the au-
tomatic emotion recognition. Ever since Dr. Picard and diovisual channels such as facial expressions [4, 6] or speeches
colleagues brought forward the initial concept of physiolog- [2, 12, 16]. Recently physiological signals, as an alterna-
ical signals based emotion recognition, various studies have tive channel for emotional communication, have gradually
been reported following the same system structure. In this earned attentions in the field of emotion recognition. Start-
paper, we implemented a novel 2-stage architecture of the ing from the series of publications authored by Dr. Picard
emotion recognition system in order to improve the perfor- and her colleagues in the Massachusetts Institute of Tech-
mance when dealing with multi-subject context. This type nology (MIT) Laboratory [17, 18, 19], several interesting
of system is more realistic practical implementation. Instead findings have been reported indicating that certain affec-
of directly classifying data from all the mixed subjects, one tive states can be recognized by means of heart rate (HR),
step was added ahead to transform a traditional subject- skin conductivity (SC), temperature (Tmp), muscle activity
independent case into several subject-dependent cases by (EMG), and respiration velocity (Rsp). They also elabo-
classifying new coming sample into each existing subject rated a complete physiological signal based emotion recogni-
model using Gaussian Mixture Model (GMM). For simul- tion procedure which gave great inspirations to the followers
taneous classification on four affective states, the correct [15, 9, 24, 7, 10].
classification ration (CCR) shows significant improvement
from 80.7% to over 90% which supports the feasibility of There is one particular issue that first appeared during the
the system. description of the affective data collection in [19], turns out
to be a major obstacle to the development of a general
methodology for multi-subject emotion recognition using phys-
Categories and Subject Descriptors iological signals. This issue, we put as the “individual differ-
H.1.2 [Models and Principles]: User/Machine Systems— ences”, can be briefly explained as the intricate variety of in-
Human information processing; G.3 [Probability and Statis- dividual behaviors among subjects. On one hand, the prob-
tics]: Multivariate statistics lem shows the concern of different interpretation of emotions
across individuals within the same culture [19]. Therefore it
may complicate the signal processing and classification pro-
Permission to make digital or hard copies of all or part of this work for cedures when the goal is to examine whether subjects elicit
personal or classroom use is granted without fee provided that copies are similar physiological patterns for the same emotion. For-
not made or distributed for profit or commercial advantage and that copies tunately, thanks to the vast studies in psychology such as
bear this notice and the full citation on the first page. To copy otherwise, to the proposal of six basic emotions by Ekman [4, 5] or the
republish, to post on servers or to redistribute to lists, requires prior specific development of the International Affective Picture System
permission and/or a fee.
(IAPS), this aspect of “individual differences” has been some
Augmented Human Conference, April 2-3, 2010, Megève, France.
Copyright ⃝ c 2010 ACM 978-1-60558-825-4/10/04... $10.00. extent alleviated by the employment of scientific categoriza-
tion of emotions and the usage of standardized emotion elic-
itation facilities.
Picard et al. [19] - single-sub DFA and QDF SFFS and Fisher all classes: 81.25%
- 8 emotions using guided imagery
Haag et al.[9] - single-sub MLP none aro: 96.58%
- aro/val using IAPS val: 89.93%
Gu et al. [7] - single-sub SVM none aro: 85.71%
- aro/val using IAPS val: 78.57%
Wagnar et al. [24] - single-sub kNN, LDF SFFS, Fisher - no feat.red.: 80%
- 6 emotions using music and MLP and ANOVA - with feat.red.: 92%
Nasoz et al. [15] - multi-sub kNN, DFA none kNN: 71.6%
- 6 emotions using movie clips and MBG DFA: 74.3%
Gu et al. [8] - multi-sub kNN, fkNN, GA - no feat.red.:
- aro/val using IAPS LDF and QDF val: 64.2%, aro: 62.8%
- with feat.red.:
val: 76.1%, aro: 78%
Kim and André [10] - multi-sub pLDA SBS for 4 classes:
- 4 EQs on aro/val plane sub-indep: 70%
using music sub-dep: 95%
For a preliminary study on the proposed 2-stage emotion The mean of the abs of the
recognition system, we adopted the time-domain statistical ∑N −1
1st differences of x̃(n): δ̃ = 1
N −1 1 |x̃(n + 1) − x̃(n)|
feature sets proposed by Picard et al. [19], because these fea-
tures have appeared in several previous studies and shown
The mean of the abs of the
the ability in classification affective states [19, 9, 7, 8, 14]. 6 ∑N −2
2nd differences of x(n): γ= 1
N −2
|x(n + 2) − x(n)|
features were extracted from each physiological signal (HR, 1
BVP, SC, EMGz, EMGc and Rsp) using the formulas de-
The mean of the abs of the
picted in Table 2. In all, there were 36 features prepared for ∑N −2
each data entry. 2nd differences of x̃(n): γ̃ = 1
N −2 1 |x̃(n + 2) − x̃(n)|
sub B 1 C 0 0 19 0 9 28 0.321
sub E 0.975
Table 5: Recognition Results of open-set experiment
using the proposed 2-stage system (k = 7 for kNN)
The classification goal is to simultaneously differentiate four
Experiment I Experiment II
types of affective states (EQ1 to EQ4 in Figure 3). All of
the classification procedures are conducted under 10-cross
validation. sub A 0.832 sub B 0.963
Table 2: True positives (TP), false negatives (FN) and false positives (FP) measured for subjects T1 and T2.
[7] N. Matsushita, D. Hihara, T. Ushiro, S. Yoshimura, York, NY, USA, 1996. ACM.
J. Rekimoto, and Y. Yamamoto. Id cam: A smart [17] R. Vertegaal, C. Dickie, C. Sohn, and M. Flickner.
camera for scene capturing and id recognition. In Designing attentive cell phone using wearable
ISMAR ’03: Proceedings of the 2nd IEEE/ACM eyecontact sensors. In CHI ’02: CHI ’02 extended
International Symposium on Mixed and Augmented abstracts on Human factors in computing systems,
Reality, page 227, Washington, DC, USA, 2003. IEEE pages 646–647, New York, NY, USA, 2002. ACM.
Computer Society.
[8] M. Middendorf, G. McMillan, C. G., and J. K.
Brain-computer interfaces based on the steady-state
visual-evoked response. IEEE Trans. Rehabil. Eng.,
8:211–214, 2000.
[9] G. Mueller-Putz and G. Pfurtscheller. Control of an
electrical prosthesis with an ssvep-based bci. IEEE
Transactions on Biomedical Engineering, 55:361–364,
2008.
[10] G. Mueller-Putz, R. Scherer, C. Brauneis, and
G. Pfurtscheller. Steady-state visual evoked potential
(ssvep)-based communication: impact of harmonic
frequency components. Journal of Neural Engineering,
2:123–130, 2005.
[11] R. Palaniappan. Utilizing gamma band to improve
mental task based brain-computer interface design.
Neural Systems and Rehabilitation Engineering, IEEE
Transactions on, 14(3):299–303, Sept. 2006.
[12] R. Palaniappan and D. Mandic. Biometrics from brain
electrical activity: A machine learning approach.
Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 29(4):738–742, April 2007.
[13] R. Palaniappan, R. Paramesran, S. Nishida, and
N. Saiwaki. A new brain-computer interface design
using fuzzy artmap. Neural Systems and
Rehabilitation Engineering, IEEE Transactions on,
10(3):140–148, Sept. 2002.
[14] R. Paranjape, J. Mahovsky, L. Benedicenti, and
Z. Koles’. The electroencephalogram as a biometric. In
Electrical and Computer Engineering, 2001. Canadian
Conference on, volume 2, pages 1363–1366 vol.2, 2001.
[15] M. Poulos, M. Rangoussi, V. Chrissikopoulos, and
A. Evangelou. Parametric person identification from
the eeg using computational geometry. In Electronics,
Circuits and Systems, 1999. Proceedings of ICECS
’99. The 6th IEEE International Conference on,
volume 2, pages 1005–1008 vol.2, Sep 1999.
[16] B. M. Velichkovsky and J. P. Hansen. New
technological windows into mind: there is more in eyes
and brains for human-computer interaction. In CHI
’96: Proceedings of the SIGCHI conference on Human
factors in computing systems, pages 496–503, New
Relevance of EEG Input Signals
in the Augmented Human Reader
Inês Oliveira Ovidiu Grigore, Nuno Guimarães, Luís Duarte
CICANT, University Lusófona LASIGE/FCUL
Campo Grande, 376, 1749-024, Lisbon University of Lisbon, Campo Grande, 1749-016, Lisbon
PORTUGAL PORTUGAL
ines.oliveira@ulusofona.pt {ogrigore | nmg}@di.fc.ul.pt
ABSTRACT 1. INTRODUCTION
This paper studies the discrimination of electroencephalographic The understanding and use of human physical and physiological
(EEG) signals based in their capacity to identify silent attentive states in computational systems increases the coupling between
visual reading activities versus non reading states. the user and the application behavior. The integration of
The use of physiological signals is growing in the design of physiological signals in applications is relevant in the design of
interactive systems due to their relevance in the improvement of universally-accessible interactive systems and will become more
the coupling between user states and application behavior. relevant as new computing paradigms such as ubiquitous
computing [7] and ambient intelligence [1],[14] develop.
Reading is pervasive in visual user interfaces. In previous work,
we integrated EEG signals in prototypical applications, designed The use of neurophysiological signals, and in particular
to analyze reading tasks. This work searches for signals that are electroencephalograms (EEG), has been widely reported in the
most relevant for reading detection procedures. More specifically, context of an important example of coupled interaction systems:
this study determines which features, input signals, and frequency BCI’s [4],[5],[16]. These interfaces exploring the information at
bands are more significant for discrimination between reading and its source, the brain. EEG signals are frequently chosen because of
non-reading classes. This optimization is critical for an efficient their small temporal resolution and non-invasiveness [9] and also
and real time implementation of EEG processing software due to its relative low cost capture device settings.
components, a basic requirement for the future applications. Visual user interfaces often require reading skills. The users’
reading flow is highly influenced by their concentration and
We use probabilistic similarity metrics, independent of the attention while interacting with applications. The application
classification algorithm. All analyses are performed after visual characteristics and users’ cognitive state can decrease
determining the power spectrum density of delta, theta, alpha, beta readability and degrade the interaction.
and gamma rhythms. The results about the relevance of the input
signals are validated with functional neurosciences knowledge. Augmented reading applications should adapt to the user’s
reading flow through the detection of reading and non-reading
The experiences have been performed in a conventional HCI lab, states. Reading flow analysis also improves the understanding of
with non clinical EEG equipment and setup. This is an explicit the users’ cognitive state while interacting with the applications
and voluntary condition. We anticipate that future mobile and and improves the current empirical style of usability testing [9]. In
wireless EEG capture devices will allow this work to be previous work, we integrated EEG signals in two prototypical
generalized to common applications. applications, designed to analyze and assist reading tasks. These
applications are briefly described further down in this paper.
Categories and Subject Descriptors
H.5.2 [Information Interfaces and Presentation]: User This paper focuses on the discrimination of EEG signals based in
Interfaces – user centered design, evaluation, interaction styles. their relevance with respect to the identification of silent attentive
reading versus non reading tasks, therefore finding the
General Terms importance of each EEG signal for the reading detection
Design, Experimentation, Human Factors, Measurement procedure. The ultimate goal of this study is to allow a robust
selection and weighting of input signals, which we deem critical
Keywords for a feasible, efficient, and real time implementation of EEG
Reading Detection, HCI, EEG Processing and Classification, processing software components, our augmentation approach.
Similarity Metrics, Feature Relevance Measurement. EEG processing literature generally refers feature vectors of some
extent. We have dealt with data dimensionality reduction in the
processing pipeline by using Principal Component Analysis [9].
Permission to make digital or hard copies of all or part of this work for PCA does not consider the spatial distribution of the input signals
personal or classroom use is granted without fee provided that copies are
nor the functional neurosciences knowledge. Neurosciences map
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy cognitive processes into skull areas.. Quantifying the importance
otherwise, or republish, to post on servers or to redistribute to lists, of each input signal in relation to reading detection will help
requires prior specific permission and/or a fee. verifying what electrodes and frequency bands are more involved
AH’10, April 2-4, 2010, Megève, France. in the reading cognitive process, and builds on the functional
Copyright 2010 ACM 978-1-60558-825-4/10/04…$10.00. neurosciences knowledge.
The analysis of EEG signals relevance is performed after The first 5000ms and the last 3000ms of each trial are discarded
determining the power spectrum density (PSD) of delta, theta, for avoiding possible artifacts caused by start-end of the recording
alpha, beta, and gamma rhythms (the known EEG frequencies process. To assure the reliability of capture procedure the
bands) in each of the captured EEG streams. We then apply experiment was also tested using a professional medical capture
probabilistic similarity measures [10], which are independent of device, in use in a Hospital, which setup was entirely prepared
the classification algorithm, in each of these streams to detect the and tuned by expert technicians [9]. The results obtained with
main differences, and to discriminate between visual reading and both capture devices were validated by an EEG specialist and a
non reading activities. All results obtained about the importance consistent set of sample results was produced.
of the input signals are provided and crossed against functional
neurosciences knowledge. 2.1 Read and Not Read Experience
The capture experiments, object of the relevance analysis
Our experiments were performed in a conventional HCI lab, with described in this paper, were based in the presentation of alternate
non clinical EEG capture equipment. This is not a limitation to blank and text screens containing about 40 lines of daily news
overcome but rather a feature and an apriori requirement of our text. The duration of such screens differed according with the
design. Even if the results can be further validated in clinical ability to keep subjects concentrated in the task [9]. Text screens
settings (in vitro), our goal is to address real life situations (in were presented in longer periods (30s) than black screens (20s).
vivo) which have harsher stability, noise and artifact conditions. These types of periods were interlaced: one reading text sample,
We predict future mobile and wireless EEG capture devices will followed by 2 watch-only blank screens, and again back to read.
allow the generalization and extension of this work to common All these periods were captured separately, allowing a small
tools and applications. The broader goal of this work is to design resting period, where the signal was not recorded.
and develop usable and robust software components for Each capture trial included approximately 120s of both sample
integration in interactive systems that reach higher adaptation classes. All data was recorded without any previous or special
levels through this augmentation approach. training in a right handed female subject, mid thirties and without
known vision disabilities (see discussion on this choice in the final
section).
2. EXPERIMENTAL SETTINGS
EEG signals were captured using MindSet-1000, a simple digital 2.2 Assisted Reading Prototypes
system for EEG mapping with 16 channels, connected to a PC In the context of these experiences, we designed simple prototype
using a SCSI-interface. These channels are connected through tools. ReadingTester tests in real time “reading event scripts”,
pure tin electrodes (sensors) to a cap made of elastic fabric, sequences of events with certain duration that are generated by the
produced by Electro-Cap International. application. The subject is exposed to these events, and
simultaneously the EEG is captured and analyzed. A detection
performance report is built when the detection process stops.
Figure 2 shows the electrodes mapping that are used in our study.
The EEG signals are amplified in differential manner relative to
the ear electrodes and are sampled with 256Hz frequency. All
requirements indicated by suppliers and technicians were fulfilled
[9]. These included “grounding” the subjects and keeping the
impedance in each electrode bellow 6000Ω, through the thorough
application of conductive gel.
1
From this similarity-like measure, several dissimilarity 3.3 ANOVA Analysis
coefficients have been derived. Chernoff coeficient (CC) of the To statistically validate our conclusions we performed Variance
order t is defined as [5]: Analysis, also known as ANOVA. It analyses the variation
present in our experiments by statistically testing whether the
statistical parameters of our groups of measures (bands,
This measure is related to KL divergence through its slope at t=0,
electrodes, etc.) are consistent, assuming that the sampled
it is smaller than KL divergence and it is less sensitive than the
populations are normally distributed. If, for instance, this
KL-divergence to outlier values [8].
consistency holds for two electrodes or bands, then we can safely
There is also a special case symmetric metric for t=1/2, named consider them correctly ranked.
Bhattacharyya Coefficient (BC), defined by [10]:
ANOVA results are put into a graphic or table (Figure 4). The
center line in the graphic represents the mean of each group, the
above and below polygon lines, show it’s the mean +/- variance
BC measures the amount of overlap between two probability values and the line segments delimit the confidence interval.
distributions.
3.1.4 Minkowski’s Based Measures
The Minkowski’s Lp distance with p ={1,2,3, …} defined in
[2][5]:
All Minkowski measures are symmetric and differ only in the way
they amplify the effect of outlier values. Minkowski’s distances of
first and second order, L1 and L2 distances, are also known as
Manhattan and Euclidean distance respectively.
30 60
importance Rank
50
Band Average
40
20 30
20
10 10
0
0
FP1 FP2 F7 F3 F4 F8 T3 C3 C4 T4 T5 P3 P4 T6 O1 O2
Average Figure 8. ANOVA for δ(1), θ(2), α(3), β1(4) and γ(5) bands.
Rank Electrode Band
Relevance Rank The average ranks of θ and α were relatively higher and
1 79,4 O1 Alpha
differentiated from the rest of the bands. Γ band performed
2 77,8 P3 Alpha
poorly, showing the lowest rank and widest variation. According
3 74,9 O1 Beta1
4 74,8 P3 Theta with the previous reasoning about ANOVA table results, we can
5 73,9 O2 Alpha also state that the statistical parameters of these groups are
6 73,5 O1 Theta consistent, in spite of the F being close of its critical value.
7 72,4 T5 Alpha To further detail this analysis we performed Multiple
8 69,0 O2 Beta1 Comparisons: a technique that complements ANOVA and looks
9 68,1 O2 Theta for specific significant differences between pairs of groups by
10 66,3 T5 Theta checking the means among them. Figure 9 contains multiple
comparison results for delta, theta, alpha, and beta1. Each line
Table 2. 10 highest average feature relevance. segment represents the comparison intervals of each group.
This ranking reinforces all the previous discussion, because all
these values are located in the left hemisphere, and α and θ are the
most frequent bands. It also shows that the averaging introduced
in the previous analyses may minimize the importance of certain
electrodes, namely P3 that appears twice in the top 10.
5.4 ANOVA Analysis Results
We performed several ANOVA test runs with different groups of
measures, namely: left versus right hemisphere, skull areas, bands,
electrodes, and features.
Erro! A origem da referência não foi encontrada. above
(section 3) presented the ANOVA graphic and table for left and
right hemispheres. These calculations were performed after
averaging the ranks of all features related with each hemisphere.
As we stated before, the results in the table indicate that the Figure 9. Multiple Comparison for δ(1), θ(2), α(3) and β1 (4).
statistical parameters of the analyzed groups are consistent. This
conclusion is reinforced by the graphic, which shows that the
average ranks of both groups are statistically distinct with no δ and β1 bands comparison intervals were significantly different
possible overlap. We can also see that the left hemisphere from the ones determined for θ and α rhythms. This also means
importance is significantly higher than that of the right that θ and α bands were significantly higher and distinct from the
hemisphere. rest of the rhythms and, for this reason, they appear to be more
relevant for classifying reading versus non reading tasks.
Figure 10 below displays the ANOVA result for specific skull
areas: front polar, frontal, central, temporal, occipital, and parietal
regions. These calculations were performed after averaging the
ranks of all features related with each area.
SV SS DF MS F P Crit.F
Between Groups 15849,8 15 1056,7 31,1 5,9E-33 1,8
Within Groups 3810,4 112 34,0
Total 19660,2 127
SV SS DF MS F P Crit.F Figure 11. ANOVA for FP1(1), FP2(2), F7(3), F3(4), F4(5),
Between Groups 4893,1 5 978,6 50,9 9,5E-17 2,4
F8(6), T3(7), C3(8), C4(9), T4(10), T5(11), P3(12), P4(13),
Within Groups 807,4 42 19,2
T6(14), O1(15) and O2(16).
Total 5700,5 47
Figure 10. ANOVA for Frontal Polar (1), frontal(2),
central(3), temporal(4), occipital(5) and parietal(6) areas.
These groups’ statistical parameters are also consistent, as the
previous tables, since F is significantly higher than its critical
value, and P is extremely small. Accordingly, with our previous
results, we obtained average ranks relatively higher and distant
from the remaining regions for the front polar and occipital areas.
We then repeated the ANOVA process for all input signals using
the average ranks in all features related with each electrode (see
Figure 11). We did not discard any input signal at this stage in
order to verify the averaging effect that we could get in the
previous calculations. These results confirmed the previous
discussion about areas. Front polar and occipital electrodes
revealed higher ranks than the remaining electrodes, in spite of
not being distant enough, especially front polar.
Figure 12. Multiple Comparison for FP1(1), FP2(2), F7(3),
The values in the table also confirm these rankings as statistically F3(4), F4(5), F8(6), T3(7), C3(8), C4(9), T4(10),
consistent. F is once more greater that its critical value and P is T5(11), P3(12), P4(13), T6(14), O1(15) and O2(16).
very small. We then applied multiple comparisons to better
analyze differences among electrodes (see Figure 12), and
approximately got three groups, occipital, front polar and the
remaining electrodes. Only for occipital electrodes, the
comparison interval was significantly different from remaining
electrodes group.
Finally, we applied ANOVA to individual features, but reducing
its number to 16 by applying the previous conclusions (see Figure
13). Features were restricted to front-polar and occipital areas, and
we also discarded the γ band.
The table supports that these rankings are statistically consistent,
but we got here the lowest F value. However, F still is greater than
its critical value and the probability of F being smaller than its
critical value is very small (P).
δ band features from both occipital electrodes (9 and 13) worked SV SS DF MS F P Crit.F
poorly and showed a great variability. But the remaining features Between Groups 17535,6 15 1169,0 6,7 4,59E-10 1,8
of these input signals were very concentrated and showed a Within Groups 19584,3 112 174,9
relative distance regarding the rest of the groups. The variation of Total 37119,9 127
front polar related features (from 1 to 8) was more significant, Figure 13. ANOVA for FP1(1 to 4), FP2 (5 to 8), O1(9 to 12)
especially for δ and β1 bands. and O2(13 to 16) with bands δ, θ, α and β1 respectively.
We presented results that reinforce that left hemisphere is
dominant regarding reading tasks. We showed that its input
signals consistently revealed higher dissimilarities between
reading and non-reading samples than its homologues in right
hemisphere. The results also indicated front polar and occipital
areas, especially the latter, as also α and θ bands, related features
as being more relevant that the remaining values. In opposition
the some related work [12],[13], γ and δ bands results consistently
performed poorly. In summary, we can state that:
For EEG-based silent reading detection, use mainly O1(θ,α,β1)
and O2(α)
8. REFERENCES [11] Popescu, F. Siamac, F., Badower, Y., Blankertz, B., Mu ller,
K., “Single Trial Classification of Motor Imagination Using
[1] Aarts, E., Encarnação, J., True Visions, The Emergence of 6 Dry EEG Electrodes”. PLoS , ONE 2(7): e637, 2007.
Ambient Intelligence, Springer, 2006.
[12] Shlens, J., “Notes on Kullback-Leibler Divergence and
[2] Bizas, E., Simos, G., Stam, C.J., Arvanitis, S., Terzakis, D., Likelihood Theory”, Systems Neurobiology Laboratory, Salk
Micheloyannis, S. EEG Correlates of Cerebral Engagement Institute for Biological Studies, La Jolla, CA 92037, 2007.
in Reading Tasks, Brain Topography, Vol. 12, 1999.
[13] Steinberg, R. J., Cognitive Psychology, Thomson
[3] Oliveira, I., Lopes, R., Guimarães, N. M., Development of a Wandsworth, 2003.
Biosignals Framework for Usability Analysis (Short Paper),
ACM SAC´09 HCI Track, 2009. [14] Streitz, N., Kameas, A., Mavromatti, I., The Disappearing
Computer: Interaction Design, System Infrastructures and
[4] Wolpaw, J. R. et al., “Brain–Computer Interface Technology: Applications for Smart Environments, Springer, 2007
A Review of the First International Meeting”, IEEE
Transactions on Rehabilitation Engineering, Vol. 8, 2000. [15] Topsøe , F., Jensen-Shannon Divergence and norm-based
measures of Discrimination and Variation, Technical report,
[5] Millán, J.R., “Adaptative Brain Interfaces”, Communications Department of Mathematics, University of Copenhagen,
of the ACM, 2003. 2003.
[6] Jung, J., Mainy,N., Kahane,P., Minotti, L., Hoffmann, D., [16] Z.A. Keirn, J. I. Aunon, “A New Mode of Communication
Bertrand, O., Lachaux, J., ” The Neural Bases of Attentive between Man and His Surroundings”, IEEE Transactions on
Biomedical Engineering, Vol. 37, 1990.
Brain Computer Interfaces for Inclusion
P. J. McCullagh M.P. Ware G. Lightbody
Computing & Engineering, Computing & Engineering, Computing & Engineering,
University of Ulster University of Ulster University of Ulster
Shore Road, Jordanstown, Shore Road, Jordanstown, Shore Road, Jordanstown,
Co. Antrim BT37 0QB, UK Co. Antrim BT37 0QB, UK Co. Antrim BT37 0QB, UK
Tel: +44 (0)2890368873 Tel: +44 (0)28 90366045
g.lightbody@ulster.ac.uki
pj.mccullagh@ulster.ac.uk mp.ware@ulster.ac.uk
2. ARCHITECTURE DESIGN
3. The status of devices as they join or leave the smart homes The IGUI and the UAI share access to a common menu
network is updated via the xml menu definition file; where structure. This menu is implemented in static xml with a
devices are either enabled or disabled as appropriate. The separate parsing module. The structure as implemented is
IGUI is informed as device status is modified so that the hierarchical, however, for future implementations, it is possible
menu can be re-parsed and the display updated accordingly. to declare traversal paths in-order to provide short-cuts, for
instance return to a high level menu on completion of a
4. For the purpose of receiving incoming messages the IGUI sequence of tasks. The current menu details several locations:
implements two listening threads. One dedicated to back garden, two bedrooms, bathroom, dining room, front
listening for incoming BCI2000 data packets – on thread garden, hall, kitchen, living room. Devices are grouped
UDPListener and one dedicated to listening for according to each room. Where devices are in a communal area,
incoming UAI redisplay events and unpackaged BCI2000 the user’s menu declaration lists the communal room and the
communications on thread EventQueueListener. device. A user’s menu declaration will not normally list menu
Clearly, it does not make sense to allow the user to issue a items which are only of significance to another user. When the
command as the menu display is being up-dated. The UAI detects that a device is available to the network, the device
device that the user may wish to interact with may no status on the menu declaration will be updated to ‘enabled’.
longer be available; neither does it make sense for the menu Provision within the xml declaration has also been made such
to be redisplayed at the same time a user command is being that, should a device be judged to be sufficiently significant it
processed as the outcome may affect the menu display. For can be made constantly available, through the use of a ‘sticky’
this reason mediation has to take place between events status to indicate permanent on screen display. It is also
raised on either thread and each event is processed possible to use a non location based groupings such as ‘general’
sequentially. to collect devices and applications together which do not have a
Interaction with BCI2000 is based upon the reception and single location, for example spellers and photo albums. Should
sending of data packets. The Internet Protocol (IP) address and the interface be used for some other purpose it is possible to
communication port of a computer supporting BCI2000 is implement a different classification mechanism, for instance
known to the IGUI. Using these, a thread is initiated for the grouping by functionality, or if necessary no classification
purpose of listening for incoming packets. On packet reception, mechanism. This is done by simply using replacing the xml
the data is unpacked and the nature of the incoming user declaration. Where devices or locations are to be added, the xml
command determined. The appropriate message is placed on the declaration can be expanded accordingly.
EventQueue. The UAI may also write an event to the The sample xml declaration (below) lists two forms of item.
EventQueue, essentially the UAI indicates when and how the ‘Node’ items have sub-item declarations (e.g., Bedroom1).
menu should be re-parsed and redisplayed. The IGUI ‘Leaf’ items are used to associate a specific physical device or
EventQueueListener monitors the EventQueue length. software package to a device/package interface command, e.g.
If an event is detected in the queue the IGUI reads the event, x10Light1. All menu items have an associated graphical
instigates the appropriate processing and removes the event from display. The location of the graphics file is declared in the icon
the queue. tag of the menu item in the xml declaration. Currently the menu
implementation uses static xml. Provision has been made in the
IGUI interface for the passing of an object containing a similar command arrows presented on the screen point to four
xml declaration for dynamic content. Dynamic content of the peripheral LEDs. For the purpose of interface testing and to
same format can be parsed and displayed using existing provide a potential support facility for the carers of users, the
mechanisms. Dynamic xml is relevant where content may be four command arrows can be activated using a standard mouse.
subject to frequent change, such as listing available files on a Under different BCI paradigms arrows would still be present on
media server (e.g. movie titles). the screen but they would function in a slightly different
manner. Under P300 it is anticipated that an additional module
<menu_list_item>
<label>Bedroom1</label> would govern the time sequenced animation of the arrows,
<enabled>True</enabled> thereby providing a synchronised stimulus for the ‘odd-ball’
<sticky>False</sticky> effect. Alternatively voluntary responses can be used to control
<icon>Bedroom1.jpg</icon> cursor movement towards arrows (ERD/ERS intended
<on_selection> movement paradigm).
<menu_list_item>
<label>Lighting</label>
<device_Id>x10Light1</device_Id>
<enabled>True</enabled>
<sticky>False</sticky>
<icon>Bedroom1/Lighting.jpg</icon>
<on_selection>
<command>BinaryLight.
toggle_power</command>
</on_selection>
</menu_list_item>
</on_selection>
</menu_list_item>
Figure 3: Intuitive Graphical Command Interface for
BRAIN Project Application
The xml menu declaration represents menu content. The user
needs a mechanism for manipulating content in as an effective The screen displays location level menu items. At this menu
manner as possible, in order to traverse the menu hierarchy and level the icons are photographs which in the real world context
to pass correctly formulated commands to the UAI. Currently could relate to the users own home. The use of photographs and
the supported BCI paradigm implements high-frequency SSVEP real world images are intended to make the interface use more
as a mechanism for user interaction, but it is anticipated that intuitive to the user and to reduce the cognitive load of
other BCI paradigms (‘oddball’ stimulus and intended interacting with the menu structure. At a lower menu level
movement) will be supported over time. Studies have reported general concept images have been used, albeit still in a
that up to 48 LEDs can be used. These operated between 6- photographic format. Specifically, individual lights are not
15Hz at increments of 0.195Hz [22]. However, making this represented; instead a universal image of a light bulb is used. It
many signals available to the user in a meaningful manner using is felt that at this menu level the user will already have grasped
a conventional screen interface requires a degree of mapping the intent of the interface. Furthermore, the concept of device
which may be beyond both the interface and beyond the user’s interaction at the command level is made universal by this
capabilities and inclinations. Similarly, many devices (cameras, approach, such as a tangibly visible interaction of turning a light
mp3 players, printers) with restricted input capabilities use a on and an invisible interaction such as turning the volume of a
four-way command mapping as an interface of choice. Using device up. Once again the flexibility of the application is
such a command interface it is possible to cycle through lists of demonstrated. The intuitive feel of the interface can be
menu items (left/right), to select commands or further menu modified by simply replacing the graphics files. For instance, a
option (down), and to reverse selections or exit the system (up). ‘younger’ look and feel can be obtained by replacing
Using less than four commands produces an exponential photographic representations with cartoon style drawings. On
command burden upon the user as cycle commands (left/right) screen display of icons are supported by associated labels, these
increase and selection commands (down) cannot be applied in a are represented by tags in the xml menu declaration. The labels
single action. It was decided that a four-way command interface are used to make the meaning explicit; however the interface has
would be optimal. The LEDs are placed at the periphery of the been devised in such a way as to ensure that literacy is not
screen with command icons central to the display [23]. required.
3. Application Interface
The command interface Figure 3, displays the icons relating to
three menu items central to the screen. The central icon
Smart Homes are environments facilitated with technology that
represents the current menu item; as such it is highlighted using
act in a protective and proactive function to assist an inhabitant
a lighter coloured frame. Icons to either side, provide list
in managing their daily lives specific to their individual needs.
orientation to the user, to suggest progression and to suggest
Smart homes technology has been predominately applied to
alternative options. Under the current SSEVP paradigm the four
assist with monitoring vulnerable groups such as people with specification, offering the possibility of ‘wrapping’ other
early stage dementia,[24] and older people in general [25] by technologies (e.g. where a device is not UPnP compliant). UPnP
optimising the environment for safety. The link between BCI enables data communication between any two devices under the
and Smart homes is obvious, as it provides a way to interact command of any control device on the network. UPnP
with the environment using direct brain control of actuators. Our technology can run on any medium (category 3 twisted pairs,
contribution uses a software engineering approach, building an power lines (PLC), Ethernet, Infra-red (IrDA), Wi-Fi,
architecture which connects the BCI channel to the standard Bluetooth). No device drivers are used; common protocols are
interfaces used in Smart Homes so that control, when used instead.
established, can be far reaching and tuned to the needs of the
The UPnP architecture supports zero-configuration, invisible
individual, be it for entertainment, assistive devices or
networking and automatic discovery, whereby a device can
environmental control. Thus a link to the standards and
dynamically join a network, obtain an IP address, announce its
protocols used in Smart Home implementations is important.
name, convey its capabilities upon request, and learn about the
A BCI-Smart Home channel could allow users to realize several presence and capabilities of other devices. Dynamic Host
common tasks. For instance, to switch lights or devices on/off, Configuration Program (DHCP) and Domain Name servers
adjust thermostats, raise/lower blinds, open/close doors and (DNS) are optional and are only used if they are available on the
windows. Video-cameras could be used to identify a caller at the network. A device can leave a network smoothly and
front door, and to grant access, if appropriate. The same automatically without leaving any unwanted state information
functions achieved with a remote control could be realized (i.e. behind. UPnP networking is based upon IP addressing. Each
for a television, control the volume, change the channel, ‘mute’ device has a DHCP client and searches for a server when the
function). In a media system, the user could play desired music device is first connected to the network. If no DHCP server is
tracks. available, that is, the network is unmanaged, the device assigns
itself an address. If during the DHCP transaction, the device
3.1 Standards and Protocols obtains a domain name, the device should use that name in
subsequent network operations; otherwise, the device should use
its IP address.
While the underlying transmission media and protocols are
largely unimportant from a BCI user perspective, the number of Open Source Gateway Interface (OSGi ) is middleware for the
standards provides an interoperability challenge for the software Java platform. OSGi technology provides a service-oriented,
engineer. Open standards are preferred. A number of standards component-based environment for developers and offers a
bodies are involved; the main authorities are Institute of standardized ways to manage the software lifecycle. The OSGi
Electrical and Electronics Engineers (IEEE), International platform allows building applications from components. Two
Telecommunication Union (ITU-home networking) and (or more components) can interact through interfaces explicitly
International Standards Organisation (ISO). Industry provides declared in configuration files (in xml). In this way, OSGi is an
additional de-facto standards. Given the slow ‘user channel’, enabler of expanded modular development at runtime. Where
BCI interaction with the control aspects of domotic networks modules exist in a distributed environment (over a network),
requires high reliability with available bit-rate transmission, web services may be used for implementation. The OSGi UPnP
being of much lesser importance. Service maps devices on a UPnP network to the Service
Registry.
Domotic standards for home automation are based on either
wired or wireless transmission. Wired is the preferred mode for 3.2 Interoperability with existing smart
‘new’ build Smart Homes, where an information network may
be installed as a ‘service’ similar to electricity or mains water home interface
supply. Wireless networks can be used to retrofit existing It is important that the architecture developed can interoperate
buildings, are more flexible, but are more prone to domestic with existing and future assistive technology. A BRAIN partner,
interference, overlap and ‘black spots’, where communication is The Cedar Foundation, has sheltered apartments (Belfast) which
not possible. Wireless networks normally use work using radio are enabled for non-BCI Smart Home control. Each apartment is
frequency (RF) transmission and can use Industrial Scientific fully networked with the European Installation Bus (EIB) for
and Medicine (ISM) frequencies (2.4GHz band) or proprietary home and building automation [26]. Into this, peripherals are
frequencies and protocols. Infra-red uses higher frequencies connected which can be operated via infra-red remote control
which are short range and travel in straight lines (e.g. remote [27]. These peripherals when activated carry tasks that tenants
control for television control). are not physically able to perform. Examples include door
access, window and blind control, heating and lighting control
The Universal Plug and Play (UPnP) architecture offers and access to entertainment. Whilst this was ‘state of the art’
pervasive peer-to-peer network connectivity of PCs, intelligent technology at the time of development, KNX has replaced EIB
appliances, and wireless devices. UPnP is a distributed, open as the choice for open standard connectivity. This reinforces the
networking architecture that uses TCP/IP and HTTP protocols need for interoperability within a modular architecture, if BCI is
to enable seamless proximity networking in addition to control to be introduced to the existing configuration.
and data transfer among networked devices in the home. UPnP
does not specify or constrain the design of an API for
applications running on control points. A web browser may be
used to control a device interface. UPnP provides interoperable
3.3 A Universal Application Interface replacing a communication wrapper the IGUI could interface
The Universal Application Interface (UAI) aims to interconnect with a different BCI package, or by replacing the UAI the IGUI
heterogeneous devices from different technologies integrated in could be harnessed for other control purposes, for example
the home network, and to provide common controlling interface driving a robot. It is also possible for the IGUI to be substituted
for the rest of the system layers. Figure 4 illustrates how the and for a different command interface to call the services of the
interfaces between BCI2000, IGUI, menu definition and UAI. UAI.
UAI control is based on UPnP specification, which provides The UAI is also flexible, additional standards can be added
protocols for addressing and notifying, and provides without modifying the core command processing and device
transparency to high level programming interfaces. The UAI handling modules. By incorporating new standards it is possible
maps requests to events, generates the response to the user’s to interact with an increasing number of devices without
interaction, and advertises applications according device. The radically modifying other aspects of the application device
UAI infers the device type and services during the discovery architecture. By presenting an architecture which facilitates the
process, including the non-UPnP devices, which can be wrapped up-grading of existing standards it is also possible to interact
as UPnP devices with the automatic deployment of device with existing devices in a more efficient manner.
proxies.
i
Additional authors: M.D..Mulvenna, H.G..McAllister, C.D..Nugent, Faculty of Computing and Engineering, University of Ulster,
Shore Road, Jordanstown, Co. Antrim BT37 0QB, UK
Emotion Detection using Noisy EEG Data
ABSTRACT 1. INTRODUCTION
Emotion is an important aspect in the interaction between Over the past two decades, there has been an increasing
humans. It is fundamental to human experience and ratio- interest in developing systems that will detect and distin-
nal decision-making. There is a great interest for detecting guish people’s emotions automatically. Emotions are funda-
emotions automatically. A number of techniques have been mental to human experience, influencing cognition, percep-
employed for this purpose using channels such as voice and tion, and everyday tasks such as learning, communication,
facial expressions. However, these channels are not very and even rational decision-making. However, studying emo-
accurate because they can be affected by users’ intentions. tions is not an easy task, as emotions are both mental and
Other techniques use physiological signals along with elec- physiological states associated with a wide variety of feel-
troencephalography (EEG) for emotion detection. However, ings, thoughts, and behaviors [15].
these approaches are not very practical for real time appli- Many have attempted to capture emotions automatically.
cations because they either ask the participants to reduce Developing computerized systems and devices that can au-
any motion and facial muscle movement or reject EEG data tomatically capture human emotional behavior is the pur-
contaminated with artifacts. In this paper, we propose an pose of affective computing. Affective computing attempts
approach that analyzes highly contaminated EEG data pro- to identify physiological and behavioral indicators related
duced from a new emotion elicitation technique. We also to, arising from or influencing emotion or other affective
use a feature selection mechanism to extract features that phenomena [14]. It is an interdisciplinary field that requires
are relevant to the emotion detection task based on neuro- knowledge of psychology, computer science and cognitive sci-
science findings. We reached an average accuracy of 51% ences.
for joy emotion, 53% for anger, 58% for fear and 61% for Because of its many potential applications, affective com-
sadness. puting is a rapidly growing field. For example, emotion
assessment can be integrated in human-computer interac-
Categories and Subject Descriptors tion systems in order to make them more comparable to
human-human interaction. This could enhance the usabil-
I.5.2 [Pattern Recognition]: Design Methodology—Clas- ity of systems designed to improve the quality of life for
sifier design and evaluation, Feature evaluation and selec- disabled people who have difficulty communicating their af-
tion fective states. Another emerging application that makes use
of emotional responses is to quantify customer’s experiences.
Keywords Automated prediction of customer’s experience is important
Affective Computing, Brain Signals, Feature Extraction, Sup- because the current evaluation methods such as relying on
port Vector Machines customers’ self reports are very subjective. People are not al-
ways feeling comfortable revealing their true emotions. They
may inflate their degree of happiness or satisfaction in self
reports [21].
Permission to make digital or hard copies of all or part of this work for There are two main approaches for eliciting participants’
personal or classroom use is granted without fee provided that copies are emotions. The first method presents provoking auditory or
not made or distributed for profit or commercial advantage and that copies visual stimulus to elicit specific emotions. This method is
bear this notice and the full citation on the first page. To copy otherwise, to
used by almost all studies in literature [9, 17, 2, 1, 18,
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. 13, 19]. The second approach builds on the facial feedback
AH Augmented Human Conference, April 2-3, 2010, Megève, France. paradigm which shows that facial expressions are robust elic-
Copyright ° c 2010 ACM 978-1-60558-825-4/10/04 ...$10.00.
itors of emotional experiences. In the famous Strack, Martin indicators of emotion because they can either be faked by
& Stepper’s study [20], Strack, Martin & Stepper attempted the user or may not be produced as a result of the detected
to provide a clear assessment of the theory that voluntary emotion.
facial expressions can result in an emotion. Strack, Martin, Based on the cognitive theory of emotion, the brain is the
& Stepper [20] devised a cover story that would ensure the center of every human action [17]. Consequently, emotions
participants adopt the desired facial posing without being and cognitive states can be detected through analyzing phys-
able to perceive either the corresponding emotion or the re- iological signals that are generated from the central nervous
searchers’ real motive. Each participant was asked to hold a system such as brain signals recorded using EEG. However,
pen in his mouth in different ways that result in different fa- there is not much work done in this area of research. Thanks
cial poses. Participants who held a pen resulting in a smile to the success of brain computer interface systems, a few
reported a more positive experience than those who held new studies have been done to find the correlation between
the pen in a position that resulted in a frown. This study different emotions and EEG signals. Most of these studies
was followed by different psychologists including Ekman et combine both EEG signals with other physiological signals
al. [6] who found that emotions generated with a directed generated from the peripheral nervous system [1, 2].
facial action task results in a finer distinction between emo- One of the earliest attempts to prove that EEG signals
tions. However, this approach contaminates brain signals can be used for emotion detection is proposed by Chanel et
with facial muscle artifacts and that’s why this approach is al [2]. Chanel et al [2] tried to distinguish among excitement,
not conceived by computer scientists. neutral and calm signals. They compared the results of three
We decided to explore this second approach because it emotion detection classifiers. The first one was trained on
helps making our system close to actual real time emotion EEG signals, the second classifier was trained on peripheral
detection systems since there will be lots of facial muscle signals such as body temperature, blood pressure and heart
movements and other artifacts that will contaminate EEG beats. The third classifier was trained on both EEG and
data. peripheral signals. In order to stimulate the emotion of in-
Our work extends existing research in three principal ways. terest, the user is seated in front of a computer and is viewed
First, we are the first in the computer science field to use an image to inform him/her which type of emotion s/he has
voluntary facial expression as a means for enticing emotions. to think of. They then captured the signals from 64 differ-
Although this contaminates EEG with noise, it helps to test ent channels that cover the whole scalp in order to capture
our approach on unconstrained environment where the users signals in all the rhythmic activity of the brain neurons. As
were not given any special instructions about reducing head for feature extraction, they transformed the signal into the
motions or facial expressions which makes our dataset close frequency domain and use the power spectral as the EEG
to a real time application. Second, we are using a new tech- features. Finally, they used a Naive Bayes Classifier which
nique for selecting features that are relevant to the emo- resulted in an average accuracy of 54% compared to only
tion detection task that is based on neuroscience findings. 50% for a classifier trained on physiological signals. The ac-
Finally, we tested our approach on a large dataset of 36 curacy of combining both types of signals resulted in a boost
subjects and we were able to differentiate between four dif- of accuracy that reached up to 72%. The problem with the
ferent emotions with an accuracy that ranges from 51% to research done by Chanel et al [2] is the idea of using 64 chan-
61% which is equal or higher than other related works. nels for recording EEG as well as other electrodes to capture
The paper is organized as follows: section 2 surveys re- physiological signals make this approach impractical to be
lated work on different channels used for emotion detection, used in real time situation.
especially those that use EEG. Section 3 discusses the cor- Ansari et al [1] improved the work done by Chanel et
pus of EEG we are using and how different emotions are al [2]. They proposed using Synchronization Likelihood (SL)
elicited. Section 4 shows the different noise sources that method as a multichannel measurement which allowed them
contaminates EEG signals. Section 5 gives an overview of along with anatomical knowledge to reduce the number of
our methodology for emotion detection using EEG. Exper- channels from 64 to only 5 with a slight decrease in accu-
imental evaluation and results are presented in section 6. racy and huge improvement in performance. The goal was
Section 7 concludes the paper and outlines future directions to distinguish between three emotions which are exciting-
in the area of emotion detection using EEG. positive, exciting-negative and calm. For signal acquisition,
they acquired the signal from (AFz, F4, F3, CP5, CP6). For
2. RELATED WORKS feature extraction, they used sophisticated techniques such
as Hjorth Parameters and Fractal Dimensions and they then
There is much work done in the field of emotion and cog-
applied Linear Discriminant Analysis (LDA) as their classi-
nitive state detection by analyzing facial expressions or/and
fication technique. The results showed an average accuracy
speech. Some of these systems showed a lot of success such
of 60% in case of using 5 channels compared to 65% in case
as those discussed in [7, 10]. The system proposed by El
of using 32 channels.
Kaliouby and Robinson [7] uses an automated inference of
A different technique was taken by Musha et al [13]. They
cognitive mental states from observed facial expressions and
used 10 electrodes (FP1,FP2, F3, F4, T3, T4, P3, P4, O1,
head gestures in video. Whereas, the system proposed by
and O2) in order to detect four emotions which are anger,
Kim et al [10] makes use of multimodal fusion of different
sadness, joy and relaxation. They rejected frequencies lower
timescale features of the speech. They also, make use of
than 5 Hz because they are affected by artifacts and fre-
the meaning of the words to infer both the angry and neu-
quencies above 20 Hz because they claim that the contribu-
tral emotions. Although facial expressions are considered to
tions of these frequencies to detect emotions are small. They
be a very powerful means for humans to communicate their
then collected their features from the theta, alpha and beta
emotions [21], the main drawback of using facial expressions
ranges. They performed cross correlation on each channel
or speech recognition is the fact that they are not reliable
collected in the university of Arizona by Coan et al. [4].Tin
(a) (c)
electrodes in a stretch-lycra cap (Electrocap, Eaton, Ohio)
were placed on each participant’s head. EEG was recorded
at 25 sites( FP1, FP2 F3, F4, F7, F8, Fz, FTC1, FTC2, C3,
C4, T3, T4, TCP1, TCP2, T5, T6, P3, P4, Pz, O1, O2, Oz,
A1, A2) and referenced online to Cz.
3.1 Participants
(d)
(b)
This database contains EEG data recorded from thirty-six
participants (10 men and 26 women). All participants were
right handed. The age of the participants ranged from 17 to
24 years, with a mean age of 19.1. The ethnic composition of
the sample was 2.7% African American, 2.7% Asian, 18.9%
Hispanic, and 75.7% Caucasian.
feel like taking any kind of action, like doing anything? If Extracting Alpha Band
anything was reported, participants were then asked to rate
its intensity on a scale of 1 to 7 (1 = no experience at all; 7 Classification
= an extremely intense experience).
For each participant, we have four files indicating one of
the four emotions. Each file has two minutes of recording. Figure 2: Multistage approach for emotion detection
These two minutes are not fully representing emotions. Hu- using EEG.
man coders were used to code the start and the end of each
emotion. We used a one minute of recording between the
start and the end. In order to have more than one file for near blood vessels, the data coming from them will be af-
each emotion for each participant, we worked on two 30- fected by the heartbeat.
second epoches. Sweat artifacts can affect the impedance of the electrodes
used in recording the brain activity. Subsequently, the data
recorded can be noisy or corrupted. These different types of
4. TYPES OF NOISE noise make the processing of EEG a difficult task especially
in real time environment where there is no control over the
4.1 Technical Artifacts environment or the subject.
The technical artifacts are usually related to the environ- Our dataset is largely contaminated with facial muscle
ment where the signals are captured. One source of technical artifacts. Despite this highly noisy dataset, we are trying to
noise is the electrodes itself [12]. If the electrodes are not achieve reasonable detection accuracy for the four emotions,
properly placed over the surface of the scalp or if the re- anger, fear, joy and sad , and low false positive so that we
sistance between the electrode and the surface of the scalp can integrate our emotion detection approach into real time
exceeds 5 kohm, this will result in huge contamination of affective computing systems.
the EEG. Another source of technical artifact is the line
noise. This noise occurs due to A/C power supplies which 5. APPROACH FOR EMOTION DETECTION
may contaminate the signal with 50/60 Hz if the acquisition
electrodes are not properly grounded. Our EEG database is USING EEG
contaminated with the 60 Hz frequency. As shown in Fig. 2, we use a multilevel approach for ana-
lyzing EEG to infer the emotions of interest. The recorded
4.2 Physiological Artifacts EEG recorded signals are first passed through the signal
Another sources of noise are the physiological artifacts. preprocessing stage in which the EEG signals are passed
Those physiological artifacts include eye blinking, eye move- through number of filters for noise removal. After that rel-
ments, Electromyography (EMG), motion, pulse and sweat evant features are extracted from the signals and finally we
artifacts [12]. use support vector machines for classification.
The problem in eye blinks is that it produces a signal
with a high amplitude that is usually much greater than the 5.1 Signal Preprocessing
amplitude of the EEG signals of interest. Eye movements Fig. 2 shows the three stages that EEG data is passed
are similar to or even stronger than eye blinks. through during the signal preprocessing stage. Our EEG
The EMG or muscle activation artifact can happen due to data are referenced online to Cz. According to the recom-
some muscle activity such as movement of the neck or some mendation of Reid et al [16] who pointed out that this ref-
facial muscles. This can affect the data coming from some erence scheme did not correlate particularly well, an offline
channels, depending on the location of the moving muscles. average reference is performed for the data by subtracting
As for the motion artifact, it takes place if the subject is from each site average activity of all scalp sites. To reduce
moving while EEG is being recorded. The data obtained can the amount of data to be analyzed, our data is downsampled
be corrupted due to the signals produced while the person from 1024 Hz to 256 Hz.
is moving, or due to the possible movement of electrodes. Our dataset is largely contaminated with facial muscle and
Another involuntary types of artifacts are pulse and sweat eye blink artifacts. Moreover, there are segments that are
artifacts. The heart is continuously beating causing the ves- highly contaminated with artifacts and are marked for re-
sels to expand and contract; so if the electrodes are placed moval. Instead of rejecting such segments, we included them
2 sec 100 Percentage
90
1 sec 80
70
60
50 Alpha linear kernel
30 seconds
40
30 Alpha + asymmetry linear kernel
20
10
0 Run Number
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Figure 3: Applying FFT to overlapping windows. Figure 4: A comparison of the classification accuracy
of joy emotion using a linear SVM kernel on two
different feature selection criteria.
in our analysis so that our approach can be generalized to
real time applications. Since most of the previously men- 100
Percentage
tioned artifacts appear in low frequencies, we used a band 90
80
pass finite impulse response filter that removed the frequen- 70
60
overall
cies below 3 Hz and above 30 Hz. 50
40 absence of joy
30
presence of joy
5.2 Feature Extraction 20
10
0 Run number
Our approach divides each 30 sec data epoch into 29 win- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
5.2.1 Feature Reduction Using Alpha Band of feature vectors, SVMs attempt to find a hyperplane such
We made use of the study made by Kostyunina et al. [11] that the two classes are separable and given a new feature
in order to reduce our feature set. Kostyunina et al. [11] vector, SVMs try to predict to which class this new feature
showed that emotions such as joy, aggression and intention vector belongs to.
result in an increase in the alpha power whereas, emotions SVMs view the input data, FFT features, as two sets of
such as sorrow and anxiety results in a decrease in the alpha vectors in a n-dimensional space. SVM will construct a sep-
power. As a result of this conclusion, we focused our feature arating hyperplane in that space that maximizes the mar-
extraction on the power and phase of the alpha band only gin between the two data sets. A good hyperplane will be
which ranges from 8 Hz to 13 Hz for the 25 channels. We the one that has the highest distance to different points in
used other features such as the mean phase, the mean power, different classes [8]. We built eight different binary classi-
the peak frequency, the peak magnitude and the number fiers. For each emotion, we used two different classifiers, the
of samples above zero. Making use of the study made by first classifier is trained on the alpha band extracted features
Kostyunina et al. [11] helped in decreasing the number of only. The second classifier is trained on scalp asymmetries
features from 146025 to 10150 features. extracted features. For each classifier, we used linear, poly-
nomial and radial kernels.
5.2.2 Feature Reduction Using EEG Scalp Asymme-
tries
Another important research that we made use of in order 6. EXPERIMENTAL RESULTS
to reduce our feature set is the research done by Coan et The experiment included 36 participants with 265 sam-
al. [4]. Coan et al. [4] showed that positive emotions are ples (66 samples representing joy, 64 samples representing
associated with relatively greater left frontal brain activ- sadness, 65 samples representing fear and 70 samples repre-
ity whereas negative emotions are associated with relatively senting anger). We started by building a joy emotion classi-
greater right frontal brain activity. They also showed that fier on which all the samples representing joy are considered
the decrease in the activation in other regions of the brain positive samples and all other samples represent negative
such as the central, temporal and mid-frontal was less than samples.
the case in the frontal region. This domain specific knowl- Six different classifiers were built, two classifiers with lin-
edge helped us in decreasing the number of features from ear kernel for each set of features, two classifiers with ra-
10150 to only 3654 features. dial kernel for each set of features and two classifiers with
The asymmetry features between electrodes i and j at fre- polynomial features for each set of features. The SVM classi-
quency n are obtained using the following equation fiers with polynomial did not converge whereas the classifiers
c(n, i, j) = Xi (fn ) − Xj (fn ) with radial kernel resulted in a very low accuracy of almost
0 %.
in which Xi (fn ) is the frequency power at electrode i and To test our classifiers, we used 20-fold cross validation in
the nth bin. This equation is applied to scalp symmetric which we divided our 265 samples into testing samples (10%)
electrodes only such as (C3, C4), (FP1, FP2)...etc. and training samples (90%) which means that the samples
we used for training are different from those used for testing.
5.3 Support Vector Machines (SVMs) We repeated this approach 20 times during which the testing
For classification, we used support vector machines (SVMs). and training samples were selected randomly and we made
SVM is a supervised learning technique. Given a training set sure that the training and testing samples are different in
Percentage Percentage
100.00 100.00
90.00 90.00
80.00
80.00
70.00
70.00 60.00
60.00 overall overall
50.00
50.00 absence of anger 40.00 absence of fear
40.00 30.00
presence of anger 20.00 presence of fear
30.00
10.00
20.00 Run number
0.00
10.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.00 Run number
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
compares the true positive, false negative and overall detec- 10.00
0.00 Run number
out that the use of the alpha band combined with EEG scalp
difference resulted in a better detection accuracy than using Figure 8: A comparison of the classification accuracy
the alpha band only. This again proves that using neuro- of sad emotion using a linear SVM kernel on the
science findings in feature selection helps in decreasing the second set of features (alpha band + asymmetry).
size of the feature set and results in better classification ac-
curacies. Also, we found out that the radial kernel for both
types of features resulted in 0 % accuracy for joy and in a in 48% of the samples. In this work, we did not ignore the
very high classification accuracy of almost 100% for the not samples for which self reports did not match the elicited
joy emotion. The average detection accuracy is 51% and emotions. It may have increased the accuracy if we used the
83% for the presence of joy and not joy emotion respectively samples for which the participants have felt and reported
using linear kernel. the same emotion as the intended one. Also, the accuracy
Fig. 5 shows the average overall detection accuracy, av- may be affected if the samples used are the ones that the
erage detection accuracy of the presence of the joy and the participants reported the emotions with high intensities.
average detection accuracy of the absence of the joy emotion Table 2 shows a comparison of the average detection accu-
using a linear SVM kernel. The average overall detection ac- racy for the four emotions. For each emotion, we are report-
curacy represents the number of correctly classified samples ing the results of the linear SVM kernel on two feature sets,
that represent joy or not joy divided by the total number of using the alpha band only and using the alpha band along
testing samples which is 27 samples, 10% of the total number with scalp asymmetries. For each feature set, the percentage
of samples. The average detection accuracy of the presence of presence of joy, for instance, is computed as:
of joy is the number of correctly classified samples that rep- N
X
resent joy divided by the total number of joy samples in the ( F (i)) ∗ 100/N
testing set. Finally, the average detection accuracy of the i=1
absence of joy is the number of correctly classified samples
where F(i) is 1 if the joy sample number i was correctly
that represent not joy divided by the total number of not
classified and 0 otherwise. N is the number of all the joy
joy samples in the testing set. From the graph, it can be
samples in the 20 different runs. The overall accuracy is
deducted that the false negative is in the range of 77% to
the number of samples whether it is joy or not joy that are
95% which means that the false positive is very low in the
correctly classified divided by the total number of samples
range of 5 % to 23 %.
in the 20 different runs.
We applied the same approach for building classifiers for
It is observed that the accuracy of the linear kernel for the
anger, fear and sad emotions. Fig. 6, Fig. 7, Fig. 8 shows
second feature set (alpha + asymmetry) is higher than the
the classification accuracies of the linear SVM kernel for the
linear kernel for the first feature set (alpha band only) in joy,
second set of features (alpha + asymmetry) for anger, fear
anger and sad emotions. Whereas, the detection accuracy
and sad emotions respectively.
for the linear kernel for the first feature set (alpha band
The reason why the accuracies of anger, fear, joy and
only) is higher in the fear emotion than the linear kernel of
sadness range from 30% to 72.6% can be explained by the
the second feature set (alpha + asymmetry).
fact that voluntary facial expressions may affect the emo-
tional state of people differently and with different inten-
sities. Coan and Allen [3] who experimented on the same 7. CONCLUSION
dataset, reported that the dimensions of experience vary as a The goal of this research is to study the possibility of clas-
function of specific emotions and individual differences when sifying four different emotions from brain signals that were
compared self reports against the intended emotions to be elicited due to voluntary facial expressions. We proposed
elicited with certain facial expressions. Table 1 shows the an approached that is applied on a noisy database of brain
report rates for different emotions. Table 1 can show that signals. Testing on large corpus of 36 subjects and using
self reports were different from the the intended emotions two different techniques for feature extractions that rely on
[7] R. El Kaliouby and P. Robinson. Mind reading
Table 1: Self Report Rates by Emotion. The rate machines: automated inference of cognitive mental
column reflects the percentage that self reports were states from video. In 2004 IEEE International
the same as the target emotion. Conference on Systems, Man and Cybernetics,
Emotion Rate
Anger 65.7% volume 1.
Fear 61.8% [8] S. Gunn. Support Vector Machines for Classification
Joy 50.0% and Regression. ISIS Technical Report, 14, 1998.
Sadness 30.6%
Overall Average 52.0% [9] K. Kim, S. Bang, and S. Kim. Emotion recognition
system using short-term monitoring of physiological
signals. Medical and biological engineering and
Table 2: Results of emotion classification using lin- computing, 42(3):419–427, 2004.
ear SVM kernels on two different feature sets: using [10] S. Kim, P. Georgiou, S. Lee, and S. Narayanan.
the alpha band only and using scalp asymmetries. Real-time emotion detection system using speech:
Emotion Alpha Alpha + asymmetry Multi-modal fusion of different timescale features. In
presence overall presence overall
Anger 38% 73% 53% 74%
IEEE 9th Workshop on Multimedia Signal Processing,
Fear 58% 79% 38% 77% 2007. MMSP 2007, pages 48–51, 2007.
Joy 38% 73% 51.2% 74% [11] M. Kostyunina and M. Kulikov. Frequency
Sadness 48% 77% 61% 79% characteristics of EEG spectra in the emotions.
Neuroscience and Behavioral Physiology,
26(4):340–343, 1996.
domain knowledge, we reached an accuracy of 53%, 58%,
51% and 61% for anger, fear, joy and sadness emotions re- [12] J. Lehtonen. EEG-based brain computer interfaces.
spectively. Helsinky University Of Technology, 2002.
[13] T. Musha, Y. Terasaki, H. Haque, and G. Ivamitsky.
7.1 Future Directions Feature extraction from EEGs associated with
One of the areas where we can enhance this study is to emotions. Artificial Life and Robotics, 1(1):15–19,
reduce the number of features. This can be done by reduc- 1997.
ing the number of channels. We will work on studying the [14] R. Picard. Affective computing. MIT press, 1997.
effect of reducing the number of channels against the clas- [15] R. Plutchik. A general psychoevolutionary theory of
sification accuracy. Reducing the number of channels will emotion. Theories of Emotion, 1, 1980.
help us reduce the processing time and make the classifica- [16] S. Reid, L. Duke, and J. Allen. Resting frontal
tion task more portable. Hence, it can be used in real time electroencephalographic asymmetry in depression:
applications. Inconsistencies suggest the need to identify mediating
Another way to achieve a better classification results is to factors. Psychophysiology, 35(04):389–404, 1998.
improve our preprocessing stage. This can be done by using [17] D. Sander, D. Grandjean, and K. Scherer. A systems
Independent Component Analysis (ICA). ICA is a compu- approach to appraisal mechanisms in emotion. Neural
tational model that can extract the different components of networks, 18(4):317–352, 2005.
the signals. For instance, ICA can separate EEG and phys- [18] A. Savran, K. Ciftci, G. Chanel, J. Mota, L. Viet,
iological noise from the recorded signals. B. Sankur, L. Akarun, A. Caplier, and M. Rombaut.
Finally, It will be interesting to compare the results achieved Emotion detection in the loop from brain signals and
from our methodology with an emotion detection system facial images. Proc. of the eNTERFACE 2006, 2006.
that relies on facial expressions for emotion detection. [19] K. Schaaff and T. Schultz. Towards an EEG-based
Emotion Recognizer for Humanoid Robots. The 18th
8. REFERENCES IEEE International Symposium on Robot and Human
[1] K. Ansari-Asl, G. Chanel, and T. Pun. A channel Interactive Communication, pages 792–796, 2009.
selection method for EEG classification in emotion [20] F. Strack, L. Martin, and S. Stepper. Inhibiting and
assessment based on synchronization likelihood. In facilitating conditions of the human smile: A
Eusipco 2007, 15th Eur. Signal Proc. Conf. nonobtrusive test of the facial feedback hypothesis.
[2] G. Chanel, J. Kronegg, D. Grandjean, and T. Pun. Journal of Personality and Social Psychology,
Emotion assessment: Arousal evaluation using EEG’s 54(5):768–777, 1988.
and peripheral physiological signals. Lecture Notes in [21] F. Strack, N. Schwarz, B. Chassein, D. Kern, and
Computer Science, 4105:530, 2006. D. Wagner. The salience of comparison standards and
[3] J. Coan and J. Allen. Varieties of emotional the activation of socail norms consequences for
experience during voluntary emotional facial judgments of happiness and their communications.
expressions. Ann. NY Acad. Sci, 1000:375–379, 2003. 1989.
[4] J. Coan, J. Allen, and E. Harmon-Jones. Voluntary
facial expression and hemispheric asymmetry over the
frontal cortex. Psychophysiology, 38(06):912–925, 2002.
[5] P. Ekman, W. Friesen, J. Hager, and A. Face. Facial
Action Coding System. 1978.
[6] P. Ekman, R. Levenson, and W. Friesen. Autonomic
nervous system activity distinguishes among emotions.
Science, 221(4616):1208–1210, 1983.
World’s First Wearable Humanoid Robot that
Augments Our Emotions
Dzmitry Tsetserukou Alena Neviarouskaya
Toyohashi University of Technology University of Tokyo
1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi, 7-3-1 Hongo, Bunkyo-ku, Tokyo,
441-8580 Japan 113-8656 Japan
dzmitry.tsetserukou@erc.tut.ac.jp lena@mi.ci.i.u-tokyo.ac.jp
ABSTRACT
In the paper we are proposing a conceptually novel approach to
reinforcing (intensifying) own feelings and reproducing
(simulating) the emotions felt by the partner during online
communication through wearable humanoid robot. The core
component, Affect Analysis Model, automatically recognizes nine
emotions from text. The detected emotion is stimulated by
innovative haptic devices integrated into the robot. The
implemented system can considerably enhance the emotionally
immersive experience of real-time messaging. Users can not only
exchange messages but also emotionally and physically feel the
presence of the communication partner (e.g., family member,
friend, or beloved person).
3. ARCHITECTURE OF WEARABLE The structure of the wearable humanoid robot iFeel_IM! is shown
in Figure 2. As can be seen, the structure is based on that of the
HUMANOID ROBOT human body and includes such parts as head, brain, heart, hands,
A humanoid robot is an electro-mechanical machine with its chest, back, abdomen, and sides.
overall appearance based on that of the human body and artificial
intelligence allowing complex interaction with tools and In the iFeel_IM!, great importance is placed on the automatic
environment. The field of humanoid robotics is advancing rapidly sensing of emotions conveyed through textual messages in 3D
(e.g., ASIMO, HRP-4C). However, they have not found the virtual world Second Life (artificial intelligence), the
practical applications at our homes yet (high price, large sizes, visualization of the detected emotions by avatars in virtual
safety problems, etc.). Recent science fiction movies explore the environment, enhancement of user’s affective state, and
future vision of co-existence of human beings and robots. In the reproduction of feeling of social touch (e.g., hug) by means of
movie “Surrogates” humans withdrew from everyday life almost haptic stimulation in a real world. The architecture of the
iFeel_IM! is presented in Figure 3.
Affect
Analysis
Model
chat text
emotion: intensity
PC
Chat Haptic
log Devices
file Controller
D/A
Figure 3. Architecture of the iFeel_IM!. In order to communicate through iFeel_IM!, users have to wear innovative affective
haptic devices (HaptiHeart, HaptiHug, HaptiButterfly, HaptiTickler, HaptiTemper, and HaptiShiver) developed by us.
As a media for communication, we employ Second Life, which (HaptiHeart, HaptiHug, HaptiButterfly, HaptiTickler,
allows users to flexibly create their online identities (avatars) and HaptiTemper, and HaptiShiver) worn by user is activated.
to play various animations (e.g., facial expressions and gestures)
of avatars by typing special abbreviations in a chat window. 4. AFFECT RECOGNITION FROM TEXT
The Affect Analysis Model [20] senses nine emotions conveyed
The control of the conversation is implemented through the
through text (‘anger’, ‘disgust’, ‘fear’, ‘guilt’, ‘interest’, ‘joy’,
Second Life object called EmoHeart (invisible in case of ‘neutral’
‘sadness’, ‘shame’, and ‘surprise’). The affect recognition
state) attached to the avatar’s chest. In addition to communication
algorithm, which takes into account specific style and evolving
with the system for textual affect sensing (Affect Analysis
language of online conversation, consists of five main stages: (1)
Model), EmoHeart is responsible for sensing symbolic cues or
symbolic cue analysis; (2) syntactical structure analysis; (3) word-
keywords of ‘hug’ communicative function conveyed by text, and
level analysis; (4) phrase-level analysis; and (5) sentence-level
for visualization (triggering related animation) of ‘hugging’ in
analysis. Our Affect Analysis Model was designed based on the
Second Life. The results from the Affect Analysis Model
compositionality principle, according to which we determine the
(dominant emotion and intensity) and EmoHeart (‘hug’
emotional meaning of a sentence by composing the pieces that
communicative function) are stored along with chat messages in a
correspond to lexical units or other linguistic constituent types
file on local computer of each user.
governed by the rules of aggregation, propagation, domination,
Haptic Devices Controller analyses these data in a real time and neutralization, and intensification, at various grammatical levels.
generates control signals for Digital/Analog converter (D/A), Analyzing each sentence in sequential stages, this method is
which then feeds Driver Box for haptic devices with control cues. capable of processing sentences of different complexity, including
Based on the transmitted signal, the corresponding haptic device simple, compound, complex (with complement and relative
clauses), and complex-compound sentences. To measure the
accuracy of the proposed emotion recognition algorithm, we (indicating the strength of emotion, namely, ‘low’, ‘middle’, or
extracted 700 sentences from a collection of diary-like blog posts ‘high’). If no emotion is detected in the text, the EmoHeart
provided by BuzzMetrics (http://www.nielsenbuzzmetrics.com). remains invisible and the avatar facial expression remains neutral.
We focused on online diary or personal blog entries, which are The examples of avatar facial expressions and EmoHeart textures
typically written in a free style and are rich in emotional are shown in Figure 4.
colourations. Three independent annotators labelled the sentences During a two month period (December 2008 – January 2009), 89
with one of nine emotions (or neutral) and a corresponding Second Life users became owners of EmoHeart, and 74 of them
intensity value. actually communicated using it. Text messages along with the
We developed two versions of the Affect Analysis Model (AAM) results from AAM were stored in an EmoHeart log database.
differing in syntactic parsers employed during the second stage of From all sentences, 20 % were categorized as emotional by the
affect recognition algorithm: (1) AAM with commercial parser AAM and 80 % as neutral (Figure 5). We observed that the
Connexor Machinese Syntax (http://www.connexor.eu) (AAM- percentage of sentences annotated by positive emotions (‘joy’,
CMS); (2) AAM with GNU GPL licensed Stanford Parser ‘interest’, ‘surprise’) essentially prevailed (84.6 %) over sentences
(http://nlp.stanford.edu/software/lex-parser.shtml) (AAM-SP). annotated by negative emotions (‘anger’, ‘disgust’, ‘fear’, ‘guilt’,
The performance of the AAM-CMS and AAM-SP was evaluated ‘sadness’, ‘shame’). We believe that this dominance of positivity
against two sets of sentences related to ‘gold standards’: 656 expressed through text is due to the nature and purpose of online
sentences, on which two or three human raters completely agreed; communication media.
(2) 249 sentences, on which all three human raters completely
agreed. An empirical evaluation of the AAM algorithm showed anger
promising results regarding its capability to accurately (AAM- 8.8
disgust
CMS achieves accuracy in 81.5 %) classify affective information 0.6 fear
in text from an existing corpus of informal online communication.
9.0 guilt
68.8
5. EmoHeart 1.1
interest
joy
1.0
sadness
6.9 1.8 shame
2.1 surprise
Soft Hand
Belt slots
Weight, kg HaptiTemper V V V
0.146 >1.0 0.160 >2.0 >1.2
HaptiTickler V
Overall HaptiHug V V
sizes 0.5 × 0.4 × 0.4 × 0.3 ×
Height, m × 0.1 × 0.4 0.6 0.5 0.55 0.45
Width, m The heart sounds are generated by the beating heart and the flow
of blood through it. There are two major sounds that are heard in
Wearable
design O – O O O the normal heart and are often described as a lub and a dub
(“lubb-dub” sound occurs in sequence with each heartbeat). The
Generated first heart sound (lub), commonly termed S1, is caused by the
Pressure, 4.0 – – 0.5 2.7 sudden block of reverse blood flow due to the closure of mitral
kPa and tricuspid valves at the beginning of ventricular contraction.
The second heart tone “dub”, or S2, is resulted from sudden lock
DC Vibro- Vibro- Air Air of reversing blood flow at the end of ventricular contraction [3].
Actuators
motors motors motors pump pump
We developed the heart imitator HaptiHeart to produce special
Visual heartbeat patterns according to emotion to be conveyed or elicited
representati – – (sadness is associated with slightly intense heartbeat, anger with
on of the O – –
quick and violent heartbeat, fear with intense heart rate). We take
partner
advantage of the fact that our heart naturally synchronizes with
Social the heart of a person we hold or hug. Thus, the heart rate of a user
pseudo- O – – – –
touch is influenced by haptic perception of the beat rate of the
HaptiHeart. Furthermore, false heart beat feedback can be directly
Based on interpreted as a real heart beat, so it can change the emotional
human- O – – – –
human hug perception.
The HaptiHeart consists of two modules: flat speaker FPS 0304
and speaker holder. The flat speaker sizes (66.5 x 107 x 8 mm)
Developed HuptiHug is capable of generating strong pressure and rated input power of 10 W allowed us to design powerful and
while being lightweight, and compact. Such features of haptic hug relatively compact HaptiHeart device. It is able to produce
display as visual representation of the partner, social pseudo- realistic heartbeating sensation with high fidelity. The 3D model
haptic touch, and pressure patterns similar to that of human- of HaptiHeart is presented in Figure 13.
human interaction, increase the immersion into the physical
contact of partners while hugging greatly.
Gaze detection
4. USER INTERACTION
After the calibration process, the presented equipment is ready for
outdoor usage. Previously mentioned, our goal is the gaze-
sensitive exploration of an urban environment providing the user
with a 6th sense for georeferenced information.
Bluetooth sender When to trigger which suitable action in an eye-gaze-based
system is a commonly investigated and discussed issue known as
the ‘Midas Touch’ problem. A good solution must not render void
the intuitive interaction approach of such an attentive interface by
Location and
increasing the user’s cognitive load or disturbing her gaze-pattern.
orientation detection At the same time, the unintended invocation of an action must be
avoided.
Text-to-
The task of object selection on a computer screen investigated by
speech Jacob [9] might seem related to our scenario of mobile urban
engine exploration where we want to select real-world objects to learn
more about annotated POIs. Jacob suggests either to use a
keyboard to explicitly execute the selection of a viewed item via a
key press or, preferably, apply a dwell time to detect a focused
gaze and fire the action thereafter. In Jacob’s experiment, users
were provided with visual feedback about the current selection
Visibility detection and therefore, were able to easily correct errors.
Due to our mobile scenario, we want to keep the involved
equipment as lightweight as possible sparing an additional
keyboard or screen. Therefore, we rely on an explicit eye-based
action to trigger a query for the currently object. As though the
Figure 4. Overview of our system’s software user would memorize the desired object, closing her eyes for two
components and their communication. seconds triggers the selection. In technical terms, the spatial query
is executed for the last known global gaze direction if the user’s
tracked eye could not be detected during the last two seconds. An
The mobile application may invoke a remote visibility detection invocation of the query engine is marked in the log file with a
service via a 3G network. This service takes the user’s current special status flag.
view into account: By passing a location and an orientation (in
The names of the POIs returned by the visibility detection service
our case the global eye gaze vector) to this HTTP service, a list of
are then extracted and fed into the text-to-speech engine for voice
currently visible POIs in this direction is returned. The engine
output. If a new query is triggered during the output, the text-to-
makes use of a 2.5D block model, i.e. each building in the model
speech engine is interrupted and restarted with the new results.
is represented by a two-dimensional footprint polygon, which is
The auditory output is either possible via the mobile’s built-in
extruded by a height value. Based on this model, POIs with a
loudspeakers or attached earphones.
clear line-of-sight to the user and POIs located inside visible
5. TOUR ANALYSIS scene video is overlaid with a red cross representing the user’s
During the usage of our KIBITZER all sensor values are current gaze and thus, can be used to evaluate our system’s
continuously recorded to a log file. These datasets annotated with accuracy. Furthermore, when combined with the visibility
corresponding time stamps enable a complete reconstruction of detection engine, the tour reconstruction can be used to
the user’s tour for later analysis. automatically identify areas of interest or compile further
statistics.
To efficiently visualize a log file’s content we implemented a
converter tool that generates a KML file from a passed log file.
KML is a XML-based format for geographic annotations and
6. CONCLUSIONS AND OUTLOOK
In this paper, we introduced KIBITZER, a wearable gaze-sensitive
visualizations with the support of animations. The resulting tour
system for the exploration of urban surroundings, and presented
video can be played using Google Earth [8] and shows the user’s
related work in the field of eye-based applications. Wearing our
orientation and gaze from an exocentric (‘third person’)
proposed headpiece, the user’s eye-gaze is analyzed to implicitly
perspective (Figure 5). The displayed human model is orientated
scan her visible surroundings for georeferenced digital
according to the captured compass values; its gaze ray is
information. Offering speech-auditory feedback via loudspeakers
corrected by the calculated gaze deviations. The invocation of the
or earphones, the user is unobtrusively informed about POIs in
visibility detection service, i.e. the gaze-based selection of an
their current gaze direction. Additionally, we offer tools to
object, is marked by a different-colored gaze ray.
reconstruct a user’s recorded tour visualizing her eye-gaze. These
animations are not only useful for accuracy tests during
development but rather aim at a later automated tour analysis, e.g.
to identify areas of interest.
Experiences from first functional tests and reconstructed tour
videos showed that the proposed system’s overall accuracy is
sufficient for determining POIs in the user’s gaze. However, in
some trials the built-in compass was heavily influenced by
magnetic fields resulting in wrong POI selections. This problem
could be solved by complementing the system with a more robust
external compass.
During these tests we observed some minor limitations of the
chosen vision-based gaze tracking approach and the blinking
interaction. In rare cases, unfavorable reflections caused by direct
sunlight prevented a correct detection of the user’s pupil and
therefore, interfered the gaze tracking. Obviously, at night the
usage of such a vision-based system is not feasible without any
artificial light source.
Figure 5. Screenshot of a KML animation Our proposed research prototype is a first step towards the
reconstructed from the logged tour data. exploitation of a user’s eye-gaze in mobile urban exploration
scenarios and therefore, it is deliberately designed for
experimentation. The current system built from off-the-shelf
hardware components provides a complete framework to study
possible gaze-based interaction techniques. With the future arrival
of smart glasses or even intelligent contact lenses, the required
equipment is supposed to become more comfortable to wear, if
not almost unnoticeable.
Applying the presented system, we will evaluate the usability and
effectiveness of eye-gaze-based mobile urban exploration in
upcoming user tests. We will set special focus on the acceptance
of the currently implemented ‘blinking’ action and the
investigation of alternative interaction techniques, respectively.
Inspired by ‘mouse-over’ events known from Web sites such as
switching an image when moving the mouse cursor over a
sensitive area, implicit gaze feedback is conceivable. When a user
Figure 6. Screenshot of the video taken by the helmet- glances at an object, she might be notified about the availability
mounted scene camera. The red cross represents the of annotated digital information by a beep or tactile feedback. The
current eye gaze. combination of our gaze-based system with a brain-computer-
interface to estimate a gaze’s intention and thus, trigger an
according action is another promising direction for future
As the scene camera’s video stream can be recorded via the research.
iViewX HED application, the reconstructed tour animation can be
compared to the actually captured scene video (Figure 6). The
7. ACKNOWLEDGMENTS Information in an AR Information Space. In International
This work has been carried out within the projects WikiVienna Journal of Human-Computer Interaction, Vol. 16, No. 3,
and U0, which are financed in parts by Vienna’s WWTF funding 425-446.
program, by the Austrian Government and by the City of Vienna [11] Morimoto, C.H., and Mimica, M.R.M. 2005. Eye gaze
within the competence center program COMET. tracking techniques for interactive applications. In Computer
Vision and Image Understanding, Vol. 98, No. 1, 4-24.
8. REFERENCES [12] Park, H.M., Lee, S.H., and Choi, J.S. 2008. Wearable
[1] Barakonyi, I., Prendinger, H., Schmalstieg, D., and Ishizuka, Augmented Reality System using Gaze Interaction. In Proc.
M. 2007. Cascading Hand and Eye Movement for of the 7th IEEE/ACM international Symposium on Mixed
Augmented Reality Videoconferencing. In Proc. of 3D User and Augmented Reality, 175-176.
Interfaces, 71-78.
[13] Reitmayr, G., and D. Schmalstieg, D. 2004. Collaborative
[2] Bolt, R.A. 1982. Eyes at the Interface. In Proc. of Human Augmented Reality for Outdoor Navigation and Information
Factors in Computer Systems Conference, 360-362. Browsing. In Proc. of Symposium on Location Based
[3] Bulling, A., Ward, J.A., Gellersen, H., and Tröster, G. 2009. Services and TeleCartography, 31-41.
Eye Movement Analysis for Activity Recognition. In Proc. [14] Schmalstieg, D., and Wagner, D. 2007. The World as a User
of the 11th International Conference on Ubiquitous Interface: Augmented Reality for Ubiquitous Computing. In
Computing, 41-50. Proc. of Symposium on Location Based Services and
[4] Duchowski, A.T. 2002. A breadth-first survey of eye TeleCartography, 369-391.
tracking applications. In Behavior Research Methods, [15] Simon, R. 2006. The Creative Histories Mobile Explorer -
Instruments, & Computers (BRMIC), 34(4), 455-470. Implementing a 3D Multimedia Tourist Guide for Mass-
[5] Feiner, S., MacIntyre, B., Höllerer T., and Webster, A. 1997. Market Mobile Phones. In Proc. of EVA.
A Touring Machine: Prototyping 3D Mobile Augmented [16] Simon, R., and Fröhlich, P. 2007. A Mobile Application
Reality Systems for Exploring the Urban Environment. In Framework for the Geo-spatial Web. In Proc. of the 16th
Personal and Ubiquitous Computing, Vol. 1, No. 4, 208-217. International World Wide Web Conference, 381-390.
[6] Fitts, P. M., Jones, R. E., and Milton, J. L. 1950. Eye [17] SMI iView X™ HED. http://www.smivision.com/en/eye-
movements of aircraft pilots during instrument-landing gaze-tracking-systems/products/iview-x-hed.html. Accessed
approaches. In Aeronautical Engineering Review 9(2), 24– January 07 2010.
29.
[18] Vertegaal, R. 2002. Designing Attentive Interfaces. In Proc.
[7] Fröhlich, P., Simon, R., and Baillie, L. 2009. Mobile Spatial of the 2002 Symposium on Eye Tracking Research &
Interaction. Personal and Ubiquitous Computing, Vol. 13, Applications, 23-30.
No. 4, 251-253.
[19] Ware, C., and Mikaelian, H.T. 1987. An evaluation of an eye
[8] Google Earth. http://earth.google.com. Accessed January 7 tracker as a device for computer input. In Proc. of the ACM
2010. CHI + GI-87 Human Factors in Computing Systems
[9] Jacob, R.J.K. 1990. What you look at is what you get: eye Conference, 183-188.
movement-based interaction techniques. In Proc. of the [20] Wikitude. http://www.mobilizy.com/wikitude.php. Accessed
SIGCHI conference on Human factors in computing systems, January 07 2010.
11-18.
[10] Kooper, R., and MacIntyre, B. 2003. Browsing the Real-
World Wide Web: Maintaining Awareness of Virtual
Airwriting Recognition using Wearable Motion Sensors
Acceleration in mg
0 ay
We also do not get the 3D trajectory easily. While it is
theoretically possible to reconstruct the trajectory from a 6 −500
ax
DOF sensor like the one we use by applying a strapdown
inertial navigation algorithm, it is practically a hard task −1000
az
because of sensor drift and noise. A standard strapdown
algorithm integrates the angular rate once to obtain the at- −1500
0 0.5 1
titude of the sensor, then the gravitational acceleration can Time in s
be subtracted from the acceleration signals and finally dou-
ble integration of the acceleration yields the position. This 150
1 2
www.analog.com http://www.wortschatz.uni-leipzig.de/html/wliste/html
“A”
“TO”
“T” Repos “O”
Data Sensor Preprocessing Normalized Decoding (HMM + Recognized
Acquisition Values Features Language Model) Character or Word
Figure 3: System Overview: Motion data is gathered by the sensors on the glove. The raw data is sent to a
computer and preprocessed resulting in a feature vector. The feature vectors are classified using an HMM
decoder in combination with a language model. The recognized character or word is the output of the system.
100
T repos O
Recognition Rate in %
95
Figure 4: Context model for the word “TO”. It con-
sists of the independent models for the graphemes
“T” and “O” with 7 states and the 2 state model for
the repositioning in between. 90
Recognition Rate in %
F (RH) 91.3 77.2 77.1
G (RH) 94.9 80.6 78.8 81.5
H (RH) 91.8 72.9 73.8
I (RH) 96.4 86.0 87.8
81
J (RH) 93.9 84.3 84.6
Average 94.8 81.9 78.2
80.5
Table 2: Results of character recognition exper-
iments, writer-dependent and writer-independent.
The second column of the table shows the writer- 80
dependent results. The third column shows the 10 20 30 40 50 60 70 80 90 100
Total Number of GMMs
writer-independent results when leaving out the
left-hander, the fourth column shows the writer-
independent results for all writers. Figure 6: Results of the writer-independent charac-
ter recognition on right-handers dependent on the
total amount of Gaussians per HMM. If different
parameter combinations had the same total amount
case on the right-hander data.
of Gaussians, the performance range is shown as ver-
tical bar. A polynomial fitted on the data illustrates
It is not surprising that the recognition performance drops
the tendency.
when testing on the left-handed person, since the writing
style of this person differs in a fundamental way from the
writing of the right-handed test persons. All horizontal
strokes are written in opposite direction and all circular let-
ters are also written in opposite direction.
The main problem of the right-hander only systems are am- hypothesis
biguities in characters and writing variants of different writ- A B C D E F G H I J K L MN O P Q R S T U VWX Y Z
ers. Figure 7 shows the confusion matrix for the cross val- A A
idation on the right-handed writers. First of all, there are B B
C C
problems with similar graphemes like the pairs (P, D) and D D
(X, Y ). The similarities are obvious. In case of P and D, E E
F F
the only difference is the length of the second stroke (the G G
arc). In case of X and Y, depending on how people write H H
I I
it, the only difference is the length of the first stroke (from J J
false negatives
upper left to lower right). One should notice that the writ- K K
reference
L L
ers did not get any kind of visual feedback on their writing. M M
This probably leads to even more ambiguous characters than N N
O O
when writing with visual feedback. The pair (N, W ) is also P P
subject to frequent misclassification. The reason gets obvi- Q Q
R R
ous when considering the way the character N was written S S
by some test persons. Four of the nine right-handers started T T
U U
the N in the upper left corner. They moved down to the V V
lower left corner and up to the upper left corner again be- W W
X X
fore writing the diagonal stroke. An N written this way has Y Y
the same stroke sequence than a W . Figure 8 illustrates this Z Z
ambiguity.
A B C D E F G H I J K L MN O P Q R S T U VWX Y Z
We see that most classification errors arise from the differ- false positives
ences in writing style between the individual writers. The
test persons do write characters in different ways even un- Figure 7: Accumulated confusion matrix for the
der the constraint of block letters. Some of the variants cross validation of the right-handed writers. The
have a very similar motion sequence to variants of different confusion matrices of the tests on each writer were
characters observed by other writers. This leads to more summed together.
ambiguities than in the writer-dependent case. On the sin-
gle character level, it is hard to solve this problem. But
when switching to recognition of whole words, the context
Train Test Word Training Recognition
Iterations Rate
Set A Set B 0 74.2
Set B Set A 0 71.8
Average 0 73.0
Set A Set B 10 97.5
Set B Set A 10 97.5
Average 10 97.5
(a) (b) (c)
Table 4: Results of the writer-dependent word
Figure 8: Writing variants of N (a),(b) compared to recognition. The character models were already
the stroke sequence of W (c). The allographs (b) trained on character data. The number of training
and (c) have the same basic stroke sequence. iterations in the table corresponds only to further
training on words.
Data Features States GMM Rec. Rate
RH axyz 15 6 76.8 is inaccurate and by that also character models can profit
RH axyz ,gxyz 15 6 81.9 from word training. We can see that word recognition per-
formance is in the same range than character recognition
Table 3: Comparison of sensor configurations performance for this writer. Typical misclassifications occur
on writer-independent character recognition. Ac- by confusing words that barely differ, like “as” and “was” or
celerometer and gyroscope features are compared “job” and “jobs”.
to accelerometer only features.
8. DEMONSTRATOR
information should help dealing with these ambiguities. Finally we built an online demo system to showcase the ap-
plicability of the recognizer. The trained models from the
We also investigated the effect of using only accelerometer word recognition experiments were taken and a small vo-
data as features. We would be able to keep the sensor flat cabulary was used. The system can recognize the words
and cheaper if we do not use gyroscopes. We compared the necessary to form the sentences “Welcome to the CSL” and
results of using accelerometers (axyz ) and gyroscopes (gxyz ) “This is our Airwriting System”. The demonstration system
to the results of using only accelerometers. Table 3 shows the uses a laptop with a 2.4 Ghz Intel Core 2 Duo processor
results of this comparison. We see, that accelerometer only and 2 GB RAM for the data recording and the recognition.
performance is worse than with the full sensor setup, but The system runs stable and few recognition errors occur.
depending on the application, this might still be acceptable. A demonstration video of the system can be seen on our
website 3 .
ABSTRACT INTRODUCTION
In the last couple of years, in-vehicle information systems In-vehicle information systems, such as personal navigation
have advanced in terms of design and technical sophistica- devices, built-in driver assistance units and Smartphones,
tion. This trend manifests itself in the current evolution of have become standard equipment in today’s cars - and their
navigation devices towards advanced 3D visualizations as capabilities are quickly evolving. The most obvious ad-
well as real-time telematics services. We present important vances are related to the visual presentation at the in-
constituents for the design space of realistic visualizations vehicle human-machine interface (HMI). On the consumer
in the car and introduce realization potentials in advanced mass market, we see a clear trend towards increasingly rea-
vehicle-to-infrastructure application scenarios. To evaluate listic representations of the driver’s outside world, includ-
this design space, we conducted a driving simulator study, ing textured 3D renderings of highway junctions, road de-
in which the in-car HMI was systematically manipulated tails, mountains, and buildings [14]. Arrows and icons are
with regard to its representation of the outside world. The exactly overlaid over the virtual representation of the driv-
results show that in the context of safety-related applica- er’s field of view to aid in navigation tasks. This develop-
tions, realistic views provide higher perceived safety than ment towards realistic visualization is further strengthened
with traditional visualization styles, despite their higher by the advent of augmented reality navigation systems on
visual complexity. We also found that the more complex market-available handheld devices (e.g. [12]).
the safety recommendation the HMI has to communicate,
Up to now, such realistic visualizations are mostly applied
the more drivers perceive a realistic visualization as a valu-
to navigation. However, with emerging co-operative ve-
able support. In a comparative inquiry after the experiment,
hicle-to-infrastructure or vehicle-to-vehicle communica-
we found that egocentric and bird’s eye perspectives are
tions technology [4,16,20,18], they will also become rele-
preferred to top-down perspectives for safety-related in-car
vant for delivering more advanced safety-related services.
safety information systems.
For example, drivers could be notified about sudden inci-
dents and provided with recommendations on how to react
Author Keywords
User studies, Telematics, Realistic Visualization accordingly. In this context, the major challenge is the fact
that the driver actions required can be fairly unusual and
ACM Classification Keywords unexpected, and thus might not be adequately understood or
H.5.1. Information Interfaces and Presentation: Multimedia implemented. For example, drivers may be asked to stop
Information Systems—Artificial, augmented, and virtual before a tunnel on the emergency lane due to an accident
realities; H.5.2. Information Interfaces and Presentation: ahead.
User Interfaces—GUI In this application context, realistic visualization could
represent both merit and demerit: information attached to a
quasi-realistic mapping of the outside reality might be rec-
ognized more quickly than with today’s schematic visuali-
Permission to make digital or hard copies of all or part of this work for zations, but on the other hand the wealth of details might as
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
well hamper the identification of task-relevant information.
bear this notice and the full citation on the first page. To copy otherwise, It should be clear that the effects of realistic visualizations
to republish, to post on servers or to redistribute to lists, requires prior on usability and user experience must be fully understood
specific permission and/or a fee. before recommending their use in millions of cars. In order
Augmented Human Conference, April 2–3, 2010, Megève, France. to achieve this goal, systematic and reflective user-oriented
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
research is needed.
In this paper, we present an experimental study to evaluate current systems, such additional virtual information typical-
the influence of realistic visualizations on perceived driving ly relates to route indications, congestion information, as
safety and satisfaction. We were interested in finding out well as information on points-of-interest. Typical means of
whether realistic visualizations provide an added value in augmentation are color coded lines or arrows, icons, text
terms of safety and user experience, or whether they are just and numbers. Visualization approaches reach from colored
“eye-candy” that could even endanger the driver and other overlays over the road to virtual “follow-me cars” implicitly
traffic participants. We first specify the basic elements and indicating the speed and direction (compare [13,8]).
characteristics of realistic visualizations. Departing from
this, we formulate a set of research questions that are and Constituents Characteristics
describe the method of an experimental study to address
Map representation Schematic 2D Untextured 3D Textured 3D
them. We finally provide a detailed results description and
provide suggestions for further research. Viewing perspective Top-Down Bird’s Eye Egocentric
Table 1: Typical application scenarios for safety-related traffic telematics services, their urgency and related oppor-
tunities for realistic visualization.
Experimental application scenarios Experimental visualization styles
The test users were exposed to three safety-related applica- Four visualization variants were specified: ‘none’ (as a con-
tion scenarios, as specified in Table 1: navigation with un- trol condition), ‘conventional’, ‘realistic’, and ‘interleaved’.
expected route change, lane utilization, and urgent incident Each visualization style was then realized for the three ap-
warning. The dramaturgical design of these scenarios fol- plication scenarios, resulting in 12 different combinations.
lowed a three-phase structure: the initial phase, the critical Figure 3 illustrates the realization of the conventional and
moment, and the final phase. the realistic view for navigation, lane change, and urgent
incident. In the ‘none’ variant, the map area was filled with
In the initial phase, users were driving for about 1 km along
grey color. In the interleaved variant, the conventional view
the highway, following the routing instructions of their in-
was in the initial and the final phase, and the realistic view
car information system. Then, when entering a predefined
in the critical phase.
zone, a warning was presented to the user, consisting of a
short audio signal, a text message and an icon (see Figure
Procedure and measures
2). The first line of the text message recommends an action The overall duration of test (from the participant’s entering
to the driver, together with an indication of distance. The and leaving the test room) was approximately two hours. A
second line provides information on the cause for the given test assistant was present to conduct the interview, to pro-
recommendation. vide task instructions, and to note specific observations
The critical phase was between the point of the warning made during the experiment. Each individual test consisted
reception and the point at which the action requested in the of an introduction phase, in which the test persons were
respective scenario (the respective turn, lane selection, or briefed about the goals and procedure of the test, and data
emergency stop) should have been performed at the latest. on demographics and previous experiences was gathered.
Then, the participants were enabled to familiarize them-
The final phase mostly served as a way to let users natural- selves with the driving simulator and with the HMI. To
ly finish their driving task. For example, in the lane change minimize a potential habituation effect, it was assured that
scenarios, the driver passed the partly blocked road section the users were informed about and had actively used each
and was then told about the scenario end. visualization and each application scenario. The subsequent
phases of the study will now be described in detail.
Figure 3: HMI Screenshots from the IVIS simulation for different application scenarios
Experimental part RESULTS
In this section, the results from the experimental part, the
The two independent factors of the experimental part were
post-experimental inquiry, and the comparative inquiry are
visualization and safety scenario. Each participant was driv-
described. The statistical analysis was based on the data-
ing 12 conditions, the product of 4 of scenario types and 3
from 28 participants. Mean differences were calculated with
of visualization variants. (This way, participants encoun-
non-parametrical techniques for dependent samples (Fried-
tered every possible combination of visualization and sce-
man and Wilcoxon tests). In all figures, the error bars
nario types). In order to avoid order effects, the sequence of
represent 95% confidence intervals. Throughout the meas-
conditions was varied systematically. At the start of each
ures used in the study we did not find age- or gender-
condition, the car was “parked” at the emergency lane of a
specific differences.
highway. The participant was instructed to drive along the
highway and to follow the instructions on the HMI as accu-
Experimental part
rately as possible. Task completion
In the critical phase of each driving condition, the experi- Our results are characterized by a very high task completion
menter assessed task completion. Task completion was giv- ratio across all test conditions: 99.4% of the navigation,
en if the subjects generally followed the system instructions lane utilization and urgent stop recommendations were gen-
(taking the right exit, selecting the right lane, and emerging erally followed. We found no significant differences be-
stop on the right lane). Furthermore, the test facilitator tween the different visualization styles
noted incidents that occurred during the driving situation.
Post-condition Questionnaire
To capture the immediate driving- and HMI-related impres-
sions, the participants filled out a questionnaire after each Figure 4 presents an overview of participants’ mean ratings
of the 12 conditions. The first question aimed at under- of the four different visualization styles on three scales:
standing the general support perceived in the driving situa- perceived general support in the respective driving situa-
tion. The two subsequent questions were designed to under- tion, support for identifying the relevant details and match-
stand the visualization’s support for identifying the driving- ing with the outside real world. On all three scales, partici-
task relevant information (a potential problem area of de- pants rated those visualizations without a real-world repre-
tailed realistic visualizations) and its support for finding sentation worse than all others. Participants consistently
matches between the road situation and the HMI display (a judged the realistic view as more supportive than the con-
potential advantage of realistic visualizations). ventional view (all differences significant, p < .05). On
none of the three scales, any difference could be found be-
Final interview tween the realistic and the interleaved visualizations.
The final interview aimed at gathering the participants’
overall reflections of the driving situations experienced in
the different conditions. The first two questions directly
addressed the potential strengths by asking: “Did realistic
visualizations support you in finding accordances between
the road situation and the HMI display?”, and the weak-
nesses “Did realistic visualizations deter you from identify-
ing the task-relevant details in the necessary time span?”
Due to the realistic nature of the test, the 12 visualization
variants tested represent specific prototypical combinations
of constituents. In order to also obtain a rough understand-
ing of the impact of the constituents of realistic visualiza-
tions in isolation, a systematic comparison was performed,
based on an illustrated questionnaire. Due to their impor-
tance, ‘map representation’ and ‘viewing perspective’ were
selected as the constituents of interest in the interview. Re-
garding the ‘viewing perspective’, the users were shown
Figure 4: Mean post-condition ratings on the visualization
three clusters of screenshots of 2D and 3D views in naviga- styles, with regard to perceived general support in the driving
tion, lane change and urgent incident warning scenarios, situation, the support for identifying relevant details and for
one cluster only including top-down, the other only bird’s matching the virtual representation with the real-world
eye, and the third only egocentric perspective. The partici-
pants were then asked to provide a ranking on the three
different perspectives, with regard to their assumed support
in the driving situations. The same principle was applied for
‘map representation’.
Figure 5 again shows the perceived overview support in the Realistic: In the realistic view conditions, we noticed hat
driving situation, but here separated by the three safety sce- users tried to follow the indicated arrow as closely as possi-
narios. The ratings are mostly consistent throughout all ble. In the urgent incident scenario, this attitude sometimes
safety scenarios. A notable exception was observed when resulted in driving significantly slower to exactly stop at the
looking at the difference between the conventional and the indicated location. However, this behavior was mostly ob-
realistic view: this was perceived as significantly lower served the first and second time a realistic view was used.
rated in the urgent incident and lane utilization scenarios,
Interleaved: The switch from conventional view to the rea-
but not in the navigation scenario (p<.001, p<.004, p=082).
listic view was noticed well by the drivers. In general, the
When directly comparing the rating values for the conven-
observations made in the critical moment were similar to
tional visualizations between the different scenarios, the
the ones made for the realistic visualization.
conventional visualization was rated better in the navigation
than in the urgent incident scenario (p<.01). The mean rat- Participant impressions
ings in the lane utilization scenario also tended to be lower,
The participants‘ comments provided after using the visua-
but the difference did not reach significance (p = .065).
lizations were as follows:
No visualization: The vast majority of users stated that
without a real-word visualization it was difficult to follow
the lane utilization and urgent incident recommendations on
the HMI. They were basically regarded as a standard fea-
ture for every form of navigation devices.
A few participants stated that in principle it could suffice to
provide safety warnings without a real-word representation,
but that in this case a combination with audio output would
be necessary. Furthermore, they wished the icon placed at a
more prominent place on the screen (interestingly, many
participants only took notice of the icon in the no-
visualization condition).
Conventional: The majority of participants complained
about the experienced difficulties in interpreting the over-
layed lines and icons over the schematic 2D map, when
Figure 5: Mean post-condition ratings on the visualization following utilization and emergency stop recommendations.
styles, with regard to perceived general support in the driving Furthermore, users of latest navigation systems criticized
situation, separated by the three safety scenarios.
the relatively low number of displayed details on the map
and the lack of a car position item. What was often positive-
Observations
ly valued was the good foresight provided by the bird’s eye
perspective.
The main observations of incidents that had been noted
during the test conditions were as follows: Realistic: Many participants stated that they felt safe when
using the realistic visualization. A very often mentioned
No visualization: When being confronted with on-screen reason was that the “1:1” match with the outside world im-
navigation instructions, drivers did not encounter notable proved orientation. They would have liked to see even more
problems. In the two other scenarios, subjects often ap- spatially-referenced annotations, such a blocking icon di-
peared to be confused about how they should behave cor- rectly placed on the respective lane. The display of many
rectly. They were unsure about where exactly to change details was not seen as distractive from the relevant infor-
lanes or where to stop (but as indicated above, the vast ma- mation. The few critical remarks were related to less fore-
jority stopped at the right lane). Several users also got noti- sight, as compared to the conventional view.
ceably excited after receiving a warning and very attentive-
ly looked onto the road situation, to look for the announced Interleaved: Participants provided similar comments with
incident. regard to the interleaved as to the realistic view. The switch
was not seen as an added value by the participants. Many
Conventional: During navigation, no notable problems were stated that they would have preferred a continuously dis-
observed. However, in the other two scenarios many users played realistic view.
were unsure about where to stop or which lane to take. This
was obviously due to the rather schematic visualization on
the 2D map.
Final Interview CONCLUSIONS
The participants widely stated that realistic visualizations In the following, the results are summarized with regard to
had enabled them to find a match between the HMI and the the research questions:
real road situation (mean rating of 16.11 on a 20 point
scale, SD = 3.8). Similarly, many participants stated that Q1: Real-world visualization in general (baseline)
realistic visualizations had not hindered them in finding the The results suggest that an HMI is perceived to support a
relevant details on the screen display (mean rating of 5.5 on driver better in following safety-related recommendations if
a 20 point rating scale, SD = 4.4). it displays a real-world visualization, as compared to a pure
textual and iconic message. A map appears to be regarded
Comparative inquiry
as a standard HMI feature, and it helps to better orientate
Figure 6 shows the ranking results from the comparative oneself. The added value of such a real-world representa-
inquiry on the perspectives top-down, bird’s eye, and ego- tion is consistently supported by user ratings and com-
centric, with regard to their assumed support in the driving ments. On other hand, our task completion results show that
situations. Overall, the top-down perspective was rated sig- the pure display of text and an icon obviously suffices to
nificantly lower than the other perspectives (for both correctly follow a recommendation, at least in low-
p<.001). However, the ratings for bird’s eye and egocentric complexity driving situations.
perspective did not differ significantly from each other, the
navigation scenario again differed from the other two sce- Q2: Realistic vs. conventional visualization
narios: here the bird’s eye view was preferred to the ego- We found that realistic visualizations is perceived as an
centric perspective (Z=-2.05, p<.05). added value when presenting safety-related recommenda-
tions on the HMI, as compared to conventional visualiza-
tions. This is a result that was not easily predictable: in
principle, the many ‘irrelevant’ details shown in realistic
visualizations could as well have been assumed to be dis-
turbing. Also we found that realistic views do not decrease
task completion, at least in simple scenarios.
The comparative inquiry on the map representation re- Q5: Influence of safety scenarios
vealed a strong preference of 3D over 2D (77.4% vs. Throughout the study, we found that drivers felt even more
22.6%; Z=-3.49, p<.001). Again, the navigation differed supported by realistic visualizations when they had to fol-
from lane utilization and urgent incident: here a difference low urgent and non-standard instructions in the urgent inci-
between 3D and 2D could not be found. dent and lane utilization scenarios. While drivers in prin-
ciple followed the general instructions correctly, they often
felt insecure when choosing the right lane or place to stop.
DISCUSSION
encountered as downgraded or simplified implementations.
The experiment presented in this paper is the first compre- For example, visualizations currently marketed as “reality
hensive evaluation of the suitability of different visualiza- views” actually still have many aspects of schematic repre-
tion styles and their constituents for safety-related in-car sentations: in many cases they do not display the current
information applications. The goal was to overcome the situation, but only display 3D templates or 2D images of
current scarcity of prescriptive knowledge on this important prototypical junctions. To advance towards safe and satis-
and safety-relevant topic. factory realistic visualizations in the car, the results clearly
encourage the scientific advancement and understanding of
Our simulator study results show that realistic HMI visuali- the design space for realistic visualizations.
zation styles have a significant positive impact on the user
experience. In comparison to other visualization styles, rea- ACKNOWLEDGMENTS
listic views provided added value in terms of driver support This work has been carried out within the projects
and perceived safety, beyond a purely aesthetic function as REALSAFE and U0, which are financed in parts by
visual enhancement or “eye candy”. These utilitarian bene- ASFiNAG AG, Kapsch TrafficCom AG, nast consulting,
fits materialized particularly in more acute safety-critical the Austrian Government and by the City of Vienna within
scenarios which required effective and timely action by the the competence center program COMET.
driver. Furthermore, we did not find any evidence for nega-
tive impact of realistic views on participants, e.g. in terms REFERENCES
of diminished task-performance, distractions by visual clut- 1. Allen, R. W., Cook, M. L., Rosenthal, T. J. (2007). Ap-
ter or reduced safety. Our findings may thus challenge con- plication of driving simulation to road safety. Special Is-
ventional recommendations which postulate the simplifica- sue in Advances in Transportation Studies 2007.
tion and reduction of visual HMIs designs [6]. In the light 2. Böhm, M., Fuchs, S., Pfliegl, R. (2009). Driver Beha-
of our results, the application of realistic views in safety vior and User Acceptance of Cooperative Systems based
contexts should be considered again on a broader level. We on Infrastructure-to-Vehicle Communication. Proc. TRB
therefore suggest further systematic research on the merits 88TH Annual Meeting.
and demerits of realistic visualizations for in-vehicle navi-
gation and safety applications. 3. Burnett, G. (2008): “Designing and Evaluating In-Car
User Interfaces”. In: J. Lumsden (Ed), Handbook of Re-
Our results also show that compared to traditional naviga- search on User Interface Design and Evaluation for Mo-
tion, safety scenarios have different properties, and conse- bile Technology. Idea Group Inc (IGI), 2008
quently different visualization requirements: in the naviga-
4. COOPERS project: http://www.coopers-ip.eu/
tion scenario, users saw no additional benefit of realistic
views over conventional, schematic ones. However, with 5. Crampton, J. (1992) A cognitive analysis of wayfinding
rising urgency of the scenarios, participants found realistic expertise. Cartographica 29 3: 46-65
views to be significantly more useful. This shows not only 6. European Commission. Commission Recommendation
that reality views provide tangible benefits for the driver, of 22 December 2006 on safe and efficient in-vehicle in-
but also that safety-related HMI represents an application formation and communication systems: Update of the
class distinct from pure navigation, requiring dedicated user European Statement of Principles (ESOP) on human
experience research. machine interface. Commission document C (2006)
Our study participants were only exposed to relatively sim- 7125 final, Brussels.
ple environments (highway) and tasks (such as stopping at 7. Janssen, W. (2007). Proposal for common methodolo-
the emergency lane). This may explain the observed insen- gies for analysing driver behavior. EU-FP6 project
sitivity of users’ (near to perfect) task completion rate to HUMANIST. Deliverable 3.2.
visualization style. Thus, our results should not be genera- 8. Levy, M. Dascalu, S., Harris, FC. ARS VEHO: Aug-
lized towards more challenging high complexity scenarios. mented Reality System for VEHicle Operation. Proc.
Under high strain and cognitive load, users might change Computers and Their Applications, 2005.
preferences and perform better with other or even without
HMI visualizations. Future studies should extend and vali- 9. Martens, M.H., Oudenhuijzen, A.J.K., Janssen, W.H.,
date the design space towards such higher complexity de- and Hoedemaeker, M. (2006). Expert evaluation of the
mands. TomTom-device:location, use and default settings. TNO
memorandum. TNO-DV3 2006 M048.
In this study, we were deliberately interested in understand-
10. McDonald, M., Piao, J., Fisher, G., Kölbl, R., Selhofer,
ing the effects of certain prototypical extreme variants (no
A., Dannenberg, S., Adams, C., Richter, T., Leonid, E.,
visualization, conventional, realistic and interleaved views).
Bernhard, N. (2007). Summary Report on Safety Stan-
Obviously, further visualization variants are possible in this
dards and Indicators to Improve the Safety on Roads,
context. Most importantly, we want to stress the fact that
Report D5-2100. COOPERS project.
these three styles represent idealized variants highly suita-
ble for experimental testing, but which in practice are rather
11. Medenica, Z., Palinko, O., Kun, O., and Paek, T. (2009). 16. REALSAFE project:
Exploring In-Car Augmented Reality Navigation Aids: https://portal.ftw.at/projects/all/realsafe/
A Pilot Study. EA Ubicomp. 17. Ruddle, R.A., Payne, S. J., Jones, D. M. (1997). Navi-
12. Mobilizy: http://www.mobilizy.com/drive gating buildings in “desk-top” virtual environments:
13. Narzt, W., Pomberger, G., Ferscha, A., Kolb, D., Mül- Experimental investigations using extended navigational
ler, R., Wieghardt, J., Hörtner, H., Lindinger, C (2006).: experience. Journal of Experimental Psychology: Ap-
Augmented reality navigation systems. Universal Ac- plied 3 2: 143-159.
cess in the Information Society 4(3): 177-187. 18. TomTom HD Traffic: www.tomtom.com
14. Navigon: www.navigon.com 19. Wang, Y., Zhang, W., Wu, S., and Guo, Y. (2007). Si-
15. Patterson, T. (2002). Getting Real: Reflecting on the mulators for Driving Safety Study – A Literature Re-
New Look of National Park Service Maps. Proc. Moun- view. In: R. Shumaker (Ed.): Virtual Reality, Proc. HCII
tain Cartography Workshop of the International Carto- 2007, LNCS 4563, pp. 584–593, 2007.
graphic Association; 20. Vehicle Infrastructure Integration (VII) initiative:
www.mountaincartography.org/mt_hood/pdfs/patterson http://www.vehicle-infrastructure.org
1.pdf.
An Experimental Augmented Reality Platform
for Assisted Maritime Navigation
Olivier Hugues Jean-Marc Cieutat Pascal Guitton
MaxSea – ESTIA Recherche ESTIA Recherche University Bordeaux 1 (LaBRI) & INRIA
Bidart France Bidart, France Bordeaux, France
+33 5 59 41 70 96 +33 5 59 43 84 75 +33 5 40 00 69 18
o.hugues@net.estia.fr j.cieutat@estia.fr guitton@labri.fr
General Terms
Experimentation, Human Factor, Security.
Keywords
Augmented Reality, Mixed Environment, Image Processing, Figure 1. Garmin Figure 2. MaxSea
Human Factor, Combining exteroceptive data.
1. INTRODUCTION
The continuous progress of new technologies has led to a
proliferation of increasingly smart and powerful portable devices.
The capabilities of devices on board a ship now enable crews to
be offered a processing quality and volume of information until
now unrivalled. In a hostile environment such as the sea, users
need a relevant flow of information. Computer assisted vessel
management is therefore increasingly widespread and Figure 3. Coastal Explorer Figure 4. Furuno
digitalisation is an inescapable development. The three main aims
are as follows:
1. Improved safety (property, environment and people) These environments enable navigation to be greatly improved by
2. Increased gains from productivity (fishing, etc) only showing the necessary information, eg. by combining
satellites photos of the earth and nautical charts like PhotoFusion
3. The representations required for environmental control in Figure 5 proposed by MaxSea [14].
(orientation, location and direction)
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, to republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
Augmented Human Conference, April 2–3, 2010, Megève, France.
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
Copyright 2004 ACM 1-58113-000-0/00/0004…$5.00.
Mixed Reality
6. FUTURE WORK
Given the exploratory nature of this platform, we consider several
fields of work important. These can de divided into two
Figure 14. VE augmented video flow thumbnail image categories. The first refers to the technology and the second
relates to the human factor.
From the technological point of view, aligning the real world and
virtual world remains a challenge, which the boat's movements do
not facilitate. The precision of GPS data and recognizing shapes
in image analysis are complex issues which still need to be dealt
with.
Concerning the human factor, we propose determining the extent
to which this platform helps users to satisfy their orientation
needs. In which conditions is it more natural to use a VE or AR to
navigate and to which extent is it possible to contextualize the
information?
Secondly, we would like to extend our platform to enable it to be
generalized as an AR platform with a VE, which can be, used both
at sea (on boats) and on land (by car or by foot).
[11] Garmin. http://www.garmin.com/garmin/cms/site/fr, October [20] Technology Systems Inc. Augmented reality for marine
2009. navigation. Tech. rep., LookSea, 2001.
[12] Goleman, D. Emotional Intelligence. New York: Bantam [21] William A. Hoff, Khoi Nguyen, and Torsten Lyon. Computer
Books, 1995. Vision-Based Registration Techniques for Augmented
Reality. Intel igent Robots and Computer Vision XV, In Intel
[13] Laurence Nigay, Philippe Renevier, Marchand, T., ligent System and Advanced Manufacturing, SPIE 2904
Salembier, P., and Pasqualetti, L. La réalité cliquable: (November 1996), 538–548.
Instrumentation d’une activité de coopération en situation de
mobilité. Conférence IHM-HCI2001 Lil le (2001), 147–150. [22] Wuest, H., Vial, F., and Stricker, D. Adaptive Line Tracking
with Multiple Hypotheses for Augmented Reality. In
[14] MaxSea. http://www.maxsea.fr, October 2009. ISMAR’05: Proceedings of the Fourth IEEE and ACM
[15] Milgram, P., and Kishino, F. Taxonomy of mixed reality International Symposium on Mixed and Augmented Reality
visual displays. IEICE Transactions on Information Systems (Washington, DC, USA, IEEE Computer Society. 2005), 62–
E77-D, 12 (December 1994), 1–15. 69.
[16] Philippe Fuchs. Les interfaces de la réalité virtuelle. La [23] Zendjebil, I. M., Ababsa, F., Didier, J., Vairon, J., Frauciel,
Presse de l’Ecole des Mines de Paris, ISBN 2- 9509954-0-3 L., and Guitton, P. Outdoor Augmented Reality: State of the
(1996). Art and Issues. 10th ACM/IEEE Virtual Reality International
Conference (VRIC2008), Laval : France.
[17] Schall, G., Mendez, E., Kruijff, E., Veas, E., Junghanns, S.,
Reitinger, B., and Schmalstieg, D. Handheld augmented [24] SAFRAN, UFO Detection. http://www.safran-group.com
reality for underground infrastructure visualization. Personal
and Ubiquitous Computing, Special Issue on Mobile Spatial
Interaction 13, 4 (Mai 2008), 281–291.
Skier-ski system model and development of a computer
simulation aiming to improve skier’s performance and ski
During the first experiment, we worked with the FFS (French Ski
Federation) and videotaped nine athletes during a Giant Slalom
ISF race. The French teams’ director obtained the agreement from
the referee of the race to put video cameras on the sides of the
slope. The aim was to measure kinematic data of the skiers’
techniques during a real race situation, to find out how alpine Figure1. Unit data acquisition, focal points and
skiers start a turn and pilot for an efficient trajectory. on-board Balance
Finally, for our last experiment, three male alpine skiers and one
Figure 2. Skier-skis’ system and sub-systems
female alpine skier participated to the study. They skied on the
Giant Slalom and Slalom.
Kinematic data were acquired the same way, except for video We assume that we can better improve skis’ development and
cameras. They were put on scaffolding to get rid of the spray training separating skier and skis, contrarily to a study about
snow, due to ski turns, which kept from seeing lower focal points. effects of ski and snow-cover on alpine skiing turn without
We got two Kistler on-board force plates to measure ski- studying the skier while using a sledge as the skier [2].
bootsàski torques and snowàski torques. Force plates were
inserted between the ski-boot and the ski. Skiers had to carry About the kinematic study, skier system is represented as a
units’ data acquisition of the on-board balance. Data was sampled mechanical structure made by many stiff and rigid segments
at 936 Hz. articulated with each other. Their moves are determined by
To collect 3D kinematic data, the measurement area was first degrees of freedom allowed by skeletal anatomy. The torques
calibrated. The area marking has been realized with focal points applied on joints have a muscular origin.
made by fluorescent yellow and black squares tennis balls
impaled on a wood stick jabbed in the snow in the sight-line of The ski-bootàski torque is made up with three components of the
video cameras. Focal points were also put on the bottom of the support reaction (on the three axis) and with three torques
internal piquet of the upper gate, and on the bottom of the internal compounding the resulting torque, results of the distance between
piquet of the lower gate. the application point of the support reaction and the origin of the
Focal points data from the slalom gates and from the subjects binding mounting mark (engraved on the topsheet by the ski
were input into 3D Vision software and treated by a DLT constructor).
algorithm.
We calculated both the position and the progression of the center 3.2 Computer representation
of mass for each experiment. To determine the center of mass From the first model obtained with the 3D Vision software (figure
position, we had to calculate the weight, m1, m2, m3… mn, of each 3), we have determined the position of some segmental centers of
segment of the skier and of each ski gear, then determinate the masses, using an anthropometric model [6] [7]. Then, we
position of the center of gravity M1, M2, M3…Mn of each body calculated the global center of mass. Those segments have been
segments and ski equipment in a space defined according to a chosen because they can produce the eight fundamental
frame. The position of the skier-ski system’s center of gravity is techniques that the skier uses to control him/herself.
calculated according to positions PM1, PM2, PM3… PMn of each
center of gravity of body segments and ski equipment.
3. COMPUTER SIMULATOR
3.1 Tridimensional modeling
We consider skier-ski system as an articulated system divided in
several sub-systems (figure 2). The first one, where the torque
snowàskis is applied, is itself made by sub-systems skis and
lower limbs. The second one, which it articulated with the
previous one at each coxal joint, is constituted by the trunk, the
head and upper limbs.
With the new software ID3D (latest version of the 3D Vision
software), we can pilot the progression of each segment of the
jumping jack with captured kinematic of skier’s movements’
analysis. We can provide to the jumping jack anthropometrics
characteristics modeled by Hanavan and apply context’s torques
that constrain the jumping jack. We can then measure data.
With the computer simulator, we aim two goals. The first is
didactic, to show to coaches, trainees or athletes (beginners or
experts) some biomechanical causes which affect the skier-skis’
system and which the understanding is useful either to give
instructions or to act. The second one is technological; it consists
to impose technical instructions to the computer jumping jack, or
modify some equipment characteristics, to measure their
consequences. The goal is then to make assumptions about the
evolution of techniques, skis modeling, and relations between the
evolution of the ski structure and the torques produced during a
specific situation.
Figure 3. First modeling
4. RESULTS
Software works with a DLT algorithm (“Direct Linear The figure 4 shows the segmental modeling with resultant forces
Transformation”). It connects real coordinates (according to a of skiàski-boot torques, and also the direction of the global
known frame) of focal points which are in the common optical acceleration of the center of masses.
field of video cameras, with coordinates collected on the computer We calculated variations of joint angles to underline technique
screen of each focal point recorded by each video camera. This used by each subject to pilot himself. We made statistical
calculation enables to rebuild the position of the moving point into comparisons which make objective our empirical model.
the real space of experimentation. Then, the point which moves on The two next graphics (figures 5 and 6) show in 2D variations of
the screen owns the 3D characteristics of the real moving point lateral knee inclination to the on-edge angle.
kinematic. Graphics make appear a dispersion of size and timing but also a
The 3D computer representation distinguishes each body segment, similar shape showing that the action is realized by every racer.
the coxal joint, the progression of center of gravity, feet trajectory, So, it is a fundamental technique to make vary the on-edge angle
and each ski-bootàski torque (represented by arrows, figure 4). and make our biomechanical and technological models.
Figure 4. Jumping jack, computer simulator Figure 5. Angle right shin tip up
6. DISCUSSION
The system modeling, from this experimental method, seems now
possible. This study went further than results obtained with a 3D
motion analysis only [3] because it has highlighted some factors
to improve performance. Nevertheless, it keeps pursuing the
investigation of interactions of the mechanical and geometric skis’
characteristics between the skiàski-boot torque and the torque
applied on ski by the snow cover. Snow cover properties change
the on-edge angle on steering phase and loading on the ski [2].
Let’s remind that the technique is defined according to articular or
material marks taken from body or equipment. It corresponds to a
technique seen as pertinent for the performance with current skis,
and it also corresponds to a goal intended to vary ski-snow efforts
Figure 9. Picture of the winner or aerodynamic efforts characteristics.
This technical model of the skier is a tool for the coach and the
skier to improve training. The body-technique described is
referred to the middle of the ski-boot of the same side because the
articulate skier-skis’ system is mostly guided by each effort at the
contact ski-snow cover. Mostly, because the aerodynamic strain,
which depends on the speed and the skier’s shape, also affects the
system guiding but weaker than ski-snow efforts. It has also been
showed that the saggital balance (that we call forward-backward
inclination) is an important factor for performance [1].
It is still hard to predict loads on skis from the skier by
electromyographic study, because estimations from EMG are
barely reliable [11].
With the computer simulator, it is possible to apply on ski torques
measured between the skiàski-boot torque and the torque applied
on ski by the snow cover. It is also possible to apply the on-edge
angle measured. That way, the static load repartition is known.
The computer simulator is capable to know the dynamic torques
of skis and bindings (on-board balances measurements). Then, we
can link loads applied on ski by the skier and by the snow to the
Figure 10. Picture of our athlete ski materials and the ski structure. The simulator manipulates
torques and skier-skis system characteristics. We can though find
out what structure/material of the ski improves skier’s
We can notice that the weight, which can be measured by scales, performance
changes if the skier does flexing, extension, and/or segmental
Avoiding replacing 3D skiers motion analysis, in their contexts,
and measurement of external constraints applied on skiers, the [4] Maesani, M., Dietrich, G., Hoffmann, G., Laffont, I.,
computer simulator will allow to impose to the skier-ski system Hanneton, S., Roby-Brami, A. 2006. Inverse Dynamics for
univocal constraints reducing experimental uncertainties, and to 3D Upper Limb Movements - A Critical Evaluation from
make assumptions easily. That way, with coaches, ski Electromagnetic 6D data obtained in Quadriplegic Patients.
constructors, and researchers, studies will be lead to make and Ninth Symposium on 3D Analysis of Human Movement.
evaluate experimental protocols to improve ski development. Valenciennes.
Let’s add that the computer simulator is not only thought for [5] Abdel-Aziz, Y.I., Karara, H. M. 1971. Direct linear
alpine skiing, but also for board and wheeled sports. We can Transformation from Comparator Coordinates into Object
manipulate movement of each body (on the jumping jack) or skis’ Space Coordinates in Close-Range Photogrammetry.
gears’ characteristics to detect and even simulate performance [6] Hanavan, E. P. 1964. A mathematical Model of the Human
variations. Body. AMRL-TR-64-102, AD-608-463. Aerospace Medical
Research Laboratories, Wright-Patterson Air Force Base,
7. ACKNOWLEDGMENTS Ohio
Our thanks to first in memory of Alain Durey; to Rossignol Ltd.
for provided us devices; and to French Skiing Federation for [7] Miller, D.I., Morrison, W. 1975. Prediction of Segmental
allowed us to work with athletes. Parameters using the Hanavan Human Body Model. Med.
Sci. Sports 7, 207-212.
8. REFERENCES [8] Kapandji, I.A. 1982. Physiologie Articulaire. Maloine S.A.
éditeur. Paris.
[1] Müller, E., Schwameder, H. 2003. Biomechanical Aspects of [9] Roux F. 2000. Actualisation des Savoirs technologiques pour
New Techniques in Alpine Skiing and Ski-jumping. Journal la Formation des Entraîneurs de Ski Alpin de Compétition.
of Sports Sciences, 21, 679-692. Doctoral Thesis. University of Paris Orsay XI.
[2] Nachbauer, W., Kaps, P., Heinrich, D., Mössner, M., [10] Cotelli, C. 2008. Sci Moderno. Mulatero Editore.
Schindelwig, K., Schretter, H. 2006. Effects of Ski and Snow
Properties on the Turning of Alpine Skis – A computer [11] Buchanan, T.S., Llyod, D.G., Manal, K., Besier, T.F. 2005.
Simulation. Journal of Biomechanics, 39, Suppl.1, 6900. Estimation of Muscles Forces and Joint Moments Using a
Forward-Inverse Dynamics Model. Official Journal of the
[3] Brodie, M., Walmsley, A., Page, W. 2008. Fusion Motion American College of Sports Medicine. 1911-1916.
Capture: A Prototype System Using Inertial Measurement
Units and GPS for the Biomechanical Analysis of Ski
Racing. Journal of Sports Technology, 1, 17-28.
T.A.C: Augmented Reality System for Collaborative Tele-
Assistance in the Field of Maintenance through Internet.
Sébastien Bottecchia Jean-Marc Cieutat Jean-Pierre Jessel
ESTIA RECHERCHE - IRIT ESTIA RECHERCHE IRIT
Technopôle Izarbel Technopôle Izarbel 118, Route de Narbonne
64210 Bidart (France) 64210 Bidart (France) 31000 Toulouse (France)
(+33)5 59 43 85 11 (+33)5 59 43 84 75 (+33)5 61 55 63 11
s.bottecchia@estia.fr j.cieutat@estia.fr jessel@irit.fr
[7] Didier, J. and Roussel, D. 2005. Amra: Augmented reality [20] Reiners, D., Stricker, D., Klinker, G. and Muller, S. 1999.
assistance in train maintenance tasks. Workshop on Augmented reality for construction tasks: doorlock assembly.
Industrial Augmented Reality (ISMAR’05). IWAR ’98: Proceedings of the international workshop on
Augmented reality: placing artificial objects in real scenes,
[8] Feiner, S., Macintyre, B. and Seligmann, D. 1993. pages 31–46.
Knowledge-based augmented reality. Commun. ACM,
36(7):53–62. [21] Sakata, N., Kurata, T., Kato, T., Kourogi, M. and Kuzuoka,
H. 2006. Visual assist with a laser pointer and wearable
[9] Fussell, S., Setlock, L., Setlock, L.D. and Kraut, R. 2003. display for remote collaboration. CollabTech06, pages 66–
Effects of head-mounted and scene-oriented video systems 71.
on remote collaboration on physical tasks. CHI ’03:
Proceedings of the SIGCHI conference on Human factors in [22] Schwald, B. 2001. Starmate: Using augmented reality
computing systems, pages 513–520. technology for computer guided maintenance of complex
mechanical elements. eBusiness and eWork Conference
[10] Haniff, D. and Baber, C. 2003. User evaluation of augmented (e2001), Venice.
reality systems. IV ’03: Proceedings of the Seventh
International Conference on Information Visualization, page [23] Siegel, J., Kraut, R., John, B.E. and Carley, K.M. 1995. An
505. empirical study of collaborative wearable computer systems.
CHI ’95: Conference companion on Human factors in
[11] Heath, C. and Luff, P. 1991. Disembodied conduct: computing systems, pages 312–313.
Communication through video in a multi-media office
environment. CHI 91: Human Factors in Computing Systems [24] Ward, K. and Novick, D. 2003. Hands-free documentation.
Conference, pages 99–103. SIGDOC ’03: Proceedings of the 21st annual international
conference on Documentation, pages 147–154.
[12] INTEL. OpenCV. http://sourceforge.net/projects/opencv/
[25] Wiedenmaier, S. and Oehme, O. 2003. Augmented reality for
[13] Kato, H. and Billinghurst, M. ARToolkit. assembly processes-design an experimental evaluation.
http://www.hitl.washington.edu/artoolkit/, URL. International Journal of Human-Computer Interaction,
[14] Kraut, P., Fussell, S. and Siegel, J. 2003. Visual information 16:497–514.
as a conversational resource in collaborative physical tasks. [26] Zhong, X. and Boulanger, P. 2002. Collaborative augmented
Human-Computer Interaction, 18:13–49. reality: A prototype for industrial training. 21th Biennial
Symposium on Communication, Canada.
Designing and Evaluating Advanced Interactive
Experiences to increase Visitor’s Stimulation in a Museum
Bénédicte Schmitt (1), (2), Cedric Bach (1), (3), Emmanuel Dubois (1), Francis Duranthon (4)
benedicte.schmitt@irit.fr, cedric.bach@irit.fr, emmanuel.dubois@irit.fr, francis.duranthon@cict.fr
ABSTRACT play and adding new interests on the exhibition. More advanced
forms of Interactive Systems, called Mixed Interactive Systems
In this paper, we describe the design and a pilot study of two (MIS) [6] have also been developed to serve this goal. Mixed
Mixed Interactive Systems (MIS), interactive systems combining Interactive Systems combine digital and physical artifacts.
digital and physical artifacts. These MIS aim at stimulating Examples of MIS include Augmented Reality (AR), Mixed
visitors of a Museum of Natural History about a complex Reality (MR) and tangible user interfaces (TUI). The interest of
phenomenon. This phenomenon is the pond eutrophication that is such advanced interactive experiences is that rather than
a breakdown of a dynamical equilibrium caused by human manipulating technological devices, visitors handle physical
activities: this breakdown results in a pond unfit for life. This objects related to the exhibits, such as wooden blocks to create
paper discusses the differences between these two MIS programs to control a robot on display [9], or physical objects to
prototypes, the design process that lead to their implementation trigger different phenomena on the environment [19], and tightly
and the dimensions used to evaluate these prototypes: user coupled to the display and animation of digital artifacts carrying a
experience (UX), usability of the MIS and the users’ predefined knowledge. Users can explore the advanced interactive
understanding of the eutrophication phenomenon. experience created by the system and discover its content. Limits
of such approaches mainly lie in the fact that they do not propose
Categories and Subject Descriptors: H.5.2. [User clear challenges: proposed tasks are open-ended and users can
Interface]: Prototyping| Evaluation/Methodology| User-centered terminate them whenever they want. However involving physical
design| Theory and methods|. artifacts in an interactive experience is strongly in line with the
most recent trends of museology: indeed Wagensberg [22]
General Terms develops an approach for modern museum in which it is
Design, Experimentation, Human Factors. recommended to maintain real objects or phenomena at the center
of exhibits. Among the above systems, only advanced interactive
Keywords experiences prompt visitors to manipulate real objects or real
Mixed Interactive Systems, Advanced Interactive Experience, co- phenomena to stimulate visitors.
design, museology, eutrophication
But who are museum visitors? Most of the research about learning
1. INTRODUCTION in museums is dedicated to children. However, Wagensberg [22]
During the past years, increasing cultural interactive experiences points out the universality of museum audience. In addition, a
were produced in particular in museology. A major goal of this recent study of Hornecker [10] shows a high interest in using
trend is to increase the involvement of visitors during a visit, in Tangible User Interfaces (TUI) in museums: they are universal
order to make them actors of their own museum experience. and can engage a range of visitor profiles. Investigating the use of
Different attempts have been introduced in Museum. Guides [23] interactive systems for adults therefore appears as a required
are used to provide additional information about the exhibit complement to existing studies related to children (Figure 1). TUI
objects by use of numeric comments. Games [23], [25] can are thus good candidates for providing a fun experience while
propose challenges to visitors, encouraging them to learn through enhancing the process of teaching complex natural phenomenon
to adult visitors.
Permission to make digital or hard copies of all or part of this work for Nevertheless introducing such technology also raises the problem
personal or classroom use is granted without fee provided that copies are of evaluation. Indeed such context usability evaluation methods
not made or distributed for profit or commercial advantage and that (UEM) are still required to study the usability of the application,
copies bear this notice and the full citation on the first page. To copy i.e. how efficient, effective and easy to learn they are [11]. In
otherwise, to republish, to post on servers or to redistribute to lists, addition evaluating the user experience (UX) is equally important
requires prior specific permission and/or a fee. because visiting a museum is a leisure activity rather than a
Augmented Human Conference, April 2–3, 2010, Megève, France. working task. Evaluating such advanced interactive experiences
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
therefore required studying how visitors feel about their objects have no means to express how to act on them. Therefore it
experience [12]. is required to base their use on affordable actions, behavior and
representations. Involving multiple physical objects is another
In this paper, we focus on the introduction and evaluation of a limitation because the user will have to grab and release several
Mixed Interactive Systems in a Museum of Natural History. These objects and find a place where to put them. In addition,
MIS are used to illustrate and teach a complex phenomenon: a technological limitations exist such as the detection and
pond eutrophication. localization of the required physical artifacts.
However MIS also have the advantage of prompting people, as
the technology is embedded into the physical environment [7].
Users can experience a new concept with less reluctance or
without feeling any constraints. They can manipulate objects
without barriers that separate digital and physical worlds.
Another advantage of MIS is their affordance, as objects provide
some common representations. Furthermore they provide some
opportunities not present in desktop interfaces [15]. Users can
explore the physical objects and evaluate which actions to
perform. This can also enable several groups of users, e.g. novices
and children, to use MIS more intuitively. In other words MIS are
potentially more universal than desktop interfaces [16].
Universality and accessibility of MIS are two major advantages
which can be used to help users familiarize with complex or
abstract concepts. We hypothesize that museums could benefit
from these advantages to engage visitors about their exhibits.
The impacts of MIS on users can be an interest topic to measure,
as they engage users. The next section presents the reasons that
Figure 1. “Mixed Interactive Systems for Eutrophication with
encourage us to use both UEM and UX evaluation methods.
Palette” (MISEP): an example of a TUI for Museum
We first motivate the use of MIS in this context and briefly review 2.2 Usability and UX evaluations
the goals of usability and UX evaluations. We then present the These past years, a new concept has emerged: user experience. As
basis of the pond eutrophication and the principles of the co- it is recent, user experience (UX) has no consensual definition yet.
design process we applied to design and implement two different In particular, Bevan [3] studies the difference between usability
MIS. These MIS, namely MISE and MISEP, are introduced, and UX evaluation methods and first highlights different possible
illustrated and finally compared in order to assess their adequacy interpretations of UX in the literature: the goal of UX can be to (a)
with the eutrophication context. Results of a pilot study focusing improve human performances or to (b) improve user’s satisfaction
on their usability and UX with these prototypes are also presented. in terms of use and appropriation of the interactive system.
Hassenzahl points out the role of hedonic and pragmatic goals; on
2. RELATED WORK this basis UX can be considered as the subjective aspects of a
As previously shown, advanced interactive systems have been system [8]. Moreover, according to results of a survey of
developed in domains such as cultural activity and informal Wechsung et al. [24], the interest in usability focused mainly on
learning. Here we discuss limitations of these systems but also designing better products whereas in UX, it is generally more
such advantages that explain this introduction. linked to concepts with emotional content (e.g. fun and joy).
2.1 Use, limitations and advantages of MIS Finally, in the context of informal learning systems, speed and
Complex or abstract concepts like in informal learning are often accuracy are no longer the unique goals; they also should bring
difficult to explain or to understand. MIS provide tools to simulate new knowledge and stimulate user’s emotion, which are indeed
natural phenomena and make these concepts reachable by users. evaluated through UX methods: it is expected from UX
Manipulating physical and digital artifacts is one of the most evaluations to extract users’ feelings and opinions about the
interesting characteristics of MIS, making these concepts easier to system. In short one can say that usability measurements reveal
understand and to perceive as users can experience them [18] problems related to the system behavior and measurements of UX
[21]. A study of Kim et al. [16] shows that MIS support the highlight some additional perspectives to understand their impact.
designer’s cognitive activities and spatial cognition by giving a
―sense of presence‖. MIS make use of three-dimensional 3. MUSEOGRAPHIC CONSIDERATIONS
interfaces to provide a sense of reality that other systems cannot The aim of our collaboration with the Natural History Museum of
offer. Toulouse is to make visitors aware about the eutrophication
process by explaining this phenomenon. It seems primordial to
MIS have some limitations. The first one is that physical objects show that a pond is not just a waterbody, but a complex and
do not provide a trace of actions [15]: once an action is performed dynamical system (Figure 2). A pond can live 100 years or more
on an object, the object itself is not able to provide any and fill up naturally over years. However, this filling can be
information about its previous state to the user or event to the slowed or accelerated by human activities. These activities impact
computer system. A second limitation with the use of physical parameters involved in the eutrophication process: if human adds
objects involved in MIS is to ensure that their planed use is weed killer or pumps water, the pond disappears faster; if human
perceivable and understandable by the users: by essence physical removes mud, the pond disappears slower. These parameters are
for example: water temperature, oxygen rate, water level, mud activities that can be applied to other thematic to respect the
level. consistency of an interactive exhibit. For our prototypes, the
generic activity is ―Making an action on an environment has a
Most visitors ignore all the effects of their activities on a pond and perceptible consequence on many different objects of this
making them actors of an advanced interactive experience can environment‖.
aware them about the eutrophication. However visitors cannot
experience the real phenomenon with real objects since the The aim of the analysis of interactive principles phase is to define
lifetime is long and it is complicated to insert a real pond into a how to make interactive one of the needs of the domain. All the
Museum. As demonstrated previously, MIS represent a fit elements listed in the previous phase are taken in account. The
solution for museum. We decide to simulate the real phenomenon minimal functions of the system which are necessary to make
on a digital pond and to put forward physical objects to represent interactive the generic activity are identified in this phase. For our
human activities. We face two different solutions: either all prototypes, the minimal functions can be expressed by: ―To show
human activities can be made physically, or a man manipulated that an action on an environment has a perceptible consequence
physically can select digital human activities and impact the on many different objects of this environment, the system should
digital environment. The interest of MIS is also that visitors can allow to perform an action, to present a flexible environment, to
interact with a realistic pond to better observe the evolution of the distinguish impacted objects and the environment, and to present
ecosystem. attributes of these objects‖. A set of recommendations to stage the
minimal functions are also listed, for example for our prototypes:
―To putting across that impacted parameters are constituent of
the environment, the system should mark the link between these
parameters and the pond‖.
The design process, described in the next section, takes into Evaluation Design
consideration these museographic requirements.
4. DESIGN PROCESS
We use a specific co-design process [1] as our collaboration with Implementation
the museum involves a multidisciplinary team, composed by
museographic experts, ergonomists and designers, to design our
prototypes. This co-design process facilitates the communication
between the participants and is adapted to Mixed Interactive
Systems, systems we decide to design. Furthermore, with regards
to software design process, the present process focuses more on Production
pedagogic, museographic and visitors’ requirements than on
engineering the software, as the question of technology is Figure 3. Overview of the co-design process for Museums
postponed as the end of the development cycle. Finally, in
contrast with traditional HCI processes, this design process
The elements of these above phases are the same for the two
primarily supports exploration of initial expectations rather than
prototypes as museographic and visitors’ requirements has to be
just users requirements. It then turns these expectations into
well understood before interaction techniques problematic and
interaction considerations and finally iterates to finalize the
technical questions.
application.
The optimization phase aims at designing the interaction with the
This co-design process consists in four phases we will define and system. Our prototypes result from two different optimization
illustrate: preliminary analysis, analysis of interactive principles, phases as the analyses of this phase do not deal with same
optimization and production (Figure 3). This process has the problems: the second prototype should interact directly with the
advantage to guide the design team throughout these phases, and digital 3D pond and put forward fewer devices. The main concept
particularly to facilitate the transformation of the requirements of this phase is to use the participatory design involving end users.
into interactive experience. This iterative phase improves different dimensions of the ongoing
In the preliminary analysis phase, we have analyzed the prototype, like social or educational dimensions. Users can
museologic domain to list all its activities, constraints, targeted participate to design the prototype by use of some creative
users. The objective of this first phase is to extract some generic
methods like brainstorming, focus group and to assess it by objects are used as an input. Visitors observe the effect of their
participating to user tests. actions on the digital environment, which includes the elements
found in MISE and, additionally, a garden and a field.
The last phase conducts to the production of one designed
prototypes.
5. PROTOTYPES
The eutrophication process is complex to explain and we design
two alternative prototypes that have differences regarding
interaction space and forms of coupling between the two worlds.
These aspects are detailed in the next sections and can impact
understanding of this process and our objective is to determine the
role of these different aspects.
Shuang Liang, Rong-Hua Li, George Baciu, Eddie C.L. Chan, Dejun Zheng
Department of Computing
The Hong Kong Polytechnic University
Hung Hom, Kowloon, Hong Kong
{cssliang, csrhli, csgeorge, csclchan, csdzheng}@comp.polyu.edu.hk
ti = ∑
n √
ci
,1 ≤ i ≤ n
(xk −xk−1 )2 +(yk −yk−1 )2
activity and split strokes where pen speed reaches the min- k=2
where s is the number of sample points. Note h(α, β) de- Obviously, when more items are returned, recall will be in-
notes the minimum calibration distance between two turning creased but precision will be decreased. In a recall/precision
angle sets of primitives. graph, a higher curve signifies a higher recall/precision value.
The similarity between two bi-segments can be derived We will further describe and analyze the result in the next
from the above dissimilarity distance as follows: section.
{
0, if dis(B1 , B2 ) > σ 8. RESULTS AND ANALYSIS
sim(B1 , B2 ) =
1− dis(B1 ,B2 )
σ
, otherwise In this section we evaluate the effectiveness of our pro-
(6) posed partial matching in the performance of recall and pre-
where σ is a threshold introduced from our experiments cision rate. In section 8.1, we compare the recall rate with
to normalize similarity and make it fall onto the range of partial shapes varying from one to four bi-segments. In sec-
[0,1]. tion 8.2, we compare the precision rate with four different
descriptors, attributed strings descriptor [8], geometric de-
7. EXPERIMENTAL SETUP scriptor, topological descriptor and our proposed bi-segment
descriptor. In section 8.3, we test the response time of our
We carry out a partial matching experiment to verify the
partial matching garment design system.
effectiveness and efficiency of our proposed method. We con-
ducted our experiment with 200 panel shapes, which are col-
lected by ten experienced panel designers. The machine we
8.1 Recall Rate
used in our experiment was a HP TouchSmart with AMD Figure 9 shows the relationship of number of retrieved
Turion X2 RM-74(2.2GHz) CPU, 2GB memory. We im- shapes to recall rate using partial shapes with different num-
plemented our system using Visual C++ in the Microsoft ber of bi-segments. As we can see, along with the completion
Windows Vista operating system environment. of the drawing process, the partial matching recall rate in-
The experiment was first carried out by collecting sketch creases gradually. The more complete the input panel shape
samples with traditional CAD software. As mentioned, we is, the clearer the user’s intention is expressed. Perhaps,
invited garment designers to freehand sketch the usual, com- the most important point to be stated is a high recall rate
mon and standard panel shapes. We collected 20 samples of partial matching system returns the wanted shapes. We
sketches from each designer. A panel database with 200 could see when four bi-segments all nonetheless always out-
panel shapes is successfully established. Figure 8 shows some perform one bi-segment at every setting. While returning 20
panel shape examples in our database. shapes, an input partial shape with four bi-segments could
Second, we need to build up a feature database based on achieve 90% recall rate.
panel database. As can be seen in Figure 8, different panel
shapes have different features. We apply our proposed bi- 8.2 Precision Rate
segment shape descriptors to extract features from panel Figure 10 depicts the relationship between the precision
shapes. We sample 5 points on each segment in our ex- of recall rate using four different descriptors of attributed
periment, and get a 13-dimension numerical vector for each strings, geometric descriptor, topological descriptor and our
bi-segment. proposed bi-segment descriptor. We average the result and
Finally, we evaluate the effectiveness of our proposed par- plot these four curves with different features used in partial
tial matching. Similar to the main performance metrics of matching. As can be seen in Figure 10, attribute strings
interest for general information retrieval, we test the effec- descriptor has only 73% precision. Due to neglecting of the
tiveness of partial matching by recall and precision rate. Re- topological characteristics, it cannot fully express the panel
call is defined as the ratio between the number of relevant content. Clearly, our proposed bi-segment descriptor has the
returned shapes and the total number of relevant shapes, best performance and achieves in average 20% more preci-
while precision is defined as the ratio between the number sion than all the other three descriptors.
of relevant returned shapes and the total number of returned
shapes. 8.3 Response Time
matching accuracy of the proposed method.
1
One bi−segment For future work, it is promising to extend the application
0.9 Two bi−segments domains of our shape descriptor, i.e., mechanical drawing.
Three bi−segments
0.8 Four bi−segments
0.7
10. ACKNOWLEDGEMENT
This work is supported by the Research Grants Council of
0.6
Recall
0.3
11. REFERENCES
[1] S. Berretti, A. D. Bimbo, and P. Pala. Retrieval by
0.2
shape similarity with perceptual distance and effective
0.1 indexing. IEEE TRANSACTIONS ON
5 10 15 20
Numbers of retrieved shapes MULTIMEDIA, 2(4), 2000.
[2] L. Chen, R. Feris, and M. Turk. Efficient partial shape
Figure 9: Performance of partial matching system matching using smith-waterman algorithm. 2008.
with increasing integrity of input shapes. [3] Y. Chi and M. Leung. Part-based object retrieval in
cluttered environment. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 29(5):890–895,
1
Bi−segment descriptor
2007.
0.95 Attributed strings [4] P. Decaudin, D. Julius, J. Wither, L. Boissieux,
Geometric descriptor
0.9 Topological descriptor A. Sheffer, and M.-P. Cani. Virtual garments: A fully
geometric approach for clothing design. 25, 2006.
0.85
[5] L. J. Lateck, V. Megalooikonomou, QiangWang, and
0.8
D. Yu. An elastic partial shape matching technique.
Precision
ABSTRACT
Wearable computing technology is one of the methods that can
augment the information processing ability of humans. However,
in this area, a soft surface is often necessary to maximize the
comfort and practicality of such wearable devices. Thus in this
paper, we propose a soft surface material, with an organic
bristling effect achieved through mechanical vibration, as a new
user interface. We have used fur in order to exhibit the visually
rich transformation induced by the bristling effect while also
achieving the full tactile experience and benefits of soft materials.
Our method needs only a layer of fur and simple vibration motors. Figure 1. Soft and Flexible User Interface suitable for
The hairs of fur instantly bristle with only horizontal mechanical Wearable Computing Inspired by Hair Erection.
vibration. The vibration is provided by a simple vibration motor
embedded below the fur material. This technology has significant (a)
potential as garment textiles or to be utilized as a general soft
user interface.
General Terms
Soft User Interface, Pet Robot, Visual and Haptic Design,
Computational Fashion
ABSTRACT
In this research, we aim to realize a gustatory display that enhances
Keywords
Gustatory Display, Pseudo-Gustation, Cross-sensory Perception.
our experience of enjoying food. However, generating a sense of
taste is very difficult because the human gustatory system is quite
complicated and is not yet fully understood. This is so because 1. INTRODUCTION
gustatory sensation is based on chemical signals whereas visual Because it has recently become easy to manipulate visual and
and auditory sensations are based on physical signals. In addition, auditory information on a computer, many research projects have
the brain perceives flavor by combining the senses of gustation, used computer-generated virtual reality to study the input and
smell, sight, warmth, memory, etc. The aim of our research is to output of haptic and olfactory information in order to realize more
apply the complexity of the gustatory system in order to realize a realistic applications [1]. Few of these studies, however, have dealt
pseudo-gustatory display that presents flavors by means of visual with gustatory information, and there have been rather few display
feedback. This paper reports on the prototype system of such a systems that present gustatory information. One reason for this is
display that enables us to experience various tastes without that gustatory sensation is based on chemical signals while visual
changing their chemical composition through the superimposition and auditory sensation are based on physical signals, which
of virtual color. The fundamental thrust of our experiment is to introduces difficulties to the presentation of a wide variety of
evaluate the influence of cross-sensory effects by superimposing gustatory information.
virtual color onto actual drinks and recording the responses of
Moreover, in the human brain's perception of flavor, the sense of
subjects who drink them. On the basis of experimental results, we
gustation is combined with the sense of smell, sight, warmth,
concluded that visual feedback sufficiently affects our perception
memory and so on. Because the gustatory system is so complicated,
of flavor to justify the construction of pseudo-gustatory displays.
the realization of a stable and reliable gustatory display is also
difficult.
Categories and Subject Descriptors
H.5.2 [INFORMATION INTERFACES AND Our hypothesis is that the complexity of the gustatory system can
PRESENTATION]: User Interfaces –Theory and methods. be applied to the realization of a pseudo-gustatory display that
presents the desired flavors by means of a cross-modal effect. In a
cross-modal effect, our perception of a sensation through one sense
General Terms is changed due to other stimuli that are simultaneously received
Experimentation, Human Factors. through other senses. The McGurk effect [2] is a well-known
example of a cross-modal effect. The visual input from the
articulatory movements of the lips saying “gaga” was dubbed over
by auditory input saying “baba”. Subjects who were asked to report
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are what they heard reported that they hear “dada”, which shows that
not made or distributed for profit or commercial advantage and that seeing the movement of the lips can interfere with the process of
copies bear this notice and the full citation on the first page. To copy phoneme identification.
otherwise, to republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. By using this effect, we may induce people to experience different
Augmented Human Conference, April 2-3, 2010, Megève, France. flavors when they taste the same chemical substance. For example,
Copyright © 2010 ACM 978-1-60558-825-4/10/04...$10. although every can of “Fanta” (produced by the Coca-Cola
Company) contains almost the same chemical substances in almost
the same combination, we appreciate the different flavors of orange,
grape, and so on. It is thus conceivable that the color and scent of a
drink have a crucial impact on our interpretation of flavor, which is
not based entirely on the ingredients of the drink.
Therefore, for the realization of a novel gustatory display system,
we need to establish a method that permits people to experience a
variety of flavors not by changing the chemical substances they
ingest, but by changing only the other sensory information that
accompanies these substances.
In this paper, we first introduce the knowledge from conventional
studies about the influence that other senses have on gustation.
Next, based on this knowledge, we propose a method that changes
the flavor that people experience from a drink by controlling the Figure 1: Aji Bag and Coloring Device method (ABCD
color of the drink with a LED. We then report the results of an method)
experiment that investigates how people experience the flavor of a we eat food that we find displeasing. We also know that without
drink with color superimposed upon it, by using the proposed olfaction, we hardly experience any taste at all. Moreover, as
method and the same drink colored with dye. Finally, we evaluate Prescott explained, the smells we experience through our nose
the proposed method by comparing these results and discuss its stimulates gustatory sensation as much as the tasting by our tongue
validity and future avenues of research. [5]. Rozin reported that when people were provided with olfactory
stimulation, they said that the sensation evoked in their mouth was
2. GUSTATORY SENSATION AND CROSS- a gustatory sensation, even if the stimulation itself did not evoke
such a sensation [6]. Furthermore, there is a report that 80% of
SENSORY EFFECT what we call "tastes" have their roots in olfactory sensation [7].
The fundamental tastes are considered the basis of a visual system
for the presentation of various tastes such as RGB. There are On the other hand, it is well known that humans have a robust
several theories regarding the number of fundamental tastes. The tendency to rely upon visual information more than other forms of
four fundamental tastes theory includes sweet, salty, sour and sensory information under many conditions. As in the
bitter, while the five fundamental tastes theory adds a fifth taste abovementioned study of gustation, many studies have explored the
sensation, umami, to these four tastes. Moreover, a number of effect of visual stimuli on our perception of "palatability". Kazuno
research reports have indicated that "fundamental tastes do not examined whether the color of a jelly functioned as a perceptual cue
exist because gustation is a continuum", or "the acceptor sites of for our interpretation of its taste [8]. His survey suggests that the
sweetness or bitterness are not located in one place [3]." It can thus color of food functions as a perceptual cue more strongly than its
be said of this crucial idea that there is no definition of fundamental taste and smell.
tastes that is accepted by all researchers [4].
These studies, then, indicate the possibility of changing the flavor
In any case, what is commonly called taste signifies a perceptual that people experience with foods by changing the color of the
experience that involves the integration of various sensations. We foods. It is not difficult to quickly change the color of a food, and
perceive "taste" not just as the simple sensation of gustatory cells the three primary colors, which can be blended to create all colors,
located on our tongue, but rather, as a complex, integrated are well-known. Thus, if we can change the experience of taste by
perception of our gustatory, olfactory, visual, and thermal changing the color of a food, this is the key to the creation of a
sensations, as well as our sense of texture and hardness, our pseudo-gustation display, because it is easy to present visual
memory of other foods, and so on. When we use the common word information. Our research, therefore, focuses on a technological
flavor, then, we are in fact referring to what is a quite multi-faceted application of the influence of colors on gustatory sensation.
sensation.
It is therefore difficult to perceive gustatory sensation in isolation 3. PSEUDO-GUSTATORY DISPLAY
from any other sensation unless we take special training or have a BASED ON CROSS-SENSORY EFFECT
remarkably developed capacity. This suggests, however, that it is
EVOKED BY SUPERIMPOSITION OF
possible to change the flavor that people experience from foods by
changing the feedback they receive thorough another modality. VIRTUAL COLOR ONTO ACTUAL
While it is difficult to present various tastes through a change in DRINKS
chemical substances, it is possible to induce people to experience In this paper, we propose a method that can induce people to
various flavors without changing the chemical ingredients, but by experience various tastes only through the controlled
changing only the other sensory information that they experience. superimposition of color upon the same drink by means of a LED.
To do this, we invented the Aji (Aji means taste in Japanese) Bag
The reason for this is that the olfactory sense, above all other
and Coloring Device (ABDC) (Fig. 1) as a means of changing the
senses, is most closely related to our perception of taste. This
color of a drink without changing its chemical composition. In the
relationship between gustatory and olfactory sensation is
ABCD method, a small plastic bag filled with a liquid to be drunk
commonly known, as illustrated by our pinching our nostrils when
drink in our implementation because it would be safe if our
subjects happened to ingest it. Our prototype system of a pseudo-
gustatory display using the ABCD method is shown in Figure 3.
4. EVALUATING CROSS-SENSORY
PERCEPTION OF SUPERIMPOSED
VIRTUAL COLOR
.
■Sweetness
■Sourness
ABSTRACT not carry enough energy per quantum to remove an electron from
There is some concern regarding the effect of smart phones and an atom or molecule.
other wearable devices using wireless communication and worn Section 2 presents the related work. Section 3 surveys the main
by the users very closely to their body. In this paper, we propose a different wireless technologies used by mobile devices from the
new network switching selection model and its algorithms that point of view of their electromagnetic radiated emission. In
minimize the non-ionizing radiation of these devices during use. Section 4, we propose our networks switching selection model
We validate the model and its algorithms with a proof-of-concept and algorithms to minimize exposures to electrosmog and we
implementation on the Android platform. validate them via discussing a proof-of-concept implementation.
Section 5 concludes the paper.
Categories and Subject Descriptors
C.1.2 [Network Architecture and Design]: Wireless 2. RELATED WORK
Communication. H.1.2 [User/Machine Systems]: Human The potential harmful effects of electrosmog have been
Factors. K.4.1 [Public Policy Issues]: Human Safety. K.6.2 researched in many occasions and there are still doubts regarding
[Installation Management]: Performance and usage these effects beyond the transformation of electromagnetic energy
measurement. in thermal energy in tissues. However, even a sceptical recent
survey [1] underlines that the precautionary principle, meaning
General Terms that efforts for minimizing exposure, should be followed,
Algorithms, Management, Measurement, Performance, Human especially for teenagers.
Factors One of the first means to reduce exposure, besides stopping using
it or using it only when needed and in good conditions (close to
Keywords the base station...), is to use a mobile phone with low Specific
electrosmog, wireless hand-over. Absorption Rate (SAR). However, as the SAR indicated on the
mobile phones is measured at their full power strength, some
phones with higher SAR may better manage their power strength
1. INTRODUCTION and end up emitting less than phones with lower SAR that emit
There are more and more wireless products that are carried out by more often at full power even if it is not needed. In the USA, the
users from broadly used mobile phones to more specific devices FCC has set a SAR limit of 1.6 W/kg, averaged over a volume of
such as cardio belts and watches to monitor heart rates whilst 1 gram of tissue in the head and in any 6 minute period. In
practicing sport. These devices use different wireless technologies Europe, the ICNIRP limit is 2 W/kg, averaged over a volume of
to communicate between each other and their Internet remote 10 grams of tissue in the head and in any 6 minute period.
servers, for example, to store the sport session data. Those devices Interestingly, the iPhone user manual underlines that it may give a
bring interesting aspects for the users. However, there is some higher SAR than the regulation if used in direct contact with the
raising concern about the effect of the non-ionizing body: “for body-worn operation, iPhone’s SAR measurement may
electromagnetic radiations of the wireless devices on the user’s exceed the FCC exposure guidelines if positioned less than 15
health. Those electromagnetic radiation exposures are generally mm (5/8 inch) from the body” [2].
coined “electrosmog”. Non-ionizing radiations mean that they do
As it is less common to stay very close to a mobile phone mast for
a long time, working on reducing the phone emission that is
Permission to make digital or hard copies of all or part of this work for carried all day long very close to the human body should have
personal or classroom use is granted without fee provided that copies are more effect for most users. However, Crainic et al. [3] have
not made or distributed for profit or commercial advantage and that investigated parallel cooperative meta-heuristics to reduce
copies bear this notice and the full citation on the first page. To copy exposure to electromagnetic fields generated by mobile phones
otherwise, or republish, to post on servers or to redistribute to lists, antennas at planning time whilst still meeting coverage and
requires prior specific permission and/or a fee.
Augmented Human Conference, April 2–3, 2010, Megève, France.
service quality constraints. It is different than our approach that
Copyright 2010 ACM 978-1-60558-825-4/10/04…$10.00.
focuses on reducing electromagnetic fields generated by the switch seamlessly to GSM/3G when the user leaves the Wi-Fi
users’ devices at use time. zone. It is also difficult for users to switch to other networks than
Algorithms and systems to seamlessly switch between the the ones provided by their telecom provider. From an energy
available networks to remain always best connected are being consumption point of view, according to Balasubramanian et al.
researched [4-7]. They mainly focus on quality of service (QoS) [12], for 10 kB data size, GSM consumes around 3.25 times less
for decision-making. In this paper, we add another dimension to than 3G and 1.5 times less than Wi-Fi (if the cost of scan and
the network selection issue and underline that electrosmog transfer is taken into account). However, for 500 kB+ data size,
exposure should also be taken into account. GSM consumes as much as 3G and twice more than Wi-Fi (even
if the cost of scan and transfer is taken into account). 3G
consumes around 1.9 times more than Wi-Fi (if the cost of scan
3. WIRELESS TECHNOLOGIES SURVEY and transfer is taken into account). It is worth noting that the
There are different wireless technologies that can be used for energy spent for GSM/3G networks can vary a lot depending on
communication by the devices carried out by the users. In this the distance between the user and the network antenna as the
section, we survey the main ones including those that are less distance can be quite long compared to Wi-Fi: for example for
well-known by the general public but are still important regarding GSM 900, a phone power output may be reduced by 1000 times if
the increased number of wearable communicating sport/health it is close to the base station with good signal. The peak handset
devices products. power limit for GSM 900 is 2 Watts and 1 Watt for GSM 1800.
3.1 WI-FI The peak power of 3G UMTS is 125 mW at 1.9 GHz. It has been
The most widespread wireless network technology on portable found that in rural area, the highest power level for GSM was
computer devices is Wi-Fi. Wi-Fi is unlicensed. There are used about 50% of the time, while the lowest power was used
different types of Wi-Fi networks, for example, Wi-Fi IEEE only 3% of the time. The corresponding numbers for the city area
802.11b and 802.11g use the 2.4 GHz band and 802.11a uses the were approximately 25% and 22%. The results showed that high
5 GHz band. 5 GHz signals are absorbed more readily by solid mobile phone output power is more frequent in rural areas
objects in their path due to their smaller wavelength and for the whereas the other factors (length of call, moving or stationary,
same power they propagate less far than 2.4 GHz signals. The indoor or outdoor) were of less importance [13]. Factors that may
average Wi-Fi range is between 30m and 100m. Mobile computer influence the power control are the distance between hand set and
devices that integrate Wi-Fi are not made to seamlessly switch base station and attenuation of the signal, the length of calls (the
between nearby available Wi-Fi networks. There are a number of phone transmits on the maximum allowed power level at the onset
security issues with Wi-Fi: WEP encryption has been broken for a of each call), and change of connecting base station, ‘‘hand-over’’
while; WPA encryption creates separate channels for each user (the phone will temporarily increase output power when
but most public Wi-Fi access points only ask for authentication connecting to a new base station). Hand-overs will be made when
and does not encrypt afterwards... Although privacy is beyond the the mobile phone is moved from one cell covered by one base
scope of the paper, it is important to note that for privacy’s sake station to another cell, but may also occur on demand from the
and the sensitive aspects of some communicated information such base station owing to a high load on the network at busy hours.
as heart rate profile, secure network should be considered. The iPhone 3G user guide indicates that 10g SAR is 0.235 W/kg
WiMAX is different than Wi-Fi, is more dedicated to long range for GSM 900, 0.780 W/kg for GSM 1800, 0.878 for UMTS 2100;
systems covering kilometers and is rarely integrated in mobile and 0.371 for Wi-Fi. It was worse for the iPhone with 1.388 W/kg
devices for now. The peak power of Wi-Fi 802.11b/g is 100 mW 1g SAR for UMTS 1900 and 0.779 W/kg 1g SAR for Wi-Fi.
and 802.11a is 1 W. Kang and Gandhi [8] found that SAR near- Combined with the fact that its user guide mentioned that it might
field exposure to a Wi-Fi 100 mW patch antenna radiated from a be higher at closer than 1.5 cm and that both 3G and Wi-Fi may
laptop computer placed 10 mm below planar phantom is 2.82 be enabled at the same time, it means that the iPhone can have a
W/kg 1g SAR and 1.61 W/kg 10g SAR at 2.45 GHz and at 5.25 1g SAR much higher than the 1.6 W/kg limit: above 1.388 +
GHz is 1.64 W/kg 1g SAR and 0.53 W/kg 10g SAR. A French 0.779 = 2.167 W/kg.
organization study found that all Wi-Fi 2.4 GHz cards studied are 3.3 Bluetooth, Zigbee, ANT…
under 2W/kg 10g SAR limit from 0.017 to 0.192 W/kg [9] at less Bluetooth, based on IEEE 802.15.1, also uses the 2.4 GHz band
than 12.5 cm. with a data rate of around 1 MB/s. A large number of mobile
3.2 GSM, UMTS/GPRS, 3G… phones integrates Bluetooth. Discovery and association of
Regarding mobile phones, although more and more smartphones Bluetooth devices are not designed to be seamless. Bluetooth
integrate Wi-Fi, their most widespread wireless network 2.1+ pairing uses a form of public key cryptography and is less
technology remains the one provided by their telecom operator: prone to Wi-Fi types of attacks [14]. The peak power of Bluetooth
GSM (around 900 MHz or 1800 MHz; maximum distance to cell ranges from 1 mW to 2.5 mW. The normal range of Bluetooth is
tower from 1 km to 20 km [10]), GPRS, EDGE, UMTS 3G around 10m, which is lower than Wi-Fi. With lower distances,
(around 2GHz; from 144 kB/s in moving vehicles to more than 2 Bluetooth has lower consumption than Wi-Fi: around 3 to 5 times
MB/s for stationary users [11]; maximum distance to cell tower more according to Su Min et al. [15]. However, for resource
from 0.5 km to 5 km [10])… The telecom operators have paid constrained wearable devices such as heart belts and cardio/GPS
licensed to be able to use them. There is different encryption watches, Bluetooth is still consuming too much energy. It is the
between each user using a cell. Mobile phones switch seamlessly reason that a new Bluetooth specification called “Bluetooth low
between GSM/3G cells and more and more mobile phones energy” has been released recently and would consume between
integrate now Wi-Fi. However, only a few phones and telecom 1% to 50% of normal Bluetooth depending on the application
providers allow the users to start a phone call with Wi-Fi and [16]. “Bluetooth low energy” is more seamless: it can support
connection setup and data transfer as low as 6ms. “Bluetooth low Based on the networking technologies that we have surveyed in
energy” can use AES-128 encryption. As Bluetooth consumed too the previous sections, the exposure can be significantly reduced
much energy for resource constrained devices, other networking by choosing among the different networking technologies
technologies have been used. Zigbee based on IEEE 802.15.4- available. On recent mobile phones, there are 4 main choices:
2003 runs at 868 MHz in Europe, 915 MHz in the USA and GSM, 2G, 3G and Wi-Fi. However, it may be cumbersome for the
Australia, and 2.4 GHz in most other places. Zigbee consumes user to learn which networking technology to choose depending
around 10 to 14 times less than Wi-Fi according to Su Min et al. on what they are doing with their phone and to constantly
[15]. The downsize of Zigbee is that it has a much lower data rate manually switch from one network to another. Fortunately, recent
from 20 kB/s to 250 kB/s. Another main network technology that mobile phone operating systems such as Android provide an
was used in many sport/health monitoring devices is ANT, which Application Programming Interface (API) that allows third-party
is proprietary. ANT and Zigbee can send data in less than 10ms. applications to switch from one networking technology to
However, ANT can send bigger files faster as its transmission is 1 another. In this section, we first describe our network selection
MB/s, which means lower energy to submit large files than model and its algorithms and then we explain how we have
Zigbee [17]. Fyfe reports even lower energy consumption for validated our approach with a proof-of-concept application
ANT compared to Zigbee for small data size (8 bytes) [18]. implemented on an Android phone.
Anyway, Zigbee and ANT are not available on mobile phones.
“Bluetooth low energy” seems a good candidate to replace ANT 4.1 Network Switching Selection Model and
and Zigbee due to its openness and number of products already
using Bluetooth. Martínez-Búrdalo et al. [19] have found that
Algorithms
Bluetooth generates very low 10g SAR of around 0.037 W/kg, We define Ni a network i among a set of n available networks in
unfortunately none of these networking technologies are [1; n].
considered as main connecting technologies maybe due to their Each network Ni is associated with a 10g SAR in W/kg defined as
limited range. SARi for the specific device carried out by the user. The related
work surveyed above has underlined that different mobile devices
3.4 Comparison Summary have different SARs.
Table 1. Networking Technologies Comparison
In this case, the optimal policy to minimise the electromagnetic
radiation from the mobile device is to select the network with the
Maximum Distance (m)
Energy Consumption*
(Wi-Fi reference)
Nchosen = N1
Openness#
Security#
Network
(W/kg)
Static
S < Ss Thrown Rolled
Floating
Space Ball 2
In "SPACE BALL 2", we use the ball which has the sound sensor
and the wireless communication module inside it. This application Figure 9. Description of Rules to Get Points and Make New
generates dynamic CG effects on the play field that change in Targets in Space Ball 2
sync with the ball's characteristic motion as detected by the ball
states recognition program. We prepared three different ball states,
"bounce", "rolled", and "flying" which are detected by the
program. The program then uses the position information of the
ball (this is achieved through the high speed camera) as
parameters to decide the direction of the game. Table 3 and Figure
4 show how the direction of game is determined by using ball’s
information. This application is designed as a multi-player
cooperative game. There is a time limit of 60 seconds per game. A
player can score by hitting the ball on the target projected CG spot.
The targets, of the same color, are displayed on the play field.
Their color and placement can be changed when player bounces
the ball outside the field, (at this moment the color of the ball also
changes). The players can choose their favorite placement of the
target spots, making it easier to get high scores by changing the Figure 10. Floor Coordination
ball's color though dribbling it on the floor with their hands.
Hitting a target in one bounce, or rolling the ball on a line of
targets generates higher scores. Figure 9 shows the rules of how to
get points and how to make new targets, as well as how to make
the time limit longer.
Sound effects have an important role in Space Ball 2. We applied
up-beat music as a basic BGM during the play. This up-beat
music is aimed at making people excited during the game. On the
continuous BGM, we added four different sound effects in
accordance with the ball's bounce. Sounds differ based on the
context of the scene, letting players know what happened in their
game (ex. They changed the targets coordinates, they got points
by hitting the target; the ball simply bounced inside the play field
but failed to hit the target). Each sound was designed and recorded
beforehand and plays when a bounce occurs with no delay.
(a)
8. REFERENCES
[1] S. Arya, D. M. Mount, N. S. Netanyahu,
R. Silverman, and A. Wu. An optimal algorithm for
approximate nearest neighbor searching xed
dimensions. J. of the ACM, 45:891923, 1998.
[2] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. SURF:
Speeded up robust features. CVIU, 110:346359, 2008.
[3] M. Datar, P. Indyk, N. Immorlica, and V. S. Mirrokni.
Locality-sensitive hashing scheme based on p-stable
distributions. In Proc. SCG, pages 253262, 2004.
[4] C. Harris and M. Stephens. A combined corner and
edge detector. In Proc. AVC, pages 147151, 1988.
(a) [5] J. Hull, B. Erol, J. Graham, Q. Ke, H. Kishi,
J. Moraleda, and D. Van Olst. Paper-based augmented
reality. In Proc. ICAT, pages 205209, 2007.
[6] H. Kato and M. Billinghurst. Marker tracking and
hmd calibration for a video-based augmented reality
conferencing system. In Proc. IWAR, 1999.
[7] V. Lepetit, J. Pilet, and P. Fua. Point matching as a
classication problem for fast and robust object pose
estimation. In Proc. CVPR, pages 244250, 2004.
[8] D. G. Lowe. Distinctive image features from
scale-invariant keypoints. IJCV, 60:91110, 2004.
[9] T. Nakai, K. Iwata, and K. Kise. Accuracy
improvement and objective evaluation of annotation
extraction from printed documents. In Proc. DAS,
pages 329336, 2008.
[10] T. Nakai, K. Iwata, and K. Kise. Real-time retrieval
for images of documents in various languages using a
web camera. In Proc. ICDAR, pages 146150, 2009.
(b) [11] T. Nakai, K. Kise, and K. Iwata. Camera based
document image retrieval with more time and memory
ecient LLAH. In Proc. CBDAR, pages 2128, 2007.
[12] D. Nister and H. Stewenius. Scalable recognition with
a vocabulary tree. In Proc. CVPR, pages 21612168,
2006.
[13] E. Rosten and T. Drummond. Machine learning for
high speed corner detection. In Proc. ECCV, pages
430443, 2006.
[14] S. Sinha, J. Frahm, M. Pollefeys, and Y. Genc.
GPU-based video feature tracking and matching. In
Proc. EDGE, 2006.
[15] H. Uchiyama and H. Saito. Augmenting text
document by on-line learning of local arrangement of
keypoints. In Proc. ISMAR, pages 9598, 2009.
[16] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond,
and D. Schmalstieg. Pose tracking from natural
features on mobile phones. In Proc. ISMAR, pages
(c) 125134, 2008.
1 2
http://www.cdvp.dcu.ie/iCLIPS http://research.microsoft.com/en-us/um/cambridge/projects/sensecam
by iconized attributes, including people, location, actions on one’s memory. In the Seven Sins of Memory [12], Schacter
documents and time stamps, then allow filtering/searching for an characterizes seven daily memory problems including:
action on the document. transience, absent-mindedness, blocking, misattribution,
Most work in PLL capture to date has focused on short term suggestibility, bias, and persistence. These sins can generally fall
studies of a few days or a week or two of data. To support into three categories of memory problems namely: forgetting
research exploring technologies to truly augment human memory (transience, absent-mindedness, blocking), false memory
it is our belief that much longer term PLLs archives are needed (misattribution, suggestibility, bias), and the inability of forgetting
for research. As should be obvious, capturing PLLs using current (persistence). In the remainder of this section, we explain the
technologies requires a considerable investment by the subject mechanisms for these memory sins (problems), and discuss the
capturing their own PLL. Software must be monitored and data possible solutions that PLLs can offer.
carefully archived, more demandingly though the batteries on Table 1. Seven Sins of Memory
peripheral devices must be charged regularly and important data
Sins Meaning
uploaded to secure reliable storage locations. The iCLIPS project
transience the gradual loss of memory overtime
has so far gathered PLLs of 20 months duration from 3 subjects.
absent- incapability to retrieve memory due to the
Our experiences in capturing and archiving this data are described
mindedness lack of attention while encoding the
in detail in [11].
information.
3. MEMORY SUPPORT NEEDS blocking the failure of retrieving encoded
Since people will only turn to use memory aid tools when they information form memory due to the
feel unconfident or incapable of retrieving a piece of information interference of similar information
from their memory, we believe that a sound understanding of the retrieved or encoded before (proactive) or
memory problems people usually encounter in their daily life will after this (retroactive)
provide a guide of the functionality of memory aid tools. misattribution remembering information without correctly
recollecting where this information is from
In this section we first explain memory problems and the suggestibility reconstructing a set of information with
mechanisms which cause problems based on psychology research, false elements, which are from the
we then review existing studies in exploring normal people’s suggested cues at the time of retrieval
memory failures and needs for memory aid tools in daily life, and bias people’s current retrieved or reconstructed
finally we discuss the possible functions that PLLs may be able to memory is influenced by current emotions
provide for augmenting human memory. or knowledge
persistence inability to forget things which one wants
3.1 Theoretical Review to forget
Memory is a cognitive ability to encode, store and retrieve
information. Encoding is the process of converting sensors
received external stimuli into signals which the neuron system in Encoding newly encountered information or thoughts needs to
the brain can interpret, and then absorbing the newly received process them in a short term memory (STM) system, which is
information into long term storage, termed long term memory called working memory (WM). The WM system is comprised of
(LTM). Retrieval is the process of bringing back information from subsystems including separate short term storage channels for
the LTM storage. Different types of retrieval approaches are used visual spatial and acoustic (sound) information, and an episodic
for different types of memory. The two basic categories of buffer which links newly incoming information with what is
memory systems are procedural memory and declarative memory. already in long term storage. WM also has a central executive
Procedural memory is also called implicit memory, meaning that module which assigns cognitive resource (especially attention) to
it is usually retrieved without explicit awareness or mental effort. the channels [13, 14]. Thus the absence of attention can reduce
Examples include memory of motor skills, oral language, and the encoding efficiency or even cause encoding failure of some
memory of some types of routines. Procedural memory usually information input at that time (this is the so-called “absent-
requires minimum cognitive resource and is very durable. It has mindedness” in the seven sins of memory). And information
been found that even people with serious global memory which was paid more attention to is more likely to be better
impairments have preserved procedural memory. For this reason, encoded and therefore more likely to be better remembered. It has
memory aids for procedural memory are not explored in this been suggested that emotion can often influence attention at
paper. Declarative memory as opposed to procedural memory, encoding, and therefore influence the memory of items.
usually involves explicit awareness during encoding and retrieval. Regarding LTM, it has been argued that information in human
There are two major types of declarative memory: semantic memory exists in an associative network, the activation of one
memory, meaning memory of facts, and episodic memory, piece of information (by external output, e.g. presenting that
referring to the memory of experiences, which is usually related information again) can trigger the memory of its linked nodes
to temporal context. Most of our memory problems are [15]. The stronger the link, the more likely the trigger is going to
declarative memory problems. happen. This is why recall is easier when a cue is presented (cued
Although most memory problems can only be observed during recall) than when there is not (free recall). It has been suggested
retrieval, since current techniques are not advanced enough to by many psychology studies that it is the lack of proper links to
know what’s happening in the human mind, failures at any stage information, rather than the loss of the memory of information
can cause problems in memory. For example, failure to encode itself that cause “forgetting”. Since one node of memory may be
encountered information makes the information unavailable in linked to several other nodes, it is important that only the required
information be triggered. Thus, inhibition is an important function word or phrase. Prospective memory problems were also found to
of human memory. However, it may also induce ‘blocking’. A be frequent and usually severe.
classic example is the ‘tip of the tongue' (TOT) phenomenon, The diary study by Hayes et al [19] took a more direct approach
where one is unable to recall the name of some well remembered and explored the situations in which people wanted to use their
information, feeling that the memory is being temporarily memory aid tool, a mobile audio recoding device called Audio
blocked. Loop, to recollect the recorded past. The questions in their diary
False memory, meaning memory errors or inaccurate recollection, study not only included memory failures, but also how much time
also arises due to the structure of the associative memory the participants would be willing to spend on recalling such
network. According to Loftus [16], every time a piece of memory content. Their results showed that for neural events, people would
is retrieved, it is actually reconstructed with associated small spend an average of 336 seconds (σ = 172) to find the required
nodes of information. False memories can bring various problems information from voice records. 62% of the reported reasons for
in daily life. For example, “Misattribution” of witnesses can cause returning to an audio recoding were because of “cannot
serious legal problems if a witness does not know whether the remember”, 33% out of 62% was transience type retrieval failure,
source is from reality or was in a dream or on TV or even while 29% out of 62% were due to failure of encoding (e.g.
imagined. absent-mindedness). Another 26% of their reasons for searching
As for the sin of persistence, this is actually a problem of mental recorded audio were to present the records themselves to other
well-being and cognitive problems with memory. The reason for people. And finally 12% of recordings were marked as important
persistence is that unwanted and sometimes even traumatic before recording. While the reasons for rehearsing these predicted
memories are so well encoded, rehearsed and consolidated, that important records were not described, these results indicate that
they may not be buried or erased. According to theories of important events are likely to be reviewed, and that people may
forgetting, these memories can be “blocked” if the external cues want to “rehearse” recoding of important parts to consolidate their
can form strong link with memories of other experiences, ideally memory of information encountered during the period. Due to
happy experiences. Therefore, having people rehearsing more limitations of the information they record (selective audio
happy memories may find these helpful to replace their memories recording), and the specific tool they use, the scenarios in which
of traumatic experiences. The question of which pieces of happy people may need memory aids might be limited. For example,
memory to present is beyond the scope of our work, and is left to when the experience is largely made of visual memory, audio
clinical psychologists. records may not be helpful and not be desired.
In summary, there are two main reasons for difficulty in retrieving 3.3 Summary
a memory, namely: absence of the memory due to failure at While all of the above studies successfully discovered some daily
encoding, or the lack of proper and strong cues to link to and memory problems, the non-monitored self-reporting approach is
access the correct pieces of memory. For memory problems limited in that the people can only report their needs for memory
arising from both causes, PLLs may have the potential to provide support when they are aware of a difficulty in retrieving a
supplements. Data in PLLs can provide some details which one memory. While it is true that people may only seek help for
failed to encode due to “Absent-mindedness”, or which have specific parts of their memory when they realize that they have
faded in one’s memory over time. It can also provide cues for problem in recollecting these pieces of information from their
memories which have been “blocked”. memory, they are not always very clear as to what they actually
want to retrieve until they bring back the piece of memory. For
3.2 Empirical Studies example, sometimes people just want to review (mentally) some
In this section, we further explore the needs for memory aids past episodes for fun or because of nostalgia. They usually look at
though some documented empirical studies, and use the results of some photos or objects which are related to past events, and
this work to focus our investigation. which bring them more vivid memories of past experiences. Due
In [17], Elsweiler et al explored people’s daily memory problems to the richness of data, lifelogs can provide more details about the
with a diary study in working settings with 25 participants from past than any physical mementos can do.
various backgrounds. They concluded that the participants’ diary In short, lifelogs are a good resource for supporting retrospective
inputs can be split into 3 categories of memory problem: memory problems, including those we have gradually forgotten,
Retrospective Memory problems (47% in their data entry), distorted, or we missed while encoding. Consolidating memory of
Prospective Memory (29%), and Action Slips (24%), which are useful information cam also can also be used to provide digital
also a type of prospective memory failure caused by firmly copies of episodes (e.g. when we need to give a video record of a
routine actions rooted in procedural memory. Since prospective meeting to some one who failed to attend), or provide memory
memory failure and action slips usually happen before the person cues to trigger a person’s organic memory about the original
is made aware of them by experiencing the consequent error information, experiences, emotion, or even thoughts. Lifelogs
caused by the problem, it is unlikely that people will actively seek might also be able to improve a subject’s memory capability by
help from memory aids in these cases, unless the memory aid is training them to elaborate or associate pieces of information.
proactive and intelligent enough to understand what is going on.
Indeed, supporting people’s memory is not only a matter of
Lamming et al [18] also did a diary study to explore possible finding the missing or mistaken parts of memory for them but also
memory problems during work, and found that the most improving their long term memory capabilities. It has been argued
frequently occurring memory problems include: forgetting one’s that the better memory is often related to the ability to associate
name, forgetting a paper document’s location, and forgetting a things, and make decisions of which information to retrieve. For
example, older people usually have less elaborated memory [20].
In the study by [21], psychologists found a tendency for people which can be used as a summary of events, can also be good at
with highly elaborated daily schemas to recall more activities reducing information overload compared to viewing videos
from last week better than people with poorly elaborated schemas. (e.g.[10]). This requires that the system either to detect important
Therefore, memory-supporting tools may be able to assist people parts, or digitize and textualize describable features of physical
to associate things in order to elaborate and consolidate their world entities or events should be digitized to facilitate retrieval.
memories, and which can facilitate retrieval by strengthening the The term digitalize in this paper means represent the existence of
links between memories and the cues that life logs systems can physical world entity as digital items, e.g. an image or a line of
provide, and potentially enhancing their efficiency at performing data in the database. These can be searched directly using certain
various tasks. features (cues), rather than with the features of episodes in which
such information is encountered, e.g. features of a person and a
corresponding profile. Overall appropriate cues really depend on
4. GUIDELINE FOR DEVELOPING LIFE what people tend to remember. Therefore it is important to
explore the question of what people usually remember about the
LOG APPLICATIONS target.
Based on the previous sections, lifelogs should be able to provide
the following:
4.2 Data Capture
• Memory cues, rather than external copies of episodic In principle, the more information that is captured and stored in
memory. lifelogs, the greater will be the chance that the required
• Information or items themselves: semantic memory information can be found in the data collection. However, the
support, when one needs to exact details about previous more data that is collected the more the noise level may also
encountered information, or when one needs the increase and impose a greater burden on the life logger. In order
original digital item, e.g. a document. for a PLL to support the above memory augmenting functions, the
following data channels are needed:
Whether it is the information itself which is needed, or the target
triggered memory, it is important that these items or this 1. Visual
information can be retrieved when needed, and that relevant For the majority of individuals, most of our information is
retrieved results can be recognized by the user. Indeed, what to inputted via our eyes, therefore it is important that encountered
retrieve and even what to capture and store in life logs depends on visual information be captured. While video can capture almost
what needs to be presented to the user to serve the desired every moment when it is recording, watching video streams is a
memory aid functions. heavy information load. However, browsing static images or
photos can be much easier job. Some automatic capturing cameras
4.1 Presenting have been proved to provide rich memory cues [23]. The
There are basically two rules for presenting information:
Microsoft SenseCam is one such wearable camera which
1. Provide useful information as memory cues automatically captures images throughout the wearer’s day. It
takes VGA quality images at up to 10 images per minute. The
When items are presented to the user, it is desirable that the images taken can either be triggered by a sensed change in
information shown can be recognized by the user as what they environment or by fixed timeout. Other examples include the Eye
want, and that if the retrieval targets are cues that are expected to Tap [24] and the other Brother [25].
be useful to triggers to the user’s own memory about something,
e.g. experiences which cannot be copied digitally, it is also 2. Speech
essential that the retrieved targets are good memory cues for the
Another important source of information in daily life comes from
memory that the user wants to recall, e.g. the memory of an
audio. For example, much useful information comes from
experience.
conversations. However, as mentioned previously continuous
Lamming et al. [18] suggested that memory supporting tools audio recording has been argued to be intrusive and unacceptable
should not only provide episodes or information one forgets, but to surrounding people. For this reason, it is difficult to carry out
also episodic cues including other episodes with the temporal continuous audio recording. Some existing studies, such as [9]
relationships among them, together with information about the discussed early, record audio for limited significant periods,
characteristics of these episodes. It is suggested in [8] that the however we chose not to do this since this requires active
features usually visible in episodic memory cues are: who (a face, decisions of when to begin and end capture and careful choice of
any people in the background), where (a room, objects and when to do this to avoid privacy problems. We preferred to
landmarks in the environment), when (time stamped, light continuous and passive capture modes which are non-intrusive.
conditions, season, clothing and hair styles), and what (any visible An alternative source of much of the information traditionally
actions, the whether, etc.) conveyed in spoken conversation is now appearing in digital text
communications as described in the next section.
2. Avoid information overload
3. Textual (especially from digital born items):
It is also necessary to avoid information overload when presenting
material as a memory aid. In [22], it was found that when Nowadays, we communicate more and more with digital
unnecessary information is reduced and important parts of messages (email, instant message, and text message). These
information are played more slowly, their memory aid application content sources contain an increasing portion of the information
achieved its best results. We suggest that text or static images used in daily life which used to come from spoken conversations.
These digital resources, usually in the form of text, have less glaring in the window and she was talking on the phone to her
noise from surrounding environment and irrelevant people, and friend Jack. Conventional search techniques would not be capable
therefore have less likelihood of intruding on a third person’s of retrieving the correct photo based on these context criteria that
privacy. Text extracted from communication records (e.g. emails, are unrelated to its contents. Use of the remembered context
text messages) can be even used to assist narrative events and would enable her to search for pictures viewed while speaking
represent computer activities to trigger related episodic memory with Jack while the weather was sunny. The notion of using
(e.g. [10]). context to aid retrieval in this and other domains is not new.
Context is a crucial component of memory for recollection of
4. Context items we wish to retrieve from a PLL. In previous work we
As mentioned earlier, context information such as location and examined the use of forms of context data, or combinations of
people presented can provide important memory cues for events them, for retrieval from a PLL [28]. This work illustrated that in
[26]. Therefore they are both important for presenting events and some situations a user can remember context features such as time
can be useful for retrieving items related to events. and location, much better than the exact content of a search item,
and that incorporating this information in the search process can
4.3 Retrieval improve retrieval accuracy when looking for partially
The final and possibly most challenging component of an remembered items.
augmented memory application built on a PLL is retrieval. It is Ideally, as argued by Rhodes [29], a memory augmentation
essential that useful information be retrieved efficiently and system should provide information proactively according to the
accurately from a PLL archive in response to the user’s current user’s needs in their current situation. Many studies on ubiquitous
information needs. In order to be used most efficiently by the computing have been devoted to research into detecting events.
user, retrieval must have a high level of precision so as not to For example, retrieving an object related to a recording when
overload the user’s working memory. It is recognized that a key someone touches an object for which the sensor information is
feature of good problem solving is the ability of an individual to passed to the retrieval system as a query. Another system called
retrieve highly relevant information so that they do not have to Ubiquitous memories [7] automatically retrieves target objects
expend effort on selecting pertinent information from among related to a video recoding which automatically tagged when
related information which is not of direct use in the current touching the object. Face detection techniques are used in [8] to
situation. Being able to filter non-relevant information is an tag a person related to a memory, and enable automatic retrieval
important feature of good problem solving. of personal information triggered by detecting of the face.
Finding relevant information in such enormous data collections to Satisfying the need for high precision retrieval from PLLs
serve a user’s needs is very challenging. The characteristics of discussed earlier requires search queries to be as rich as possible
PLLs mean that they provide a number of challenges for retrieval by including as much information as possible about the user‘s
which are different to those in more familiar search scenarios such information need, and then to exploit this information to achieve
as search of the World Wide Web. Among these features are that: the highest possible effectiveness in the search process. Our
items will often not have formal textual descriptions; many items underlying search system is based on the BM25F extension to the
will be very similar, repeatedly covering common features of the standard Okapi probabilistic information retrieval model [30].
user's life; related items will often not be joined by links; and the BM25F is designed to most effectively combine multiple fields
archive will contain much non-useful data that the user will never from documents (content and context) in a theoretically well
wish to retrieve. The complex and heterogeneous natures of these motivated way for improved retrieval accuracy; BM25F was
archives means that we can consider them to be a labyrinth of originally developed for search of web type documents which, as
partially connected related information [27]. The challenge for outlined above, are very different to the characteristics of a life
PLL search is to guide the owner to elements which are pertinent log. Thus we are also interested in work such as [31] which
to their current context, in the same way as their own biological explores ways of combining multiple fields for retrieval in the
memory does in a more in complex and integrated fashion. domain of desktop search. Our current research is extending our
Traditional retrieval methods require users to generate a search earlier work, e.g. [28], to investigate retrieval behaviour using our
query to seek the desired information. Thus they rely on the user’s experimental PLL collections to explore new retrieval models
memory to recall information related to the target in order to form specifically developed for this data. In addition, PLL search can
a suitable search query. Often however the user may have a very also include features such as biometric measures to help in
poor recollection of the item from their past that they wish to location of highly relevant information [4].
locate. In this case, the system should provide search options of
features that people tend to remember. For example, the location
and people attending an event may be well remembered, thus the 5. iCLIPS - A PROTOTYPE PLL SEARCH
search engine should enable search using this information. In fact, SYSTEM
the user may not even be aware of or remember that an item was The iCLIPS project at DCU is developing technologies for
captured and is available for retrieval, or even that a particular effective search of PLLs. To support this research, three
event occurred at all, so they won’t even look for this item researchers are carrying out long term lifelog data collection. As
without assistance. outlined in Section 2, these collections already include 20 months
We can illustrate some of the challenges posed by PLLs retrieval of data, including visual capture of the physical world events with
using an example. Consider a scenario where someone is looking Microsoft SenseCams [32], full indexing of accessed information
for a particular photo from her PLL archive. All she remembers on computers and mobiles phones, and context data including
about the picture is that last time she viewed it, the sun was location via GPS and people with Bluetooth. The Microsoft
SenseCam also captures sensor information such as light status being searched with its inherent redundancy of data with
and movements (accelerometer). Our system indexes every information often repeated in different forms in multiple
computer activity and SenseCam image with time stamps and documents meaning that pieces of information are accessible from
context data including location, people, and weather. It enables different sources using a wide range of queries from users with
search of these files by textual content and above context such. differing linguistic sophistication and knowledge of the domain.
Part of our work continues to focus on the development of novel Additionally in the case of the web link structures generated by
effective search algorithms to best exploit content and context for the community of web authors can be exploited to direct searchers
PLL search. The other focus of the project is the development of a to authoritative or popular pages. In the case of specialised
prototype system to explore user interaction with a PLL to satisfy collections such as medical or legal collections, users are typically
their desire for information derived from their previous life domain experts who will use a vocabulary well matched to that in
experiences. documents in the collection. As outlined in Section 5.3 the
characteristics of PLL collections are quite different to
One of the reasons for the success of popular established search
conventional search collections. An interface to search a PLL
engines such as Google is that their interface is simple to use.
collection requires that the user can enter queries using a range of
Once a few concepts have been understood users are able to use content and context features. The memory association between
these search engines to support their information search activities.
partially remembered life events means that more sophisticated
However, simple interfaces to existing collections work well to a
interfaces supported browsing of the PLLs using different facets
large extent due to the features of the data being searched and the
are likely to be needed to support the satisfaction of user
background of the users. In the case of web search engines the
information needs. Essentially users need an interface to enable
domain knowledge, search experiences and technical background
them to explore the labyrinth of their memory using different
of searchers is very varied. However, the size of the collection
recalled facets of their experiences.
Figure 1 shows our prototype interface for use of a PLL as a presented, thus both searching and browsing panels are
daily memory aid for normal people. In particular, it aims to included.
serve the functions of: providing specific information or digital
items to supplement the parts of memory which are not Search
available to be retrieved; providing cues of specific episodes to The interface provides a range of search options to cater for the
assist the user to rehearse experiences during that period. It also different types of information people may be able to recall about
seeks to assist users in improving memory capability though the episodes or items, such as location, people present, weather
repeatedly associating events or information. This interface conditions and date/time. We understand the burden of trying to
requires user effort to look for or choose the information to be recall and enter all of these details for a single search, so we
adopt the virtues of navigation, and put more weight on the
presentation and browsing of results. This is particularly 6. CONCLUSIONS AND FURTHER
important in cases where over general search queries may bring
too many results for easy presentation. For example, sometimes WORK
people just want to have a look at what happened during certain In conclusion, developments in digital collection and storage
periods, e.g. when they were in middle school, and enter a time- technologies are enabling the collection of very large long term
based query: year 1998, this may result in huge amount of result personal information archives in the form of PLLs storing
data being retrieved which must then be explored by the user. details of an individual’s life experiences. Combining these with
effective tools for retrieval and presentation provides the
Navigation potential for memory aid tools as part of the augmented human.
To avoid information overload when there are a large number of Effective solutions will enable user’s to confirm partially
items as results, and provide instant memory cues for each small remembered facts from their past, and be reminded of things
step, we adopt the advantages of location-based hierarchical they have forgotten about. Applications include recreational and
folder structures to let users navigate and browse search results social situations (e.g. sharing details of a life event), being
which are grouped either temporally or by attributes such as reminded of information in a work situation (e.g. previous
location or core people attended. Based on psychology meetings with an individual, being provided with materials
literature, we believe that when, where and who are well encountered in the past), and potentially for more effective
remembered features of episodes, therefore grouping items problem solving. Integrating these technologies to really support
based on these features makes it easier for users to remember and augment humans requires that we understand how memory
and know where there target is. It also enables them to jump to is used (and how it fails), and to identify opportunities for
other results which have similar attributes (e.g. in the same supporting individuals in their life activities via memory aids.
location, with same group of people). By doing so we also The iCLIPS project is seeking to address these issues by
expect the system to help people remember more context data developing technologies and protocols for collection and
for each event or item, generating more useful associations in management, and for effective search and interaction with
their memory and elaborating them. PLLs.
Presenting results Our current work is concentrated on completing our prototype
system to explore memory augmentation using long-term PLL
While presenting the results, we provide context cues to help
archives. Going forward we are seeking methods for closer
people recognize their target and related folders more easily.
integration between PLLs, the search process and human use of
Since temporally adjacent activities are argued to be good
memory, possibly involving mobile applications and
episodic memory cues, the system enables preview of folders by
presentation of using emerging display technologies such as
presenting landmark events or computer activities (if there are
head up displays and augmented reality.
any) on a timeline. A “term cloud” (a group of selected
keywords, similar to a conventional “tag cloud”) of the
computer activities is also presented in the form of text below 7. ACKNOWLEDGMENTS
the timeline, by clicking a word, its frequency of appearance is This work is supported by grant from Science Foundation
displayed. Again this is designed to provide more memory cues Ireland Research Frontiers Programme 2006. Grant No:
for recalling what the user was doing with documents which 06/RFP/CMS023.
contain such keywords. For example, one may remember that
the target needed was previously encountered during the period 8. REFERENCES
when he/she read a lot about “SenseCam”. The name of the
[1] Byrne, D. and Jones, G., "Creating Stories for Reflection
location and the people are also included in the term clouds for
from Multimodal Lifelog Content: An Initial
the same reason.
Investigation," in Designing for Reflection on Experience,
Due to the complex functions provided in the interface, it is not Workshop at CHI 2009, Boston, MA, U.S.A., 2009.
suitable for portable or wearable devices. Thus it is not aimed at [2] Berry, E., et al., "The use of a wearable camera,
solving memory problems which need solution urgently while SenseCam, as a pictorial diary to improve
the person is away from computers. Alternative interfaces autobiographical memory in a patient with limbic
potentially automatically taking account of current user context encephalitis: A preliminary report," Neuropsychological
(location, associates nearby, and time) would be needed for Rehabilitation: An International Journal, vol. 17, pp. 582
mobile interaction is planned to a part of our further study. - 601, 2007.
[3] Bell, G. and Gemmell, J., Total Recall: Dutton 2009.
We are currently undertaking user studies to evaluate the [4] Kelly, L. and Jones, G., "Examining the Utility of
prototype system. These evaluations include the reliability with Affective Response in Search of Personal Lifelogs," in 5th
which episodes in the results can be recognized from the Workshop on Emotion in HCI, British HCI Conference,
features presented to the searcher, whether they feel that it is Cambridge, U.K, 2009.
easy to recall at least one piece of information required by the [5] Devaul, R. W., "The memory glasses: wearable computing
search fields, and the effectiveness of the retrieval algorithms. If for just-in-time memory support," Massachusetts Institute
these functions are fully working, we can explore how the life of Technology, 2004.
loggers prefer to use these data in supporting their memory, and [6] Lee, H., et al., "Constructing a SenseCam visual diary as a
what functions they may want to use in different situations, with media process," Multimedia Systems, vol. 14, pp. 341-349,
our system and our data collection. 2008.
[7] Kawamura, T., et al., "Ubiquitous Memories: a memory Video Aided Rehearsal," presented at the Proceedings of
externalization system using physical objects," Personal the 2005 IEEE Conference 2005 on Virtual Reality, 2005.
Ubiquitous Comput., vol. 11, pp. 287-298, 2007. [23] Sellen, A. J., et al., "Do life-logging technologies support
[8] Farringdon, J. and Oni, V., "Visual Augmented Memory memory for the past?: an experimental study using
(VAM)," presented at the Proceedings of the 4th IEEE sensecam," presented at the Proceedings of the SIGCHI
International Symposium on Wearable Computers, 2000. conference on Human factors in computing systems, San
[9] Vemuri, S., et al., "iRemember: a personal, long-term Jose, California, USA, 2007.
memory prosthesis," presented at the Proceedings of the [24] Mann, S., "Continuous lifelong capture of personal
3rd ACM workshop on Continuous archival and retrival of experience with EyeTap," presented at the Proceedings of
personal experences, Santa Barbara, California, USA, the the 1st ACM workshop on Continuous archival and
2006. retrieval of personal experiences, New York, New York,
[10] Lamming, M. and Flynn, M., "Forget-me-not: intimate USA, 2004.
computing in support of human memory," in Proceedings [25] Helmes, J., et al., "The other brother: re-experiencing
FRIEND21 Symposium on Next Generation Human spontaneous moments from domestic life," presented at
Interfaces, Tokyo Japan, 1994. the Proceedings of the 3rd International Conference on
[11] Byrne, D., et al., "Multiple Multimodal Mobile Devices: Tangible and Embedded Interaction, Cambridge, United
Lessons Learned from Engineering Lifelog Solutions," in Kingdom, 2009.
Handbook of Research on Mobile Software Engineering: [26] Tulving, E., Elements of episodic memory: Oxford
Design, Implementation and Emergent Applications, ed: University Press New York, 1983.
IGI Publishing, 2010. [27] Kelly, L. and Jones, G. J. F., "Venturing into the
[12] Schacter, D. L., The seven sins of memory. . Boston: labyrinth: the information retrieval challenge of human
Houghton Mifflin, 2001. digital memories," presented at the Workshop on
[13] Baddeley, A., "The episodic buffer: a new component of Supporting Human Memory with Interactive Systems,
working memory?," Trends in Cognitive Sciences, vol. 4, Lancaster, UK, 2007.
pp. 417-423, 2000. [28] Kelly, L., et al., "A study of remembered context for
[14] Baddeley, A. D., et al., "Working Memory," in information access from personal digital archives,"
Psychology of Learning and Motivation. vol. Volume 8, presented at the Proceedings of the second international
ed: Academic Press, 1974, pp. 47-89. symposium on Information interaction in context, London,
[15] Anderson, J. and Bower, G., Human associative memory: United Kingdom, 2008.
A brief edition: Lawrence Erlbaum, 1980. [29] Rhodes, B. J., "The wearable remembrance agent: a
[16] Loftus, E., "Memory Distortion and False Memory system for augmented memory," presented at the
Creation," vol. 24, ed, 1996, pp. 281-295. Proceedings of the 1st IEEE International Symposium on
[17] Elsweiler, D., et al., "Towards memory supporting Wearable Computers, 1997.
personal information management tools," J. Am. Soc. Inf. [30] Robertson, S., et al., "Simple BM25 extension to multiple
Sci. Technol., vol. 58, pp. 924-946, 2007. weighted fields," presented at the Proceedings of the
[18] Lamming, M., et al., "The Design of a Human Memory thirteenth ACM international conference on Information
Prosthesis," The Computer Journal, vol. 37, pp. 153-163, and knowledge management, Washington, D.C., USA,
March 1, 1994 1994. 2004.
[19] Hayes, G. R., et al., "The Personal Audio Loop: Designing [31] Kim, J., et al., "A Probabilistic Retrieval Model for
a Ubiquitous Audio-Based Memory Aid," ed, 2004, pp. Semistructured Data," presented at the Proceedings of the
168-179. 31th European Conference on IR Research on Advances
[20] Rankin, J. L. and Collins, M., "Adult Age Differences in in Information Retrieval, Toulouse, France, 2009.
Memory Elaboration," J Gerontol, vol. 40, pp. 451-458, [32] Gemmell, J., et al., "Passive capture and ensuing issues for
July 1, 1985 1985. a personal lifetime store," presented at the Proceedings of
[21] Eldridge, M. A., et al., "Autobiographical memory and the the 1st ACM workshop on Continuous archival and
daily schemas at work," Memory, vol. 2, pp. 51-74, 1994. retrieval of personal experiences, New York, New York,
[22] Hirose, Y., "iFlashBack: A Wearable Electronic USA, 2004.
Mnemonics to Retain Episodic Memory Visually Real by
Aided Eyes: Eye Activity Sensing for Daily Life
vergences. Eye movement has several moving speeds. There sufficiently detect eye blinks. Therefore, it has a high con-
are several types of high-speed eye movements. For exam- structability for daily use.
ple, the microsaccade frequency is more than 1000 Hz, and
the eye blink speed is around 150 ms. The method must dis- Therefore, we use an “infrared corneal limbus tracker” in
tinguish precisely between eye movement and blinks for an our study. This method has a lower accuracy than the
accurate detection of eye movements. Further, the human method of search coil and optical lever. However, our pur-
view angle is almost 160◦ for each eye. Therefore, a 5◦ res- pose is to extract significant information; hence, the accu-
olution is sufficient for information extraction because this racy of this method can be enhanced by combining image
system aims not only to achieve a high accuracy but also processing methods and contextual information such as eye
extract information by a daily usable system using a com- direction.
bination of eye activity information and image processing
methods. 3.3 Prototype of Eye Activity Sensor
Four phototransistors and two infrared LEDs are mounted
on the eye glassed as shown in Figure 2. A small camera is
3.2 Eye-tracking Technology Candidates mounted on the glasses for recording surrounding informa-
There are several types of eye trackers. In this study, we tion, and not for eye tracking. An infrared LED and four
consider in four different trackers: phototransistors are mounted inside of the glasses.
Camera based system: The video-based systems [9, 11] The infrared light is reflected by the eye surface and is
can capture a gaze texture image. This is the most com- received by the phototransistor. These sensor values throw
monly used tracker; however, it requires an extremely so- to instrumentation amplifier and analog/digital (AD) con-
phisticated optics system having a light source, lenses, and version, then input to the microprocessing unit (MPU). In
half mirrors. Additionally, it requires a large (table top this study, ATmega128 from Atmel is used for the MPU and
size) measurement system for quick eye movements (over AD conversion. The MPU clock frequency is 16 MHz, and
1000 Hz). Scale-wise, it is possible to develop a smaller the AD conversion time is 16μs per channel.
system; however, currently, such a system cannot measure
high-speed eye movements. Before the measurement, the head position and the display
are fixed for a calibration, and then, the display shows the
Search coil and Optical lever: These methods [13, 18] targets to be gazed in the calibration (Figure 3). The sensor
are used for laboratory experiments in a certain region of wearer gazes at the target object on the display, and the
space. However, these methods are not user friendly as the MPU records the sensor value. One target has 240 points
users are expected to wear special contact lenses that using (W 20 points x H 12 points) and each points are gazed for 1
a negative pressure on their eyes. second. After the calibration, the system estimates the gaze
direction by using the recorded data. The recorded data
Electrooculogram (EOG): Eyes have a steady electric and sensor value are compared first. Then, the center of
potential field, and this electric signal can be derived by us- gravity is calculated from the result in order to estimate the
ing two pairs of contact electrodes that are placed on the gaze direction. Simple method is enough for this research
skin around one eye. This is a very lightweight approach [2] because only gaze area in the picture is needed to know for
and can work if the eyes are closed. However, it requires an using information extraction system.
eye blink detection method and has other issues. For exam-
ple, an electrode is required and is affected by electronoise. 3.4 Life Events Extracting System
When an infrared limbus tracking method is used, the
Infrared corneal limbus tracker: An infrared corneal sensor value is changed rapidly by eye blinking. The speed
limbus tracker [14] is also a very lightweight tracker. It can is approximately 150 ms, as shown in Figure 4. Therefore,
be built by using a light source (infrared LED) and light the system can simply distinguish between blinks and other
sensors (phototransistor) and only requires very low com- eye movements. Further, the system extracts information
putational power. This approach is also affected by noise as face, texts, and preregistered objects. Pre-registered ob-
from environmental light. However, this is a very simple jects are recognized in real time by the user’s visual attention
approach; no electrodes are required. This approach can area. We use fast object recognition by using the SURF [1]
' #'
#'"
&"#"
&"#"
%!
%!
"$
"$
!
Figure 6: An example graph of eye blink frequency
Figure 3: A calibration method for the gaze recog-
nizer system. The head position and the display are
fixed for a calibration and then the display shows descriptor for matching images that is limited to the gazed
targets. A user gazes target object on the display area with the past-images database (Figure 5).
and MPU records sensor value.
Face recognition using haar-like objects by “OpenCV Li-
brary1 ” is implemented for logging of “When I meet some-
one.” This method can extract the human face first, and
then the system records the time, location, and face image.
Additionally, text logging with the OCR technology “tesseract-
ocr2 ” is implemented. This system can extract a clipped im-
!"$
!
Figure 7: Photographs of experimental environment
$" " !#"
$" "
!
The IDs of these extracted objects are logged with time, ac-
tual images, and the eye direction when the system detects
the pre-registered objects, as shown in Figure 9. Figure 10
shows the optical character reading of the gazed informa-
tion. An image of the gazed area is clipped, characters are
extracted from the clipped image. Additionally, the face
image is extracted along with the actual time, as shown in Figure 10: An example image of OCR extraction for
Figure 11. Usually, when multiple people stand in front of clipped image by gaze information using tesseract-
the camera, such as in a city or a meeting room, the normal ocr
recorded video image does not tell you who you are looking
at. However, this method can pick up who you are looking
at by using gaze information. Our system can handled with
multiple objects that shown up in head-mounted camera.
Finally, these three pieces of data are logged automatically.
!" # !"