XVIII Congreso de La Asociacin Espaola para El Doc - (PG 122 - 161)

XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 1
C
op
yri
gh
t
©
20
12
.
U
ni
ve
rsi
tat
Ja
u
m
e
I.
S
er
ve
i
de
C
o
m
un
ic
ac
ió
i
P
ub
lic
ac
io
ns
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:36:19.
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.619 0.039 0.468 0.619 0.533 0.79 mu´sica
0.318 0.049 0.316 0.318 0.317 0.635 econom´ıa
0.503 0.085 0.565 0.503 0.532 0.709 entretenimiento
0.814 0.192 0.721 0.814 0.765 0.811 pol´ıtica
0.354 0.014 0.386 0.354 0.37 0.67 cine
0.241 0.017 0.175 0.241 0.203 0.612 literatura
0.442 0.102 0.551 0.442 0.491 0.67 otros
0.162 0.013 0.194 0.162 0.176 0.575 tecnolog´ıa
0.5 0.009 0.419 0.5 0.456 0.745 deportes
0.409 0.014 0.5 0.409 0.45 0.698 fu
0.584 0.117 0.579 0.584 0.578 0.734 ´tbol Avg.
Weighted
Table 2: Detail of Configuration 2 of topic detection with Complement Naive Bayes.

Configuration number 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Parameters
N-gram 1 1 1 1 1 1 1 1 2 2 1 1 2 1
Only n-gram
Lemma/Stem ( L /S) L L L S L L L L L L L S S L
Use input data X X X X X X X X X X X X X X
Affective dictionary X X X X X X X X X X X
SMS X X X X X X X X X X X
Word types (Adj, Verb) X X X X X X X X X X
Correct words X X
Weight X X X
Negation X X X X X X X X X X X X
Classifiers (Accuracy)
Ibk 31,32 31,32 29,78 31,32 31,32 31,32 32,47 31,32 31,52 32,47 31,32 28,78 29,08 29,78
ComplementNaiveBayes 30,18 29,88 17,93 28,74 30,13 30,23 28,49 30,18 28,74 28,49 30,23 16,88 39,49 17,93
NaiveBayesMultinomial 32,82 32,97 32,97 33,37 32,77 32,87 32,52 32,82 32,87 32,52 32,87 32,52 42,38 32,97
RandomCommittee 33,72 34,16 38,24 34,61 34,31 33,67 34,41 34,36 34,01 34,41 33,67 38,34 38,14 38,24
SMO 39,79 39,64 41,93 38,94 39,59 39,6 29,24 39,74 38,3 39,24 39,6 41,38 41,43 41,93
C Figure 2: Accuracy (%) of different configurations for sentiment analysis in the small data set.
op
yri
gh
t
©
20 http://www.diccionariosms.com, ac- In Proceedings of the Workshop on Lan-
12
. cessed August 2012. guages in Social Media, LSM ’11, pages
U
30–38, Stroudsburg, PA, USA. Associa-
ni
2012. Hunspell: open source spell check-
ve
tion for Computational Linguistics.
rsi
tat
ing, stemming, morphological analysis
Ja
u
and generation under gpl, lgpl or mpl li- Allan, James. 2002. Topic detection and
m censes. http://hunspell.sourceforge.net/, tracking. Kluwer Academic Publishers,
e
I. accessed August 2012. Norwell, MA, USA, chapter Introduction
S
er to topic detection and tracking, pages 1–
ve 2012. Snowball. 16.
i
de http://snowball.tartarus.org/, accessed
C
o August 2012. at University of Waikato, Machine
m Learning Group. 2012. Weka
un
ic 2012. Taller de análisis de 3: Data mining software in Java.
ac
ió
sentimientos en la sepln / http://www.cs.waikato.ac.nz/ml/weka/,
i
P
workshop on sentiment analysis at sepln accessed August 2012.
ub (tass). http://www.daedalus.es/TASS,
lic
ac accessed August 2012. Banerjee, Somnath, Krishnan Ramanathan,
io
ns
and Ajay Gupta. 2007. Clustering short
Agarwal, Apoorv, Boyi Xie, Ilia Vovsha, texts using wikipedia. In Proceedings of
Owen Rambow, and Rebecca Passonneau. the 30th annual international ACM SIGIR
2011. Sentiment analysis of twitter data. conference on Research and
development
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.285 0.073 0.368 0.285 0.321 0.763 negative+
0.43 0.174 0.354 0.43 0.389 0.736 negative
0.064 0.028 0.145 0.064 0.089 0.577 neutral
0.14 0.047 0.317 0.14 0.194 0.616 positive
0.715 0.261 0.461 0.715 0.561 0.798 positive+
0.469 0.138 0.525 0.469 0.495 0.782 none
0.424 0.146 0.404 0.424 0.4 0.738 Weighted Avg.
Table 3: Detail of Configuration 13 of sentiment analysis with Naive Bayes Multinomial.
in information retrieval, SIGIR ’07, pages Go, Alec, Richa Bhayani, and Lei Huang.
787–788, New York, NY, USA. ACM. 2009. Twitter sentiment classification us-
Bermingham, Adam and Alan F. Smeaton. ing distant supervision. Processing, pages
2010. Classifying sentiment in microblogs: 1–6.
is brevity an advantage? In Jimmy Hall, Mark, Eibe Frank, Geoffrey Holmes,
Huang, Nick Koudas, Gareth J. F. Jones, Bernhard Pfahringer, Peter Reutemann,
Xindong Wu, Kevyn Collins-Thompson, and Ian H. Witten. The weka data mining
and Aijun An, editors, CIKM, pages 1833– software: An update.
1836. ACM.
Justin, T., R. Gajsek, V. Struc, and S. Do-
Brooke, Julian, Milan Tofiloski, and Maite brisek. 2010. Comparison of different
Taboada. 2009. Cross-Linguistic Senti- classification methods for emotion recog-
ment Analysis: From English to Spanish. nition. In MIPRO, 2010 Proceedings of
In Proc. International Conference on Re- the 33rd International Convention, pages
cent Advances in NLP. 700 –703, may.
Chang, Chih-Chung and Chih-Jen Lin. 2011. Kukich, Karen. 1992. Techniques for auto-
Libsvm: A library for support vector ma- matically correcting words in text. ACM
chines. ACM Trans. Intell. Syst. Comput. Surv., 24(4):377–439, December.
Technol.,
C 2(3):27:1–27:27, May. Laboreiro, Gustavo, Lu´ıs Sarmento, Jorge
op Teixeira, and Eugénio Oliveira. 2010.
yri
gh
Cruz, Ferm´ın L, Jose A Troyano, To- kenizing micro-blogging messages
t Fernando Enriquez, and Javier Ortega. using a text classification approach. In
©
20 2008. Clasificacion de documentos basada Proceed- ings of the fourth workshop on
12
. en la opinión: experimentos con un Analytics for noisy unstructured text data,
U
ni
corpus de cr´ıticas de cine en espanõl. AND ’10, pages 81–88, New York, NY,
ve
rsi
Procesamiento del Lenguaje Natural, USA. ACM.
tat 41:73–80.
Ja
u Lee, K., D. Palsetia, R. Narayanan, M.M.A.
m Engström, Charlotta. 2004. Topic depen-
e Patwary, A. Agrawal, and A. Choud-
dence in sentiment classification. Master’s
I.
hary. 2011. Twitter trending topic clas-
S
thesis, University of Cambridge.
er
ve
sification. In Data Mining Workshops
i Gabrilovich, Evgeniy and Shaul Markovitch. (ICDMW), 2011 IEEE 11th International
de
C 2005. Feature generation for text catego- Conference on, pages 251 –258, dec.
o
m rization using world knowledge. In Pro-
Liu, Bing. 2010. Sentiment analysis and
un
ceedings of the 19th international joint
ic
subjectivity. In Handbook of Natural Lan-
ac conference on Artificial intelligence, IJ-
ió guage Processing, Second Edition. Taylor
i CAI’05, pages 1048–1053, San Francisco,
P and Francis Group, Boca.
ub CA, USA. Morgan Kaufmann Publishers
lic
ac Inc. Lucene. 2005. The Lucene search engine.
io
ns
Garc´ıa, Miriam Mart´ın. 2009. Mart´ınez-Camara, Eugenio, M. Mart
Sistema de clasificación automatica de cr
´ıticas de cine. Master’s thesis, ´ın-
University Carlos III of Madrid. Valdivia, and L. Urenã-Lopez.
2011.
Proyecto Fin de Carrera, Ingenier´ıa
Opinion classification techniques applied
Superior de Telecomunicacion.
to a spanish corpus. In Rafael
Munõz,
Andrés Montoyo, and Elisabeth Métais, IEEE/WIC/ACM
editors, Natural Language Processing and International Confer- ence
Information Systems, volume 6716 of Lec- on, volume 3, pages 120 –
ture Notes in Computer Science. Springer 123, 31
Berlin / Heidelberg, pages 169–176. 2
0
Mart´ınez Camara, 1
Eugenio, M. Teresa Mart´ın Valdivia, Jos 0
é M. Perea Ortega, and L. Alfonso -
Urenã Lopez. s
2011. Técnicas de clasificacion de e
opiniones aplicadas a un corpus en p
espanõl. Procesamiento de Lenguaje t
Natural, .
47(0).
3
Mathioudakis, Michael and Nick Koudas. .
2010. Twittermonitor: trend detection
over the twitter stream. In Proceedings
of the 2010 ACM SIGMOD International
Conference on Management of data, SIG-
MOD ’10, pages 1155–1158, New York,
NY, USA. ACM.
Padró, Llu´ıs, Samuel Reese, Eneko
Agirre, and Aitor Soroa. 2010.
Semantic services in freeling 2.1:
Wordnet and ukb. In Pushpak
Bhattacharyya, Chris- tiane Fellbaum,
and Piek Vossen, editors, Principles,
Construction, and Application of
Multilingual Wordnets, pages 99–105,
Mumbai, India, February. Global Word-
C
op net Conference 2010, Narosa Publishing
yri
gh House.
t
©
20 Pak, Alexander and Patrick Paroubek. 2010.
12
. Twitter as a corpus for sentiment analysis
U
ni
and opinion mining. In Nicoletta Calzo-
ve
rsi
lari (Conference Chair), Khalid Choukri,
tat Bente Maegaard, Joseph Mariani, Jan
Ja
u Odijk, Stelios Piperidis, Mike Rosner,
m
e and Daniel Tapias, editors, Proceedings
I.
S
of the Seventh International Conference
er
ve
on Language Resources and Evaluation
i (LREC’10), Valletta, Malta, may. Eu-
de
C ropean Language Resources Association
o
m (ELRA).
un
ic
ac Pang, Bo and Lillian Lee. 2008. Opinion
ió
i mining and sentiment analysis. Found.
P
ub Trends Inf. Retr., 2(1-2):1–135, January.
lic
ac
io Phuvipadawat, S. and T. Murata. 2010.
ns
Breaking news detection and tracking in
twitter. In Web Intelligence and Intel-
ligent Agent Technology (WI-IAT),
2010
Rahimtoroghi, Elahe and Azadeh Shakery. NY, USA. ACM.
2011. Wikipedia-based smoothing for en- hancing text
Sriram, Bharath, Dave Fuhry, Engin Demir,
clustering. In Proceedings of the 7th Asia
Hakan Ferhatosmanoglu, and Murat
conference on Informa- tion Retrieval Technology,
Demirbas. 2010. Short text classification
AIRS’11, pages
in twitter to improve information filtering.
327–339, Berlin, Heidelberg. Springer- Verlag.
In Proceedings of the 33rd international
Read, Jonathon. 2005. Using emoticons to reduce ACM SIGIR conference on Research and
dependency in machine learning techniques for development in information retrieval, SI-
sentiment classification. In Proceedings of the ACL GIR ’10, pages 841–842, New York, NY,
Student Re- search Workshop, ACLstudent ’05, pages USA. ACM.
43–48, Stroudsburg, PA, USA. Associa- tion for
Vakali, Athena, Maria Giatsoglou, and Ste-
Computational Linguistics.
fanos Antaris. 2012. Social network-
Sankaranarayanan, Jagan, Hanan Samet, Benjamin E. ing trends and dynamics detection via a
Teitler, Michael D. Lieber- man, and Jon Sperling. cloud-based framework design. In Pro-
2009. Twit- terstand: news in tweets. In ceedings of the 21st international con-
Proceed- ings of the 17th ACM SIGSPATIAL Inter- ference companion on World Wide Web,
national Conference on Advances in Ge- ographic WWW ’12 Companion, pages 1213–1220,
Information Systems, GIS ’09, pages 42–51, New York, New York, NY, USA. ACM.
The L2F Strategy for Sentiment Analysis and Topic Classification
Fernando Batista Ricardo Ribeiro
(1) (1)
L2F - INESC-ID, Lisbon, Portugal L2F - INESC-ID, Lisbon, Portugal
R. Alves Redol, 9, 1000-029 Lisboa, Portugal R. Alves Redol, 9, 1000-029 Lisboa, Portugal
(2) (2)
ISCTE-IUL - Lisbon University Institute ISCTE-IUL - Lisbon University Institute
Av. Forças Armadas, 1649-026 Lisboa Av. Forças Armadas, 1649-026 Lisboa
Fernando.Batista@inesc-id.pt Ricardo.Ribeiro@inesc-id.pt
Abstract: This paper describes the strategy used by the L2F team for performing automatic
sentiment analysis and topic classification over Spanish Twitter data. The L2F system achieved
the best results for the topic classification contest, and the second place in terms of sentiment
analysis. Apart from describing the best strategies for each one of the tasks, this paper also
reports several other experiments that were conducted and contributed to selecting the best
strategies, thus allowing a better understanding about the preferred options.
Keywords: Sentiment analysis, Topic detection, Twitter data, Logistic Regression models.
process a continuous stream of data, and

1 Introduction possibly to store such data in a way that it can
be accessed in the future.
By providing revolutionary means for people to
This paper tackles two well-known Natural
communicate and interact, Social Networks
Language Processing (NLP) tasks, commonly
take part in the nowadays life of a large number
applied both to written and speech corpora:
of people. Each social network targets different
sentiment analysis and topic detection. The two
audiences, offering a range of unique services
tasks have been applied to Spanish Twitter data
that people find useful in the course of their
provided in the context of a contest proposed by
lives. Twitter offers a simple way for people to
“TASS – workshop on Sentiment Analysis”, a
express themselves, by means of small text
C satellite event of the SEPLN1 2012 conference.
op messages of at most 140 characters, which can
yri This paper is organised as follows: Section 2
gh be used to express everything.
t presents a brief description of the data. Section
© Twitter can be accessed in numerous ways,
20 3 describes some of the strategies that have
12 ranging from computers to mobile phones and
been tested, as well as the best strategies.
.
other mobile devices. That is particularly
U
Section 4 presents and analyses a number of
ni
important because accessing and producing
ve experiments, and reports the results for each
rsi content becomes a trivial task, therefore
tat one of the final approaches. Section 5 presents
Ja assuming an important part of people's lives.
u some conclusions and discusses future work.
m One strong advantage of Twitter over other
e
I. communication means is its ability to rapidly
S
propagate such content and make it available to
2 Data
er
ve
i
specific communities, selected based on their As previously mentioned, the data consists of
de interests. the Twitter data released in the context of the
C
o The huge amount of data, constantly being TASS contest. The provided training data
m
un produced in a daily basis, makes it impracti- consists of an XML file, containing about 7200
ic
ac cable to manually process such content. For that tweets, each one labelled in terms of sentiment
ió
i
reason, it becomes urgent to apply automatic polarity and in terms of the corresponding
P
ub
processing strategies that can handle, and take topics. We decided to consider the first 80% of
lic advantage, of such amount of data. However, the data for training (5755 tweets) and the
ac
io processing Twitter is all but an easy task, not remaining 20% for development (1444 tweets).
ns
only because of specific phenomena that can be
found in the data, but also because it requires to 1
http://www.daedalus.es/TASS
The provided test data is also available in XML, The remainder of this section describes the
and consists of about 60800 unlabelled tweets. method and the architecture of the system when
The goal consists in providing automatic applied to each one of the tasks.
sentiment and topic classification for that data.
Each tweet in the labelled data is annotated 3.1 Maximum Entropy models
in terms of polarity, using one of six possible We have adopted an approach based on logistic
values: NONE, N, N+, NEU, P, P+ (see section regression classification models, which
3.1 for the meaning of the values). Moreover, corresponds to the maximum entropy (ME)
each annotation is also marked as classification for independent events, firstly
AGREEMENT or DISAGREEMENT, whether applied to natural language problems by Berger
or not all the annotators performed the (1996). This approach provides a clean way of
annotation coherently. In what concerns topic expressing and combining different
detection, each tweet was annotated with one or characteristics of the information. A ME model
more topics, from a list of 10 possible topics: estimates the conditional probability of the
política (politics), otros (others), entreteni- events given the corresponding features. This
miento (entertainment), economía (economics), approach provides probabilistic classifications,
música (music), fútbol (football), cine (movies), a generalization of Boolean classification,
tecnología (technology), deportes (sports), and which provides probability distributions over
literatura (literature). the classes. The single-best class corresponds to
It is also important to mention that besides the class with the highest probability.
the tweets, an XML file with some information The ME models used in this study were
about each one of all the users that authored at trained using the MegaM tool (Daume, 2004),
least one of the tweets in the data was also which uses an efficient implementation of
made available. In particular, this information conjugate gradient (for binary problems).
includes the type of user, assuming one of three
possible values – periodista (journalist), famoso 3.2 Sentiment analysis
(famous person), and politico (politician) – As previously mentioned, the sentiment
which may provide valuable information for classification considers 6 possible classes: N,
these tasks. N+  negative polarity; P, P+  positive
Apart from the data provided, some polarity; NEU  contains both positive and
experiments described in this paper also made negative sentiments; NONE  without polarity
2
use of Sentiment Lexicons in Spanish , a information. The plus sign (+) signals the
resource created at the University of North sentiment intensity.
C
op
Texas (Perez-Rosas et al., 2012). From this The first interesting results were achieved by
yri resource, only the most robust part, known as combining 5 different binary classifiers, one for
gh
t fullStrengthLexicon, which contains about 1346 each class. A first classifier <NONE, other>
©
20 words automatically labelled with sentiment was used to discriminate between NONE and
12
.
polarity, was used. any other classes. Two other classifiers <other,
U
ni N>, and <other, P> allow to detect negative
ve
rsi
3 Approach and positive sentiments, respectively. These
tat
Ja We have decided to consider both tasks as two latest classifiers make it possible to
u
m classification tasks, thus sharing the same distinguish between three classes: Positive,
e
I. method. Our most successful and recent Negative, and Neutral. By combining them with
S
experiments cast the problem as a binary the first classifier, one can now discriminate
er
ve classification problem, which aims at between four classes: NONE, Negative, Positive
i
de discriminating between two possible classes. and Neutral. Finally, two other classifiers, <N,
C
o Binary classifiers are easier to develop, offer N+> and <P, P+>, allow perceiving the
m
un faster convergence ratios, and can be executed sentiment intensity. Only tweets annotated as N
ic
ac in parallel. The final results are then generated and N+ were used for training the <N, N+>
ió
by combining all the different binary classifiers. classifier, and only tweets marked as P or P+
i
P were used for training the second. That is
ub
lic different from the first three classifiers, which
ac
io have used all the available data for training.
ns
2
http://lit.csci.unt.edu/
Tweet data
C1 C2 ... C10
Other Eco ... Cin
Figure 2 – Approach for topic classification.
4.1 Tweet content pre-processing

The content of each tweet was firstly tokenized
using twokenize, a tokenization tool for English
3
tweets , with some minor modifications for
dealing with Spanish instead of English.
4.2 Features
Figure 1 – Approach for sentiment analysis. Most of the features were used both for
sentiment analysis and for topic detection, with
small differences, specially concerning the use
After some other experiments, we observed
of punctuation marks. The following features,
similar results by removing the first classifier
concerning the tweet text, were used for each
and using the second and third classifiers to
tweet:
also indicate if no sentiment was present. The
 Punctuation marks were only used as
idea is that the classifiers <other, N> and
feature for the sentiment task, but not for
<other, P> can, in fact, discriminate between
C
the topic detection task.
op
four classes, by considering the class NONE
yri
 All words after the words "nunca" (never)
gh whenever both return "other". Figure 1
t or "no" (no) were prefixed by "NO_" until
© illustrates the adopted configuration, where
20 reaching some punctuation mark or until
12 only four binary classifiers are used.
. reaching the end of the tweet.
U
ni 3.3 Topic classification  Each token starting with "http:" was
ve
rsi
For this task we have created 10 distinct binary converted into the token "HTTP". However,
tat
the weight of such token was reduced (from
Ja classifiers, each one for a different topic. Each
u
m classifier selects its corresponding topic, and in the standard 1.0 to 0.9).
e
I. the case no topic was selected, the most  All tokens starting with "#" were expanded
S
er probable topic is then selected based on the into two tokens, one with and the other
ve
available classification probabilities. Figure 2 without the "#". A lesser weight was given
i
de
illustrates this simple classification process. to the stripped version of the token.
C
o  All tokens starting with "@" were used, but
m
un 4 Experiments the token "@USER" was introduced as
ic
ac
well, with a smaller weight.
ió This section describes the steps taken, features  All words with more than 3 repeating letters
i
P that have been used, experiments that have been were also used. However, whenever they
ub
lic conducted, and the final submitted runs. occur, two more features are produced:
ac
io "LONG_WORD" with a lower weight, and
ns
3
By Brendan O'Connor (brenocon@gmail.com)
the corresponding word without repetitions test set. Sentiment lexicons turned out not to be
with a high weight (3.0). helpful for this sentiment analysis task, as can
 All cased words were used, but the be seen, especially, by observing the results
corresponding lowercase words were used over the development set. In what concerns to
as well. Uppercase words were assigned topic classification, combining features as
also to a higher weight, since they are often bigrams proved to be a good solution for the
used for emphasis. test set, contrarily to the development set.
Apart from the features extracted from the text,
two more features were used: Sent Topic
 Username of the author of the tweet. Unigrams only 63.4 (55.2) 64.9 (43.2)
 Usertype, corresponding to the user Unigrams, Bigrams 62.2 (53.8) 65.4 (42.5)
classification, according to the file users- Sentiment lexicon 63.2 (54.8)
info.xml.
Some of the previous features were also
combined as bigrams for some experiments. Table 1 - Submitted runs.
Feature bigrams involve the following tokens:
HTTP, words starting with # without the 5 Conclusions
diacritic #, @USER, LONG_WORD, all other
words converted to lowercase. The paper describes the L2F team strategy for
automatic sentiment analysis and topic
4.3 Sentiment analysis classification on the TASS contest. The system
achieved the best results for topic classification,
Our baseline for this task corresponds to the
and the second place for sentiment analysis.
initial results achieved using the approach
One possible future direction for improving
previously described, and corresponds to 52.5
the current results would be to use the
Accuracy (Acc) in the development set. The
remainder information available. For example,
baseline was then further improved to 53.6 Acc
the use of the sentiment polarity type
(+1.1) by using the tweet's author name. Adding
(AGREEMENT, DISAGREEMENT), together
the user type improved the results even further
with other information about the user (e.g.
to 54.2 (+0.6). Finally, the best results were
number of tweets, number of followers, number
achieved by also providing punctuation marks
of following), would probably have an impact
as features, corresponding to 55.1 Acc (+0.9).
on the results.
4.4 Topic classification
C
Differences across experiments were always
Acknowledgements
op
yri
gh subtle, because improvements in one classifier This work was partially supported by national
t
©
may worsen results in another classifier. funds through FCT – Fundação para a Ciência e
20
12
However, adding the author's name produced Tecnologia, under project PEst-OE/EEI/
. slightly better results but, contrarily to what LA0021/2011, and by DCTI - ISCTE-IUL –
U
ni was expected, providing the user type as a Lisbon University Institute.
ve
rsi feature did not improve results. Adding
tat
Ja punctuation marks decreased the overall Bibliography
u
performance.
m
e
A. L. Berger, S. A. D. Pietra, and V. J. D.
I. 4.5 Submitted Runs Pietra. A maximum entropy approach to
S
er natural language processing. Computational
ve For the final evaluation contest, we have Linguistics, 22(1):39–71, 1996.
i
de submitted three different runs, each
C
o corresponding to the previously described H. Daumé III. Notes on CG and LM-BFGS
m
un
approaches. Table 1 summarises the obtained optimization of logistic regression.
ic results, where "Sent" corresponds to sentiment http://hal3.name/megam/, 2004.
ac
ió classification results; and "Topic" corresponds V. Perez-Rosas, C. Banea, and R. Mihalcea.
i
P to topic detection results. The results obtained Learning sentiment lexicons in spanish.
ub
lic for the development set are shown in Proc. of the 8th International Conference on
ac
io parenthesis. It is important to mention that, Language Resources and Evaluation
ns
according to the table, results over the (LREC’12), Istanbul, Turkey, May 2012.
development are consistent with results over the
Sentiment Analysis of Twitter messages based on Multinomial
Naive Bayes
Análisis del Sentimiento de mensajes de Twi.tter con Multinomial Naive

Bayes
Alexandre Trilla, Francesc Alías

GTM - Grup de Recerca en Tecnologies Media
LA SALLE - UNIVERSJTAT RAMON LLULL
Quatre Carnins 2, 08022 Barcelona (Spain)
atrilla@sallc.url.edu, falias@salle. url .e<lu
Resumen: Este artículo adapta un esquema de Clasificación de Texto basado en

.Multinomial Naive Bayes para procesar mensajes de Twittcr etiquetados con seis
clases de sentimiento así como también su tópico. La efectividad de esta estrategia
de clasificación de sentimiento se evalúa con el corpus TASS-SEPLN Twitter y se
obtiene una tasa má.x: ima de medida F. promediada de 36.28%.
Palabras clave: Análisis del sentimiento. Clasificación de Texto. Aprendizaje Au-
tomático. Twitter
Abstract: This article adapts a Text Classification scheme based on Multinomial

Naivc Baycs to deal with Twitter mcssages labellcd with six classes of sentimcnt
as well as with their topic. The effectivcness of this scheme is evaluatcd using the
TASS-SEPLN Twitter datasen ancl it achievcs maximum macroavcraged F. measure
rate of 36.28%.
Keywords: Sentiment Analysis. Text Classification. Machíne Learning. Twitter
C
1 Introduction tion 3, ancl wc explain thc conclusions that
op
can be dcrived from the rcsults providcd by
yri
The scntiment. classification framework we
gh thc contcst organiscrs in Scction 4.
t
©
present in the TASS-SEPLN compctition is
20 fundamentally inAuencecl by our previous re-
12
. sults in short-text Sentiment Analysis (Trilla 2 Multinomial Naive
U
ni y Alías. 2012). which are published in the Bayes
ve
main stream of the SEPLN 2012 coufer-
rsi The Multinomial Naive Bayes (M B) is a
tat once. Givcn a short-text scenado like this
Ja probabilistic gcnerat.ive approach that builds
u one based on Twitter messages. where the
m a language model assurning condit.ional in·
e amount of available textual instanccs to train
I. dcpendence among the linguistic fcaturcs.
the classifier (e.g., 7219 examples) is much
S
Therefore. no sense of history. sequence nor
er
smallcr than the dimensionality of the fea-
ve order is introduccd in thís modcl. ln real-
i turc space to roprcseut the texts (e.g., 29685
de ity. this assumption does not hold for textual
C dlmenslous. considcring several fcature repre-
o data (Pang. Lee. y Vaithyanathan. 2002).
m scutations), the most cffcctivc scheme (both
un but evon though the probability estimatcs are
in accuracy and specd) is buttrcssed by
ic
of low quality bccausc of this ovcrsímplíficd
ac
Multínomial 1 aíve Bayes (MNB) operat.ing
ió moclel, its classification decisions (based on
i on a binary-weighted unigram space (Trilla y
P Beyes" decision rule) are surprisingly good
ub Alías, 2012).
lic (lvlanning. Raghavan, y Schütze, 2008). The
ac In this work, we prcscnt a summary of thc ?\'INB combines cfficicncy (it has an optimal
io
ns lcarning strategy of our approach in Scction time performance) with good accuracy, hcncc
2. the prcliminary resulta that we obtained it is oftcn used as a basclinc in Tcxt
wit.h t.he target. Twitter-basecl datnset in Clas- sification and Scntiment Analysis
Sec- rcscarch
(Sebastiani, 2002; Mauning. Raghavan, y
C
op
yri
gh
t
©
20
12
.
U
ni
ve
rsi
tat
Ja
u
m
e
I.
S
er
ve
i
de
C
o
m
un
ic
ac
ió
i
P
ub
lic
ac
io
ns
UNED en TASS 2012: Systema para la Clasificación de
la
Polaridad y Seguimiento de Temas∗
UNED at TASS 2012: Polarity Classification and Trending Topic

System
Tamara Mart´ın- Jorge Carrillo de Albornoz

Wanton UNED NLP & IR UNED NLP & IR Group Juan
Group Juan del Rosal, 16 del Rosal, 16
28040 Madrid, Spain 28040 Madrid, Spain
tmartin@lsi.uned.es jcalb ornoz@lsi.uned.es
Resumen: Los medios sociales, tales como blogs, foros y redes, ofrecen un excelente
escenario para compartir informacion y conectar personas. La informacion vertida
en estos medios es de gran interés tanto para empresas como para particulares.
Sin embargo, el gran volumen en que se presenta limita su utilidad a menos que
se disponga de herramientas eficientes para su manejo. En este contexto, dos
tareas de Procesamiento de Lenguaje Natural, la detección de temas y la
clasificación de polaridad, adquieren gran relevancia. La deteccion de temas
conlleva explorar la web en busca de contenidos relacionados con un determinado
tema o materia. La cla- sificación de polaridad, por su parte, significa determinar
la orientacion polar (i.e., positivo o negativo) de un texto. Estas dos tareas son el
objetivo de la competicion TASS-SEPLN. En el presente trabajo, se describe la
participacion de la UNED en dicha competicion. Para la tarea de deteccion de
temas, se presenta un sistema basado en un modelo probabil´ıstico (Twitter-
LDA). Para la clasificacion de polaridad, se propone un método basado en
significados emocionales. Los resultados experi- mentales muestran que el sistema
desarrollado se comporta adecuadamente. Palabras clave: social media, deteccion
de temas, clasificacion de polaridad
C
op
yri
Abstract: Social media, such as blogs, forums, and social networks, offer an excellent
gh place for sharing information and connecting people. The information in these media
t
© (usually referred to as user generated content ) is of great interest for both companies
20
12 and individuals. However, the huge amount of information that is generated need
.
U to be efficiently processed to be of real use. In this context, two Natural Language
ni
ve
Processing tasks, topic detection and polarity classification, become highly relevant.
rsi
tat
Topic detection involves exploring the web in the search for contents related to a
Ja given topic. Polarity classification, in turn, is a sentiment analysis task concerned
u
m with the problem of determining the polar orientation (i.e., positive or negative)
e
I. of a text. These two tasks are the focus of the TASS-SEPLN competition. In this
S
er
paper, we present the participation of the UNED group in such competition. For
ve
i
topic detection, we present a system based on a probabilistic model (Twitter-LDA).
de For polarity classification, we propose an emotional concept-based method. The
C
o experimental results show the adequacy of our approach for the task.
m
un Keywords: social media, topic detection, polarity classification
ic
ac
ió
1. Introduction the European Union (FP7-ICT-2011-7
i
P - Language technologies - nr 288024
ub
lic
The enormous popularity of “social media”, (LiMoSINe)
ac such as blogs, forums, or real time social net-
io
ns working’s sites offer a place for sharing infor-
∗
This research was partially supported by the Spa-
nish Ministry of Education (FPI grant nr BES-2011-
044328), the Spanish Ministry of Science and Inno-
vation (Holopedia Project, TIN2010-21128-C02) and
mation as it happens and for connecting people in real amount of the so called user generated
time, often making lasting friends- hips, contacts and content, which has motivated many natural
spreading a wealth of latest news about real-world language processing task, such as sentiment
events and topics dominating social discussions. This analysis, topic detection, product comparison
spread of new social media channels has produce a huge
or opinion summarization. In this paper we also an interesting point to consider. For su-
focus on sentiment analysis and topic detec- re, any opinionated sentence can be classified
tion. into positive or negative, but it is clear that
With more than 140 million active users not all sentences express the same negative or
and 340 million tweets a day (as of March positive intensity. So, sentiment analysis sys-
2012), Twitter presents the most popular and tems should include semantic-level analysis in
interesting social media channel from a re- order to solve word ambiguity and correctly
search perspective. However, due to its cha- capture the meaning of each word according
racteristics, it is the most noisy and intensive to its context. Also, complex linguistic pro-
stream of new content, which makes users to cessing is needed to deal with problems such
face a challenge when they want to find the as the effect of negations and intensifiers. Mo-
most interesting themes in few time. reover, understanding the emotional meaning
Topic exploration is a laborious and time- of the different textual units is important to
consuming task, usually involving several accurately determine the overall polarity of a
searches. Users are particularly interested in text and its degree.
emergent topics that arise from recent events
but the representation of the data and the In this paper, we present a combined sys-
search results of the social media sites do not tem that has as main objectives analyzing the
support this kind of information. Detecting sentiments of tweets written in Spanish, and
and characterizing emerging topics of discus- grouping them into a set of given topics. To
sion through analysis of Internet data is of accomplish the first objective we have adap-
great interest to particular users and for bu- ted an existing emotional concept-based sys-
sinesses. For example, a market analyst may tem for sentiment analysis to classify tweets
want to review technical and news-related li- in Spanish. The original method makes use of
terature for recent trends that will impact the an affective lexicon to represent the text as
companies he is watching and reporting on. the set of emotional meanings it expresses,
The manual review of all the available data is along with advanced syntactic techniques to
simply not feasible. Human experts who are identify negations and intensifiers, their sco-
tasked with identifying emerging events need pe and their effect on the emotions affected
to rely on automated systems as the amount by them. Besides, the method addresses the
of information available in digital format in- problem of word ambiguity, taking into ac-
creases. count the contextual meaning of terms by
C
op On the other hand, sentiment analysis is using a word sense disambiguation algorithm.
yri
gh concerned with the problem of discovering For the second objective, detection of topics,
t
© emotional meanings in text. This discipline we first build for each topic of the task a lexi-
20
12
has gained much attention from the research con of words that best describe it, thus repre-
.
U
community in recent years, mainly due to its senting each topic as a ranking of discrimina-
ni many practical applications and the increa- tive words. Moreover, a set of events is retrie-
ve
rsi sing availability of user generated content. ved based on a probabilistic approach that
tat
Ja One of the most popular sentiment analy- was adapted to the characteristics of Twitter.
u
m sis tasks is polarity classification, which at- To determine which of the topics corresponds
e
I. tempts to classify texts according to the posi- to each event, the topic with the highest sta-
S
er
tivity or negativity of the opinions expressed tistical correlation was obtained comparing
ve in them (Pang, Lee, y Vaithyanathan, 2002; the ranking of words of each topic and the
i
de Turney, 2002; Esuli y Sebastiani, 2006; Wil- ranking of words most likely to belong to the
C
o son, Wiebe, y Hoffmann, 2009; Wiegand et event.
m
un al., 2010). Determining polarity might seem
ic
ac an easy task, as many words have some pola- The paper is organized as follows. Section
ió
i
rity by themselves. However, words do not al- 2 introduces previous work in topic detection
P
ub
ways express the same emotions, and in most and sentiment analysis. Section 3 presents
lic cases the polarity of a word depends on the the combined system. Section 4 describes and
ac
io context in which the word is used. So, terms discusses the results obtained by the system
ns
that clearly denote negative feelings can be in the TASS-SEPLN challenge. Finally, sec-
neutral, or even positive, depending on their tion 5 provides concluding remarks and outli-
context. The degree or strength of polarity is nes future work.
2. Related le hate or bad are negative ones. However, the
work approaches above work with words instead of
The topic of a tweet is a latent feature and senses, disregarding the contextual meaning
can be inferred by analyzing its content. Mo- of such words, and the fact that a word may
deling Twitter content requires methods that present various senses of which some of them
are suitable for short texts with heterogeneo- could have different polarities. On the other
us vocabulary. Recent work shows that hand, even though the use of polarity-based
one such method is Latent Dirichelet lexicons is quite frequent, few works employ
Allocation (LDA) (Blei, Ng, y Jordan, 2003) more fine-grained emotional resources. The-
and its ex- tensions (Weng et al., 2010). A re seems to be an assumption that emotional
direct application is to use the traditional classification exclusively depends on the po-
LDA model to discover topics from tweets lar orientation of the words or concepts wit-
by treating each tweet as a single document, hin the text, regardless of the sentiments or
but it is probable that this method does not emotions they express. However, it is clear,
work well taking in to account that tweets for instance, that the words cancer and cold,
are very short (often containing only a single though having a negative orientation, express
sentence). different emotions: a cancer is usually asso-
To overcome this difficulty, some previous ciated with fear and sadness, while a cold is
studies proposed to consider all the tweets better associated with displeasure or dislike.
of a user as a single document (Weng et Finally, in spite of their importance for
al., 2010; Hong y Davison, 2010). This treat- sentiment analysis tasks, linguistic modifiers
ment can be regarded as an application of such as negation or intensifiers have attrac-
the author-topic model (Steyvers et al., 2004) ted less attention and are usually addressed
to tweets, where each document (tweet) in very naive fashion. For example, negation
has a single author. However, the is mostly considered a simple polarity shifter
aggregated tweets of a single user may (Das y Chen, 2001), while intensifiers are all
have a diverse range of topics, so this considered as amplifiers or diminishers that
model does not ex- ploit the following contain a fixed value for all positive words
important property of the tweets: a single and another value for all negative words res-
tweet is usually about a single topic. We pectively (Polanyi y Zaenen, 2006).
apply a modified author-topic model called
Twitter-LDA introduced by (Zhao et al., 3. UNED system
C
op 2011), which assumes a single topic as- In this section we present the system for topic
yri
gh signment for an entire tweet. detection and polarity classification.
t
© Concerning polarity classification, this For the task of topic detection, our sys-
20
12 task is usually formulated as a supervi- tem has three stages. In the first one (Section
.
U sed ML problem with two classes (i.e. po- 3.1.1), the system uses a corpus of tweets la-
ni
ve
sitive and negative), but sometimes consi- beled with topics to obtain a ranking of im-
rsi ders a more fine-grained classification (e.g. portant words for every topic. Note that this
tat
Ja strongly-negative, negative, neutral, positive stage can be done off-line. The second sta-
u
m and strongly-positive). Traditional approa- ge (Section 3.1.2) consists in, given a set of
e
I. ches consider the text as a bag of word fre- tweets, obtaining clusters of tweets that dis-
S
er quencies, n-grams, or even more complex le- cuss the same event, for example: reviews on
ve
i
xical features, such as phrases and informa- a novel, recommendations of a book, com-
de tion extraction patterns (Pang, Lee, y Vaith- ments about an author. And finally, in the
C
o yanathan, 2002; Dave, Lawrence, y Pennock, third stage (Section 3.1.3) we identify, for
m
un 2003; Riloff, Patwardhan, y Wiebe, 2006). each event, to which of the 10 topics it be-
ic
ac Approaches based on word frequencies have longs to.
ió
i
the main drawback of being highly dependent In polarity classification, our main concern
P
ub
on the application domain. is to analyze the applicability of a complex
lic An alternative to word-based learning is emotional concept-based approach intended
ac
io sentiment-based learning. That is, instead of for classifying product reviews, to classify
ns
representing the text as a bag of words, the tweets in Spanish. To this aim, we have adap-
text is modeled as a set of polar expressions ted the approach presented in (Carrillo de
(Das y Chen, 2001; Wilson, Wiebe, y Hoff- Albornoz, Plaza, y Gervas, 2010) for pola-
mann, 2009). Polar expressions are words
that contain a prior polarity. For example, li-
ke or good are positive polar expressions, whi-
rity and intensity classification of opiniona- model, namely Latent Dirichelet Allocation
ted texts. The main idea of this method is (LDA) (Blei, Ng, y Jordan, 2003). It is
to extract the WordNet concepts in a sen- an unsupervised machine learning technique
tence that entail an emotional meaning, as- which uncovers information about latent to-
sign them an emotion within a set of ca- pics across a corpora. We use a variant of
tegories from an affective lexicon, and use LDA proposed by Zhao et al. (Zhao et al.,
this information as the input to a machine 2011) that is adapted to the characteristics
learning algorithm. The strengths of this ap- of Twitter: tweets are short (140-character li-
proach, in contrast to other more simple stra- mit) and a single tweet tends to be about a
tegies, are: (1) the use of WordNet and a single topic.
word sense disambiguation algorithm, which The model is based on the following as-
allows the system to work with concepts rat- sumptions. There is a set of topics T in Twit-
her than terms, (2) the use of emotions ins- ter, each represented by a word distribution.
tead of terms as classification attributes, and Each user has her topic interests modeled by
(3) the processing of negations and intensi- a distribution over the topics. When a user
fiers to invert, increase or decrease the inten- wants to write a tweet, first chooses a topic
sity of these emotions. This system has been based on her topic distribution. Then the user
shown to outperform previous systems which chooses a bag of words one by one based on
aim to solve the same task. the chosen topic. However, not all words in a
tweet are closely related to the topic of that
3.1. Topic detection tweet; some are background words commonly
3.1.1. Topic representation used in tweets on different topics. Therefore,
Intuitively, the words that best describe a to- for each word in a tweet, the user first de-
pic are the words that occur relatively more cides whether it is a background word or a
frequently in the tweets that are labeled with topic word and then chooses the word from
this topic than in the tweets labeled with a its respective word distribution. The process
different topic. Based on this intuition, we ob- is described as follows:
tain a weighted vector of important words for
a topic using the Kullback-Leibler Divergence 1. Draw φB ∼ Dir(β), π ∼ Dir(γ)
(KLD). The KLD measures the relative en-
tropy between two probability distributions. 2. For each topic t ∈ T ,
We calculate for every topic t the KLD scores
C
op of it’s lexical units as: (a) draw φt ∼ Dir(β)
yri
gh
t 3. For each user u ∈ U ,
©
P (w)
20
K LDt (w) = PR (t) × logR (a) draw θ ∼ Dir(α)
12
. PN (w)
U
ni
(b) for each tweet du,m
ve Where: PR (w) - probability of the lexical
rsi i. draw zu,m ∼ M ulti(θ)
tat unit w occurring in the relevant documents
Ja
u (tweets labeled with the topic t) , and calcu- ii. for each word wu,m,n
m
e lated as fR (w)/R, where fR (w) - A. draw yu,m,n ∼ Bernoulli(π)
I. frequency of occurrence of w in the
S B. draw wu,m,n ∼ M ulti(φB ) if
er
ve
relevant set, R - number of terms in the yu,m,n = 0 and wu,m,n ∼ M
i relevant set; PN (w)
de ulti(φzu,m ) if yu,m,n = 1
C - probability of the lexical unit w occurring
o
m in the non-relevant documents (tweets labe-
where: φt denotes the word distribution for
un
ic
led with a different topic), and calculated as
ac
ió
fN (w)/N , where fN (w) - frequency of
i occu-
P topic t; tB the word distribution for back-
ub rrence of w in the non-relevant set, N - num-
lic ground words; θu denotes the topic distri-
ac ber of terms in the non-relevant set. In this
io bution of user u and π denotes a Bernou-
ns way, we have for each topic the ranking of
lli distribution that governs the choice bet-
words that best describes it.
ween background words and topic words. Af-
3.1.2. Event ter applying the TwitterLDA model, a topic
detection is represented as a vector of probabilities over
In order to obtain the events we use an the space of words.
approach based on a latent variable topic
3.1.3. Trending topics 14 emotions to WordNet concepts. SentiSen-
Lastly, we need to obtain a mapping between se also include the antonym relationship bet-
the topics of the task and the events retrieved ween emotional categories, which allows us
by the TwitterLDA method. Noting that, we to capture the effect of some linguistic modi-
have two rankings: the ranking of words that fiers such as negation. We have adapted the
best describes each topics obtained from the system to work with Spanish texts, as the
training data, and for each event of TLDA, original system is conceived for English. The
the probability that the words belong to that method comprises four steps that are descri-
event, this may be constructed as the impor- bed below:
tance of that word on the event, i.e. a ran-
king of words of the events. Therefore, using 3.2.1. Pre-processing: POS Tagging
a measure of correlation of rankings for each and Concept Identification
event, we can obtain the topic to which it
relates. The first step aims to translate each text to
A rank correlation is the relationship bet- its conceptual representation in order to work
ween different rankings of the same set of at the concept level in the next steps and
items (ranking of words). A rank correlation avoid word ambiguity. To this end, the input
coefficient measures the degree of similarity text is split into sentences and the tokens are
between two rankings. We use one of the tagged with their POS using the Freeling li-
most popular rank correlation statistic: the brary (Carreras et al., 2004). In this step, the
Ken- dall rank correlation coefficient, syntax tree of each sentence is also retrieved
commonly referred to as Kendall’s tau (τ ) using the Freeling chunk parser. With this in-
coefficient. Given two rankings on the same formation, the system next maps each token
domain (on the same set of objects), to its appropriate WordNet concept using the
Kendall’s rank correlation coefficient τ is UKB algorithm (Agirre y Soroa, 2009) as in-
defined as: cluded in the Freeling library. Besides, to en-
rich the emotion identification step, the hy-
nc − nd pernyms of each concept are retrieved from
τ = 1 n(n − 1) WordNet.
2
where nc is the number of concordant 3.2.2. Emotion Identification
pairs and nd is the number of discordant
C
pairs. A concordant (discordant) pair is an Once the concepts are identified, the next
op
yri
ordered pair of objects, which has the same step maps each WordNet synset to its co-
gh (opposite) order in both rankings. Kendall’s rresponding emotional category in the Senti-
t
© τ is normalized in the interval h−1, 1i . In the Sense affective lexicon, if any. The emotional
20
12 case of maximum similarity between two ran- categories of the hypernyms are also retrie-
.
U kings τ = 1 (rankings are identical). In the ved. We hypothesize that the hypernyms of
ni
ve
case of maximum dissimilarity τ = −1 (one a concept entail the same emotions than the
rsi
tat
ranking is reverse of the other). concept itself, but decreasing the intensity of
Ja Thus, for each event retrieved by Twit- the emotion as we move up in the hierarchy.
u
m terLDA, we calculate the correlation with So, when no entry is found in the SentiSense
e
I. each of the topics and choose the topic that lexicon for a given concept, the system re-
S
er
has greater value. trieves the emotional category associated to
ve its nearest hypernym, if any. However, only a
i
de 3.2. Sentiment Analysis certain level of hypernymy is accepted, since
C
o The original method presented in (Carrillo de an excessive generalization introduces some
m
un Albornoz, Plaza, y Gervas, 2010) has been noise in the emotion identification. This para-
ic
ac
modified to improve the scope detection ap- meter has been empirically set to 3. In order
ió
i
proach for negation and intensifiers in order to accomplish this step for Spanish texts we
P to deal with the effect of subordinate senten- have automatically translated the SentiSense
ub
lic ces and special punctuation marks. Also, the lexicon to the Spanish language. To do this,
ac
io presented approach uses the SentiSense affec- we have automatically updated the synsets
ns
tive lexicon (Carrillo de Albornoz, Plaza, y in SentiSense to their WordNet 3.0 version
Gervás, 2012), which is a lexicon specifically using the WordNet mappings. In particular,
designed for opinionated texts. Sentisense at- for nouns and verbs we use the mappings pro-
taches an emotional category from a set of
vided by the WordNet team1 and for adjec- of these delimiters are ambiguous, their POS
tives and adverbs, the UPC mappings2 . In is used to disambiguate them. Once the mo-
this automatic process we have only found difiers and their scope are identified, the sys-
15 labeled synsets without a direct mapping, tem solves their effect over the emotions that
which were removed in the new SentiSen- they affect in the text. The effect of negation
se version. Finally, in order to translate the is addressed by substituting the emotions as-
SentiSense English version to Spanish we use signed to the concepts by their antonyms. In
the Multilingual Central Repository (MRC) the case of the intensifiers, the concepts that
(Gonzalez-Agirre, Laparra, y Rigau, 2012). fall into the scope of an intensifier are tagged
The MCR is an open source database that with the corresponding percentage weight in
integrates WordNet versions for five different order to increase or diminish the intensity of
languages: English, Spanish, Catalan, Basque the emotions assigned to the concepts.
and Galician. The Inter-Lingual-Index (ILI) In order to adapt the present method to
allows the automatic translation of synsets Spanish texts, a list of common negation to-
from one language to another. kens in Spanish (such as no, nunca, nada, na-
3.2.3. Post-processing: Negation and die, etc.) and common intensifiers (más,
Intensifiers menos, bastante, un poco, etc.) were
In this step, the system has to detect and developed (based on the original list of
solve the effect of negations and intensifiers negation and intensifier signals from
over the emotions discovered in the previous (Carrillo de Albornoz, Plaza, y Gervas,
step. This process is important, since these 2010)). In order to determine the scope of
linguistic modifiers can change the polarity each modifier, the syntax tree as generated
and intensity of the emotional meaning of the by the FreeLing library is used.
text. Clearly, the text Recio no tiene indicios 3.2.4. Classification
potentes para denunciar a los responsables de In the last step, all the information generated
los ERE entails different polarity than the in the previous steps is used to translate each
text Recio tiene indicios potentes para de- text into a Vector of Emotional Intensities
nunciar a los responsables de los ERE, and (VEI), which will be the input to a machine
sentiment analysis systems must be aware of learning algorithm. The VEI is a vector of
this fact. 14 positions, each of them representing one
To this end, our system first identifies the of the emotional categories of the SentiSense
C presence of modifiers using a list of common affective lexicon. The values of the vector are
op
yri
negation and intensification tokens. In such a generated as follows:
gh
t
list, each intensifier is assigned a value that
© represents its weight or strength. The scope For each concept, Ci , labeled with an
20
12 of each modifier is determined using the syn- emotional category, Ej , the weight of
.
U tax tree of the sentence in which the modifier the concept for that emotional category,
ni
ve
arises. We assume as scope all descendant leaf weight(Ci ; Ej ), is set to 1.0.
rsi
tat
nodes of the common ancestor between the
Ja modifier and the word immediately after it, If no emotional category was found for
u
m and to the right of the modifier. However, this the concept, and it was assigned the
e
I. process may introduce errors in special cases, category of its first labeled hypernym,
S
such as subordinate sentences or those con- hyperi , then the weight of the concept is
er
ve
taining punctuation marks. In order to avoid computed as:
i
de
C
this, our method includes a set of rules to de-
o limit the scope in such cases. These rules are
m
un based on specific tokens that usually mark weight(Ci ; Ej ) = 1/(depth(hyperi ) +
ic
ac the beginning of a different clause (e.g., por-
ió
i que, hasta, por qué, aunque, etc.). Since 1) If the concept is affected by a
P
ub
some
lic
ac 1
nega-
WordNet mappings. http://wordnet.princeton
io tion and the antonym emotional cate-
ns .edu/wordnet/download/.
2
Universidad Politécnica de Catalunã gory, Ea ntonj , was used to label the
mappings. http://nlp.lsi.up c.edu/web/index.php? concept, then the weight of the
option= concept is multiplied by α = 0,6. This
com content&task=view&id=21 value has been empirically determined
&Itemid=57.
in previous studies. It is worth
mentioning that the experiments have shown that α values
below 0.5 decrease performance sharply, was obtained for each topic in the stage of
while it drops gradually for values above representation (see Section 3.1.1). For exam-
0.6. ple, the topic literature has only 99 tweets
If the concept is affected by an intensi- and most are references or comments about
fier, then the weight of the concept is in- any book or novel. Most of the time the to-
creased/decreased by the intensifier per- pic can be deduced from a single word of the
centage, as shown in: tweet: novel, book, author readings, literary.
Therefore, when building the vocabulary the-
re are few words that really belong to this
topic.
weight(Ci ; Ej ) = weight(Ci ; Ej )∗(100+
An important improvement to enrich the
vocabulary could be to add words from an
%) Finally, the position in the VEI of external resource (e.g., WordNet Domains
(Magnini y Cavaglia, 2000)), and name of en-
the tities related to the domain (could be extrac-
emotional category assigned to the con- ted from Wikipedia).
cept is incremented by the weight pre-
viously calculated.
System Precision Rank
4. Evaluation and discussion L2F - INESC 65,37 % 1
UNED 30,98 % 15
This section presents the evaluation of our
system in the context of the TASS-SEPLN
competition.3 The data set consists of tweets, Table 1: Trending topic coverage
written in Spanish by nearly 200 well-known
personalities and celebrities of the world. The
set is divided into two sets: training ( 7,210 4.2. Polarity classification
tweets) and test (60,798 tweets). Each tweet In the TASS-SEPLN competition, polarity
is tagged with its global polarity, indicating classification is evaluated as two different
whether the text expresses a positive, nega- tasks. The first task consists in automatically
tive or neutral sentiment, or no sentiment at classifying each tweet in one of the 5 polarity
all. 5 levels have been defined: strong positive levels previously mentioned. However, prior
(P+), positive (P), neutral (NEU), negative to this classification, the task requires to filter
C
op (N), strong negative (N+) and one additio- those tweets which do not express any senti-
yri
nal no sentiment tag (NONE). Each tweet of ment (i.e., those tagged as NONE). To per-
gh
t the corpus has been semiautomatically assig- form this filtering, our system simply consi-
©
20 ned to one or several of 10 possible topics: ders an extra class, NONE, so that it classifies
12
. sports, music, literature, soccer, politics, eco- the tweets into six classes. Next, the tweets
U
ni nomy, art, entertainment, music and techno- classified as NONE are ignored for evaluation
ve
rsi logy. purposes.
tat The results of the two variants of the
Ja
u 4.1. Topic detection UNED system for this task are shown in Ta-
m
e The aim of this tasks is to automatically iden- ble 2. The first uses the logistic model in We-
I.
S tify the topic of each tweet. We run Twitter- ka as the ML algorithm, while the second uses
er
ve
LDA with 500 iterations of Gibbs sampling. the J48 algorithm. As it may be observed,
i
de
After trying a few different numbers of to- accuracy for both algorithms is over 52 %,
C pics, we empirically set the number of topics which is a very high accuracy considering the
o
m to 100. We set α to 50,0/|T |, β to a sma- complexity of the task and the high num-
un
ic ller value of 0.01 and λ to 20 as (Zhao et al., ber of polarity classes that are taken into ac-
ac
ió 2011) suggested. Table 1 shows the results count. Therefore, our results in this task are
i
P
obtained by our system. Due to space cons- quite satisfactory, as evidenced by the fact
ub
lic
traints we only show the result reached by that our runs are ranked 7th and 8th in the
ac our system and the system that was in first competition (of 20 systems)4 . Besides, the re-
io
ns place. It may be seen that the result of our sults of the two runs are quite similar, regard-
system is not satisfactory. We believe that 4
The competition ranking may be found in
this behavior is due to the vocabulary that
http://www.daedalus.es/TASS/participation.php
3
The task guidelines may be found in
http://www.daedalus.es/ TASS/tasks.php
System Accuracy Rank
System Accuracy Rank
Elhuyar F. 65.29 1
Elhuyar F. 71.12 1
UNED-Logistic 53,82 % 7
UNED-Logistic 59,03 % 7
UNED-J48-Graft 52,54 % 8
UNED-J48-Graft 58,77 % 8
Table 2: 5-classes polarity detection Table 3: 3-classes polarity detection
less of the ML algorithm that is used, which performs reasonably well taking into account
seems to indicate that our emotional-based that the system was originally conceived for
representation is correctly capturing the po- English texts. This may have influenced the
larity of the text. results because it was necessary to make an
The second task consists in classifying automatic translation of the SentiSense af-
each tweet in 3 polarity classes. To this end, fective lexicon and also to use the Spanish
only the tweets tagged as positive, neutral version of Wordnet, which has considerably
and negative are considered. The results of less coverage than the English version. Howe-
the two UNED runs for this task are shown ver, the topic detection obtained quite poor
in Table 3. As expected, the results are bet- results and we believe that what should be
ter than those obtained in the previous task, improved is the representation of the topics.
since the number of polarity classes is lower, Thus, as future work we plan to use other
and thus the task is simpler. Again, the two sources and/or resources in order to retrieve
variant of our system are ranked 7th and better discriminative words for each topic.
8th among the 20 participants. However, it
is worth mentioning that, even if we consider Bibliograf
these results to be quite positive, the origi- ´ıa
nal system have presented significantly bet-
Agirre, E. y A. Soroa. 2009. Personalizing
ter accuracy when evaluated over other data
pagerank for word sense disambiguation.
sets (in particular, sets of different product
En Proceedings of the 12th conference of
reviews). This is due to two main facts: first,
the European chapter of the Association
the systems is expected to work better when
for Computational Linguistics.
classifying product reviews than more general
C texts (as the tweets in hands), since product Blei, David M., Andrew Y. Ng, y Michael I.
op
yri reviews express the user’s satisfaction or dis- Jordan. 2003. Latent dirichlet allocation.
gh
t satisfaction with the different product attri- J. Mach. Learn. Res., 3:993–1022, Marzo.
©
20
butes, and therefore employ a highly emotive
12 language. Second, the coverage of the affecti- Carreras, X., I. Chao, LL. Padro, y M. Padro.
.
U ve lexicon, SentiSense, for the evaluation da- 2004. Freeling: An open-source suite of
ni
ve ta sets is quite poor (only around 12 % of the language analyzers. En Proceedings of the
rsi
tat words are labeled), and therefore we find that 4th International Conference on Language
Ja
an important number of tweets are not labe- Resources and Evaluation.
u
m led with any emotion. This was expected, sin-
e Carrillo de Albornoz, J., L. Plaza, y
I. ce SentiSense is specially designed for proces-
S P. Gervas. 2010. A hybrid approach
er sing product reviews. Therefore, taking this
ve to emotional sentence polarity and inten-
i low coverage into account, we expect that
de sity classification. En Proceedings of the
ex- panding the coverage of SentiSense will
C
14th Conference on Computational Natu-
o
allow us to significantly improve the
m ral Language Learning, paginas 153–161.
un
ic
classification results.
ac
ió
Carrillo de Albornoz, J., L. Plaza, y
i
P
5. Conclusions P. Gervas. 2012. Sentisense: An ea-
ub sily scalable concept-based affective lexi-
lic This paper presents the contribution of the
ac con for sentiment analysis. En procee-
io UNED group to the tasks of sentiment analy-
ns dings of the 8th International Conference
sis and trending topics at the TASS whorks-
on Language Resources and Evaluation.
hop. The results have shown that the met-
hod for determining the polarity of the tweets Das, Sanjiv y Mike Chen. 2001. Yahoo!
for Amazon: Extracting Market Sentiment
from Stock Message Boards. En Procee- in Natural Language Processing, EMNLP
dings of the Asia Pacific Finance Asso- ’06, paginas 440–448, Stroudsburg, PA,
ciation Annual Conference (APFA). USA. Association for Computational Lin-
Dave, Kushal, Steve Lawrence, y David M. guistics.
Pennock. 2003. Mining the peanut Steyvers, Mark, Padhraic Smyth, Michal
gallery: opinion extraction and semantic Rosen-Zvi, y Thomas Griffiths. 2004.
classification of product reviews. En Pro- Probabilistic author-topic models for in-
ceedings of the 12th international confe- formation discovery. En Proceedings of the
rence on World Wide Web, WWW ’03, p tenth ACM SIGKDD international con-
áginas 519–528, New York, NY, USA. ference on Knowledge discovery and data
ACM. mining, KDD ’04, paginas 306–315, New
Esuli, Andrea y Fabrizio Sebastiani. 2006. York, NY, USA. ACM.
Determining term subjectivity and term Turney, Peter. 2002. Thumbs up or thumbs
orientation for opinion mining. En Pro- down? Semantic orientation applied to un-
ceedings EACL-06, the 11th Conference supervised classification of reviews. En
of the European Chapter of the Associa- Proceedings of the 40th Annual Meeting
tion for Computational Linguistics, Tren- of the Association for Computational Lin-
to, Italy. guistics (ACL 2002), paginas 417–424,
Gonzalez-Agirre, A., E. Laparra, y G. Ri- Philadelphia, USA.
gau. 2012. Multilingual central repository Weng, Jianshu, Ee P. Lim, Jing Jiang, y
version 3.0: upgrading a very large lexi- Qi He. 2010. TwitterRank: finding topic-
cal knowledge base. En Proceedings of the sensitive influential twitterers. En Procee-
Sixth International Global WordNet Con- dings of the third ACM international con-
ference. ference on Web search and data mining,
Hong, Liangjie y Brian D. Davison. 2010. WSDM ’10, paginas 261–270, New York,
Empirical study of topic modeling in twit- NY, USA. ACM.
ter. En Proceedings of the First Works- Wiegand, Michael, Alexandra Balahur, Ben-
hop on Social Media Analytics, SOMA jamin Roth, Dietrich Klakow, y Andrés
’10, paginas 80–88, New York, NY, USA. Montoyo. 2010. A survey on the ro-
ACM. le of negation in sentiment analysis. En
C
op Magnini, Bernardo y Gabriela Cavaglia. Proceedings of the Workshop on Nega-
yri 2000. Integrating subject field codes into tion and Speculation in Natural Language
gh
t WordNet. En Proceedings of LREC 2000. Processing, NeSp-NLP ’10, paginas 60–68,
©
20 Stroudsburg, PA.
12 Pang, Bo, Lillian Lee, y Shivakumar Vaith-
.
U yanathan. 2002. Thumbs up? sentiment Wilson, Theresa, Janyce Wiebe, y Paul Hoff-
ni
ve
classification using machine learning tech- mann. 2009. Recognizing contextual
rsi niques. En Proceedings of the Conferen- polarity: An exploration of features for
tat
Ja ce on Empirical Methods in Natural Lan- phrase-level sentiment analysis. Compu-
u
m guage Processing (EMNLP 2002), p tational Linguistics, 35(3):399–433.
e
I. áginas Zhao, Wayne Xin, Jing Jiang, Jianshu Weng,
S
er 79–86, Philadelphia, USA. Jing He, Ee-Peng Lim, Hongfei Yan, y
ve
i Polanyi, Livia y Annie Zaenen. 2006. Con- Xiaoming Li. 2011. Comparing twitter
de
C textual valence shifters. En W. Bruce and traditional media using topic models.
o
m Croft James Shanahan Yan Qu, y Janyce En Proceedings of the 33rd European con-
un
ic Wiebe, editores, Computing Attitude and ference on Advances in information re-
ac
ió
Affect in Text: Theory and Applications, trieval, ECIR’11, paginas 338–349, Berlin,
i
P
volumen 20 de The Information Retrieval Heidelberg. Springer-Verlag.
ub Series. Springer Netherlands, paginas 1–
lic
ac 10.
io
ns
Riloff, Ellen, Siddharth Patwardhan, y Jany-
ce Wiebe. 2006. Feature subsumption for
opinion analysis. En Proceedings of the
2006 Conference on Empirical Methods
UNED @ TASS: Using IR techniques for topic-based sentiment
analysis through divergence models
UNED en TASS: Uso de Técnicas de RI para el análisis de sentimientos a
través de modelos de divergencia
Ángel Castellanos González Juan Cigarrán Recuero Ana García Serrano
Universidad Nacional de Universidad Nacional de Universidad Nacional de
Educación a Distancia Educación a Distancia Educación a Distancia
C/ Juan del Rosal 16, Madrid C/ Juan del Rosal 16, Madrid C/ Juan del Rosal 16, Madrid
acastellanos@lsi.uned.es juanci@lsi.uned.es agarcia@lsi.uned.es
Resumen: En este artículo se presenta el trabajo realizado para el Taller de Análisis de

Sentimientos en la SEPLN. Este taller está enfocado al análisis de sentimientos en Twitter, tanto
a nivel de tweet como a nivel de temática. Nuestra propuesta aborda la detección de
sentimientos y temáticas mediante un sistema de Recuperación de Información (RI) basado en
modelos del lenguaje. Se hace uso de la divergencia de Kullback-Liebler (KLD) para la
generación tanto de los modelos de polaridad como de los modelos de temática que serán
utilizados en el proceso de RI. Con el fin de mejorar la precisión de los resultados, se proponen
varias aproximaciones centradas en llevar a cabo la obtención de los modelos del lenguaje
considerando no sólo los contenidos textuales completos asociados a cada tweet sino, como
alternativa, las entidades nombradas o los adjetivos detectados. Los resultados muestran como
el uso tanto de entidades nombradas como de adjetivos en el modelo mejora los resultados de
precisión obtenidos, indicando una mayor representatividad de éstos frente al uso de términos
comunes. Los resultados generales son prometedores (5º y 4º posición en cada una de las tareas
propuestas), lo que indica que una aproximación basada en RI y en modelos del lenguaje puede
resultar una alternativa a otras propuestas más comunes en el estado del arte y centradas en la
aplicación de técnicas clásicas de clasificación.
C Palabras clave: Minería de opinión, Divergencias del Lenguaje, Divergencia Kullback-Liebler,

op
yri
Detección de Temática, Etiquetado POS
gh
t
© Abstract: In this paper, we present the research done for the Workshop on Sentiment Analysis
20
12 at SEPLN. The Workshop is focused on the sentiment analysis on Twitter, both at tweet and
.
U topic level. Our proposal addresses the sentiment and topic detection from an Information
ni
ve
Retrieval (IR) perspective, based on language divergences. Kullback-Liebler Divergence (KLD)
rsi
tat
is used to generate both, polarity and topic models, which will be used in the IR process. In
Ja order to improve the accuracy of the results, We propose several approaches focused on carry
u
m out language models, not only considering the textual content associated to each tweet but, as an
e
I. alternative, the named entities or adjectives detected as well. Results show that modeling the
S
er tweets set using named entities and adjectives improves the final precision results and, as a
ve
i
consequence, their representativeness in the model compared with the use of common terms.
th th
de
C
General results are promising (5 and 4 position in each of the proposed tasks), indicating that
o
m
an IR and language models based approach may be an alternative to other classical proposals
un focused on the application of classification techniques.
ic
ac Keywords: Opinion Mining, Language Divergences, Kullback-Liebler Divergence, Topic
ió
i Detection, POS Tagging.
P
ub
lic
ac
io
ns
Uno de los niveles más interesantes, en el

1 Introducción que se centra una de las tareas del TASS, es el
análisis de la polaridad de una temática
El análisis de sentimientos consiste en la
detección automática de la opinión del gran concreta (Trending Topic Coverage Task). En
público acerca de un tema o producto particular. este contexto se hace especialmente importante
A menudo el análisis de sentimientos es poder analizar, de manera automática, el
referido también como minería de opinión. contenido de los mensajes presentes en estas
Las técnicas de análisis de sentimientos son redes sociales, para poder identificar su
similares a las de clasificación de textos. Dado temática. A este respecto se han realizado
un texto, o conjunto de textos, la idea es trabajos aplicando técnicas clásicas de minería
asignarles automáticamente una clase, en este de textos. El problema es que, a diferencia de lo
caso una opinión (e.g. positiva, negativa, que ocurre con la minería de textos tradicional,
neutra). Esta clasificación se puede llevar a los mensajes en redes sociales (e.g. tweets)
cabo en varios niveles: a nivel de temática, a suelen tener una extensión reducida, lo que
nivel de documento, a nivel de frase o a nivel dificulta un análisis contextualizado. Esta
de características. Sin embargo, el análisis de limitación tiene asociados, además, otros
sentimientos necesita una precisión mucho problemas como el uso de abreviaturas, el uso
mayor que la de la clasificación de textos de slang o jerga o el uso de vocabulario propio
tradicional debido principalmente a que un de estas redes.
pequeño cambio en el texto puede reflejar un Una solución planteada a estos problemas es
gran cambio en la opinión (e.g. la fotografía me la aplicación de Topic Models (Bley y Laferty,
gusta vs. la fotografía no me gusta). 2009). Esta técnica se basa en la generación de
Para llevar a cabo el análisis de modelos asociados a las diferentes temáticas
sentimientos, tradicionalmente, se han identificadas para, posteriormente, clasificar los
planteado varias aproximaciones. Las técnicas nuevos mensajes en alguna de ellas. A la hora
de desarrollar Topic Models, Latent Dirichlet
más comunes de aprendizaje automático como
Allocation (LDA) (Blei, Ng y Jordan, 2003) se
Naïve Bayes (Melville y Gryc, 2009) Support
ha convertido en una técnica de modelado
Vector Machine (SVM) (Xia, Zong y Li, 2011)
estándar (Liu, Niculescu-Mizil y Gryc, 2009)
(Zhang et al, 2011), SVM multi-clase (Xu et al,
(Phan, Nguyen y Horiguchi, 2008), así como
2011), K-Nearest Neighbors (KNN) (Tan y
extensión conocida como Author-Topic Model
Zhang, 2008) o redes neuronales (Jian, Chen y
(Rosen-Zvi et al, 2010).
Han-Shi, 2010) han sido utilizadas, así como el
A diferencia de los trabajos anteriores, que
uso de información semántica para enriquecer
C abordan el problema del análisis de
op la información textual asociada (Wu y Shen,
yri sentimientos con un enfoque basado en
gh 2009). Como alternativas más elaboradas
t clasificación, en este trabajo se presenta una
© podemos destacar aproximaciones centradas en
20 aproximación basada en RI donde un conjunto
12 el análisis de la información léxica (Yu y
. de tweets son representados e indexados
Hatzivassiloglou, 2003) o en información
U
utilizando modelos del lenguaje. La asignación
ni
gramatical de bajo nivel del tipo part-of-speech
ve de temáticas y polaridad a un tweet específico
rsi (POS).
tat se llevará a cabo empleando su contenido como
Ja Otras propuestas plantean la utilización de
u consulta contra el índice que almacena los
m información complementaria. El trabajo
e modelos previamente generados.
I. presentado en (Jia, Yu y Meng, 2009) detecta y
La investigación realizada en este trabajo se
S
aprovecha la aparición de negaciones en el
er
centra en el proceso de representación inicial y,
ve
proceso de análisis de sentimientos. En (Choi,
i más concretamente , en la selección de la
de Breck y Cardie, 2006) se estudia el uso de
C terminología más adecuada para la realización
o Entidades y Relaciones asociadas con la
m de los modelos, considerando no sólo términos
un expresión de opiniones. Finalmente, en (Joshi y
ic comunes sino también aquellos términos con
ac Penstein-Rose, 2009) se utilizan relaciones de
ió
dependencia sintáctica para identificar una función sintáctica específica dentro del
i
P opiniones, mientras que en (Soo-Min y Hovy, tweet tales como adjetivos o entidades
ub
lic 2006) se emplean para extraer opiniones y nombradas.
ac
topics de noticias online. El resto del artículo se organiza como sigue:
io
ns en la sección 2 se presenta el sistema
desarrollado para llevar a cabo el trabajo; en las
secciones 3 y 4 se exponen los diferentes anterior junto con la información relativa a
experimentos realizados y se discute sus su polaridad y temática.
resultados; finalmente, en la sección 5
presentan las conclusiones y las líneas de 2.2 Modelado
trabajo futuras.
La fase de modelado toma como entrada los
2 Descripción del Sistema resultados obtenidos en la sección anterior y su
fin último es la generación de los modelos del
Tal y como hemos expuesto, el sistema lenguaje para representar los distintos niveles
presentado aplica una aproximación basada en de polaridad, así como las diferentes temáticas
RI. Para ello se ha utilizado el conjunto de consideradas en la competición.
tweets de entrenamiento proporcionados por la La idea principal se basa dividir el conjunto
organización como base para la generación de de tweets en sub-conjuntos según la polaridad
los modelos y los índices utilizados. Pueden que tengan (y de manera análoga para las
distinguirse tres fases principales: 1) temáticas), agregando el contenido de cada uno
preprocesamiento, se filtra y etiqueta el de ellos a su correspondiente subconjunto.
conjunto de tweets original, 2) modelado, en la Como resultado de este proceso cada una de las
cual se generan los modelos del lenguaje que distintas polaridades, así como cada una de las
representan tanto a las temáticas como a las temáticas quedarán representadas por un
distintas polaridades y, por último, 3) conjunto de tweets preprocesados que
categorización, donde se lleva a cabo el proceso posteriormente serán utilizados para generar los
de asignación de cada tweet del conjunto de test modelos. Este modelado será aplicado de
proporcionado por la organización a una manera independiente para la polaridad y para
polaridad y temática específicas utilizando un las temáticas utilizando la divergencia de
enfoque de RI y donde cada tweet es utilizado Kullback-Liebler (KLD) (Kullback y Liebler,
como consulta dentro del sistema. 1951) tal y como se detalla en una
aproximación similar en (Castellanos, Cigarrán
2.1 Preprocesado y García-Serrano, 2012).
En esta etapa se realiza un tratamiento previo Mediante KLD es posible ordenar los
sobre el conjunto de tweets de entrenamiento términos de cada subconjunto en función de su
proporcionado por la organización y consta de representatividad de acuerdo a la formula (1),
los siguientes pasos: donde pD(t) es la probabilidad de que el
término t aparezca en el subconjunto S y pC(t)
C • Limpieza del contenido de los tweets: la probabilidad de que el mismo término t
op
yri Consistente en: eliminación de caracteres aparezca en el resto de los subconjuntos.
gh
t especiales (puntos, comas, etc…) ()
(1)
© () ( )
20 eliminación de palabras vacías y eliminación
12
.
de términos propios de Twitter (menciones, ()
U
ni
hashtags y retweets). 2.2.1 Modelos generados
ve • Etiquetado POS de los Tweets: Sobre el
rsi
tat contenido de los tweets se ha realizado un Siguiendo el enfoque de modelado presentado
Ja
etiquetado POS para identificar las entidades se han generado varios modelos que tienen en
u
m
nombradas y los adjetivos presentes. Para cuenta distintos elementos o características de
e
los tweets, tanto a la hora de detectar polaridad
I.
ello se han utilizado las herramientas de
S
er anotación presentadas en (Hernandez- como temática. Para la tarea de detección de
ve
i Aranda, Granados y García-Serrano, 2012), polaridad se han generado los siguientes
de
C basadas en la herramienta Stilus desarollada modelos:
o 1
m por Daedalus .  Modelo KLD basado en el contenido
un
ic • Almacenamiento de los tweets: Con el fin (MP1): se ha aplicado KLD sobre la
ac
ió de facilitar la obtención de modelos del totalidad de los términos del conjunto de
i
P
lenguaje se han almacenado los resultados tweets asociados a cada una de las
ub de las anotaciones realizadas en el paso polaridades identificadas en la tarea.
lic
ac
io
 Modelo KLD de los adjetivos (MP2): Con
ns el fin de restringir los términos sobre los que
1
http://www.daedalus.es/productos/stilus/ se aplica KLD, en este segundo modelo se
han utilizado únicamente los adjetivos Todos los modelos presentados se han
identificados en la etapa de etiquetado POS 2
indexado , normalizando sus valores de KLD
sobre el conjunto de tweets asociados a cada con el fin de llevar a cabo el proceso de
una de las polaridades. Este modelo trata de recuperación descrito en la siguiente
poner de manifiesto la relación directa entre subsección.
la polaridad de un tweet y los adjetivos que
éste contiene. 2.3 Categorización de los Tweets
 Modelo KLD de los adjetivos filtrado
(MP3): El método de generación de los Para este proceso se ha empelado un enfoque
modelos KLD es igual que el anterior (MP2). clásico de Recuperación de Información,
La diferencia radica en que, tras la utilizando los modelos previamente indexados y
generación de los modelos, se refinan los considerando cada uno de los tweets del
modelos correspondientes a las polaridades conjunto de test como una consulta
positivas y negativas (P+, P, N, N+ independiente. El resultado es un ranking donde
respectivamente) eliminado los términos cada uno de sus elementos se corresponden con
correspondientes a la polaridad neutra. Con los modelos asociados a las distintas
ello se pretende reducir el ruido, eliminando polaridades o temáticas. El proceso de
adjetivos que no definan una polaridad categorización se lleva a cabo seleccionando el
positiva o negativa, aunque aparezcan en primer resultado devuelto por el sistema como
tweets etiquetados como tal. Por ejemplo, en indicador de la polaridad o la temática del tweet
el tweet: utilizado como consulta. Para un mayor detalle
'Buen día todos! Lo primero mandar un de los diferentes enfoques para categorizar los
abrazo grande a Miguel y a su familia tweets ver la sección de Experimentos.
@libertadmontes Hoy podría ser un día
para la grandeza humana. 3 Experimentos
etiquetado con polaridad P+, aparecen los Se han enviado un total de 16 experimentos
adjetivos: buen, grande, humana. Mientras (RUNs), 4 para la tarea de Sentiment Analysis y
que buen y grande pueden representar 12 para la tarea de Trending Topic Coverage.
realmente una polaridad muy positiva, En las siguientes subsecciones se detallan los
humana, no tiene asociada ninguna polaridad experimentos correspondientes a cada una de
a pesar de estar en un tweet etiquetado como las tareas.
muy positivo.
Por otro lado, para detectar la temática de 3.1 Sentyment Analysis
C
op
yri
los tweets, útil para la realización de la segunda Esta tarea está enfocada en la identificación
gh tarea propuesta, se han generado los siguientes automática de la polaridad de cada tweet. Los
t
© modelos: diferentes valores que ésta puede tomar son:
20
12
 Modelo KLD basado en el contenido Muy Positiva (P+), Positiva (P), Neutral (NEU),
.
U
(MT1): Similar al modelo MP1 utilizado Ninguna (NONE), Negativa (N) y Muy
ni
ve
para polaridad, pero aplicando KLD sobre el Negativa (N+). Para esta tarea se han enviado 4
rsi
tat conjunto de términos asociados al conjunto experimentos que se detallan a continuación:
Ja
u
m
de tweets correspondientes a cada temática. TASK1_RUN_01: Se toma el contenido del tweet y
e De este modo, se pretende obtener los se consulta contra el índice que contiene el
I.
S términos más representativos de cada una de modelo MP1. Este RUN se toma como baseline
er
ve las temáticas de la tarea. para el resto de aproximaciones
i
de
 Modelo KLD de las entidades nombradas TASK1_RUN_02: Se toman los adjetivos con los
C
o
(MT2): En este caso, se ha aplicado KLD que cada tweet ha sido etiquetado para consultar
m únicamente sobre las entidades nombradas contra el índice que almacena el modelo MP2. Se
un
ic identificadas en el proceso de etiquetado pretende estudiar cuánta importancia tienen los
ac
ió POS. adjetivos a la hora de establecer la polaridad de
i un tweet y si estos son más fiables que el propio
P
ub contenido.
lic
ac
io
ns
2
La indexación de todos los modelos se ha
llevado a cabo utilizando Apache Solr.
TASK1_RUN_03: Se toman los adjetivos asociados método aplicado para detectar la temática.
a cada tweet para consultar contra el índice que Destacar también que los resultados
almacena el modelo MP3. Se pretende comprobar correspondientes a la tarea de Sentiment
los resultados de aplicar el filtrado de adjetivos Analysis se evalúan teniendo en cuenta 5
propuesto en la sección 2.2.1.
niveles de polaridad (P+, P, NEU, N y N+) y 3
TASK1_RUN_04: Es posible que no existan
niveles ( P+ y P, NEU, N+ y N).
adjetivos asociados a cada tweet utilizando los
modelos MP2 y MP3. En los dos RUNS
anteriores (02 y 03) esta situación se afrontaba Sentiment Analysis (5 niveles)
estableciendo NONE como polaridad por defecto. Run Precisión
TASK1_RUN_01 0.3998
En este RUN se aborda esta situación realizando TASK1_RUN_02 0.4041
una segunda consulta sobre el modelo MP1 TASK1_RUN_03 0.3947
basado en el contenido completo del tweet. TASK1_RUN_04 0.3859
Sentiment Analysis (3 niveles)
3.2 Trending Topic Coverage Run Precisión
TASK1_RUN_01 0.4043
Esta tarea se basa en la identificación de la TASK1_RUN_02 0.4361
temática de un mensaje (un tweet en este caso) TASK1_RUN_03 0.5008
para después aplicar un análisis de polaridad TASK1_RUN_04 0.4120
sobre cada una de las temáticas. Para esta tarea Trending Topic Coverage
se han enviado 12 experimentos, detallados a Run Precisión
continuación: TASK2_RUN_01 0.4051
TASK2_RUN_02 0.4051
TASK2_RUN_03 0.4051
TASK2_RUN_01 – TASK2_RUN_04: En estos 4
TASK2_RUN_04 0.4051
RUNS la estrategia para detectar el topic es la TASK2_RUN_05 0.4526
misma. Se toma el contenido de cada tweet y se TASK2_RUN_06 0.4526
consulta contra el índice que almacena el modelo TASK2_RUN_07 0.4526
TASK2_RUN_08 0.4526
MT1. La diferencia entre los 4 RUNs es el TASK2_RUN_09 0.4224
método utilizado para detectar la polaridad. Para TASK2_RUN_10 0.4224
ellos se aplican en el mismo orden las 4 TASK2_RUN_11 0.4224
TASK2_RUN_12 0.4224
aproximaciones para detección de polaridad
presentadas en la sección 3.1.
TASK2_RUN_05 – TASK2_RUN_08: En estos 4 Tabla 1: Resultados obtenidos
RUNS, para detectar el topic se utilizan las
entidades nombradas (NE) presentes en cada
tweet para consultar contra el índice que 4.1 Sentiment Analysis
almacena el modelo MT2. Cuando no se detecten En primer lugar, fijándose únicamente en el
entidades nombradas, se establecerá que el tweet análisis de sentimiento, los resultados
pertenece a la temática otros. De nuevo la confirman que el uso de adjetivos en la
diferencia entre los 4 RUNS se basa en el método generación del modelo (RUN 02, 03 y 04)
de detección de la polaridad aplicado.
obtiene mejores resultados que el uso del
TASK2_RUN_09 – TASK2_RUN:12: En estos
contenido completo de los tweets (RUN 01).
RUNS la temática es detectada en base a las
entidades nombradas que éstas contengan. La
Esto es debido a la diversidad terminológica.
diferencia radica en los tweets en los cuales no se Puesto que los modelos se han creado en
detecten NE. En este caso se realizará una función del conjunto de entrenamiento, si
segunda consulta utilizando el contenido de los aparece un nuevo término en el conjunto de test
tweets (modelo MT1) como indicadores de la este no aportará nada al proceso de
temática. Cada uno de los 4 RUNs varía en la recuperación.. Dado que la terminología
forma de detectar la polaridad asociada a los contenidos es mayor que la de los
adjetivos, es más probable que aparezca un
4 Resultados nuevo término en el contenido del tweet a que
aparezca un nuevo adjetivo.
La Tabla 1 muestra los resultados obtenidos en Observando los RUNs realizados contra los
C los diferentes runs utilizando como métrica la modelos basados en adjetivos se puede
op
yri
precisión. Nótese que, a la hora de evaluar los comprobar como, al contrario de lo que
gh
t
resultados de la tarea de Trending Topic inicialmente podría pensarse, el hecho de
© Coverage, los organizadores de la tarea asignar una polaridad NONE en el caso de que
20
12 agruparon los experimentos de acuerdo al
.
U
ni
ve
rsi
tat
Ja
u
m
e
I.
S
er
ve
i
de
C
o
m
un
ic
ac
ió
i
P
ub
lic
ac
io
ns
el tweet a clasificar no contenga adjetivos para detectar la polaridad y temática de un
indexados por el modelo (RUN 02 y 03) tweet, utilizando diferentes tipos de
produce mejores resultados que realizar una información. Por un lado se ha utilizado el
nueva consulta teniendo en cuenta el contenido contenido textual de los propios tweets,
completo del tweet (RUN 04). Este fenómeno mientras que por otro lado se han empleado los
se puede observar tanto al considerar 5 niveles adjetivos o entidades nombradas detectadas en
de polaridad como 3. Este comportamiento éstos mediante un proceso de etiquetado POS.
puede deberse, nuevamente, a la diversidad Destacar la sencillez computacional del
terminológica introducida por el corpus de test método planteado, haciendo que sea posible su
que hace que los modelos del lenguaje aplicación en tiempo real sobre un flujo
definidos originalmente sobre el contenido de constante de tweets. Teniendo en cuenta esta
entrenamiento no produzcan resultados sencillez y que este trabajo se encuentra en fase
satisfactorios. inicial, lo resultados obtenidos son
Finalmente, resulta interesante observar el satisfactorios (5º mejor grupo en polaridad con
comportamiento del sistema al considerar los 3 niveles y 4º mejor en polaridad teniendo en
modelos MP2 y MP3 sobre adjetivos. En este cuenta los topics).
punto los resultados son contradictorios: cuando El análisis de los resultados confirma
se toman 5 niveles de polaridad, el no llevar a nuestra intuición acerca de que, dado este
cabo el filtrado de adjetivos es ligeramente corpus, los adjetivos son más representativos de
mejor (0.4041 frente a 0.3947), mientras que la polaridad de un tweet que el contenido de
cuando se toman solo 3 niveles, resulta éste, de igual modo que las entidades
significativamente mejor realizar dicho filtrado nombradas para la temática. Para poner estos
(0.5008 frente a 0.4361). En este punto haría resultados en contexto hay que tener en cuenta
falta una experimentación más completa para el problema de la diversidad de la terminología
poder concluir algo al respecto. planteado. A mayor diversidad terminológica
más compleja resulta la categorización
4.2 Trending Topic Coverage utilizando modelos del lenguaje sobre el
conjunto de entrenamiento.
Puesto que los resultados están agrupados por la
Los resultados alcanzados dejan la puerta
manera de detectar los topics (sin tener en
abierta a un gran número de modificaciones
cuenta los diferentes métodos de detectar la
interesantes del sistema planteado. En este
polaridad), centraremos este análisis en ello.
trabajo no se ha hecho uso de la información
Se puede observar que los mejores
social disponible de los usuarios (información,
resultados se obtienen cuando se tienen en
C
tipos, relaciones), por lo que sería interesante
op cuenta las entidades nombradas (RUNs del 05
yri extender el modelado realizado sobre temáticas
gh al 12) frente a considerar el contenido completo
t y polaridades para modelar a los propios
© de los tweets (RUNS del 01 al 04). Al igual que
20 usuarios. Gracias a ello, se podría investigar en
12 ocurre con los adjetivos para la polaridad, a la
. las relaciones entre temática y polaridad dado
hora de detectar la temática de un tweet parece
U
un determinado usuario (i.e. un usuario publica
ni
más aconsejable utilizar únicamente las
ve mayoritariamente tweets positivos sobre
rsi entidades nombradas contenidas en éstos. De
tat deportes).
Ja nuevo la diversidad en la terminología podría
u Así mismo, dados los resultados obtenidos
m estar relacionada con este fenómeno.
e utilizando el contenido de los tweets, otra
I. Poniendo atención solo a los RUNs que
S extensión interesante sería profundizar más en
tienen en cuenta las entidades nombradas
er
el tratamiento del contenido. Actualmente para
ve
(RUNs 05 al 12) destacar que, de nuevo, se
i generar los modelos se utilizan unigramas; sería
de produce un comportamiento similar a cuando se
C bueno plantear el modelado para la utilización
o utilizan los adjetivos para detectar la polaridad.
m de n-gramas. El etiquetado POS es uno de los
un Es preferible no establecer temática alguna
ic aspectos que también se prestan a mejoras.
ac cuando el tweet no contiene entidades
ió Además de tener en cuenta adjetivos y
nombradas a utilizar su contenido para ello.
i
P
entidades nombradas, se podría experimentar
ub con el uso de verbos, expresiones o sintagmas
lic
ac
5 Conlcusiones y Trabajo Futuro nominales como representantes de una
io
ns En este trabajo se ha presentado una técnica de polaridad o temática. Otros métodos como la
modelado basada en divergencias del lenguaje utilización de técnicas de expansión semántica
con fuentes externas también serían 17th International Conference on
recomendables. WorldWide Web, páginas 91–100.
Rosen-Zvi, M. Chemudugunta, C. Griffiths, T.
Bibliografía Smyth, P. Steyvers, M. 2010. Learning
Blei, D. y Lafferty, J. 2009. Topic models. Text author-topic models from text corpora.
Mining: Theory and Application. ACM Transactions on Information
Blei, D. Ng, A. Jordan, M. 2003. Latent Systems, 28(1): 1–38.
dirichlet allocation. The Journal of Soo-Min, K. Hovy, E. 2006. Extracting
Machine Learning Research (3):993–1022. opinions, opinion holders, and topics
Castellanos, A. Cigarrán, J. García-Serrano, A. expressed in online news media text. En
2012. Generación de un corpus de usuarios Proceedings of ACL/COLING Workshop
basado en divergencias del lenguaje. En on Sentiment and Subjectivity in Text.
Proceedings of the Segundo Congreso Tan, S. Zhang, J. 2008. An empirical study of
Español de Recuperación de In-formación sentiment analysis for chinese documents.
(CERI 2012). Expert Systems with Applications (34):
2622–2629.
Choi, Y. Breck, E. Cardie, C. 2006. Joint
Wu, C. Shen, L. 2009. A New Method of Using
extraction of entities and relations for
opinion recognition. En Proceedings of Contextual Information to Infer the
EMNLP 2006. 2006. Semantic Orientations of Context
Hernández-Aranda, D. Granados, R. García- Dependent Opinions. En Proceedings of th
2009 International Conference on
Serrano, A. 2012. Servicios de anotación y
Artificial Intelligence and Computational
búsqueda para corpus multimedia. Revista
Intelligence.
de la Sociedad Española para el
Xia, R. Zong, C. Li, S. 2011. Ensemble of
Procesamiento del Lenguaje Natural,
feature sets and classification algorithms
SEPLN (49).
Jia, C. Yu, Meng, W. 2009. The Effect of for sentiment classification. Information
Sciences (181): 1138–1152.
Negation on Sentiment Analysis and
Xu, K. Shaoyi-Liao, S. Li, J. Song, Y. 2011.
Retrieval Effectiveness. En Proceedings of
Mining comparative opinions from
CIKM 2009.
customer reviews for Competitive
Jian, Z. Chen, X. Han-Shi, W. 2010. Sentiment
Intelligence. Decision Support Systems
classification using the theory of ANNs.
(50): 743–754.
The Journal of China Universities of Posts
Yu, H. Hatzivassiloglou, V. 2003. Towards
and Telecommunications (17)(Suppl.): 58–
C
answering opinion questions: Separating
op 62.
yri facts from opinions and identifying the
gh Joshi, M. Penstein-Rose C. 2009. Generalizing
t polarity of opinion sentences. En
© dependency features for opinion mining.
20 Proceedings of the Conference on
12 En Proceedings of ACL/IJCNLP 2009.
. Empirical Methods in Natural Language
Kullback, S. Leibler. R.A. 1951. On
U
Processing (EMNLP-2003), páginas 129 -
ni
information and sufficiency. Annals of
ve 136.
rsi Mathematical Statistics, 22(1): 79-86.
tat Zhang, Z. Ye, Q. Zhang, Z. Li,Y. 2011.
Ja Liu, Y. Niculescu-Mizil, A. Gryc, W. 2009.
u Sentiment classification of Internet
m Topic-link lda: joint models of topic and
e restaurant reviews written in Cantonese.
I. author community. En Proceedings of the
S Expert Systems with Applications.
er 26th Annual International Conference on
ve
i
Machine Learning, páginas 665–672. Melville,
de
C
Gryc, W. 2009. Sentiment Analysis of
o Blogs by Combining Lexical Knowledge
m
un with Text Classification. En Proceedings
ic
ac of the KDD 2009.
ió
i Phan, X. Nguyen, L. Horiguchi, S. 2008.
P
ub
Learning to classify short and sparse text
lic
ac
& web with hidden topics from large-scale
io data collections. En Proceedings of the
ns
SINAI en TASS 2012
SINAI at TASS 2012
Eugenio Martínez Cámara M. Ángel García Cumbreras
M. Teresa Martín Valdivia L. Alfonso Ureña López
Departamento de Informática, Escuela Politécnica Superior de Jaén
Universidad de Jaén, E-23071 – Jaén
{emcamara, maite, magc, laurena}@ujaen.es
Resumen: En el presente artículo se describe la participación del grupo de investigación

SINAI de la Universidad de Jaén en la primera edición del taller sobre Análisis de
Sentimientos en el congreso de la SEPLN (TASS 2012). El Taller propone dos tareas, una
centrada en la determinación de la polaridad de tweets en español, y una segunda en la
que hay que identificar los temas a los que pertenecen los tweets. Para la primera tarea se
ha optado por una estrategia de aprendizaje automático supervisado, siendo SVM el
algoritmo elegido. Se han realizado diferentes experimentos en los que se han incluido
bolsas de palabras positivas y negativas. En cuanto a la segunda tarea, también se ha
utilizado SVM, y con el fin de mejorar el resultado de la clasificación se ha combinado
con bolsas de palabras de cada uno de los temas.
Palabras clave: Twitter, Análisis de Sentimientos, Análisis de la Opinión, método
supervisado, SVM.
Abstract: In this paper is described the participation of the SINAI research group of the
University of Jaén in the first edition of the workshop on Sentiment Analysis at the
SEPLN congress (TASS 2012). The Workshop includes two tasks, the first one is focused
in the polarity classification of a corpus of Spanish tweets, and the second one involves a
topic classification. For the first task, we have chosen a supervised machine learning
approach, in which we have used SVM for classifying the polarity. In the second task, we
C
have also used SVM for the topic classification but several bags of words have been used
op with the goal of improving the classification performance.
yri
gh Keywords: Twitter, Sentiment Analysis, Opinion Mining, Supervised Machine Learning,
t
© SVM
20
12
.
U
ni
ve computacional de la información subjetiva
rsi
tat
1 Introducción presente en cualquier tipo de documento
Ja
u En este artículo presentamos los (Pang & Lee, 2008).
m
e experimentos y resultados obtenidos en el La proliferación de contenidos web
I.
S Taller de Análisis de Sentimientos en la generados por los propios usuarios en blogs,
er
ve SEPLN (TASS 2012). Concretamente hemos wikis, foros o redes sociales ha motivado
i
de
participado en las dos tareas propuestas: que empresas, investigadores y
C Sentiment Analysis y Trending Topic organizaciones se interesen por analizar y
o
m Coverage. monitorizar toda esta información que
un
ic El Análisis de Sentimientos (AS), circula por la red. Es por ello, que cada vez
ac
ió también conocido como Minería de en más foros se presentan artículos
i
P Opiniones (MO), se ha convertido en una científicos, conferencias o proyectos
ub
lic
prometedora disciplina de investigación que relacionados con la MO.
ac se encuadra dentro del Procesamiento del Por otra parte, la MO se ha tratado
io
ns Lenguaje Natural (PLN) y la Minería de fundamentalmente sobre textos extensos
Datos. Se suele definir como el tratamiento como por ejemplo documentos en blogs o
artículos de opinión. Sin embargo, debido al expresan sus opiniones sobre los temas más
enorme éxito de las redes sociales, el interés variados. En España la presencia de Twitter
para analizar las opiniones en textos cortos ha ido creciendo paulatinamente, y es a
está creciendo de manera exponencial. partir de 2010 cuando empresas, políticos y
Aunque se trata de un área de usuarios en general se están dando cuenta
investigación relativamente nueva, existen del verdadero potencial de esta red social.
una gran cantidad de trabajos relacionados Sería de gran utilidad el poder determinar de
con AS, y más específicamente con la forma automática la polaridad de esas
clasificación de la polaridad. Se pueden opiniones permitiendo desarrollar sistemas
distinguir dos formas de tratar el problema. que se encarguen de estudiar y analizar la
La primera se basa en técnicas de intención de voto de los ciudadanos, la
aprendizaje automático, que utilizan una opinión de consumidores sobre algún
colección de documentos con el fin de producto o servicio concreto o el estado de
entrenar un clasificador (Pang y otros, ánimo de las personas.
2002), y la segunda se fundamenta en el Los tweets tienen características que los
concepto de orientación semántica, que no hacen diferentes de las opiniones y
necesita el entrenamiento de ningún comentarios que hay en foros y páginas web.
algoritmo, pero sí debe tener en cuenta la Normalmente los comentarios u opiniones
orientación de las palabras (positiva o que se escriben en Internet suelen ser textos
negativa) (Turney, 2002). En este trabajo más o menos extensos en los que los
intentamos combinar ambas aproximaciones usuarios intentan resumir lo que piensan
con el fin de mejorar la precisión de los sobre un determinado tema, pero los tweets
sistemas. suelen estar escritos en un lenguaje informal,
Además, la mayoría de las y su extensión está limitada a 140 caracteres.
investigaciones se han centrado en textos en Por otra parte, muchos tweets no expresan
inglés aunque está claro que cada día más, opiniones sino situaciones que les ocurren a
otros idiomas como el chino o el español los usuarios. Por último, los tweets tienen
están haciendo uso de internet. Precisamente que ser analizados a nivel de frase, y no a
el taller que es la base de este artículo utiliza nivel de documento.
un corpus en español, lo que lo hace aún Aunque la mayoría de los usuarios de
más atractivo desde el punto de vista Twitter provienen de EE.UU., últimamente
científico. no ha parado de crecer su presencia en el
El resto del artículo se organiza como resto de países del mundo, siendo España un
C
op
sigue: la siguiente sección incluye una ejemplo claro. Debido a que la explosión de
yri
gh
pequeña revisión del estado del arte sobre popularidad de esta red social es
t MO en Twitter. A continuación, se presenta relativamente reciente en países de habla no
©
20 una breve introducción a la categorización inglesa, los pocos trabajos de investigación
12
. de texto en Twitter. La sección 4 muestra el publicados relacionados con AS y Twitter
U
ni corpus y el proceso de preparación de los son todos para lengua inglesa. Existen, por
ve
rsi
datos. En las secciones 5 y 6 se presenta la ejemplo, algunos trabajos que utilizan
tat
Ja
experimentación y resultados en las dos Twitter a modo de corpus. (Petrovic,
u tareas abordas. Osborne, Lavrenko, 2010) crean un gran
m
e
I.
corpus con 97 millones de tweets. (Pak &
S 2 Análisis de la opinión en Twitter Paroubek, 2010) describen cómo generar de
er
ve
Las redes sociales se han convertido en una forma automática un corpus de tweets
i
de inmejorable fuente de datos para MO positivos, negativos y neutros. El corpus que
C
o mostrándose como una herramienta crean lo utilizan para entrenar un
m
un fundamental donde la gente expresa sus clasificador de sentimientos. (Go, Bhayani,
ic
ac opiniones. Cada día más, los usuarios de Huang, 2009) usan técnicas de aprendizaje
ió
Internet utilizan las redes sociales para automático para construir un clasificador
i
P
expresar sus sentimientos y sensaciones que les permita determinar la polaridad de
ub
lic sobre cualquier tema. Un ejemplo de esto los tweets. Para el etiquetado del corpus en
ac
io son las redes de micro-blogging, como tweets negativos y positivos siguen la misma
ns
Twitter, en la que en tiempo real los usuarios estrategia que se describe en (Read, 2005).
(Jansen, Zhang, Sobel, Chowchuri, 2009)
demuestran como los sitios de micro- como blogs de opinión o redes sociales.
blogging son una herramienta muy útil en Existen algunos trabajos que estudian este
marketing, e indica que los tweets pueden problema. Por ejemplo, (Sriram et al., 2010)
considerarse como Electronic Word Of proponen una aproximación que categoriza
Mouth (EWOM). Siguiendo esta línea de tweets dependiendo del texto que contienen
utilizar Twitter como otra herramienta más en un conjunto predefinido de clases
en marketing, en (Asur & Huberman, 2010) genéricas como noticias, eventos, opiniones,
se utiliza un corpus de tweets sobre un tratos o mensajes privados. Por su parte
conjunto de películas que se estrenaron a (Garcia et al., 2010) realizan una
finales de 2009 y principios de 2010, para comparación entre dos redes sociales (Blippr
demostrar la correlación existente entre la y Twitter) clasificando en las categorías
cantidad de tweets y su polaridad sobre una Movies, Books, Music, Apps y Games. Los
determinada película, con la recaudación en resultados para ambas redes sociales en cada
taquilla que ha obtenido en las dos primeras una de las categorías son muy parecidos. Por
semanas desde su estreno. (Bollen, Mao, último, distintas técnicas de reconocimiento
Zeng, 2011) investigan la posible de entidades (NER) son aplicadas en el
correlación del estado de ánimo que se trabajo de (Jung, 2011) para categorizar el
manifiesta en Twitter con la variación de los texto en redes sociales.
mercados de valores.
El estudio de la tendencia de la opinión 4 Preparación de los datos
política también ha sido un tema que ha
atraído a la comunidad científica. El taller TASS 2012 proporciona dos
(Tumasjan, Sandner, Welpe, 2010) llevan a conjunto de datos, uno de entrenamiento y
cabo un interesante estudio con más de otro para el test. En la Tabla 1 y en la Tabla
100.000 tweets sobre las elecciones 2 se muestran algunas características de los
federales al parlamento alemán que se dos corpus.
celebraron en 2009. Intentan medir la
intención de voto simplemente contando en Nº Tweets 7.219 100%
número de menciones que tiene cada #P 1.019 14,12%
partido. El resultado es bastante #P+ 1.764 14,2%
sorprendente ya que el error cometido no #NEU 610 8,45%
llega al 2%. (O’Connor, Balasubramanyan, #N 1.221 16,91%
Routledge, Smith, 2010) analizan opiniones #N+ 903 12,51%
C
op
políticas y sobre productos comerciales y #NONE 1.702 23,58%
yri comparan los resultados usando Twitter y Tabla 1: Nº de tweets por clase del corpus de
gh entrenamiento para la tarea 1
t encuestas. (Diakopoulos & Shamma, 2010)
©
20 clasifican la polaridad de los tweets durante
12
. el debate presidencial en los Estados Unidos Nº Tweets 60.798 100%
U
ni del año 2008 retransmitido por televisión y #P 1.487 2,45%
ve
rsi
seguido por muchas personas en Twitter. #P+ 20.744 34,12%
tat
Ja
#NEU 1.304 2,15%
u
m
3 Categorización de texto en #N 11.286 18,56%
e
I.
Twitte #N+ 4.556 7,49%
S
er
r #NONE 21.415 35,23%
ve Tabla 2: Nº de tweets por clase del corpus de test para
i Sin duda, la tarea más estudiada en AS es la tarea 1
de
C la clasificación de la polaridad. Sin embargo,
o
m existen muchas otras aplicaciones bastante
un interesantes (sistemas de recomendación, En las tablas se puede comprobar que los
ic
ac filtrado colaborativo…) que pueden dos corpus no están balanceados. Esto no es
ió
i aprovechar la investigación en otro tipo de muy importante para el test, pero para el
P
tareas como la extracción de información entrenamiento sí es vital, ya que es muy
ub
lic
subjetiva o la detección de ironía. Una de probable que se produzca un sobre-
ac
io
estas tareas consiste en categorizar en entrenamiento de aquellos niveles de
ns
distintas clases predefinidas los textos polaridad con un mayor número de tweets en
extraídos de sitios potencialmente subjetivos el conjunto de entrenamiento.
Antes de aplicar los datos al clasificador Para resolver el problema planteado se
de la polaridad, y posteriormente al decidió seguir una estrategia basada en
clasificador de categorías, se le ha aplicado aprendizaje automático supervisado. El
al conjunto de tweets un proceso de método supervisado requiere de la
limpieza, con la intención de reducir la construcción de un modelo a partir de un
mayor cantidad de ruido posible. Dicho conjunto de datos, en el que cada objeto se
proceso está formado por: encuentre etiquetado con la clase a la que
1. Eliminación de caracteres que no sean pertenece, y que normalmente recibe el
letras del alfabeto español o números. nombre de conjunto de datos de
La eliminación de los signos de entrenamiento. Normalmente la obtención
exclamación se produce posteriormente de este conjunto de datos suele ser un
a su tratamiento, en aquellos problema, pero en este caso ha sido
experimentos en los que se ha estudiado proporcionado por los organizadores del
su influencia. Taller. El siguiente paso fue seleccionar la
2. Se ha llevado a cabo una normalización configuración del sistema que con ese
de las expresiones de risa, de manera conjunto de entrenamiento proporcionara
que todas ellas se encuentren unos mejores resultados, es decir, se llevó
representadas por una misma expresión. acabo un proceso evaluación del
3. También se han normalizado las clasificador. El algoritmo de clasificación
palabras que tienen letras repetidas. El elegido es SVM (Support Vector Machines)
proceso ha consistido en reducir a dos (Vapnik, 1995), y más concretamente la
1
repeticiones toda letra que estuviera implementación SVMLight . La elección de
repetida más de tres ocasiones, de SVM se ha fundamentado a los buenos
manera que se considera distinta a la resultados que suele ofrecer en los trabajos
palabra original, pero de la misma forma de AS, pudiéndose consultar muchos de
independientemente del número de ellos en (Pang y Lee, 2008). Además, SVM
repeticiones. también ha sido utilizado con éxito por
Una vez realizado este preprocesado, los nuestro equipo en varios trabajos de AS
datos ya están preparados para su aplicación (Martínez-Cámara et al., 2011a), (Martínez-
al clasificador tanto de polaridad como de Cámara et al., 2011b).
temas. Una vez que se han limpiado los datos tal
y como se ha descrito en la sección 4 , se
5 Tarea 1: Clasificación de diseñaron un conjunto de 33 experimentos,
C
op
la polaridad que van desde el caso base en el que
yri solamente se tokenizan los tweets, hasta
gh La primera tarea propuesta en el taller
t configuraciones a los que se añaden número
© consiste en el desarrollo de un sistema que
20 de palabras positivas, negativas que
12 tenga la capacidad de identificar la
. aparecen en los tweets. Las características
intensidad de la opinión o emotividad que
U
que se han utilizado para evaluar la
ni
expresan un conjunto de tweets. De una
ve configuración del clasificador han sido:
rsi manera más técnica, el objetivo de la tarea
tat 1. Unigramas: Cada tweet se tokeniza, y se
Ja es la clasificación de tweets en español, que
u utiliza la métrica TF para representar
m no se encuadran dentro de un dominio
e cada unigrama. TF se refiera a la
I. específico, en cinco niveles de polaridad:
S frecuencia relativa de cada unigrama en
1. NONE: Texto objetivo o que no expresa
er
el tweet. La elección de TF, y no por
ve
ningún tipo de opinión.
i ejemplo TF-IDF, se basa en trabajos
de 2. N+: Opinión negativa con un cierto
C previos en el taller en el que siempre los
o grado de intensidad.
m mejores resultados se han obtenido con
un 3. N: Opinión negativa.
ic TF.
4. NEU: Texto subjetivo en el que el autor
ac
ió 2. Emoticonos: Se añade como
i no expresa de manera clara su posición. característica el número de emoticonos
P
ub
5. P: Opinión positiva positivos o negativos que aparecen en el
lic
ac
6. P+: Opinión positiva con un cierto grado tweet. Para ello se ha utilizado una bolsa
io de intensidad.
ns
1
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
de emoticonos positivos y negativos, contador de palabras negativas, en lugar
más concretamente los que aparecen en del de positivas.
la Tabla 3. En las tareas de aprendizaje automático
3. Palabras positivas y negativas: En relacionadas con procesamiento de texto se
algunas de las evaluaciones que se han suele utilizar stopper y stemmer para reducir
llevado a cabo se han incluido como el número de características léxicas. Para los
características el número de palabras experimentos que se han llevado a cabo para
positivas y negativas que aparecen en la participación en el taller, solo se ha
los tweets. Esto se ha realizado probado la eficacia de la aplicación de
siguiendo un enfoque basado en bolsa de stemmer, ya que en investigaciones previas
palabras. En español no se ha hemos llegado a la conclusión que para la
encontrado ninguna bolsa de palabras tarea de Análisis de la Opinión en Twitter, el
positivas y negativas, por lo que se uso de stopper es contraproducente.
decidió traducir automáticamente la El procedimiento de evaluación del
presentada en (Hu & Liu, 2004), y que clasificador elegido ha sido el de K-Cross-
2
se puede descargar desde la web de uno Validation con un valor de K=10.
de sus autores.
4. Intensidad: Aprovechando la bolsa de Emo. Positivos :) : ) :-) ;) ;-)
palabras, se decidió añadir como =) ^_^ :-D :D
característica adicional el número de :d =D C: Xd
palabras positivas y negativas con XD xD (x (=
caracteres repetidos, de manera que se ^^ ô^ ’u’ n_n
pudiera modelizar la intensidad de la *-* *O* *o*
opinión o emoción que el autor quiere *_*
expresar. Emo. Negativos :-( :( :(( : ( D:
En otro conjunto de experimentos las Dx ’n’ :\ /:
palabras indicadoras de opinión y que ):-/ :’ =’[ :_(
contaban con caracteres repetidos no se /T_T TOT ;_;
les da ningún tratamiento especial, Tabla 3 Emoticonos positivos y negativos
mientras que otro conjunto de
evaluaciones del método al contador de
palabras positivas y negativas se le 5.1 Resultados tarea 1
añadía una unidad más en el caso de que La evaluación de la combinación de
C el término tuviera letras repetidas. La todas las características anteriores originó 33
op
yri intensidad puede ser expresada mediante experimentos, de los cuales se escogieron
gh
t signos de exclamación, por lo que en cuatro para presentarlos al taller.
©
20 otro conjunto de experimentos se Los resultados que se obtuvieron en el
12
.
aumentaba con una unidad adicional el proceso de evaluación del clasificador se
U
ni
contador de palabras positivas o pueden ver en la Figura 1. De todas esas
ve negativas en el caso de fueran configuraciones del clasificador se
rsi
tat acompañadas por un signo de seleccionaron los cuatros con mayor valor
Ja
u exclamación. de F1. Las características de esas
m
e También se ha experimentado con el configuraciones son:
I.
S
efecto de la presencia de partículas 1. EXP 2: En este caso cada tweet se
er
ve
negativas en el tweet. La presencia de representa como un conjunto de
i una partícula negativa delante de alguna unigramas, cuya importancia es
de
C palabra indicadora de opinión, hacía que indicada por su valor de TF. Antes de su
o
m su aportación se sumara al contador clasificación se le aplica un proceso de
un
ic opuesto a la categoría que pertenece, es stemmer.
ac
ió
decir, que si una palabra positiva va 2. EXP 6: Lo mismo que el anterior pero
i
P
acompañada de un elemento negativo, además se normalizan las direcciones
ub su aportación se registrada en el web, de manera que cada url se sustituye
lic
ac
io
por URL. En este caso también se
ns 2
http://www.cs.uic.edu/~liub/FBS/opinion- normalizan las menciones, por lo que
lexicon-English.rar cada expresión de la forma
@nombre_usuario, se sustituye por N 33,61%
MENTION. N+ 32,00%
3. EXP 4: Igual que EXP 6 pero en esta NONE 71.52%
ocasión solo se normalizan las Total 54,68%
menciones. Tabla 4: Resultados del EXP 19
4. EXP 19: En experimento 19, además de
las características léxicas incluidas en
Los mayores errores se encuentran entre
las tres anteriores configuraciones, se
clases cercanas, como son NONE y NEU, y
incluyen como características el número
entre P y P+. El error entre las clases NONE
de emoticonos positivos y negativos, el
y NEU es muy probable que sea debido a la
número de palabras positivas y
diferencia tan grande entre el número de
negativas, y en el caso de que alguna de
tweets NEU y NONE, por lo que
esos términos tengan caracteres
seguramente el clasificador ha sobre-
repetidos se considera dicha palabra
entrenado la clase NONE. El error en las
como doble.
clases positivas es muy probable que esté
Como se puede comprobar las
debido a conjunto de palabras positivas
configuraciones de la clasificación que
utilizadas, así como a la asignación de una
mejores resultados han originado son los
doble importancia a aquellas palabras
más simples. Únicamente el EXP19 es el
positivas con letras repetidas.
que incluye características que indican una
En la Tabla 5 se muestra los resultados
cierta información semántica, como es el
para la configuración correspondiente a EXP
número de emoticonos y palabras positivas y
6. En este caso, al no haber ninguna
negativas.
característica semántica, el sobre-ajuste
El que no haya funcionado el enfoque
sobre la clase NONE es mucho más
basado en bolsa de palabras positivas y
evidente.
negativas, como se pensaba en un principio,
se cree que es debido a la calidad de la
Experimento EXP 6
traducción del listado de términos original, y
a las lógicas diferencias entre expresiones P 0,27%
que en inglés pueden ser indicativo de una P+ 4,69%
opinión, y en español no. Esto indica, que NEU 0%
tiene que haber una mejora de los recursos N 0,12%
lingüísticos en español relacionados con el N+ 0,37%
C
Análisis de la Opinión, y mientras tanto NONE 96,48%
op
yri seguir investigando en el uso de otros Total 35,65%
gh
t recursos de mayor calidad en inglés. Tabla 5: Resultados del EXP 6
©
20 Los resultados obtenidos con los datos de
12
. test, y que los organizadores han publicado La Tabla 6 contiene los resultados del
U
ni son: EXP 2. En este caso existe un mayor sobre-
ve
rsi
ajuste sobre la clase NONE. Además, al
tat
Ja
Experimento EXP 19 tratarse de una configuración en la que solo
u P 0,94% se tiene en cuenta las características léxicas,
m
e P+ 60,99% se puede decir que las únicas clases con un
I.
S NEU 0,38% vocabulario más determinante son P y P+, ya
er
ve que son las únicas en las que unos pocos
i
de
C
o
m
un
ic
ac
ió
i
P
ub
lic
ac
io
ns
tweets se han clasificado correctamente. entrenamiento. Como características a
procesar se han utilizado los unigramas,
Experimento nivel léxico, de los tweets con o sin procesar.
P Se realizaron experimentos previos con otros
P+ sistemas de aprendizaje automático, siendo
NEU SVM el que obtuvo los mejores resultados.
N El procesado de los tweets es el mismo
N+ descrito en la tarea 1.
NONE Además, se han generado dos bolsas de
Total palabras relevantes para cada categoría para
Tabla 6: Resultados del EXP 2 mejorar el rendimiento del sistema. La
primera bolsa de palabras se ha obtenido a
3
En la Tabla 7 se muestran los resultados partir de Google AdWordsKeyWordTool ,
de la configuración EXP 4. EXP4 tiene un que permite introducir un término y
devuelve las n ideas directamente
comportamiento similar que EXP 2, pero en
relacionadas. La segunda bolsa de palabras
esta ocasión si se consigue clasificar algún
se ha obtenido a partir de los hashtags de los
tweet de las clases N y N+. Esto puede ser
tweets de entrenamiento, tomando para cada
debido a la no normalización de las
categoría los hashtags que únicamente
direcciones web, que haya favorecido a esas
aparecen en los tweets de dicha categoría.
clases, y haya perjudicado algo la
En función de las características, el
clasificación de la clase NONE.
procesado de los tweets y las bolsas de
palabras se realizaron diversos experimentos
Experimento con el conjunto de tweets de entrenamiento y
P evaluación cruzada (10-fold cross
P+ validation). Según los resultados previos
NEU obtenidos se presentaron los siguientes
N experimentos oficiales:
N+ - Top-sinai-1 (caso base). Los tweets no
NONE se han procesado ni se utilizan bolsas de
Total palabras.
Tabla 7: Resultados del EXP 4 - Top-sinai-2. Los tweets se han
procesado pero no se aplica ni stopper ni
C
op 6 Tarea 2: Categorización de stemmer. No se utilizan bolsas de
yri
temas palabras.
gh
t - Top-sinai-3. Los tweets se han
©
20
En esta tarea se evalúa el rendimiento de un procesado y se aplica stopper y
12
.
clasificador que identifique correctamente el stemmer. No se utilizan bolsas de
U topic de un tweet, y en función de ese topic palabras.
ni
ve analice la polaridad de dicho tweet. La - Top-sinai-4. Los tweets se han
rsi
tat evaluación se realiza conforme a las mismas procesado y se aplica stopper y
Ja
u métricas definidas en la tarea anterior stemmer. Como bolsas de palabras se
m
e
(precisión, recall y F1). utilizan los hashtags, y se añaden a cada
I. Para esta tarea se ha desarrollado un tweet en entrenamiento, dependiendo de
S
er sistema de clasificación multietiqueta, que su topic.
ve
i tomando como base las categorías de los - Top-sinai-5. Los tweets se han
de
C tweets de entrenamiento, categorice procesado y se aplica stopper y
o
m
correctamente un nuevo tweet. Las stemmer. Como bolsas de palabras se
un
ic
categorías identificadas son: cine, deportes, utilizan los hashtags y las palabras de
ac economía, entretenimiento, fútbol, literatura, Adwords, y se añaden a cada tweet de
ió
i música, otros, política y tecnología. entrenamiento, dependiendo de su topic.
P
ub El sistema de clasificación utilizado para Enviados estos experimentos como
lic
ac los experimentos oficiales es un sistema de
io
ns
aprendizaje automático basado en SVM, que 3
toma diversas características para el Disponible en
https://adwords.google.com/o/KeywordTool
oficiales estos han sido los resultados Agradecimientos
obtenidos, en términos de precisión:
Esta investigación ha sido subvencionada
Run id parcialmente por el Fondo Europeo de
top-sinai-1 Desarrollo Regional (FEDER), a través del
proyecto TEXT-COOL 2.0 (TIN2009-
top-sinai-2
13391-C04-02) por el gobierno español, y
top-sinai-3
por la Comisión Europea bajo el Séptimo
top-sinai-4
programa Marco (FP7 - 2007-2013) a través
top-sinai-5 del proyecto FIRST (FP7-287607).
Tabla 8: Resultados oficiales de la tarea 2
Bibliografía
Analizando las categorías etiquetadas
automáticamente por nuestro sistema Asur, Sitaram, Huberman, Bernardo A.
observamos que la mayoría de los tweets de (2010). Predicting the Future with Social
evaluación han sido etiquetados en las Media. 2010 IEEE/WIC/ACM
categorías “otros”, “política” y International Conference on Web
“entretenimiento”. Por este motivo sacamos Intelligence and Intelligent Agent
estadísticas de etiquetado de la colección de Technology. Vol. 1, pp.492-499.
entrenamiento, obteniendo los datos que se Bollen, J. Mao, H., Zeng, X. (2011). Twitter
pueden ver en la Tabla 9. mood predicts the stock market. Journal
Observando estos resultados no es difícil of Computational Science. Vol. 2, Núm.
concluir que la distribución de categorías 1, pp. 1-8.
está muy desbalanceada para un sistema de
aprendizaje automático al uso, sin contar con Diakopoulos, N. A. and D. A. Shamma.
información adicional de la categoría a la (2010). Characterizing debate
hora de entrenar, ya que, por ejemplo, con performance via aggregated twitter
un subconjunto de entrenamiento del 1,11% sentiment. CHI ’10: Proc. of the 28th
(literatura) resultará imposible que un tweet International Conf. on Human Factors in
de evaluación lo clasifique en dicha Computing ystems. New York, NY,
categoría. USA. ACM. pp 1195–1198.
Garcia, S. O'Mahony, M.P., Smyth, B.
Categoría # tweets Towards tagging and categorization for
C cine 183 micro-blog. 21st National Conference on
op
yri
deportes 101 Artificial Intelligence and Cognitive
gh
t
economía 525 Science (AICS 2010), Galway, Ireland,
© entretenimiento 1.2 30 August - 1 September, 2010
20
12 fútbol 225Go, A., R. Bhayani, and L. Huang. (2009).
.
U literatura Twitter sentiment classification using
ni
ve música 4 distant supervision. Technical report,
rsi
tat política 2.715 Stanford Digital Library Technologies
Ja
u otros 1.6 Project
m
e tecnología 145
I. Hu, Minqing, Liu, Bing. (2004). Mining and
TOTAL: 7.2
S
er Tabla 9: Nº de tweets por categoría
Summarizing Customer Reviews.
ve
i
Proceedings of the ACM SIGKDD
de
C
International Conference on Knowledge
o Estamos realizando un análisis más Discovery and Data Mining (KDD-
m
un
profundo de los resultados con el fin de 2004). Seattle, Washington, USA.
ic obtener más conclusiones y trabajo a realizar
ac
ió para mejorar el sistema, aunque estamos casi Jansen, B., M. Zhang, K. Sobel, and A.
i
P seguros que la mejora del clasificador pasará Chowdury (2009). Twitter power:tweets
ub
lic por entrenar diferentes categorías generales as electronic word of mouth. Journal of
ac
y específicas con otro material externo. the American Society for Information
io
ns Science and Technology.
Jung, J.J. "Towards Named Entity Read, J. (2005). Using emoticons to reduce
Recognition Method for Microtexts in dependency in machine learning
Online Social Networks: A Case Study of techniques for sentiment classification.
Twitter," asonam, pp.563-564, 2011 Proceedings of the ACL Student
International Conference on Advances in Research Workshop, pp. 43–48.
Social Networks Analysis and Mining,
201 Sriram, B., Fuhry, D., Demir, E.,
Ferhatosmanoglu, H. and Demirbas, M.
Martínez-Cámara E., Martín-Valdivia M. T., 2010. Short text classification in twitter
Perea-Ortega, J. M., Ureña-López, L. A. to improve information filtering. In
(2011b). Técnicas de clasificación de Proceedings of the 33rd international
opinions aplicadas a un corpus en ACM SIGIR conference on Research and
español. Revista de Procesamiento de development in information retrieval
Lenguaje Natural. Vol 47. Sociedad (SIGIR '10). ACM,
Española para el Procesamiento de
Lenguaje Natural. Tumasjan, A., T. O. Sprenger, P. G.
Sandner, and I. M. Welpe (2010).
Martínez-Cámara E., Martín-Valdivia M. T., Predicting elections with Twitter: What
Ureña-López, L. A. (2011a). Opinion 140 characters reveal about political
classification techniques applied to a sentiment. International AAAI
Spanish corpus. Natural Language Conference on Weblogs and Social
Processing and Information Systems. Media, Washington, D.C.
Springer. Pp 169-176.
Turney, P. D. (2002). Thumbs up or thumbs
O’Connor, B., R. Balasubramanyan, B. R. down?: semantic orientation applied to
Routledge, and N. A. Smith (2010). From unsupervised classification of reviews.
Tweets to polls: Linking text sentiment to Proceedings of the 40th Annual Meeting
public opinion time series. International on Association for Computational
AAAI Conference on Weblogs and Linguistics (ACL). ACL. Morristown,
Social Media, Washington, D.C. NJ, USA. pp. 417–424.
Pak, A., P. Paroubek (2010). Twitter as a Vapnik, V. (1995). The Nature of Statistical
corpus for sentiment analysis and opinion Learning Theory. Springer-Verlag, new
mining. Proceedings of the Seventh York
conference on International Language
C Resources and Evaluation (LREC’10),
op
yri European Language Resources
gh
t Association (ELRA), Valletta, Malta, pp.
©
20
19–21.
12
. Pang, B., Lee, L. (2008). Opinion mining
U
ni and sentiment analysis. Foundation and
ve
rsi Trends in Information Retrieval 2(1-2) 1-
tat
Ja 135
u
m
e
Pang, B., Lee, L., & Vaithyanathan, S.
I. (2002). Thumbs up? Sentiment
S
er classification using machine learning
ve
i techniques. Proceedings of the
de
C Conference on Empirical Methods in
o
m
Natural Language Processing (EMNLP).
un
ic
Association for Computational
ac Linguistics. pp. 79–86.
ió
i
P Petrovic, S., Osborne, M., Lavrenko, V.
ub
lic (2010). The Edinburgh Twitter corpus.
ac
io
SocialMedia Workshop: Computational
ns Linguistics in a World of Social Media,
pp. 25–26.
Lexicon-Based Sentiment Analysis of Twitter Messages in Spanish
Análisis de Sentimiento basado en lexicones de mensajes de Twitter
en
español
Antonio Moreno-Ortiz, Chantal Pérez Hernández
Facultad de Filosofía y Letras Universidad de
Málaga
{amo, mph}@uma.es
Resumen: Los enfoques al análisis de sentimiento basados en lexicones difieren de los más
usuales enfoques basados en aprendizaje de máquina en que se basan exclusivamente en
recursos que almacenan la polaridad de las unidades léxicas, que podrán así ser identificadas en
los textos y asignárseles una etiqueta de polaridad mediante la cual se realiza un cálculo que
arroja una puntuación global del texto analizado. Estos sistemas han demostrado un rendimiento
similar a los sistemas estadísticos, con la ventaja de no requerir un conjunto de datos de
entrenamiento. Sin embargo, pueden no resultar ser óptimos cuando los textos de análisis son
extremadamente cortos, tales como los generados en algunas redes sociales, como Twitter. En
este trabajo llevamos a cabo tal evaluación de rendimiento con la herramienta Sentitext, un
sistema de análisis de sentimiento del español.
Palabras clave: análisis de sentimiento basado en lexicones, analítica de texto, textos cortos,
Twitter, evaluación de rendimiento.
Abstract: Lexicon-Based approaches to Sentiment Analysis (SA) differ from the more common
machine-learning based approaches in that the former rely solely on previously generated
lexical resources that store polarity information for lexical items, which are then identified in
the texts, assigned a polarity tag, and finally weighed, to come up with an overall score for the
text. Such SA systems have been proved to perform on par with supervised, statistical systems,
with the added benefit of not requiring a training set. However, it remains to be seen whether
C such lexically-motivated systems can cope equally well with extremely short texts, as generated
op
yri on social networking sites, such as Twitter. In this paper we perform such an evaluation using
gh
t Sentitext, a lexicon-based SA tool for Spanish.
©
20
Keywords: lexicon-based sentiment analysis, text analytics, short texts, Twitter, performance
12
.
evaluation.
U
ni
ve only in the field of sentiment analysis, but in
rsi
tat 1 Introduction most text mining and information retrieval
Ja
applications, as well as a wide range of data-
u
m 1.1 Approaches to Sentiment Analysis intensive computational tasks. However, their
e
I. Within the field of sentiment analysis it has obvious disadvantage in terms of functionality
S
er become a commonplace assertion that is their limited applicability to subject domains
ve
i successful results depend to a large extent on other than the one they were designed for.
de
C
developing systems that have been specifically Although interesting research has been done
o
m
developed for a particular subject domain. This aimed at extending domain applicability (Aue
un view is no doubt determined by the & Gamon 2005), such efforts have shown
ic
ac methodological approach that most such limited success. An important variable for these
ió
i systems employ, i.e., supervised, statistical approaches is the amount of labeled text
P
ub
machine learning techniques. Such approaches available for training the classifier, although
lic
ac
have indeed proven to be quite successful in the they perform well in terms of recall even with
io past (Pang & Lee, 2004; Pang & Lee, 2005). In relatively small training sets (Andreevskaia &
ns
fact, machine learning techniques, in any of Bergler, 2007).
their flavors, have proven extremely useful, not On the other hand, a growing number of
initiatives in the area have explored the was first posed by Pang & Lee (2005), and the
possibilities of employing unsupervised approach is usually referred to as seeing stars in
lexicon-based approaches. These rely on reference to this work.
dictionaries where lexical items have been
1
assigned either polarity or a valence , which has 1.2 Sentiment Analysis for Spanish
been extracted either automatically from other Work within the field of Sentiment Analysis for
dictionaries, or, more uncommonly, manually. Spanish is, by far, scarcer than for English.
The works by Hatzivassiloglou & McKewon Cruz et al. (2008) developed a document
(1997) and Turney (2002) are perhaps classical classification system for Spanish similar to
examples of such an approach. The most salient Turney (2002), i.e. unsupervised, though they
work in this category is Taboada et al. (2011), also tested a supervised classifier that yielded
whose dictionaries were created manually and better results. In both cases, they used a corpus
use an adaptation of Polany & Zaenen’s (2006) of movie reviews taken from the Spanish
concept of Contextual Valence Shifters to Muchocine website. Boldrini et al. (2009)
produce a system for measuring the semantic carried out a preliminary study in which they
orientation of texts, which they call SO- used machine learning techniques to mine
CAL(culator). This is exactly the approach we opinions in blogs. They created a corpus for
used in our Sentitext system for Spanish Spanish using their Emotiblog system, and
(Moreno-Ortiz et al., 2010). discussed the difficulties they encountered
Combining both methods (machine learning while annotating it. Balahur et al. (2009) also
and lexicon-based techniques) has been presented a method of emotion classification
explored by Kennedy & Inkpen (2006), who for Spanish, this time using a database of
also employed contextual valence shifters, culturally dependent emotion triggers.
although they limited their study to one Finally, Brooke et al. (2009) adapted a
particular subject domain (the traditional movie lexicon-based sentiment analysis system for
reviews), using a “traditional” sentiment English (Taboada et al., 2006, 2011) to Spanish
lexicon (the General Inquirer), which resulted by automatically translating the core lexicons
in the “term-counting” (in their own words) and adapting other resources in various ways.
approach. They also provide an interesting evaluation that
The degree of success of knowledge-based compares the performance of both the original
approaches varies depending on a number of (English) and translated (Spanish) systems
variables, of which the most relevant is no using both machine learning methods
doubt the quality and coverage of the lexical (specifically, SVM) and their own lexicon-
C resources employed, since the actual algorithms based semantic orientation calculation
op
yri employed to weigh positive against negative algorithm, the above mentioned SO-CAL. They
gh
t segments are in fact quite simple. found that their own weighting algorithm,
©
20 Another important variable concerning which is based on the same premises as our
12
.
sentiment analysis is the degree of accuracy that system (see below), achieved better accuracy
U the system aims to achieve. Most work on the for both languages, but the accuracy for
ni
ve field has focused on the Thumbs up or thumbs Spanish was well below that for English.
rsi
tat down approach, i.e., coming up with a positive Our system, Sentitext (Moreno-Ortiz et al.,
Ja
u or negative rating. Turney's (2002) work, from 2010, 2011), is very similar to Brooke et al.’s in
m
e
which the name derives, is no doubt the most design: it is also lexicon-based and it makes use
I.
S
representative. A further step involves an of a similar calculation method for semantic
er attempt to compute not just a binary orientation. It differs in that the lexical
ve
i classification of documents, but a numerical knowledge has been acquired semi-
de
C rating on a scale. The rating inference problem automatically and then fully manually revised
o
m from the ground up over a long period of time,
1
un
ic
Although the terms polarity and valence are with a strong commitment to both coverage and
ac sometimes used interchangeably in the literature, quality. It makes no use of user-provided,
ió
i especially by those authors developing binary text explicit ratings that supervised systems
P
ub classifiers, we restrict the usage of the former to typically rely on for the training process, and it
lic
ac non-graded, binary assignment, i.e., positive or produces an index of semantic orientation based
io
ns negative, whereas the latter is used to refer to an n- on weighing positive against negative text
point semantic orientation scale. segments, which is then transformed into a ten-
point scale and a five-star rating system. ‒5 to 5, which makes sense for a number of
graded sets of near synonyms such as those
2 Sentiment Analysis with Sentitext given as examples by the authors (p. 273). In
Sentitext is a web-based, client-server our opinion, however, as more values are
application written in C++ (main code) and allowed, it becomes increasingly difficult to
Python (server). The only third-party decide on a specific one while maintaining a
component in the system is Freeling (Atserias et reasonable degree of objectivity and agreement
al., 2006, Padró, 2011), a powerful, accurate, among different (human) acquirers, especially
multi-language NLP suite of tools, which we when there is no obvious graded set of related
use for basic morphosyntactic analysis. words, which is very often the case.
There are two ways in which the original
Currently, only one client application is
available, developed in Adobe Flex, which valence of a word or phrase can be modified by
takes an input text and returns the results of the the immediately surrounding context: the
analysis in several numerical and graphical valence can change in degree (intensification or
ways, including visual representations of the downtoning), or it may be inverted altogether.
text segments that were identified as sentiment- Negation is the simplest case of valence
2
laden . Lexical information is stored in a inversion.
relational database (MySQL). The idea of Contextual Valence Shifters
Being a linguistically-motivated sentiment (CVS) was first introduced by Polanyi &
analysis system, special attention is paid to the Zaenen (2006), and implemented for English by
representation and management of the lexical Andreevskaia & Bergler (2007) in their CLaC
resources. The underlying design principle is to System, and by Taboada et al. (2011) in their
isolate lexical knowledge from processing as Semantic Orientation CALculator (SO-CAL).
much as possible, so that the processors can use To our knowledge, apart from Brooke et al.’s
the data directly from the database. The idea (2009) adaptation of the SO-CAL system, to the
behind this design is that all lexical sources can best of our knowledge, Sentitext is the only
be edited at any time by any member of the sentiment analysis system to implement CVS
team, which is facilitated by a PHP interface for Spanish natively.
specifically developed to this end (GDB). This 2.2 Global Sentiment Value
kind of flexibility would not be possible with
Sentitext provides results as a number of
the monolithic design typical of proof-of-
metrics in the form of an XML file which is
concept systems.
then used to generate the reports and graphical
C
op
2.1 Lexical resources representations of the data. The crucial bit of
yri
Sentitext relies on three major sources: the information is the Global Sentiment Value
gh
t individual words dictionary (words), the (GSV), a numerical score (on a 0-10 scale) for
©
20 multiword expressions dictionary (mwords), the sentiment of the input text. Other data
12
. and the context rules set (crules), which is our include the total number of words, total number
U
implementation of Contextual Valence Shifters. of lexical words (i.e., content, non-grammatical
ni
ve
The individual words dictionary currently words), number of neutral words, etc.
rsi
tat contains over 9,400 items, all of which are To arrive at the global value, a number of
Ja
u labeled for valence. The acquisition process for scores are computed beforehand, the most
m
e this dictionary was inspired by the important of which is what we call Affect
I.
S bootstrapping method recurrently found in the Intensity, which modulates the GSV to reflect
er
literature (e.g., Riloff & Wiebe, 2003, Gamon the percentage of sentiment-conveying words
ve
i
& Aue, 2005). Lexical items in both the text contains.
de
C dictionaries in our database were assigned one Before we explain how this score is
o
m of the following valences: -2, -1, 0, 1, 2. Since obtained, it is worth stressing the fact that we
un
ic the words dictionary contains only sentiment- do not count words (whether positive, negative,
ac
ió carrying items, no 0-valence word is present. or neutral), but text segments that correspond to
i
The most similar sentiment analysis system lexical units (i.e., meaning units from a
P
ub to ours (Taboada et al., 2011) uses a scale from lexicological perspective).
lic
ac As we mentioned before, items in our
io
ns 2 dictionaries are marked for valence with values
The application can be accessed and tested
online at http://tecnolengua.uma.es/sentitext
in the range -2 to 2. Intensification context rules
2.5 ∙ � ∙� )∙ ��
( 2.5 ∙ � can
∙ �add+ up to three marks, for maximum score
! !
! !! ! ! !! ! (1)
of 5 (negative or positive) for any given 𝐺�𝑉 =
segment.
The simplest way of computing a global
5 ∙ (�� −
value for sentiment would be to add negative ��)
where Ni is the number of each of the negative
values on the one hand and positive values on
the other, and then establishing it by simple valences found, and Pi is the equivalent for
positive values. The sum of both sets is then
subtraction. However, as others have noted
multiplied by the Affect Intensity. LS is the
(e.g., Taboada et al. 2011), things are fairly
number of lexical segments and NS is the
more complicated than that. Our Affect
number of neutral ones. Although not expressed
Intensity measure is an attempt to capture the
in the equation, the number of possible scale
impact that different proportions of sentiment-
points (5) needs to be added to the resulting
carrying segments have in a text. We define
score, which, as mentioned before, is on a 0-10
Affect Intensity simply as the percentage of
scale.
sentiment-carrying segments. Affect Intensity is
not used directly in computing the global value
for the text, however, an intermediate step 3 Task description
consists of adjusting the upper and lower limits The evaluation experiment described in this
(initially -5 and 5). The Adjusted Limit equals paper was performed as conceived for the
the initial limit unless the Affect Intensity is TASS Workshop on Sentiment Analysis, a
greater than 25 (i.e., over 25% of the text’s satellite event of the SEPLN 2012 Conference.
lexical items are sentiment-carrying. Obviously, Two tasks were proposed by the organizers.
using this figure is arbitrary, and has been The main task consisted of performing an
arrived at simply by trial and error. The automatic analysis of a large corpus of Twitter
Adjusted Limit is obtained by dividing the messages (over 60,000), the aim of which was
Affect Intensity by 5 (since there are 5 possible to determine the polarity of each individual
negative and positive valence values). message. The second (optional) task, which we
A further variable needs some explaining. did not undertake, added topic identification in
Our approach to computing the GSV is similar conjunction with its polarity. Results were to be
to Polanyi & Zaenen’s (2006) original method, rated following standard information retrieval
in which equal weight is given to positive and evaluation metrics: precision, recall and F-
negative segments, but it differs in that we measure.
C
place more weight on extreme values. This is A smaller, polarity-tagged corpus was also
op
yri
motivated by the fact that it is relatively provided for those using machine learning
gh uncommon to come across such values (e.g. classifiers. Since our tool does not require such
t
© “extremely wonderful”), so when they do a resource, we decided to not even download
20
12 appear, it is a clear marker of positive this training corpus in order to avoid bias and
.
U sentiment. Other implementations of Contextual make it a truly blind test. With hindsight,
ni
ve
Valence Shifters (Taboada et al. 2011) have put though, we might have benefited from this
rsi more weight only on negative segments when training corpus by studying what was meant by
tat
Ja modified by valence shifters (up to 50% more neutral vs. none polarity.
u
m weight), operating under the so-called “positive The test corpus was provided in XML
e
I. bias” assumption (Kennedy & Inkpen 2006), format, and included, for each tweet, the tweet
S
er
i.e., negative words and expressions appear id, the username, the date, and the language
ve
i
more rarely than positive ones, and therefore specification (largely redundant, since all tweets
de have a stronger cognitive impact, which should were in the Spanish language). From the tweet
C
o be reflected in the final sentiment score. id, the corresponding status could be
m
un In our implementation, equal weight is downloaded by participants using the Twitter
ic
ac placed to positive and negative values. API, thus conforming to the restrictions of its
ió
i
However, we do not simply assign more weight terms of use.
P to both extremes of the scale (-5 and 5), we For the main task, polarity assignment,
ub
lic place more weight on each increasingly toward participants were asked to assign one of six
ac
io both ends of the scale. valid tags: P+, P, NEU, N, N+, and NONE,
ns
The resulting method for obtaining the where P stands for “positive” and N for
Global Sentiment Value for a text is defined as: “negative”. Thus, the test involved not just
polarity classification, but intensification on a
scale, too. Further, a neutral (NEU) tag and a accord with what many scholars have found
NONE tag were also possible. This is, in our when analyzing SNS content (e.g., Siemens,
view, the trickiest part of the test, since no 2011). Sentitext’s Affect Intensity, i.e., the
indications were given as to what exactly was control threshold, is established at 25%, which,
meant by either. As we will see in the next in our experience, is rarely reached except for
section, this issue was to a large extent the extremely short texts with a high emotional
reason for our relatively poor results. load. These data are summarized in Table 1.
Furthermore, results were evaluated in two
N % AVG/tweet
separate tests. The first would offer metrics for
the above-mentioned five plus NONE levels Lexical 857,727 100 14.1
(5L+N), whereas a second one would disregard Polarity 337,238 39,32 5.5
intensity and evaluate for polarity only for a Neutral 520,489 60,68 8.6
total of tree levels (P, NEU, N) plus NONE
(3L+N). Table 1: Polarity of text segments
This twofold evaluation makes sense It is therefore not surprising that our analysis
especially in the context of Twitter messages, of this Twitter test set throws an average Affect
where no context is possibly provided given the Intensity of 19.22, which is extremely high,
extremely reduced number of available words especially if we bear in mind that 38.4% of the
for each text. Thus, it is very difficult to tweets have an Affect Intensity of 0, that is,
distinguish between a negative and a very they are neutral.
negative status update; sometimes it is even As for the tweets classification task itself,
hard to understand the user’s implications, since we show and discuss the results in the
many are prone to using personal ways of following section, where we also offer figures
expressing themselves, connotations, ironic of a more typical evaluation scenario in which
remarks, etc., which can only be understood by texts are categorized as negative, neutral, or
the user’s close followers, who share the same positive, i.e., there is no intensification for
social circle. polarity categories and no distinction between
the NEU and NONE categories (both are
4 Analysis of results considered as neutral).
Although it might seem obvious, it is worth 4.1 Three levels + NONE test
stressing that lexicon-based systems rely
Our results for the 3L+N test are summarized in
heavily on the availability of a certain number
table 2 below, where the shaded rows show the
of words on which to apply the weighing
C hit rate for each of the four categories.
op operations. As described in section 2.2 above,
yri
gh Sentitext basically computes its GSV index by Actual Pred. N %/actual
t
© weighing the number and valences of polarity N * 15,840 100.00
20
12 words and phrases against the number lexical N N 8,848 55.86
.
segments found in the text. Although it does
U N NEU 5,083 32.09
ni include threshold control (the Affect Intensity
ve N P 1,909 12.05
rsi index discussed in 2.2 above) for varying text
tat N NONE 0 0
Ja lengths, such threshold was designed to be
u
m applied to larger texts, considering “short” the NEU * 1,302 100.00
e
I.
average length of a media article or blog entry. NEU N 271 20.81
S
er
However, Twitter, with its 140 character NEU NEU 647 49.69
ve limit, involves a radically different concept of NEU P 384 29.49
i
de “short”. The average number of lexical NEU NONE 0 0
C
o segments per tweet, i.e., individual words and
m P * 22,231 100.00
un identified multiword expressions, that we
P N 1,068 4.80
ic
ac
obtained in our analysis of the test set was 14.1,
ió whereas the average number of polarity- P NEU 7,879 35.44
i
P conveying segments was 5.5. This is a very P P 13,284 59.75
ub
lic high ratio indeed, implying that social P NONE 0 0
ac
io networking sites are commonly used for NONE * 21,411 100.00
ns
expressing sentiments and opinions. This is in NONE N 2,610 12.19

XVIII Congreso de La Asociacin Espaola para El Doc - (PG 122 - 161)

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

XVIII Congreso de La Asociacin Espaola para El Doc - (PG 122 - 161)

Cargado por

Copyright:

Formatos disponibles

XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 1

Table 2: Detail of Configuration 2 of topic detection with Complement Naive Bayes.

Table 3: Detail of Configuration 13 of sentiment analysis with Naive Bayes Multinomial.

process a continuous stream of data, and

Other Eco ... Cin

Figure 2 – Approach for topic classification.

4.1 Tweet content pre-processing

Análisis del Sentimiento de mensajes de Twi.tter con Multinomial Naive

Alexandre Trilla, Francesc Alías

Resumen: Este artículo adapta un esquema de Clasificación de Texto basado en

Abstract: This article adapts a Text Classification scheme based on Multinomial

UNED at TASS 2012: Polarity Classification and Trending Topic

Tamara Mart´ın- Jorge Carrillo de Albornoz

Table 2: 5-classes polarity detection Table 3: 3-classes polarity detection

Resumen: En este artículo se presenta el trabajo realizado para el Taller de Análisis de

C Palabras clave: Minería de opinión, Divergencias del Lenguaje, Divergencia Kullback-Liebler,

Uno de los niveles más interesantes, en el

Resumen: En el presente artículo se describe la participación del grupo de investigación

También podría gustarte