Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Trend Mining
Vasileios Lampos
July, 2013
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 1/47
1
/
47
Outline
Motivation, Aims [Facts, Questions]
Data
Nowcasting Events
Extracting Mood Patterns
Inferring Voting Intention
|= Conclusions
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 2/47
2
/
47
Facts
We started to work on those ideas back in 2008, when...
B
r
i
s
t
o
l
Actual
Inferred
Figure 3: Inferred rainfall rates for Bristol, UK (October, 2009)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 10/47
10
/
47
Methodology (1/5) Text in Vector Space
Candidate features (n-grams): C = {c
i
}
Set of Twitter posts for a time interval u: P
(u)
= {p
j
}
Frequency of c
i
in p
j
:
g(c
i
, p
j
) =
_
if c
i
p
j
,
0 otherwise.
g Boolean, maximum value for is 1
Score of c
i
in P
(u)
:
s
_
c
i
, P
(u)
_
=
|P
(u)
|
j=1
g(c
i
, p
j
)
|P
(u)
|
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 11/47
11
/
47
Methodology (2/5)
Set of time intervals: U = {u
k
} 1 hour, 1 day, ...
Time series of candidate features scores:
X
(U)
=
_
xxx
(u
1
)
... xxx
(u
|U|
)
_
T
,
where
xxx
(u
i
)
=
_
s
_
c
1
, P
(u
i
)
_
... s
_
c
|C|
, P
(u
i
)
__
T
Target variable (event):
yyy
(U)
=
_
y
1
... y
|U|
_
T
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 12/47
12
/
47
Methodology (3/5) Feature selection
Solve the following optimisation problem:
min
w
X
(U)
www yyy
(U)
2
s.t. www
1
t,
t = www
OLS
1
, (0, 1].
2
+www
1
(Tibshirani, 1996)
inferred sparsity pattern may deviate from the true model, e.g.,
when predictors are highly correlated (Zhao and Yu, 2006)
N
_
p candidate features, N samples, empirical loss L( www) and
www
1
W
1
(Bartlett, Mendelson and Neeman, 2011)
Harry Potter Eect!
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 16/47
16
/
47
The Harry Potter eect (1/2)
Figure 4: Events co-occurring (correlated) with the inference target may aect
feature selection, especially when the sample size is small.
180 200 220 240 260 280 300 320 340
0
50
100
150
200
250
300
Day Number (2009)
E
v
e
n
t
S
c
o
r
e
Flu (England & Wales)
Hypothetical Event I
Hypothetical Event II
(Lampos, 2012a)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 17/47
17
/
47
The Harry Potter eect (2/2)
Table 1: Top 1-grams correlated with u rates in England/Wales (0612/2009)
1-gram Event Corr. Coef.
latitud Latitude Festival 0.9367
u Flu epidemic 0.9344
swine 0.9212
harri Harry Potter Movie 0.9112
slytherin 0.9094
potter 0.8972
benicassim Benicssim Festival 0.8966
graduat Graduation (?) 0.8965
dumbledor Harry Potter Movie 0.8870
hogwart 0.8852
quarantin Flu epidemic 0.8822
gryndor Harry Potter Movie 0.8813
ravenclaw 0.8738
princ 0.8635
swineu Flu epidemic 0.8633
ginni Harry Potter Movie 0.8620
weaslei 0.8581
hermion 0.8540
draco 0.8533
Solution: ground truth with some degree of variability
(Lampos, 2012a)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 18/47
18
/
47
About n-grams
1-grams
C
.
E
n
g
l
a
n
d
&
W
a
l
e
s
Actual
Inferred
(a) Central England/Wales (u)
0 5 10 15 20 25 30
0
20
40
60
80
100
120
Days
F
l
u
R
a
t
e
S
.
E
n
g
l
a
n
d
Actual
Inferred
(b) South England (u)
0 5 10 15 20 25 30
0
2
4
6
8
10
12
14
16
Days
R
a
i
n
f
a
l
l
r
a
t
e
(
m
m
)
B
r
i
s
t
o
l
Actual
Inferred
(c) Bristol (rain)
Figure 7: Examples of u and rainfall rates inferences from Twitter content
(Lampos and Cristianini, 2012)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 22/47
22
/
47
Performance gures
Table 2: RMSE for u rates inference (5-fold cross validation), 50m tweets,
21/06/200919/04/2010
Method 1-grams 2-grams Hybrid
Baseline
Classication and Regression Tree (Breiman et al., 1984) & (Sutton, 2005)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 23/47
23
/
47
Flu Detector
URL: http://geopatterns.enm.bris.ac.uk/epidemics
Figure 8: Flu Detector uses the content of Twitter to nowcast u rates in several
UK regions
(Lampos, De Bie and Cristianini, 2010)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 24/47
24
/
47
Extracting Mood Patterns from
Human-Generated Content
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 25/47
25
/
47
Computing a mood score
Table 4: Mood terms from WordNet Aect
Fear Sadness Joy Anger
afraid depressed admire angry
fearful discouraged cheerful despise
frighten disheartened enjoy enviously
horrible dysphoria enthousiastic harassed
panic gloomy exciting irritate
... ... ... ...
(92 terms) (115 terms) (224 terms) (146 terms)
Mood score computation for a time interval d using n mood terms
ms
d
=
1
n
n
i =1
c
(t
d
)
i
N(t
d
)
c
(t
d
)
i
: count of term i in the Twitter corpus of day d
N(t
d
): number of tweets for day d
Using the sample of d days, compute a standardised mood score:
ms
std
d
=
ms
d
ms
ms
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 26/47
26
/
47
The mood of the nation (1/5)
Figure 9: Daily time series (actual & their 14-point moving average) for the mood
of Joy based on Twitter content geo-located in the UK
27 august2012
We turned our attention to the issue of
public mood, or sentiment. Our goal was to
analyse the sentiment expressed in the collec-
tive discourse that constantly streams through
Twitter. Or as we called it the mood of
the nation.
We used tweets sampled from the 54 larg-
est cities in the UK over a period of 30 months.
Tere were more than 9 million dierent users,
and 484 million tweets. It is important to notice
that studies of this kind rely on very ecient
methods of data management and text mining,
which we have been rening for years, during
our studies of news content
5
, as well as social
media content. Our infrastructure is based on
a central database, and multiple independent
modules that can annotate the data
6
.
Notice also that the period we analysed
goes from July 2009 to January 2012, a period
marked by economic downturn and some so-
cial tensions. Tis will become relevant when
analysing our ndings.
Tere are standard methods in text analy-
sis to detect sentiment: they are used mostly
in marketing research, when analysts want to
know the opinion of users of a certain camera,
or viewers of a certain TV show. Each of the
basic emotions (fear, joy, anger, sadness) is
associated with a list of words, generated by
a combination of manual and automatic meth-
ods, and successively benchmarked on a test
set. Tis is called citation-sentiment analysis.
We did not want to develop a new method
for sentiment analysis, so we directly applied
a standard one to the textual stream generated
by UK Twitter users. We sampled the tweet-
stream every 3 to 5 minutes, specifying location
to within 10 km of an urban centre. Our word-
list contained 146 anger words, 92 fear words,
224 joy words and 115 sadness words. Tey
can be found at the WordNet-Aect website
(http://wordnet.princeton.edu)
7
.
In the u project we had a ground truth,
of independently-measured u cases. Tis
time around we did not, as no one seems to be
constantly measuring sentiment in the general
population. Tis means that the methods and
the conclusions will be of a dierent nature.
Whereas in the u project the list of keywords
(whose frequency is used to compute the u
score) is discovered by our algorithm, with
the goal of maximising correlation with the
ground truth, in the mood project we had to
feed the key words in ourselves we got them
from citation-sentiment analysis as mentioned
above and we have no ground truth to com-
pare the result with.
By applying these tools to a time series of
about 3 years of Twitter content we found that
each of the four key emotions changes over
time, in a manner that is partly predictable (or
at least interpretable). We were reassured to
nd there was a periodic peak of joy around
Christmas (Figure 2) surely due to greetings
messages and a periodic peak of fear around
Halloween, again probably due to increased
usage of certain keywords such as scary. Tese
were sanity checks, which showed us that
word-counting methods can provide a reason-
able approach to sentiment or mood analysis.
How far Christmas greetings accurately repre-
sent real joy, as opposed to duty and wishful
thinking, is of course another question. We do
not expect that a high frequency of the word
happy necessarily signies happier mood in
the population. Our measures of mood are not
perfect, but these eects could be ltered away
by a more sophisticated tool designed to ignore
conventional expressions such as Happy New
Year. It is, however, a remarkable observation
that certain days have reliably similar values
in dierent years. Tis suggests that we have
reduced statistical errors to a very low level.
But what came out most strongly is the
strong transition, towards a more negative
mood, that started in the week of October 20th,
2010. Tis was the week that the Prime Minis-
ter Gordon Brown announced massive cuts in
public spending. It was a clear change point that
we could validate by a statistical test. It was, if
you like, the moment that people realised that
austerity was not just for others; it would be
aecting their own lives too. Te eects of that
major shift in collective mood are still felt today.
We also found a sustained growth in an-
ger (Figure 4) in the weeks leading up to the
summer riots of August 2011, when parts of
London and several other cities across England
suered widespread violence, looting and arson.
It is interesting that the growth in anger
seems to have started before the riots them-
selves, but this does not mean that we could
Figure 1. A word cloud automatically generated from Twitter trafc. The larger the word, the greater the
correlation with u epidemics. Upside-down words have negative correlations
Figure 2. Plot of the time series representing levels of joy estimator over a period of 2 years. Notice the peaks
corresponding to Christmas and New Year, Valentines day and the Royal Wedding
Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12
2
0
2
4
6
8
10
933 Day Time Series for Joy in Twitter Content
Date
N
o
r
m
a
l
i
s
e
d
E
m
o
t
i
o
n
a
l
V
a
l
e
n
c
e
* RIOTS
* CUTS
* XMAS
* XMAS
* XMAS
* roy.wed.
* halloween
* halloween
* halloween
* valentine
* valentine
* easter
* easter
raw joy signal
14day smoothed joy
(Lansdall-Welfare, Lampos and Cristianini, 2012a&b)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 27/47
27
/
47
The mood of the nation (2/5)
Figure 10: Daily time series (actual & their 14-point moving average) for the
mood of Anger based on Twitter content geo-located in the UK
28 august2012
have predicted them. Discovering an interesting
correlation after the fact can be of great help to
social scientists and other scholars, when inter-
preting those events, but is very dierent from
predicting the events. Tere have been other
increases in anger before, without this lead-
ing to any riots. As there is no ocial record
of public mood, we need to be contented with
nding correlations between trends in the time
series of each emotion and events in the exter-
nal world. We can nd peaks of emotion for the
death of Amy Winehouse, and of Osama Bin
Laden; during the run-up to the Royal Wed-
ding in April 2011 people felt calmer.
After the collection and the analysis part,
we considered how to best visualise our results.
With big data this is always a consideration.
Te data sets are so large, and the possible
interactions they represent can be so complex,
that graphic displays are becoming the norm.
We are dealing with emotions; and we found
an open source tool that represents emotions
by a cartoon of a face whose expression depends
on degrees of anger, joy, surprise, fear, sadness
and disgust. It is called the grimace project
(http://grimace-project.net), and
we used it in conjunction with timelines. Te
end result can be used by the public as well as
by researchers. Figure 3 is taken from our mood
browser tool, which is live and interactive at
http://mediapatterns.enm.bris.
ac.uk/mood/. If you visit the site and drag
the cursor along the time-line to October 2010,
you will easily identify the week of the spend-
ing cuts: you will see the face suddenly wince.
Tere are some important considerations
to make and lessons to learn, from the point of
view of data analysis. Te rst is that the social
sciences can now enter a data-driven phase,
but this will require vast amounts of non-
traditional data. Te exploitation of big data
will require the use of multiple tools, from dif-
ferent elds. Data management, data mining,
text mining and data visualisation all seem to
be as necessary as the statistical analysis part.
Te second consideration is a caveat:
since we did not choose the parameters of the
mood system so as to correlate our score to
the same score for the general UK population,
we cannot claim that our mood scores were
calibrated to compensate for the various and
obvious biases we have in the data collection
(unlike in the u study). So all that we can
claim at best is that we have measured
the mood of city-dwelling Twitter users. Tey
tend to be young; they tend to be savvy and
techo-literate; they are denitely a biased
sample of the UK population, although a large
one, since we included posts by more than 9
million individual users.
Finally, there is the obvious caveat that
goes with every statistical study: correlations
as we all know are not causations. Even
if there was an increase in anger and fear after
the spending cuts were announced, how do
we know that this was due to the announce-
ment? Many other factors could have caused it.
Tis is where data analysis must stop, and the
interpretation of social scientists must begin.
But at least we have collected and digested 484
million tweets for them, so that they can focus
on the relevant questions. Big data can change
the way social science is performed, but will
not replace statistical common sense.
References
1. Weingrill, T., Gray, D.A., Barrett, L. and
Henzic, S.P. (2004) Fecal cortisol levels in free-
ranging female chacma baboons: relationship to
dominance, reproductive state and environmental
factors. Hormones and Behavior, 45(4), 259269.
2. Giannone, D., Reichlin, L. and Small, D.
(2008) Nowcasting: Te real-time informational
content of macroeconomic data. Journal of Mon-
etary Economics, 55(4), 665676.
3. Lampos, V. and Cristianini, N. (2011)
Nowcasting events from the Social Web with
statistical learning. ACM Transactions on Intelligent
Systems and Technology, 3(4).
4. Ginsberg, J., Mohebbi, M.H., Patel, R.S.,
Brammer, L., Smolinski, M.S. and Brilliant, L.
(2009) Detecting inuenza epidemics using search
engine query data. Nature, 457(7232), 10121014.
5. http://mediapatterns.enm.
bris.ac.uk
6. http://www.tijldebie.net/
V\VWHPOHV6,*02'BBGHPRB,OLDVSGI
7. Strapparava, C. and Valitutti, A. (2004)
WordNet-Aect: an aective extension of Word-
Net. In Proceedings of the 4th International Confer-
ence on Language Resources and Evaluation (LREC
2004), Lisbon, May, pp. 10831086.
Thomas Lansdall-Welfare, Vasileios Lampos and Nello
Cristianini are at the Intelligent Systems Laboratory
at the University of Bristol.
Figure 3. Visualisation of overall mood levels for the UK over 2 years using timeline plots and the Grimace
tool for facial expressions. The facial expression refers to October 27th, 2010. Visit mediapatterns.enm.
bris.ac.uk/mood
Figure 4. Plot of the time series for anger estimator over 2 and a half years. Notice visible change points
corresponding to spending cuts and riots
Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12
4
3
2
1
0
1
2
3
4
5
933 Day Time Series for Anger in Twitter Content
Date
N
o
r
m
a
l
i
s
e
d
E
m
o
t
i
o
n
a
l
V
a
l
e
n
c
e
* RIOTS
* CUTS
* XMAS
* XMAS
* XMAS
* roy.wed.
* halloween
* halloween
* halloween
* valentine
* valentine
* easter
* easter
raw anger signal
14day smoothed anger
(Lansdall-Welfare, Lampos and Cristianini, 2012a&b)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 28/47
28
/
47
The mood of the nation (3/5)
Window of 100 days: 50 before & after the point of interest
ms
std
i
=
_
ms
std
i +1i +50
_
_
ms
std
i 50i 1
_
Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12
1
0.5
0
0.5
1
1.5
Rate of Mood Change by Day using the Difference in 50day Mean
Date
D
i
f
f
e
r
e
n
c
e
i
n
m
e
a
n
Anger
Fear
Date of Budget Cuts
Date of Riots
Figure 11: Change point detection using a 100-day moving window
(Lansdall-Welfare, Lampos and Cristianini, 2012a)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 29/47
29
/
47
The mood of the nation (4/5)
Figure 12: Projections of 4-dimensional mood score signals (joy, sadness, anger and
fear) on their top-2 principal components (PCA) Twitter content from 2011
1.5 1 0.5 0 0.5 1
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
Saturday
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
1st Principal Component
2
n
d
P
r
i
n
c
i
p
a
l
C
o
m
p
o
n
e
n
t
Days of the Week
(a) Days of the week (2011)
8 6 4 2 0 2 4 6 8
2
0
2
4
6
8
10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33 34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53 54
55 56
57
58
59
60
61 62
63 64 65
66
6768 69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96 97
98
99 100
101
102 103
104
105
106
107
108
109
110
111
112 113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136 137
138
139
140
141
142
143
144 145
146
147
148
149
150 151 152
153
154
155
156
157
158
159
160
161
162163 164
165
166 167
168
169
170
171
172
173
174
175
176
177
178 179
180 181
182
183
184
185
186
187
188
189
190
191
192
193 194
195
196
197
198
199
200
201
202
203
204 205
206
207
208
209 210
211
212
213
214
215
216
217
218219
220
221 222
223
224
225 226
227
228
229 230
231
232
233
234
235
236
237
238
239
240
241
242 243 244
245
246 247
248
249
250 251
252
253
254
255
256
257
258 259
260
261
262
263
264
265
266
267
268
269
270
271 272
273
274 275
276 277
278279
280
281
282
283
284
285
286
287
288
289
290
291 292
293
294
295
296
297
298 299 300 301
302 303
304
305 306 307
308
309
310
311 312 313
314
315
316
317
318 319
320
321
322
323
324
325 326 327
328
329
330
331
332
333
334
335
336
337 338
339
340
341 342
343
344
345 346
347
348 349
350
351 352
353
354
355
356
357
358
359
360
361
362 363
364
365
1st Principal Component
2
n
d
P
r
i
n
c
i
p
a
l
C
o
m
p
o
n
e
n
t
Days in 2011
(b) Days of the year (2011)
Cluster I
New Year (1), Valentines (45), Christmas Eve (358), New Years Eve (365)
Cluster II
O.B. Ladens death (122), Winehouses death + Breivik (204), UK riots (221)
(Lampos, 2012a)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 30/47
30
/
47
The mood of the nation (5/5)
URL: http://geopatterns.enm.bris.ac.uk/mood
Figure 13: Mood of the Nation uses the content of Twitter to nowcast mood
rates in several UK regions
(Lampos, 2012a)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 31/47
31
/
47
Circadian mood patterns (1/3)
Compute 24-h mood score patterns
Mood score computation for a time interval u = 24hours using n
mood terms (WordNet) and a sample of D days:
M
s
(u) =
1
|D|
|D|
j=1
_
1
n
n
i =1
sf
(t
j,u
)
i
_
sf
(t
d,u
)
i
=
f
(t
d,u
)
i
f
i
f
i
, i {1, ..., n}.
f
(t
d,u
)
i
: normalised frequency of a mood term i during time interval u in day dD
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 32/47
32
/
47
Circadian mood patterns (2/3)
F
e
a
r
S
c
o
r
e
3 6 9 12 15 18 21 24
-0.1
0
0.1
Winter Summer
3 6 9 12 15 18 21 24
-0.1
0
0.1
Aggregated Data
S
a
d
n
e
s
s
S
c
o
r
e
3 6 9 12 15 18 21 24
-0.1
0
0.1
3 6 9 12 15 18 21 24
-0.1
0
0.1
J
o
y
S
c
o
r
e
3 6 9 12 15 18 21 24
-0.1
0
0.1
3 6 9 12 15 18 21 24
-0.1
0
0.1
Hourly Intervals
A
n
g
e
r
S
c
o
r
e
3 6 9 12 15 18 21 24
-0.05
0
0.05
Hourly Intervals
3 6 9 12 15 18 21 24
-0.05
0
0.05
Figure 14: Circadian (24-hour) mood patterns based on UK Twitter content
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 33/47
33
/
47
Circadian mood patterns (3/3)
Figure 15: Autocorrelation of circadian mood patterns based on hourly lags
revealing daily and weekly periodicities
1 12 24 36 48 60 72 84 96 108 120 132 144 156 168
0
0.2
0.4
Autocorr. Lags (Hours)
A
u
t
o
c
o
r
r
.
(
F
e
a
r
)
Autocorr.
Conf. Bound
(a) Fear
1 12 24 36 48 60 72 84 96 108 120 132 144 156 168
0
0.1
0.2
0.3
0.4
Autocorr. Lags (Hours)
A
u
t
o
c
o
r
r
.
(
S
a
d
n
e
s
s
)
Autocorr.
Conf. Bound
(b) Sadness
1 12 24 36 48 60 72 84 96 108 120 132 144 156 168
0.2
0
0.2
0.4
Autocorr. Lags (Hours)
A
u
t
o
c
o
r
r
.
(
J
o
y
)
Autocorr.
Conf. Bound
(c) Joy
1 12 24 36 48 60 72 84 96 108 120 132 144 156 168
0
0.1
0.2
0.3
Autocorr. Lags (Hours)
A
u
t
o
c
o
r
r
.
(
A
n
g
e
r
)
Autocorr.
Conf. Bound
(d) Anger
Further analysis available in (Lampos, Lansdall-Welfare, Araya and Cristianini, 2013)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 34/47
34
/
47
Emotion in Books
Input: Google Ngram corpus of 5m digitised books (Michel et al., 2010)
Tool: WordNet Aect (Strapparava and Valitutti, 2004)
1900 1920 1940 1960 1980 2000
1.0
0.5
0.0
0.5
1.0
Year
J
o
y
S
a
d
n
e
s
s
(
z
s
c
o
r
e
s
)
(a) Joy minus Sadness
1900 1920 1940 1960 1980 2000
4
2
0
2
4
Year
E
m
o
t
i
o
n
R
a
n
d
o
m
(
z
s
c
o
r
e
s
)
All
Fear
Disgust
(b) Use of
emotion-related terms
through time
1900 1920 1940 1960 1980 2000
4
2
0
2
4
Year
A
m
e
r
i
c
a
n
B
r
i
t
i
s
h
(
z
s
c
o
r
e
s
)
1900 1920 1940 1960 1980 2000
4
2
0
2
4
Year
A
m
e
r
i
c
a
n
B
r
i
t
i
s
h
(
z
s
c
o
r
e
s
)
(b)
1900 1920 1940 1960 1980 2000
4
2
0
2
4
Year
A
m
e
r
i
c
a
n
B
r
i
t
i
s
h
(
z
s
c
o
r
e
s
)
(c)
1900 1920 1940 1960 1980 2000
4
2
0
2
4
Year
A
m
e
r
i
c
a
n
B
r
i
t
i
s
h
(
z
s
c
o
r
e
s
)
(d)
(c) American versus
British English
Figure 16: Emotion trends in 20th century books
(Acerbi, Lampos, Garnett and Bentley, 2013)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 35/47
35
/
47
Inferring Voting Intention from
Social Media Content
... and a new way for modelling text regression
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 36/47
36
/
47
Motivations and Aims
, uuu
} = argmin
www,uuu,
n
i =1
_
uuu
T
Q
i
www + y
i
_
2
+(www,
1
) +(uuu,
2
)
Q
i
: X for time instance i , yyy R
n
: response variable (voting intention)
www R
m
, uuu R
p
: word and user weights, R: bias
(): a regularisation function
Elastic Net (Zhou and Hastie, 2005) for ()
Bilinear Elastic Net (BEN) (Lampos, Preoiuc-Pietro and Cohn, 2013)
V. Lampos bill@lampos.net Exploiting Human-Generated Text for Trend Mining 39/47
39
/
47
The Bilinear Model Multi-Task Learning (2/2)
Apply
1
/
2
regulariser (Argyriou, Evgeniou and Pontil, 2008)
Extends the notion of Group LASSO (Yuan and Lin, 2006) for a
-dimensional yyy
Bilinear Group
1
/
2
(BGL)
{W
, U
} = argmin
W,U,
t=1
n
i =1
_
uuu
T
t
Q
i
www
t
+
t
y
ti
_
2
+
1
m
j=1
W
j
2
+
2
p
k=1
U
k
2
,
W = [www
1
... www