Está en la página 1de 7

 

 
 
Is  word-­‐of-­‐mouth  correlated  to  General  Election  results?  
The  results  are  in.  
 
Our  experiment  to  identify  if  there  is  a  correlation  between  the  volume  of  Twitter  
mentions  for  election  candidates  and  the  election  results  is  complete  –  producing  
the  following  results:    
 
1. Individual  seat  predictions  were  69%  accurate  
(av.  sample  size    =  677  tweets  per  constituency)  
2. Regional  party  performance  predictions  were  87.5%  accurate  
(av.  sample  size  =  37,000  tweets  per  region)  
3. National  share  of  votes  predictions  were  90.5%  accurate  (an  average  error  
of  1.75  points  for  every  party  –  lower  than  most  opinion  polls)  (sample  size  
=  2,010,000  tweets)    
 
From  these  results  we  can  draw  three  key  insights:  
 
1. There  is  a  correlation  between  the  number  of  Twitter  mentions  and  a  
candidate  winning  the  seat.  
2. The  model  is  better  at  predicting  national  &  regional  trends  than  predicting  
the  outcome  of  individual  local  events.  
3. There  is  a  strong  correlation  between  sample  size  of  tweets  analysed  and  
accuracy.  Despite  instances  in  which  the  model  is  susceptible  to  
disproportionate  media  activity,  the  model  is  accurate  in  gaining  insight  
into  trends  at  a  national  and  regional  level.  
 
This  leads  to  the  following  conclusions:  
 
1. The  experiment  succeeded  in  predicting  the  national  vote  with  comparable  
accuracy  to  opinion  polls.  
2. The  data  accurately  indicated  party  performances  at  a  regional  level.  
3. The  larger  the  sample  size  of  tweets,  the  higher  the  accuracy  of  the  
predictions.  National  and  regional  trends  most  definitely  impact  local  
outcomes,  yet  the  distribution  of  this  impact  given  the  smaller  sample  size  
of  tweets  at  a  local  level  cannot  be  fully  assessed  by  measuring  buzz  alone.    
 
The  results  present  an  interesting  correlation  between  Twitter  mentions  and  
electoral  success  –  suggesting  social  media  ‘buzz’  on  platforms  like  Twitter  are  a  
good  indicator  of  election  performance  and  gauging  the  public  mood.  
 
Recap  of  the  predictive  modelling  experiment.  
 
From  Tuesday  March  30th  up  until  the  election  we  counted  the  mentions  for  
candidates  on  Twitter  and  modelled  predictions  for  the  constituency,  regional  and  
national  votes  based  on  this  data.  The  aim  of  the  study  was  to  assess  if  the  
frequency  of  Twitter  mentions  for  candidates  could  help  to  predict  which  ones  
would  be  successful.    
 
The  data  set  was  fed  from  all  433  constituencies  represented  on  Twitter,  i.e.  
candidates  mentioned  on  Twitter  could  be  attributed  to  433  out  of  650  UK  
constituencies.  2,010,000  tweets  were  processed  over  the  4  week  study  period.  
 
The  full  methodology  we  used  can  be  found  at:  
http://www.scribd.com/doc/29154537/Tweetminster-­‐Predicts  
 
This  experiment  was  not  a  polling  exercise,  nor  a  statistical  analysis  project.    As  a  
result  we  do  not  present  a  statistical  margin  of  error  calculation,  standard  deviation  
calculations  or  claim  statistical  relevance  in  the  results.  However,  the  results  are  too  
accurate  to  be  accounted  for  by  chance  or  coincidence,  they  strongly  suggest  that  
the  level  of  accuracy  of  the  predictions  are  grounds  for  confirming  the  predictive  
power  of  Twitter  is  reliable.  
 
 
Methodology  notes:  
 
• All  data  was  gathered  by  querying  the  Twitter  API.      
• The  mathematical  methodology  is  simple  addition  (e.g.  1  +  1  =  2)  of  
candidate  mentions.      
• The  candidate  with  the  most  mentions  was  predicted  the  winner  in  each  
seat.  
• The  shares  of  national  vote  percentages  were  calculated  from  adding  the  
percentages  of  mentions  for  each  party  in  the  total  sample.  
• The  regions  that  formed  the  regional  breakdowns  are  the  standard  use  
definitions  of  UK  mainland  regions.      
• The  national  vote  was  calculated  by  looking  at  the  percentage  breakdown  of  
party  mentions  (by  candidate)  in  each  of  the  analysed  seats  and  calculating  
the  percentages  of  party  mentions  within  the  433  seats  on  Twitter.  
 
 
National  party  shares  of  vote  predictions  
 
To  predict  the  top-­‐line  national  figures  no  weighting  was  applied  -­‐  we  recorded  
2,010,000  mentions  and  counted  the  mentions  for  candidates  in  each  of  the  433  
constituencies  analysed,  and  repeated  this  count  each  week  to  include  candidates  
joining  Twitter  during  the  study  period.  
 
The  percentage  of  mentions  for  each  party  in  the  433  constituencies  gave  the  
following  projected  share  of  the  vote:  (actual  may  6th  figures  and  error  in  our  
prediction  in  red)  
 
Conservatives   Labour   Liberal  Democrats   Others  
35%   30%     27%   8%  
37%  (-­‐2)   30%  (0)   24%  (+3)   10%  (-­‐2)  
 
Which  gives  the  predictive  model  an  average  accuracy  of  90.5%    -­‐  or  an  average  
error  of  1.75  
 
Compared  with  polling  predictions,  our  experiment  was  less  accurate  than  ICM  
(1.25),  on  a  par  with  Ipsos  MORI,  Populus  &  Harris  (1.75),  and  more  accurate  than  
YouGov,  ComRes,  Opinium  (2.25)  Angus  Reid  &  TNS  BMRB  (3.25)  (Source:  
http://ukpollingreport.co.uk/blog/archives/2692)  
 
During  the  four  weeks  of  our  study,  the  top-­‐line  figures  varied  as  followed:  
 
Conservatives   Labour   Liberal  Democrats   Others  
34%  36%  35  33%   35%  33%  32%  30%   22%  23  28%  26%   9%  10  7%  9%  8%    
35%  (nc)   30%  (nc)   27%  (+1)   (-­‐1)  
 
Verdict:  The  key  insight  here  is  that  the  media  buzz  around  Nick  Clegg  after  the  TV  
leader’s  debate  did  not  translate  into  actual  votes  cast,  suggesting  media  attention  
stimulated  more  mentions  of  Lib  Dem  candidates.  
 
 
 
 
Regional  party  performance  predictions  
 
The  seat  wins  predicted  in  UK  Regions  allowed  us  to  make  predictions  for  party  
performances  as  follows:  
 
• SNP  not  gaining  new  seats  (validated  by  election  result  -­‐  the  SNP  made  no  
gains)  
• Labour  and  Liberal  Democrats  performing  well  in  Scotland  (validated  by  
election  result:  both  emerged  with  their  total  number  of  seats  intact  bucking  
the  national  trend)  
• No  significant  change  in  Plaid  Cymru  support  (validated  by  election  result:  
they  gained  one  seat)  
• Liberal  Democrats  to  hold  ground  against  the  Conservatives  in  the  South  
West  (validated  by  election  result:    there  was  only  a  1%  LDEM  to  CON  swing  
in  the  South  West)  
• Labour  to  perform  better  in  London  than  polls  forecasting  (validated  by  
election  result:  the  LAB  to  CON  swing  was  2.5%  compared  to  the  6.1%  in  the  
rest  of  England  and  5.03%  across  the  country  as  a  whole)    
• Conservatives  to  perform  well  in  the  East  Midlands  (validated  by  election  
result:  In  the  East  Midlands  the  Conservatives  gained  12  seats  and  the  LAB  to  
CON  swing  was  6.7%)  
• Conservatives  to  perform  well  in  Wales  (validated  by  election  result:  in  
Wales,  the  Conservatives  gained  5  seats  and  the  LAB  to  CON  swing  was  
5.6%).  
• Conservatives  to  gain  a  few  seats  in  Scotland  (not  validated  by  election  
result:  they  didn’t).  
 
Verdict:  87.5%  of  the  predicted  regional  trends  were  accurate,  suggesting  Twitter  
mentions  give  good  insight  into  UK  regions.  
 
For  each  of  these  regional  predictions  we  analysed  an  average  of  37,000  tweets  per  
region.      This  suggests  the  sample  size  for  these  predictions  gave  an  accurate  insight  
into  trends  missed  by  some  opinion  polls.  It  also  challenges  the  perception  that  
Twitter  is  primarily  a  London-­‐centric  platform  with  significantly  less  relevance  to  the  
rest  of  the  UK.  
 
Constituency  level  predictions  
 
For  constituency-­‐by-­‐constituency  predictions  the  most  mentioned  candidate  in  each  
constituency  was  predicted  the  seat  winner.    Some  filtering  of  the  sample  was  
necessary  to  ensure  we  predicted  seats  using  consistent  sets  of  data  and  reduced  
errors  due  to  unequal  representation.    I.e.  
 
1. Number  of  mentions  in  seats  with  one  candidate  from  the  3  major  parties  
(Lib  Dem,  Con,  Lab)  represented  (128  seats)  
2. Number  of  mentions  of  seats  where  at  least  one  candidate  from  any  of  the  
three  major  parties  was  mentioned  (367  seats)  
 
The  results  showed:  
 
1. In  69%  seats  where  each  of  the  main  parties  had  a  candidate  on  Twitter  the  
most  mentioned  candidate  won.  (128  seats,  av.  Sample  size  677  mentions)  
2. In  55%  seats  where  at  least  one  candidate  from  any  of  the  3  major  parties  
was  on  Twitter  the  most  mentioned  candidate  won.  (367  seats,  av.  sample  
size  313).  
 
Verdict:  Seats  with  only  one  candidate  mentioned  on  Twitter  are  harder  to  predict  
than  seats  with  2  or  more  candidates.  This  is  also  due  to  a  smaller  sample  size  of  
tweets.  Comparing  seats  with  different  numbers  of  candidates  mentioned  reduces  
the  average  accuracy  of  the  predictions  because  they  confuse  the  results  with  less  
representative  samples.  It  is  advisable  to  filter  seats  into  groups  with  similar  levels  
of  representation  to  increase  the  accuracy  of  seat-­‐by-­‐seat  predictions.  
 
• The  accurate  prediction  of  Caroline  Lucas  winning  in  Brighton  Pavilion  
demonstrates  that  in  seats  where  most  candidates  are  on  Twitter  and  the  
sample  size  is  significant,  the  model  is  more  accurate.  
• The  incorrect  prediction  of  Esther  Rantzen  winning  in  Luton  South  shows  that  
the  model  is  susceptible  to  notable  media  frenzy  that  generates  considerable  
buzz  online.    
 
 
Conclusions  
 
The  accuracy  of  the  national  shares  of  vote  and  regional  party  performance  trends  
suggest  that  there  is  a  strong  correlation  between  online  buzz  (candidate  mentions)  
and  party  performance.  This  conclusion  is  backed-­‐up  by  the  fact  the  Twitter  model  
predictions  closely  resembled  opinion  poll  forecasts  and  the  actual  votes  cast  on  
May  6th.    
 
 It  would  be  extremely  unlikely  that  these  numbers  are  coincidentally  accurate  
compared  to  both  forecasts  and  actual  events  –  and  appear  to  demonstrate  the  
‘wisdom  of  crowds’.  
 
The  results  also  strongly  suggest  the  demographic  make-­‐up  and  political  preferences  
of  Twitter  users  is  not  necessarily  a  significant  factor  when  predicting  national  and  
regional  trends  using  large  samples  of  data  mined  from  Twitter  posts.      
 
The  accuracy  of  the  predictions  in  the  Twitter  experiment  were  similar  to  (and  in  
some  cases  better  than)  demographically  weighted  opinion  polls.    This  supports  
the  case  that  measurements  made  through  data  mining  in  social  media  channels  
can  be  as  reliable  as  traditional  opinion  polling  techniques  when  the  sample  size  is  
sufficiently  large.  
 
This  study  makes  a  robust  argument  that  data  such  as  the  volume  of  posts,  reach  of  
messages  through  Retweets  and  influence  of  individual  Twitter  users  within  the  
sample  are  insightful  and  indicators  of  public  opinion  and  behaviour.    
 
The  results  also  clearly  demonstrate  that  the  predictions  are  susceptible  to  
disproportionate  media  activity  when  considering  predictions  made  from  small  
samples,  e.g.  an  extreme  case  of  skewing  the  prediction  was  Esther  Rantzen’s  
candidacy  in  Luton  South.      This  type  of  skewing  is  based  on  such  small  numbers  it  
doesn’t  affect  the  conclusions  drawn  from  large  samples  (i.e  the  national  and  
regional  predictions)  
 
The  accuracy  of  the  predictions  would  be  improved  by  human  insight  to  filter  out  
such  anomalies  because  they  are  easy  to  spot.      
 
While  national  and  regional  trends  obviously  impact  local  realities  to  some  extent,  
our  assessment  of  constituency  level  predictions  shows  that  representation  of  all  
candidates  in  a  geographically  focused  sample  would  give  significantly  more  
accurate  constituency-­‐level  forecasts.  
 
The  findings  of  the  study  are  therefore  similar  to  those  of  a  HP  experiment  that  
predicted  box  office  success  with  97.3%  accuracy  
(http://www.fastcompany.com/1604125/twitter-­‐predicts-­‐box-­‐office-­‐sales-­‐better-­‐
than-­‐anything-­‐else)  yet  the  same  methodology  would  (based  on  our  experimental  
results)  find  it  more  complex  to  predict  a  film’s  income  by  city  –  but  have  greater  
success  at  predicting  success  on  a  regional  basis  with  sample  sizes  in  the  10,000s.  
 
This  kind  of  large  sample  predictive  model  can  accurately  predict  the  success  of  
multiple  parties  at  a  national  and  regional  level.    At  a  local  level  with  small  samples  
the  predictions  were  55  –  69%  accurate,  however  the  accuracy  of  predicting  the  
overall  results  in  the  UK  election  was  very  high  (90.5%).    We  can  therefore  
conclude  that  the  larger  the  sample  size,  the  more  accurate  the  predictive  power  
of  Twitter  analysis  –  and  it  is  unaffected  by  demographics  when  measured  using  
the  frequency  of  mentions  methodology  in  our  predictive  modelling  experiment.  
 
About  Tweetminster  
Established  in  December  2008,  Tweetminster  is  a  media  utility  that  aims  to  make  UK  
politics  more  open  and  social.  
 
You  can  use  Tweetminster  to:  
 
• Find  and  follow  MPs  and  PPCs  on  Twitter:  http://tweetminster.co.uk/    
• Access  curated  lists  of  relevant  news,  commentary  and  politicians  
http://twitter.com/tweetminster    
• Measure  the  pulse  of  UK  politics  in  real  time:  dynamically  analyse  and  make  
sense  of  information  and  data  around  political  conversations  and  news  
stories:  http://search.tweetminster.co.uk/pages/about  
 
Find  out  more:  www.tweetminster.co.uk        
Follow  us  on  Twitter:  www.twitter.com/tweetminster  

También podría gustarte