Está en la página 1de 19

COS 326

Database Systems
Lecture 19
Big Data and
Big Data Analytics (2)
Notes
14 October 2015

Admin matters: next 3 weeks


Week

Date

Day

10

13 Oct

Tues

14 Oct

Wed

16 Oct

Fri

20 Oct

Tues

21 Oct

Wed

23 Oct

Fri

27 Oct

Tue

28 Oct

Wed

11

12

Topic
L18:

Big Data Analytics


Presentation for topic 15
L19: Big Data Analytics
Presentation for topic 16
No prac

L20:

Guest lecture: SAP

L21:

Data analytics: Data mining


Presentation for topic 17

No prac
project day for Computer Science
L22: Class test 3: data analytics
Presentation for topic 18
L23: Data analytics: Data mining
Presentation for topics 19, 20

Outline
Last lecture:
1. Technologies supporting Big data storage & analytics
MapReduce computation framework
NoSQL big database management systems (BDMSes)
NewSQL big database management systems (BDMSes)

2. What types of analytics for big data?

This lecture:
Case study:
Analysis of microblogs data: Twitter
sentiment analysis of microblogs
3

Pearson Education Limited 1995, 2005

RECAP: on Big Data


Sources of Big Data:
Web-generated structured & unstructured data e.g.
e-commerce purchasing histories

social media: Face Book, Twitter, LinkedIn,


YouTube etc.

Some processing activities for big data:


(1) descriptive analytics
(2) predictive analytics
e.g. sentiment analysis for microblogs (e.g. Twitter)
4

Case study: Analytics for Twitter


Twitter : http://www.twitter.com
1. Why do people tweet?
Notable users of Twitter:
Pope Francis: 78.4 million followers
Barak Obama: 640 thousand followers
2. Format of a tweet: max 140 characters, possible inclusion of
emoticons: smiley (:-) sad face (:-( to express sentiment

4. Value of tweets to businesses:


used by market researchers in business organisations
(a) what are customers saying about our products & services?
(b) what are customers saying about our competitors products &
5
services?
Pearson Education Limited 1995, 2005

Twitter statistics

Twitter was launched in 2006


Twitter statistics (source Twitter, April 2010):
106 million registered users
180 million unique visitors every month
300,000 new users signing up every day.

600 million queries received daily via Twitters search engine


3 billion requests per day based on the Twitter API.
37% of active users used mobile phones to send requests.
approx. 200 million tweets per day (big data)

More recently:

the number of regular Twitter users has been estimated at more


than 200 million.
6

Twitter adoption in SA
Adoption in South Africa:

Businesses
governments
non-government
organisations have a Twitter & Facebook presence.

Adoption statistics for 2014


(source: Fuseware and World Wide Worx , 2014)
9.4 million active users of Facebook
5.5 million users of Twitter in South Africa.
93% of RSA major brands use Facebook
and 79% use Twitter.
7

Twitter analytics
Two approaches to analysis:
(1) Online analytics:
(i) Subscribe to a service for social media data analytics
(ii) use service to obtain analysis reports & Twitter data

(2) Offline analytics:


(i) register with Twitter
(ii) use Twitter APIs to obtain data & store it in a DB
e.g. NoSQL DB
(iii) conduct analysis on the data
8

Online analytics: Twitter data (1)


(i) Subscribe to a service for social media data analytics
(ii) use service to obtain analysis reports
Service name URL &
( and purpose) Examples of services provided / report types
URL: http://www.sentiment140.com
Sentiment140
Performs sentiment analysis on the tweets returned for a query
(sentiment
supplied by the user. (for free)
analysis)
Twitonomy
URL: http://www.twitonony.com
(get overall
Analyse a Twitter account. Provides the following for free:
view of
Twitter account) 1. number of: tweets per day, mentions, retweets, favorited
tweets (for a given period)
2. Charts
showing tweet frequencies by day of the week
and time of day
3. platforms most tweeted from
(e.g. Twitter for iPhone, Twitter web client)
9

4.2 Analysis of social network data: Twitter (2)


Online tools for analysis of Twitter data

Sentiment140: http://www.sentiment140/

Performs sentiment analysis on the tweets returned for


a query supplied by the user. (for free) e.g.

available languages

10

4.2 Analysis of Twitter data (3)


Twitonomy
URL: http://www.twitonomy.com
Analyse a Twitter account. Provides the following for free:

1. number of:
tweets per day, mentions,
retweets, favourited tweets

ORSSA 2015 presentation 15


September 2015
11

4.2 Analysis of social network data: Twitter (4)


Twitonomy :Analyse a Twitter account. Provides the followingfor free:
2. Charts showing tweet frequencies by day of the week and time of day

ORSSA 2015 presentation 15


September 2015
12

4.2 Analysis of social network data: Twitter (5)


Twitonomy: Analyse a Twitter account. Provides the following
for free:
3. platforms most tweeted from (e.g. Twitter for iPhone,
Twitter web client)

Can download
tweets in
MS Excel format
for further
ORSSA 2015 presentation 15
analysis
September 2015
13

Offline analysis of Twitter data


Twitter: http://www.twitter.com
(2) Offline analytics:
(i) register with Twitter
(ii) use Twitter APIs to obtain data & store it in a DB
e.g. NoSQL DB
(iii) conduct analysis on the data
e.g. of analysis
a. descriptives
b. sentiment analysis
c. graph mining, e.g. for community discovery

14

Pearson Education Limited 1995, 2005

Twitter: Facilities available for developers


https://dev.twitter.com/overview/documentation

Twitter APIs:
(1) REST APIs

provide programmatic access to read & write Twitter data


responses available in JSON
identifies Twitter applications & users using OAuth

(2) Streaming APIs

continuously deliver new responses to REST API queries over


long-lived http connection
receive updates on latest tweets matching a search query

OAuth:

applications send secure authorised requests Twitter APIs


application must registered before it can access to Twitter APIs 15
Pearson Education Limited 1995, 2005

Sentiment Analysis for microblogs


Sentiment Analysis (defined):

Given a tweet on a topic of interest (e.g. to a market researcher):


determine if the sentiment (opinion) of the tweet is:

positive,

negative, or

neutral.
the effect of one tweet may be small but the effect of many is
significant

Analysis methods:

Use text mining methods to create predictive (classification) model


to classify tweets as (+ve, -ve, neutral) sentiment

Traditionally text mining has been used for document


classification
Pearson Education Limited 1995, 2005

16

Sentiment Analysis for microblogs


Using a predictive model to classify tweets
+ve
sentiment
tweets

tweets
to be
classified

Predictive
( classification)
model

-ve
sentiment
tweets

neutral
sentiment
tweets
17

Pearson Education Limited 1995, 2005

Essay presentation

Topic topic 16

18

References
1. IBM Global Business Services (2012) Analytics: the real-world use of big data
how innovative enterprises extract value from uncertain data, IBM Institute
for Business value.
2. Moniruzzaman, A.B.M. & Hossain, S.A. (2013) NoSQL database: new era of
databases for big data analytics classification, characteristics and
comparison. International Journal of Database Theory and Application, vol. 6,
no. 4, 2013.
3. Wakade, S., Shekar, C., Liszka, K. J. and Chan, C.-C., 2012, Text Mining for
Sentiment Analysis of Twitter Data, International Conference on Information
and Knowledge Engineering, (IKE'12), pp. 109-114.

19

También podría gustarte