Está en la página 1de 59

Big Data

for Decision Makers

Big Data
hanyalah buzzwords

Relevant data sets are small

Gartners
Hype Cycle

2014 - sekarang:
Fase
disillusionment

Big Data
Be Data Driven

Coret Big Data, ganti Data Driven


What is Data Driven?
Tools, ability, and culture that acts on data
Essentially:
Mindset / Culture
Ability
Tools

Sekilas Big Data


Tetap perlu dibahas dahulu

Definisi umum: Big Data = The Five V

...and then derive Value (both definitions) out of these four Vs

Volume
Rule of thumb:
1 TB
atau
1 billion rows
atau
analyzing it doesnt
fit in one servers
memory

Variety
Text
Numbers
Geolocation
Time-series
Image
Audio
Video
Biometrics

Velocity
Batch
to
Stream
to
Real-time

Veracity
Accessible
Accurate
Coherent
Complete
Consistent
Defined
Relevant
Timely

Value
Both meanings:
Value = nilai, angka, ukuran
Value = nilai, manfaat, keutamaan

Disillusionment i.e. njekethek


majority of what is needed is Analytics
majority of Analytics is based on BI (Business Intelligence)
majority of what is needed for BI is Data Governance
majority of what it takes to deliver is Data Engineering
majority of Data Engineering is use case specifics
majority of Machine Learning use cases can be done by simple statistical model

Enough for the hype

Lets talk about being Data Driven


What is Data Driven?
Tools, ability, and culture that acts on data
Essentially:
Mindset / Culture
Ability
Tools

Where
are we at
now?
More data-driven
organization
covers lower levels
and invests more
heavily on the
upper levels

Hambatan menjadi data driven

Data Quality

accessible

accurate

coherent

complete

consistent

defined

relevant

timely

Data Collection
Factor

Description

Priority

Urgency

Data is really really needed

High

Value

Data will deliver high value

High

Cross-team

Data is needed by multiple teams

High

Ephemeral

Data is ephemeral / streaming

High

Enrichment

Data augments value of existing ones

Medium

Ease of (re)use

Data is easy to process (with existing system)

Medium

Historical availability

Past data is retrievable

Medium

Workaround

Data can be replaced with some workaround

Low

Quality

Data is of low quality

Low

Maintainability

Data is difficult to maintain (e.g. from scraping)

Low

Usage

Data would be rarely used

Low

Analyst Organization Structure

Centralized

: there is a central team to which analysts report

Decentralized

: analysts are scattered and owned by functional teams

Hybrid

: there is a central team but day-to-day in functional teams

Consulting

: analysts allocated in a project-based / consultative structure

Functional

: central team is under some functional team

Center of excellence: analytics done by central team + analysts in functional teams

Accenture survey: higher engagement and satisfaction for employees

Centralized vs Decentralized

Centralized team pros:

Clear / uniform career path

Standardized tools and training

Standardized data & metrics

Redundancy of domain knowledge (if one leaves, others still have knowledge)

Decentralized team pros:

Full time access less interruptions & bureaucracy faster turnaround

Deeper domain knowledge

What is Data Driven?


Tools, ability, and culture that acts on data

Mindset / Culture
Ability
Tools

Culture - its a top-down thing

Data

: right data, high quality data, accessible, queryable

People

: who design the metrics, extract the right data, analyze it

Principles

: continuous experiment & improvement, analysis drives action

Culture - Strategi membentuknya


by Economist Intelligence Unit

Culture - data driven decision making

What is Data Driven?


Tools, ability, and culture that acts on data

Mindset / Culture
Ability
Tools

Ability
Math / Statistics / Machine Learning
Data Visualization
Programming
Data Engineering
Business Skills

Ability - Math, Stats, Machine Learning


Data Scientist = mengerti statistik > programmer, bisa programming > statistikawan

Probability
Bayesian
Distributions
Statistical Tests
Simulation / Monte Carlo
Regressions

Linear
Logistic

Feature selection
Classification
Clustering
Outlier detection

Types of analyses using it


Descriptive:

menjelaskan apa

Exploratory:

mencari tahu ada apa

Inferential:

menyimpulkan dari sampling

Predictive:

memprediksi nilai variabel tertentu

Causal:

mencari sebab akibat antar variabel

Mechanistic:

mengetahui persisnya apa berapa menyebabkan apa berapa

Ability Data
Visualization

Case in point: Pie Chart is worst type of data visualization

Ability - Programming

Ability - Data Engineering

Ability - Business Skills

Qualities to look for in an analyst

Numerate

Systematic, detail-oriented

Skeptical about results, double-checking habit

Confident, have faith in data & analysis

Curious, creative

Good communicator

Patient

Passion in data

Love to learn

Pragmatic, business-savvy

What is Data Driven?


Tools, ability, and culture that acts on data

Mindset / Culture
Ability
Tools

Tools
Use correct data landscape for your use case
Batch? Streaming?
How much data? How fast?
Aggregate specific metrics? Data exploration?
Ad-hoc? Periodical?
Use visualization tools
Consider budget vs time investment

mattturck.com

insightdataengineering.com

Some leading tools


Local / single-machine data analysis ( < 10 GB data in one go) :
Multiple machines data analysis:
Analytics database:
Big data data warehouse:
Publish-subscribe messaging:
Data visualization: d3.js
Scripting notebook:
Transactional DB:

Visualization Tools
Enterprise / Proprietary

Tableau
TIBCO Spotfire
Qlikview
TIBCO Jaspersoft (Enterprise Edition)
Periscopedata.com
RJMetrics.com CloudBI
Looker.com

Free / Open Source


TIBCO Jaspersoft (Community Edition)
Google FusionTable
Plot.ly

What do we need?
The essence of all these

Most of what
youll need is BI
Data Science
exploring unknowns
vs
Analytics / Business Intelligence
specific metrics

Even in Data Science,


Most of what youll need is simple statistical model

Data Scientist = ?
Statistics > coders

Coding > statisticians

Analytics
and/or
Business
Intelligence

To do for corporates
Track data - sebanyaknya sejak awal - Variety not necessarily Volume
Be Data Driven - bikin strategi dan ambil keputusan berdasarkan data
Fokus di kualitas Veracity & Value
But dont worry too much either
kebanyakan perusahaan belum bisa atau
belum melakukan

...mitos Big Data yang salah


http://data-informed.com/20-claims-about-big-data-and-why-they-all-are-wrong/

Penutup

ainunnajib@gmail.com
WA/Telegram +65 8113 1571

También podría gustarte