Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Week 1 Week 3
Understanding Big Data Introduction to HDFS Playing around with Cluster Data loading Techniques
Week 2
Week 4
Map-Reduce Basics, types and formats Use-cases for Map-Reduce Analytics using Pig Understanding Pig Latin
Zookeeper, Sqoop, Flume Debug MapReduce programs in Eclipse. Real world Datasets and Analysis Planning a career in Big Data
Facebook Example
Facebook users spend 10.5 billion minutes (almost 20,000 years) online on the social network Facebook has an average of 3.2 billion likes and comments are posted every day.
Twitter Example
Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough to finish well ahead of Brazil, Japan, the UK and Indonesia. 79% of US Twitter users are more like to recommend brands they follow 67% of US Twitter users are more likely to buy from brands they follow 57% of all companies that use social media for business use Twitter
Hadoop Users
http://wiki.apache.org/hadoop/Po weredBy
2015: 7.9 ZB
The world's information doubles every two years Over the next 10 years: The number of servers worldwide will grow by 10x Amount of information managed by enterprise data centers will grow by 50x Number of files enterprise data center handle will grow by 75x
Source: http://www.emc.com/leadership/programs/digit al-universe.htm, which was based on the 2011 IDC Digital Universe Study
Why DFS?
Read 1 TB Data
1 Machine
4 I/O Channels Each Channel 100 MB/s
10 Machines
4 I/O Channels Each Channel 100 MB/s
Why DFS?
Read 1 TB Data
1 Machine
4 I/O Channels Each Channel 100 MB/s
10 Machines
4 I/O Channels Each Channel 100 MB/s
45 Minutes
Why DFS?
Read 1 TB Data
1 Machine
4 I/O Channels Each Channel 100 MB/s
10 Machines
4 I/O Channels Each Channel 100 MB/s
45 Minutes
4.5 Minutes
What is Hadoop?
Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.
Companies using Hadoop: - Yahoo - Google - Facebook - Amazon - AOL - IBM - And many more at http://wiki.apache.org/hadoop/PoweredBy
Hadoop Eco-System