Está en la página 1de 33

Various big data tools

 Tableau
 Keen IO
 Heap
 Google Analytics
 Crazyegg
 hadoop
Reign of big data

 The term big data was first used to refer to increasing data volumes in the
mid-1990s.
 In 2001, Doug Laney, then an analyst at consultancy Meta Group Inc.,
expanded the notion of big data to also include increases in the variety of
data being generated by organizations and the velocity at which that data
was being created and updated.
 Those three factors -- volume, velocity and variety -- became known as
the 3Vs of big data, a concept Gartner popularized after acquiring Meta Group
and hiring Laney in 2005.
 Separately, the Hadoop distributed processing framework was launched as
an Apache open source project in 2006, planting the seeds for a clustered
platform built on top of commodity hardware and geared to run big data
applications.
Contd.

 By 2011, big data analytics began to take a firm hold in organizations and the
public eye, along with Hadoop and various related big data technologies that
had sprung up around it.
 Initially, as the Hadoop ecosystem took shape and started to mature, big data
applications were primarily the province of large internet and e-
commerce companies, such as Yahoo, Google and Facebook, as well as
analytics and marketing services providers.
 In ensuing years, though, big data analytics has increasingly been embraced
by retailers, financial services firms, insurers, healthcare organizations,
manufacturers, energy companies and other mainstream enterprises.
Types of analytics

 Descriptive analytics :” what has happened”(data aggregation , summary ,


data mining).
 Predictive analytics :” what might happen”(regression).
 Perspective analytics :” what should we do”(optimization ,
recommendation).
Need for Data Analytics

 Data Analytics refers to qualitative and quantitative techniques and processes


used to enhance productivity and business gain.
 Data is extracted, acknowledged and bifurcated to identify and analyse
behavioural data, techniques and patterns can be dynamic according to a
particular business’s need or requirement.
 Data Analytics is needed in Business to Consumer applications (B2C).
 Organisations collect data that they have gathered from customers,
businesses, economy and practical experience.
 Data is then processed after gathering and is categorised as per the
requirement and analysis is done to study purchase patterns and etc.
 Data Science involves extraction of trends, patterns and useful information
from a set of existing data which will be of no use if not analysed.
Contd.

 It is a kind of business intelligence that is now used for gaining profits and
making better use of resources.
 This can also help in improving managerial operations and leverage
organisations to next level.
You must be wondering that why is this hype about big data?
The reasons why every company is inclined towards adopting big
data are –

Reasons Big Data benefits


Timely Gain instant insights from diverse
data sources
Better analytics Improvement of business
performance through real-time
analytics
Vast amount of data Big data technologies manage huge
amounts of data
Insights Can provide better insights with the
help of unstructured and semi-
structured data
Decision-making Helps mitigate risk and make smart
decision by proper risk analysis
Tools

 YARN: YARN is responsible for allocating system resources to the various applications running
in a Hadoop cluster and scheduling tasks to be executed on different cluster nodes
 MapReduce: a software framework that allows developers to write programs that process
massive amounts of unstructured data in parallel across a distributed cluster of processors or
stand-alone computers.
 Spark: an open-source parallel processing framework that enables users to run large-scale
data analytics applications across clustered systems.
 HBase: a column-oriented key/value data store built to run on top of the Hadoop Distributed
File System (HDFS).
 Hive: an open-source data warehouse system for querying and analyzing large datasets stored
in Hadoop files.
 Kafka: a distributed publish-subscribe messaging system designed to replace
traditional message brokers.
 Pig: an open-source technology that offers a high-level mechanism for the parallel
programming of MapReduce jobs to be executed on Hadoop clusters.
Applications

 Public Sector Services.


 Healthcare contributions.
 Learning Services.
 Insurance Services.
 Industrialized and Natural Resources.
 Transportation Services.
 Banking Sectors and Fraud Detection
Health care

 Large number of medical devices are there which are big data oriented.
 Today data is used to such an extent that doctor prescribes the medicines
without even visiting the patient by knowing the heartbeat and temperature
through the heart and temperature monitoring watch fitted on the patient’s
hand that stays in a remote place.
 Nanobots are miniature robots that are being developed which will increase
the immunity in the human’s body by fighting with bacteria and other harmful
germs.
 They have their own sensors and will be great in delivering chemotherapy.
 Nanobots are great biotech robots that will be used in carrying oxygen,
destroy germs, and renovate tissues.
Public sector

 Big data provides a large range of facilities to the government sectors


including the power investigation, deceit recognition and economic promotion
investigation.
Learning

 Applications named as the Bubble Score allow teachers to convey multiple-


choice assessments through mobile devices and notch up paper tests through
the cameras of the mobile phones.
 Further than just reformation coursework and the grading development, data-
driven classrooms opened up the understanding of what children learn when
they study it and to what height.
 Sometimes, a student submits his friend’s homework instead of his own. In
that situation, instead of getting the punishment he gets appreciation and the
other innocent student gets the punishment.
 So in these situations, big data entertains the cross checks of the assignments
in order to find out whose writing matches with the assignment’s writing.
Insurance services

 The big data as well enables for the better purchaser preservation from
insurance agencies.
 Big data is the technology tool that is being used in the production to offer
purchaser insights for see-through and simpler commodities, by finding out
and foreseeing buyer behavior from side to side information obtained from
internet websites including the social media as well as CCTV video recording.
Industry and natural resources

 A great quantity of data commencing the built-up industry is unexploited.


 The unused data avoids advanced eminence of merchandise, power
competence, dependability, and improved income boundaries.
 In the natural wealth industry, big data enables for analytical modelling to
sustain judgment creation that is used to consume and incorporate huge
amounts of information from geographical information, graphical information,
manuscript and chronological statistics.
Transportation

 Private sector uses the big data in traffic management, direction preparation,
intellectual transportation arrangements and overcrowding administration.
 Private sector uses the big data in income administration, industrial
improvements, logistics and for reasonable benefit.
 Personal use of the big data comprises direction forecasting to accumulate on
petroleum and period, for tour activities in seeing the sights etc.
Contributions in finance & crime
detection
 In banking sectors as the big data is implemented, it finds out all the mischief
tasks done. It detects the misuse of credit cards, misuse of debit cards etc.
 In businesses big data helps a lot in knowing the shopping patterns of
customers and CRM tactics of the competitors so that they can apply them in
their businesses in order to improve the sales.
Statistics

 Statistics is the science of collecting , organising ,presenting , analysing and


interpreting data to help in making more effective decisions.
 Statistical analysis is implemented to manipulate , summarise and investigate
data so that useful decision making information results are obtained.
 In statistical analysis you will able to get a variety of answers.
 It is basically used to understand the complex problems of the real world and
make it simpler to make useful decisions.
Contd…

 Its functions , algorithms can be used to analyse primary data, build statistics
model and predict the outcomes.
 An analysis of any situation can be done in 2 ways:

statistical analysis non-statistical analysis


 It is the science of collecting, exploring and presenting large amounts of data
to identify patterns and trends.(also called quantitative analysis)
 It provides generic information and includes text , sound , still images and
moving images.(also called qualitative analysis)
Contd….

 Although both analysis are useful but statistical analysis gives more insight
and clear picture which makes it wider for business.
 There are 2 major categories of statistics:
 Descriptive statistics
 Inferential statistics
Descriptive Statistics

 It helps organize data and focuses on the main characteristics of the data.

Descriptive
Statistics
Characteristics of data
Contd…

 It provides the summary of the data numerically or graphically.


 It explains numerical measures such as average, mode , standard deviation,
correlation are used to describe the features of the data set.
 Suppose u want to study the height of students in the classroom ??
 In ds u will be finding firstly the height of every student and then finding
average , max, and min!!!!
Inferential statistics

 IS generalizes the larger dataset and applies probability theory to draw a


conclusion.
 It allows you to infer population parameters based on sample statistics and to
model relationships with the data.
 Modelling allows u to develop mathematical equations which describe the
interrelationships between 2 or more variables.
 Same example????
 We will be categorizing height in IS as small , medium and small and then only
take small students to study the height of students.
Statistical trends….

 Insurance
 Stock market
 Genetics
 Medical studies
 Shopping
 Weather forecasting
Related terms…

 Population
 Sample
 Variable
 Quantitative variable
 Qualitative variable
 Discrete variable
 Continuous variable
Contd..

 Population: is the group from which data is to be collected.


 Sample : it is the subset of the population.
 Variable : is a feature characteristics of any member of a population differing
in quality or quantity from another member(gender, language , region, age
designation etc ).
 Quantity Variable : A variable differing in quantity.(the weight of a person,
number of people in a car)
 Qualitative Variable: differing in quality(color , degree of damage in a car in
an accident)
Contd..

 Discrete Variable : is one in which no value can be assumed between 2 given


values(the no. of children in a family).
 Continuous Variable: is one in which any value can be assumed between the
two given values(the time taken for a 100 meter run).
4 types of Statistical measures used to
describe data
 Measure of frequency
 Measures of central tendency
 Measures of speed
 Measures of position
Measures of tendency

 Frequency of data indicates the number of occurrences of any particular data


value in the given dataset.
 The measures of frequency are number and percentage.
Measures of central tendency

 It indicates whether the data values accumulate in the middle of distribution


or toward the end.
 The measures of CT are mean, mode and median.
 Mean(also known as the average).
 Mode(The value which appears most frequent in the given dataset).
 Median(the centre most value of the given dataset).
Measures of spread

 Spread describes how similar or varied the set of observed values for a
particular variable.
 The measures of spread are standard deviation(Square root of variance is
standard deviation, also the measurement of how far the data deviate from
the mean)….
 variance(It gives us the understanding of how the far the measurements are
from the mean)…..
 quartiles(gives us the understanding of how spread out the given data is).
 The measures of spread are also called measures of dispersion.
Measures of position

 Position identifies the exact location of a particular data value in the given
data set.
 The measures of position are percentiles, quartiles and standard scores.
Random variables

 The variable which can just change and can take different values.
 Variable whose value is determined by a random experiment.
 Eg: joining two cards from a deck of cards what are the chances that I will be
getting two aces??
 Discrete probability: table where a formula that lists the probabilities for
each outcome of the random variable, X.
Example of statistical analysis

 What is the weight of a mouse??????????


 Is this a statistical question if I am doing this experiment with the mice??
 It should give us a variety of answers and they must the varied distribution.
 Display the data in a dot plot. Identify any clusters, peaks or gaps?
 So firstly we have collected data and now we are organising it….

También podría gustarte