Documentos de Académico
Documentos de Profesional
Documentos de Cultura
BIG DATA
BY K.DAVID
DMCA-40
Introduction
The Big Data revolution promises to transform how we live, work, and think by
enabling process optimization, empowering insight discovery and improving decision
making.
The ability to extract value from Big Data depends on data analytics which consider
analytics to be the core of the Big Data revolution
Data analytics involves various approaches, technologies, and tools such as those from text
analytics, business intelligence, data visualization, and statistical analysis. The machine
learning (ML) as a fundamental component of data analytics. The ML will be one of the
main drivers of the Big Data revolution. The reason for this is its ability to learn from data
and provide data driven insights, decisions, and predictions.
ML Challenges Originating From Big Data
Definition
Big Data are often described by its dimensions, which are referred to as
its V’s.Earlier definitions of Big Data focussed on three V’s (volume,
velocity, and variety); however, a more commonly accepted definition
now relies upon the following four V’s : volume, velocity, variety, and
veracity.
The first and the most talked about characteristic of Big Data
is volume: it is the amount, size, and scale of the data.
1) Processing Performance:-
One of the main challenges encountered in computations with
Big Data comes from the simple principle that scale, or
volume, adds computational complexity.
Class imbalance:-
As datasets grow larger, the assumption that the data are
uniformly distributed across all classes is often broken.
This leads to a challenge referred to as class imbalance
Curse of Dimensionality:-
It refers to difficulties encountered when working in high
dimensional space. Specifically, the dimensionality describes
the number of features or attributes present in the dataset.
Feature Engineering:-
1) Deep Learning
Deep learning is an approach from the representation
learning family of machine learning. Representation learning
is also often referred to as feature learning
This type of learning is used when the data size within the
target domain is insufficient or the learning task is different .