‘72124, 15:00
Qué son los grandes datos? Inroducciény apicacién de Big data
EQué son los grandes datos? Introduccién, usos y aplicaciones.
Heger » Peneipinte
> {Gut sons grandes dares? tiecueci,usosy aplleaciones
Producimos una enorme cantidad de datos cada dia, lo
sepamos 0 no, Cada clic en Internet, cada transaccién
bancaria, cada video que miramos en YouTube, cada
correo electrénico que enviamos, cada me gusta en
nuestra publicacién de Instagram constituyen datos para
las empresas de tecnologia. Con una cantidad tan enorme
de datos recopilados , tiene sentido que las empresas
utiicen estos datos para comprender mejor a sus clientes
y su comportamiento , Esta es la razén por la que la
Popularidad de la ciencia de datos se ha multiplicado en
los tltimos afios. jIntentemos comprender qué es big data
y sus beneficlos y usos!
Este artfculo se publicé como parte de! Blogatnon de
ciencia de datos
Tabla de contenido
éQué son los grandes datos?
Big data es exactamente lo que sugiere el nombre, una
"gran" cantidad de datos. Big Data significa un conjunto
de datos que es grande en términos de volumen y més
complejo. Debide al gran volumen y la mayor complelidad
del Big Data, el software de procesamiento de datos
tradicional no puede manejarlo, Big Data simplemente
significa conjuntos de datos que contienen una gran
cantidad de datos diversos, tanto estructurados como no.
estructurados,
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations! a8‘72124, 15:00 {Qué son os grandes datos? Inroducciény aplicacién de Big data
EQué son los grandes datos? Introduccién, usos y aplicaciones.
efectiva utilizando Big Data Analytics. Las empresas
intentan identificar patrones y extraer conocimientos de
este mar de datos para poder actuar en consecuencia y
resolver los problemas en cuestién,
Aunque las empresas llevan décadas recopllando una
enorme cantidad de datos, el concepto de Big Data no
gané popularidad hasta principios y mediados de la
década de 2000. Las empresas se dieron cuenta de la
cantidad de datos que se recopilaban a diario y de la
importancia de utlizarlos de forma eficaz.
5V de Big Data
1. Elvolumen se refiere a la cantidad de datos que se
recopilan. Los datos pueden estar estructurados ono
estructurades.
2. La velocidad se refiere a la velocidad a la que llegan
los datos,
3. La var
dad se refiere a los diferentes tipos de datos
(tipos de datos, formatos, etc.) que llegan para su
anélisis. En los utimos afios, también han surgido dos
Vadicionales de datos: valor y veracidad
4.Elvalor se refiere a la utilidad de los datos
recopilados.
5. Laveracidad se reflere a la calidad de los datos que
provienen de diferentes fuentes.
Variety
Velocity @ Value
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations! ans‘72124, 15:00 {Qué son os grandes datos? Inroducciény aplicacién de Big data
EQué son los grandes datos? Introduccién, usos y aplicaciones.
Tiempo necesario: 15 minutos
Big data implica recopilar, procesar y analizer grandes
cantidades de datos de miltiples fuentes para descubrir
patrones, relaciones y conocimientos que puedan informar
la toma de decisiones. El proceso implica varios pasos:
1. Recopilacién de datos,
Los macrodatos se recopilan de diversas fuentes,
‘como redes sociales, sensores, sistemas
transaccionales, opiniones de clientes y otras fuentes.
2. Almacenamiento de datos
‘The collected data then needs to be stored in a way
that it can be easily accessed and analyzed later. This
often requires specialized storage technologies
capable of handling large volumes of deta,
3, Data Processing
Once the data is stored, it needs to be processed
before it can be analyzed. This involves cleaning and
organizing the data to remove any errors or
inconsistencies, and transform it into a format suitable
for analysis.
4, Data Analysis
After the data has been processed, Its time to
analyze it using tools lke statistical models and
machine learning algorithms to identity patterns,
relationships, and trends.
5. Data Visualization
‘The insights derived from data analysis are then
presented in visual formats such as graphs, charts,
‘and dashboards, making it easier for decision-makers,
to understand and act upon them.
Use Cases
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations! sits‘72124, 15:00 {Qué son os grandes datos? Inroducciény apicacién de Big data
EQué son los grandes datos? Introduccién, usos y aplicaciones.
to solve problems, and have more data to test their
hypothesis on,
Peni
eee ees
pron eos ae
Customer Experience
Customer experience is # major field that has been
revolutionized with the advent of Big Data. Companies are
collecting more data about their customers and their
preferences than ever. This data is being leveraged in a
positive way, by giving personalized recommendations
and offers to customers, who are more than happy to
allow companies to collect this data in return for the
personalized services. The recommendations you get on
Netflix, or Amezon/Flipkart are a gift of Big Data!
Mact
\e Learning
Machine Learning is another field that has benefited
reat from the increasing popularity of Big Data, More
data means we have larger datasets to train our ML
models, and a more trained model (generally) results in a
better performance. Also, with the help of Machine
Learning, we are now able to automate tasks that were
earlier being done manually, all thanks to Big Data.
Demand Forecasting
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations! 4s‘72124, 15:00
{Qué son os grandes datos? Inroducciény aplicacién de Big data
EQué son los grandes datos? Introduccién, usos y aplicaciones.
purchases. This helps companies build forecasting
models, that help them forecast future demand, and scale
production accordingly. It helps companies, especially
those in manufacturing businesses, to reduce the cost of
storing unsold inventory in warehouses.
Big data also has extensive use in applications such as
product development and fraud detection.
Find Out the Difference Between Big Data
and Data Science!
Hes esimate that by 2025, global data
creation wil each» mind-boggling 462
cxabytes per day. AS ou word becomes
Incessngly at-drve, the combination
ig Dats ana ts Science promises
GV Arse aye
How to Store and Process Big Data?
The volume and velocity of Big Data can be huge, which
makes it almost impossible to store it in traditional data
warehouses. Although some and sensitive information can
be
red on company premises, for most of the data,
companies have to opt for cloud storage or Hadoop.
Cloud storage allows businesses to store their data on
the internet with the help of a cloud service provider (like
‘Amazon Web Services, Microsoft Azure, or Google Cloud
Platform) who takes the responsibility of managing and
storing the data. The data can be accessed easily and
quickly with an API
Hadoop also does the same thing, by giving you the
ability to store and process large amounts of data at once.
Hadoop is an open-source software framework and is
free. It allows users to process large datasets across
clusters of computers,
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations! sts‘72124, 15:00 {Qué son os grandes datos? Inroducciény apicacién de Big data
£Qué son los grandes datos? Introduccién, usos y aplicaciones.
1. Apache Hadoop is an open-source big dats tool
designed to store and process large amounts of data
‘across multiple servers. Hadoop comprises a
distributed file system (HDFS) and 2 MapReduce
processing engine.
2. Apache Spark is a fast and general-purpose cluster
‘computing system that supports in-memory
processing to speed up iterative algorithms. Spark can
be used for batch processing, real-time stream
processing, machine learning, graph processing, and
SQL queries.
3. Apache Cassandra isa distributed NoSQL database
management system designed to handle large
‘amounts of data across commodity servers with high
availability and fault tolerance.
4, Apache Flink is an open-source streaming data
processing framework that supports batch processing,
real-time stream processing, and event-driven
applications. Flink provides low-latency, high-
throughput data processing with fault tolerance and
scalability
5. Apache Kafka is a distributed streaming platform that
‘enables the publishing and subscribing to streams of
records in real
1. Katka is used for building real-
time data pipelines and streaming applications.
6. Splunk is a software platform used for searching,
monitoring, and analyzing machine-generated big data
in real-time. Splunk collects and indexes data from
various sources and provides insights into operational
‘and business intelligence.
7. Talend is an open-source data integration platform
that enables organizations to extract, transform, and
load
L) data from various sources into target
systems, Talend supports big data technologies such
‘as Hadoop, Spark, Hive, Pig, and HBase.
8. Tableau is a data visualization and business
intelligence tool that allows users to analyze and share
data using interactive dashboards, reports, and
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations! eins‘72124, 15:00 {Qué son os grandes datos? Inroducciény apicacién de Big data
EQué son los grandes datos? Introduccién, usos y aplicaciones.
Google BigQuery,
9. Apache NiFi is @ data flow management tool used for
‘automating the movement of data between systems.
NiFi supports big data technologies such as Hadoop,
Spark, and Kafka and provides real-time data
processing and analytics,
10. QlikView is @ business inteligence and data
visualization too! that enables users to analyze and
share data using interactive dashboards, reports, and
charts. QlikView supports big data platforms such as
Hedoop, and provides real-time deta processing and
analytics.
Big Data Best Practices
To effectively manage and utilize big data, organizations
should follow some best practices:
+ Define clear business objectives: Organizations should
define clear business objectives while collecting and
analyzing big data. This can help avoid wasting time
land resources on irrelevant data,
+ Collect and store relevant data only: It is important to
collect and store only the relevant data th:
required
for analysis. This can help reduce data storage costs
and improve data processing efficiency.
+ Ensure data quality: Its critical t
ensure data quality
by removing errors, inconsistencies, and duplicates
from the data before storage and processing
+ Use appropriate tools and technologies: Organizations
must use appropriate tools and tachnolagies for
collecting, storing, processing, and analyzing big data
‘This includes specialized software, hardware, and
‘cloud-based technologies.
+ Establish deta security and privacy policies: Big data
often contains sensitive information, and therefore
organizations must establish rigorous data security
‘nd privacy policies to protect this data from
unauthorized access or misuse.
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations! ms‘72124, 15:00 {Qué son os grandes datos? Inroducciény aplicacién de Big data
EQué son los grandes datos? Introduccién, usos y aplicaciones.
to identity patterns and predict future trends in big
data, Organizations must leverage these technologies
to gain actionable insights from their date
+ Focus on data visualization: Data visualization can
simplify complex data into intuitive visual formats such
‘as graphs or charts, making it easier for decision-
makers t
understand and act upon the insights
derived from big data
Challenges
1. Data Growth
Managing datasets having terabytes of information can be
2 big challenge for companies. As datasets grow in size,
storing them not only becomes a challenge but also
becomes an expensive affair for companies.
To overcome this, companies are now starting to pay
attention to data compression and de-duplication. Data
compression reduces the number of bits that the data
needs, resulting in a reduction in space being consumed
Data de-duplication is the process of making sure
duplicate and unwanted data does not reside in our
database.
2, Data Security
Data security is often prioritized quite low in the Big Data
workflow, which can backfire at times. With such a large
amount of data being collected, security challenges are
bound to come up sooner or later.
Mining of sensitive information, fake data generation, and
lack of cryptographic protection (encryption) are some of
the challenges businesses face when trying to adopt Big
Data techniques.
Companies need to understand the Importance of data
security, and need to prioritize it. To help them, there are
professional Big Data consultants nowadays, that help
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations! Bits‘72124, 15:00
{Qué son os grandes datos? Inroducciény aplicacién de Big data
EQué son los grandes datos? Introduccién, usos y aplicaciones.
3. Data Integration
Data is coming in from a lot of different sources (social
media applications, emails, customer verification
documents, survey forms, ete). It often becomes a very
big operational challenge for companies to combine and
reconcile all of this data.
‘There are several Big Data solution vendors that offer ETL
(Extract, Transform, Load) and data integration solutions
to companies that are trying to overcome data integration
problems. There are also several APIs that have already
been built to tackle issues related to data integration,
An Introductory Guide to Big Data Analytics
‘hiaich wae pulsed a3 pat ofthe Dat Science logathon One
thing that comes our mind ae eating ig Oat Aalst this
Ansty nat. Continue ending
WV arsine °
Advantages and Disadvantages of Big Data
Advantages of Big Data
+ Improved decision-making: Big dat
can provide
insights and patterns that help organizations make
more informed decisions.
+ Increased efficiency: Big data analytics can help
organizations identify inefficiencies in their operations
‘and improve processes to reduce costs.
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations! its‘72124, 15:00 {Qué son os grandes datos? Inroducciény apicacién de Big data
EQué son los grandes datos? Introduccién, usos y aplicaciones.
campaigns that are relevant to individual customers,
resulting in better customer engagement and loyalty,
‘+ New revenue streams: Big data can uncover new
business opportur
3s, enabling organizations to
create new products and services that meat market
domand
+ Competitive advantage: Organizations that can
effectively leverage big data have @ competitive
‘advantage over those that cannot, as they can make
faster, more informed decisions based on data-driven
insights.
Disadvantages of Big Data
+ Privacy concerns: Collecting and storing large
amounts of data can raise privacy concerns,
particularly if the data includes sensitive personal
information
«+ Risk of oi
breaches: Big data increases the risk of
data breaches, leading to loss of confidential data and
negative publicity for the organization.
+ Technical challenges: Managing and processing large
volumes of data requires specialized technologies and
skilled personnel, which can be expensive and time-
consuming.
+ Difficulty in integrating date sources: Integrating data
from multiple sources can be challenging, particularly
ifthe data is unstructured or stored in different
formats.
+ Complexity of analysis: Analyzing large datasets can
be complex and time-consuming, requiing specialized
skills and expertise
Implementation Across Industries
Here are top 10 industries that use big data In thelr favor -
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations! 015‘72124, 15:00
2Qu8 son tos
eQué son los grandes datos? Introduccién, usos y aplicaciones.
Healthcare outcomes, identity trends and patterns, and}
develop personalized treatment
Track and analyze customer data to
Reta personalize marketing campaigns, improve
Inventory management and enhance CX
Iinance Detect fraud, assess risks and make
Informed investment decisions
Optimize supply chain processes, reduce
[Manufacturing costs and improve praduct quality through
predictive maintenance
Optimize routes, improve fleet msnagement
ITransportation and enhance safety by predicting accidents
before they happen
Supervise y analice patrones de uso de
leneray cenergia, optimice la produccién y reduzca
los residues mediante andlisis predietivos.
Adminitre el trifica de la red, mejore la
| etecomunicaiones 20080 del servicio reduzca el tiempo de
Inactividad mediante el mantenimiento
predictive y la prediccién de interrupciones.
‘Abordar cusstiones como la prevencién del
(Gobierno y publicodelito, la mejora de la gestion del tréfico y
la prediccién de desastres naturales.
Comprendar el comportamiento del
Ppubticiaad y cconsumidor, dirigirse a audiencies
IMarketing especiticas y mecir la eficacia de las
campafas,
Personalice las experiencias de
aprenaizaje, supervise el progreso de los
leducacion estudiantes y mejare los métodos de
lenseanza a través del aprendizaje
adaptativ,
El futuro de los grandes datos
El volumen de datos que se producen cada dia aumenta
continuamente, con la creciente digitalizacién. Cada vez
ims empresas estén empezando @ pasar de los métodos
tradicionales de almacenamiento y andlsis de datos 2
soluciones en la nube. Las empresas estén empezando @
darse cuenta de la importancia de los datos. Todo esto
implica una cosa: jel futuro del Big Data parece
prometedor! Cambiaré la forma en que operan las
empresas y se toman decisiones.
Nota final
En este articulo, analizamos lo que en’
demos por Big
Data, datos estructurados y no estructurados, algunas
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations!
randes datos? Introduccién y aplcacién 6 Big data
ns‘72124, 15:00
{Qué son os grandes datos? Inroducciény aplicacién de Big data
EQué son los grandes datos? Introduccién, usos y aplicaciones.
plataformas en la nube y Hadoop. Si esté interesado en
‘obtener mas informacién sobre los usos de big data,
registrese en nuestro programa Blackbelt Plus . Obtenga
su hoja de ruta profesional personalizada, domine todas
las habilidades que le faltan con Ia ayuda de un mentor y
resuelva proyectos complelos con orientacién experta
IInseribete hoy!
Preguntas frecuentes
1. Qué es big data on palabras simples?
R. Big data se refiere al gran volumen de datos.
estfucturados y no estructurados generados por
individuos, organizaciones y méquinas.
P2, gQué es el big data por
jempio?
R. Un ejemplo de big data seria analizar las grandes
cantidades de datos recopilados de plataformas de redes
sociales como Facebook o Twitter para identifica la
opinién del cliente hacia un producto 0 servicio en
particular.
P3, ,Cudles son ios 3 tipos de big data?
R. Los tres tipos de big data son datos estructurados,
datos no estructurados y datos semiestructurados,
4. gPara qué so utiliza el big data?
R. Los macrodatos se utilizan para diversos fines, como
mejorar las operaciones comerciales, comprender el
comportamiento de los clientes, predecir tendencias
futuras y desarrollar nuevos productos o servicios, entre
otros.
Los medios que se muestran en este artioulo no son
Propiedad de Analytics Vidhya y se utilizan a
discrecién del autor.
Svsdebigdata Aplicaciones Grandes datos
biogatén
ntps:twwwanalytesvahya.comblog/2021/05iwhat-is-big-date-intreduction-uses-and-apphcations! rans