Está en la página 1de 1

Advance Big Data Science using Python-R-Hadoop-Spark (2/3)

Total Duration: 90 hours + Practice

Hadoop core components- HDFS Hadoop Data Analysis Tools: Impala Distributed Persistence
HDFS Overview & Data storage in HDFS Introduction to Impala & Architecture Spark Streaming Overview(Example: Streaming Word Count)
Get the data into Hadoop from local machine(Data Loading How Impala executes Queries and its importance.
Techniques) - vice versa Hive vs. PIG vs. Impala Spark: Spark meets Hive
Extending Impala with User Defined functions Analyze Hive and Spark SQL Architecture
Hadoop core components- MapReduce (YARN) Improving Impala Performance Analyze Spark SQL
Map Reduce Overview (Traditional way Vs. MapReduce way) Context in Spark SQL
Concept of Mapper & Reducer Hadoop Data Analysis Tools: Hbase (NOSQL Database) Implement a sample example for Spark SQL
Understanding Map reduce program skeleton Introduction to NoSQL Databases, types, and Hbase Integrating hive and Spark SQL
Running MapReduce job in Command line HBase v/s RDBMS, HBase Components, HBase Architecture Support for JSON and Parquet File Formats Implement Data
HBase Cluster Deployment Visualization in Spark
Hadoop Data Analysis Tools: Hadoop-PIG Loading of Data
Introduction to PIG - MapReduce Vs Pig, Pig Use Cases Hadoop: Introduction to other Apache Projects Hive Queries through Spark
Pig Latin Program & Execution Introduction to Zookeeper/Oozie/Sqoop/Flume Performance Tuning Tips in Spark
Pig Latin : Relational Operators, File Loaders, Group Shared Variables: Broadcast Variables & Accumulators
Operator, COGROUP Operator, Joins and COGROUP, Union, SPARK: Introduction
Diagnostic Operators, Pig UDF Introduction to Apache Spark Data Science using SPARK Python
Use Pig to automate the design and implementation of Streaming Data Vs. In Memory Data Hadoop - Python Integration
Map Reduce Vs. Spark Spark - Python Integration (PySpark)
MapReduce applications
Modes of Spark
Data Analysis using PIG
Spark Installation Demo Spark -Python: Machine Learning -Predictive Modeling Basics
Hadoop Data Analysis Tools: Hadoop-Hive Overview of Spark on a cluster Introduction to Machine Learning & Predictive Modeling
Introduction to Hive - Hive Vs. PIG - Hive Use Cases Spark Standalone Cluster Types of Business problems - Mapping of Techniques
Discuss the Hive data storage principle Major Classes of Learning Algorithms -Supervised vs
Spark: Spark in practice Unsupervised Learning,
Explain the File formats and Records formats supported by
Invoking Spark Shell Different Phases of Predictive Modeling (Data Pre-processing,
the Hive environment
Creating the Spark Context Sampling, Model Building, Validation)
Perform operations with data in Hive
Loading a File in Shell Overfitting (Bias-Variance Trade off) & Performance Metrics
Hive QL: Joining Tables, Dynamic Partitioning, Custom
Performing Some Basic Operations on Files in Spark Shell Types of validation(Bootstrapping, K-Fold validation etc)
Map/Reduce Scripts
Building a Spark Project with sbt
Hive Script, Hive UDF
Running Spark Project with sbt
Caching Overview