Está en la página 1de 1

Advance Big Data Science using Python-R-Hadoop-Spark (1/3)

Total Duration: 90 hours + Practice


Introduction to Data Science Database Input (Connecting to database) Python: Basic statistics
What is Data Science? Viewing Data objects - subsetting, methods Basic Statistics - Measures of Central Tendencies and Variance
Data Science Vs. Analytics vs. Data warehousing, OLAP, MIS Exporting Data to various formats Building blocks - Probability Distributions - Normal distribution -
Reporting Central Limit Theorem
Relevance in industry and need of the hour Python: Data Manipulation cleansing Inferential Statistics -Sampling - Concept of Hypothesis Testing
Type of problems and objectives in various industries Cleansing Data with Python Statistical Methods - Z/t-tests (One sample, independent, paired),
How leading companies are harnessing the power of Data Data Manipulation steps(Sorting, filtering, duplicates, merging, Anova, Correlation and Chi-square
Science? appending, subsetting, derived variables, sampling, Data type
Different phases of a typical Analytics/Data Science projects conversions, renaming, formatting etc) Python: Polyglot Programming
Data manipulation tools(Operators, Functions, Packages, Making Python talk to other languages and database systems
Python: Introduction & Essentials control structures, Loops, arrays etc) How do R and Python play with each other, why it's essential to
Overview of Python- Starting Python Python Built-in Functions (Text, numeric, date, utility know both
Introduction to Python Editors & IDE's(Canopy, pycharm, functions)
Jupyter, Rodeo, Ipython etc) Python User Defined Functions Hadoop: Introduction to Hadoop & Ecosystem
Custom Environment Settings Stripping out extraneous information Introduction to Hadoop
Concept of Packages/Libraries - Important packages(NumPy, Normalizing data Hadoopable Problems - Uses of Big Data analytics in various
SciPy, scikit-learn, Pandas, Matplotlib, etc) Formatting data industries like Telecom, E- commerce, Finance and Insurance etc
Installing & loading Packages & Name Spaces Important Python Packages for data manipulation (Pandas, Problems with Traditional Large-Scale Systems & Existing Data
Data Types & Data objects/structures (Tuples, Lists, Numpy etc) analytics Architecture
Dictionaries) Key technology foundations required for Big Data
List and Dictionary Comprehensions Python: Data Analysis Visualization Comparison of traditional data management systems with Big
Variable & Value Labels Date & Time Values Introduction exploratory data analysis Data management systems
Basic Operations - Mathematical - string - date Descriptive statistics, Frequency Tables and summarization Evaluate key framework requirements for Big Data analytics
Reading and writing data Univariate Analysis (Distribution of data & Graphical Analysis) Apache projects in the Hadoop Ecosystem
Simple plotting Bivariate Analysis(Cross Tabs, Distributions & Relationships, Hadoop Ecosystem & Hadoop 2.x core components
Control flow Graphical Analysis) Explain the relevance of real-time data
Debugging Creating Graphs- Bar/pie/line Explain how to use Big Data and real-time data as a Business
Code profiling chart/histogram/boxplot/scatter/density etc) planning tool
Important Packages for Exploratory Analysis(NumPy Arrays,
Python: Accessing/Importing and Exporting Data Matplotlib, Pandas and scipy.stats etc)
Importing Data from various sources (Csv, txt, excel, access
etc)

También podría gustarte