Advance Big Data Science using Python-R-Hadoop-Spark (1/3)
Total Duration: 90 hours + Practice
Introduction to Data Science Database Input (Connecting to database) Python: Basic statistics What is Data Science? Viewing Data objects - subsetting, methods Basic Statistics - Measures of Central Tendencies and Variance Data Science Vs. Analytics vs. Data warehousing, OLAP, MIS Exporting Data to various formats Building blocks - Probability Distributions - Normal distribution - Reporting Central Limit Theorem Relevance in industry and need of the hour Python: Data Manipulation cleansing Inferential Statistics -Sampling - Concept of Hypothesis Testing Type of problems and objectives in various industries Cleansing Data with Python Statistical Methods - Z/t-tests (One sample, independent, paired), How leading companies are harnessing the power of Data Data Manipulation steps(Sorting, filtering, duplicates, merging, Anova, Correlation and Chi-square Science? appending, subsetting, derived variables, sampling, Data type Different phases of a typical Analytics/Data Science projects conversions, renaming, formatting etc) Python: Polyglot Programming Data manipulation tools(Operators, Functions, Packages, Making Python talk to other languages and database systems Python: Introduction & Essentials control structures, Loops, arrays etc) How do R and Python play with each other, why it's essential to Overview of Python- Starting Python Python Built-in Functions (Text, numeric, date, utility know both Introduction to Python Editors & IDE's(Canopy, pycharm, functions) Jupyter, Rodeo, Ipython etc) Python User Defined Functions Hadoop: Introduction to Hadoop & Ecosystem Custom Environment Settings Stripping out extraneous information Introduction to Hadoop Concept of Packages/Libraries - Important packages(NumPy, Normalizing data Hadoopable Problems - Uses of Big Data analytics in various SciPy, scikit-learn, Pandas, Matplotlib, etc) Formatting data industries like Telecom, E- commerce, Finance and Insurance etc Installing & loading Packages & Name Spaces Important Python Packages for data manipulation (Pandas, Problems with Traditional Large-Scale Systems & Existing Data Data Types & Data objects/structures (Tuples, Lists, Numpy etc) analytics Architecture Dictionaries) Key technology foundations required for Big Data List and Dictionary Comprehensions Python: Data Analysis Visualization Comparison of traditional data management systems with Big Variable & Value Labels Date & Time Values Introduction exploratory data analysis Data management systems Basic Operations - Mathematical - string - date Descriptive statistics, Frequency Tables and summarization Evaluate key framework requirements for Big Data analytics Reading and writing data Univariate Analysis (Distribution of data & Graphical Analysis) Apache projects in the Hadoop Ecosystem Simple plotting Bivariate Analysis(Cross Tabs, Distributions & Relationships, Hadoop Ecosystem & Hadoop 2.x core components Control flow Graphical Analysis) Explain the relevance of real-time data Debugging Creating Graphs- Bar/pie/line Explain how to use Big Data and real-time data as a Business Code profiling chart/histogram/boxplot/scatter/density etc) planning tool Important Packages for Exploratory Analysis(NumPy Arrays, Python: Accessing/Importing and Exporting Data Matplotlib, Pandas and scipy.stats etc) Importing Data from various sources (Csv, txt, excel, access etc)
Python Data Science: A Step-By-Step Guide to Data Analysis. What a Beginner Needs to Know About Machine Learning and Artificial Intelligence. Exercises Included