Ebook2,038 pages22 hours

R: Unleash Machine Learning Techniques

Name: R: Unleash Machine Learning Techniques
Author: Brett Lantz
ISBN: 9781787128286

By Brett Lantz, Lesmeister Cory, Dipanjan Sarkar and Raghav Bali

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Aimed for intermediate-to-advanced people (especially data scientist) who are already into the field of data science

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateOct 24, 2016

ISBN9781787128286

Author

Brett Lantz

"Brett Lantz has spent the past 10 years using innovative data methods to understand human behavior. A sociologist by training, he was first enchanted by machine learning while studying a large database of teenagers' social networking website profiles. Since then, he has worked on interdisciplinary studies of cellular telephone calls, medical billing data, and philanthropic activity, among others. When he's not spending time with family, following college sports, or being entertained by his dachshunds, he maintains dataspelunking.com, a website dedicated to sharing knowledge about the search for insight in data."

Related to R

Related ebooks

Skip carousel

Mastering Machine Learning with R
Ebook
Mastering Machine Learning with R
byLesmeister Cory
Rating: 0 out of 5 stars
0 ratings
Learning Predictive Analytics with R
Ebook
Learning Predictive Analytics with R
byMayor Eric
Rating: 0 out of 5 stars
0 ratings
Handbook of Statistical Analysis and Data Mining Applications
Ebook
Handbook of Statistical Analysis and Data Mining Applications
byRobert Nisbet
Rating: 4 out of 5 stars
4/5
Practical Data Analysis Cookbook
Ebook
Practical Data Analysis Cookbook
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
R: Recipes for Analysis, Visualization and Machine Learning
Ebook
R: Recipes for Analysis, Visualization and Machine Learning
byAtmajitsinh Gohil
Rating: 0 out of 5 stars
0 ratings
Mastering Predictive Analytics with R
Ebook
Mastering Predictive Analytics with R
byRui Miguel Forte
Rating: 4 out of 5 stars
4/5
Data Mining Applications with R
Ebook
Data Mining Applications with R
byYanchang Zhao
Rating: 4 out of 5 stars
4/5
Building a Recommendation System with R
Ebook
Building a Recommendation System with R
byGorakala Suresh K.
Rating: 0 out of 5 stars
0 ratings
R Machine Learning Essentials
Ebook
R Machine Learning Essentials
byUsuelli Michele
Rating: 0 out of 5 stars
0 ratings
Mastering Scientific Computing with R
Ebook
Mastering Scientific Computing with R
byPaul Gerrard
Rating: 3 out of 5 stars
3/5
Mastering Python for Data Science
Ebook
Mastering Python for Data Science
bySamir Madhavan
Rating: 3 out of 5 stars
3/5
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
R Object-oriented Programming
Ebook
R Object-oriented Programming
byKelly Black
Rating: 3 out of 5 stars
3/5
Mastering Data Mining with Python – Find patterns hidden in your data
Ebook
Mastering Data Mining with Python – Find patterns hidden in your data
byMegan Squire
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis - Second Edition
Ebook
Practical Data Analysis - Second Edition
byHector Cuesta
Rating: 0 out of 5 stars
0 ratings
Practical Machine Learning Cookbook
Ebook
Practical Machine Learning Cookbook
byAtul Tripathi
Rating: 0 out of 5 stars
0 ratings
Simulating Data with SAS
Ebook
Simulating Data with SAS
byRick Wicklin
Rating: 0 out of 5 stars
0 ratings
Simulation for Data Science with R
Ebook
Simulation for Data Science with R
byMatthias Templ
Rating: 0 out of 5 stars
0 ratings
Scala for Machine Learning
Ebook
Scala for Machine Learning
byNicolas Patrick R.
Rating: 0 out of 5 stars
0 ratings
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
Ebook
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
byCésar Pérez López
Rating: 0 out of 5 stars
0 ratings
Preparing Data for Analysis with JMP
Ebook
Preparing Data for Analysis with JMP
byRobert Carver
Rating: 0 out of 5 stars
0 ratings
Learning R for Geospatial Analysis
Ebook
Learning R for Geospatial Analysis
byMichael Dorman
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis with JMP, Third Edition
Ebook
Practical Data Analysis with JMP, Third Edition
byRobert Carver
Rating: 0 out of 5 stars
0 ratings
A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics
Ebook
A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics
byGayathri Rajagopalan
Rating: 0 out of 5 stars
0 ratings
Learning R Programming
Ebook
Learning R Programming
byKun Ren
Rating: 5 out of 5 stars
5/5
Learning Predictive Analytics with Python
Ebook
Learning Predictive Analytics with Python
byKumar Ashish
Rating: 0 out of 5 stars
0 ratings
Introduction to Statistical and Machine Learning Methods for Data Science
Ebook
Introduction to Statistical and Machine Learning Methods for Data Science
byCarlos Andre Reis Pinheiro
Rating: 0 out of 5 stars
0 ratings
Data Science: Concepts and Practice
Ebook
Data Science: Concepts and Practice
byVijay Kotu
Rating: 3 out of 5 stars
3/5
IBM SPSS Modeler Cookbook
Ebook
IBM SPSS Modeler Cookbook
byMeta S. Brown
Rating: 0 out of 5 stars
0 ratings
Python: Real-World Data Science
Ebook
Python: Real-World Data Science
byRobert Layton
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
Ebook
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
byJimmy Russell
Rating: 4 out of 5 stars
4/5
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
Ebook
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
byEric Vargas
Rating: 0 out of 5 stars
0 ratings
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
Ebook
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
byGame Guidez
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
C# 7.0 All-in-One For Dummies
Ebook
C# 7.0 All-in-One For Dummies
byBill Sempf
Rating: 0 out of 5 stars
0 ratings
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
ReactJS by Example - Building Modern Web Applications with React
Ebook
ReactJS by Example - Building Modern Web Applications with React
byVipul A M
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

#43 Election Forecasting and Polling
Podcast episode
#43 Election Forecasting and Polling
byDataFramed
0 ratings
0% found this document useful
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
Podcast episode
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
byMLOps.community
0 ratings
0% found this document useful
Defining Success: Metrics and KPIs - Adam Sroka
Podcast episode
Defining Success: Metrics and KPIs - Adam Sroka
byDataTalks.Club
0 ratings
0% found this document useful
051: Strategy evaluation techniques, flaws and solutions with Dave Walton: Today we’re covering a topic which can really be a concern for traders of all levels, from beginner to pro, and that is the topic of strategy evaluation. Have you ever found that real-life performance does not match expected results? Or perhaps you...
Podcast episode
051: Strategy evaluation techniques, flaws and solutions with Dave Walton: Today we’re covering a topic which can really be a concern for traders of all levels, from beginner to pro, and that is the topic of strategy evaluation. Have you ever found that real-life performance does not match expected results? Or perhaps you...
byBetter System Trader
0 ratings
0% found this document useful
Similarities and Differences between ML and Analytics - Rishabh Bhargava
Podcast episode
Similarities and Differences between ML and Analytics - Rishabh Bhargava
byDataTalks.Club
0 ratings
0% found this document useful
Humans in the Loop - Lina Weichbrodt
Podcast episode
Humans in the Loop - Lina Weichbrodt
byDataTalks.Club
0 ratings
0% found this document useful
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
Podcast episode
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
byMLOps.community
0 ratings
0% found this document useful
The Art & Science of Finding You Top Performers: The Art & Science of Finding You Top Performers Advanced Insights into Data Analysis and Optimization with Dr. Ellis Welcome to this episode of Seller Sessions, where we dive deep into the nuanced world of data analysis and optimisation with the...
Podcast episode
The Art & Science of Finding You Top Performers: The Art & Science of Finding You Top Performers Advanced Insights into Data Analysis and Optimization with Dr. Ellis Welcome to this episode of Seller Sessions, where we dive deep into the nuanced world of data analysis and optimisation with the...
bySeller Sessions Amazon FBA and Private Label
0 ratings
0% found this document useful
Conquering the Last Mile in Data - Caitlin Moorman
Podcast episode
Conquering the Last Mile in Data - Caitlin Moorman
byDataTalks.Club
0 ratings
0% found this document useful
Becoming a Data-led Professional - Arpit Choudhury
Podcast episode
Becoming a Data-led Professional - Arpit Choudhury
byDataTalks.Club
0 ratings
0% found this document useful
Data Observability - Barr Moses
Podcast episode
Data Observability - Barr Moses
byDataTalks.Club
0 ratings
0% found this document useful
Amazon PPC Automation with Micheal Hecker – SS012: Micheal has big data and a 5 step approach for Amazon PPC Automation to optimise and automate your ad spend. We cover keyword research and search term reports, how to calculate keyword relevance and product data optimisation along with ad group...
Podcast episode
Amazon PPC Automation with Micheal Hecker – SS012: Micheal has big data and a 5 step approach for Amazon PPC Automation to optimise and automate your ad spend. We cover keyword research and search term reports, how to calculate keyword relevance and product data optimisation along with ad group...
bySeller Sessions Amazon FBA and Private Label
0 ratings
0% found this document useful
15: Lifecycle: A Martech Saga part 4: Picking the right MQL model: You need a good MQL model so that marketing leads make it to sales and get followed up. There are a lot of ways to define MQLs and pass them over. It’s very common to have a lead scoring model, and it’s the best way to get to build a scalable, highly auto
Podcast episode
15: Lifecycle: A Martech Saga part 4: Picking the right MQL model: You need a good MQL model so that marketing leads make it to sales and get followed up. There are a lot of ways to define MQLs and pass them over. It’s very common to have a lead scoring model, and it’s the best way to get to build a scalable, highly auto
byHumans of Martech
0 ratings
0% found this document useful
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
Podcast episode
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
Privacy Threat Modeling with DoorDash’s Nandita Rao Narla: Privacy threat modeling is a structured approach to identifying and assessing potential privacy risks associated with a particular system, application, or process. It involves analyzing how personal data flows through a system, identifying potential ...
Podcast episode
Privacy Threat Modeling with DoorDash’s Nandita Rao Narla: Privacy threat modeling is a structured approach to identifying and assessing potential privacy risks associated with a particular system, application, or process. It involves analyzing how personal data flows through a system, identifying potential ...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
Making Sense of Chaos with Doyne Farmer
Podcast episode
Making Sense of Chaos with Doyne Farmer
byThinkers & Ideas
0 ratings
0% found this document useful
Breaking Down Today’s Machine Learning Technology with Christina Pawlikowski: Melissa Perri is joined by Christina Pawlikowski, a teaching fellow at Harvard and co-founder of Causal, to help demystify machine learning and AI on this episode of Product Thinking.
Podcast episode
Breaking Down Today’s Machine Learning Technology with Christina Pawlikowski: Melissa Perri is joined by Christina Pawlikowski, a teaching fellow at Harvard and co-founder of Causal, to help demystify machine learning and AI on this episode of Product Thinking.
byProduct Thinking
0 ratings
0% found this document useful
Product Owners in Data Science - Anna Hannemann
Podcast episode
Product Owners in Data Science - Anna Hannemann
byDataTalks.Club
0 ratings
0% found this document useful
66: A guide to data models and dynamic dashboards for marketers
Podcast episode
66: A guide to data models and dynamic dashboards for marketers
byHumans of Martech
0 ratings
0% found this document useful
Better Done Than Perfect. Using Surveys for Customer Success with Moritz Dausinger: Today we have another episode of Better Done Than Perfect. Listen in as we talk with Moritz Dausinger, founder of Refiner. Moritz shares the story behind his survey tool, when and how to survey your users, and many other tips for making the most of the survey data.
Podcast episode
Better Done Than Perfect. Using Surveys for Customer Success with Moritz Dausinger: Today we have another episode of Better Done Than Perfect. Listen in as we talk with Moritz Dausinger, founder of Refiner. Moritz shares the story behind his survey tool, when and how to survey your users, and many other tips for making the most of the survey data.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
Q4: Scott Sanderson – Portfolio Optimization: Risk Preferences In, Trades Out: CWT x Quantopian Pt. 4 – When one has a price model that they think will work well for forecasting returns, the next step is to actually trade it. This isn’t that simple for a variety of reasons. For one thing, you need to...
Podcast episode
Q4: Scott Sanderson – Portfolio Optimization: Risk Preferences In, Trades Out: CWT x Quantopian Pt. 4 – When one has a price model that they think will work well for forecasting returns, the next step is to actually trade it. This isn’t that simple for a variety of reasons. For one thing, you need to...
byChat With Traders
0 ratings
0% found this document useful
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
Podcast episode
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
byLinear Digressions
0 ratings
0% found this document useful
Building Data Science Practice - Andrey Shtylenko
Podcast episode
Building Data Science Practice - Andrey Shtylenko
byDataTalks.Club
0 ratings
0% found this document useful
Building Data Science Practice - Andrey Shtylenko
Podcast episode
Building Data Science Practice - Andrey Shtylenko
byDataTalks.Club
0 ratings
0% found this document useful
Justin Dux - Be The Match: Sign-up for The Technopath Way Weekly Newsletter here: technopath.ac-page.com/the-technopath-way-sign-up Be The Match How can I check if I’m still registered in the database if I signed up years ago? You can call this number to find out: 1 (800)...
Podcast episode
Justin Dux - Be The Match: Sign-up for The Technopath Way Weekly Newsletter here: technopath.ac-page.com/the-technopath-way-sign-up Be The Match How can I check if I’m still registered in the database if I signed up years ago? You can call this number to find out: 1 (800)...
byThe Technopath Way: Productivity through tech for nonprofits
0 ratings
0% found this document useful
511. ANSWER TRAPS ON STANDARDIZED TESTS: The difficulty of individual multiple-choice questions depends on far more than just the questions themselves, as any test taker who has struggled to differentiate between several tempting choices can attest. Amy and Mike invited educator Vinny Madera...
Podcast episode
511. ANSWER TRAPS ON STANDARDIZED TESTS: The difficulty of individual multiple-choice questions depends on far more than just the questions themselves, as any test taker who has struggled to differentiate between several tempting choices can attest. Amy and Mike invited educator Vinny Madera...
byTests and the Rest: College Admissions Industry Podcast
0 ratings
0% found this document useful
11: Which MCAT Materials are the Best for Me?: Session 11 Your questions, answered here on the OldPreMeds Podcast. Ryan and Rich again dive into the forums over at OldPreMeds.org where they pull a question and deliver the answers right on to you. Today, they discuss about the best resource...
Podcast episode
11: Which MCAT Materials are the Best for Me?: Session 11 Your questions, answered here on the OldPreMeds Podcast. Ryan and Rich again dive into the forums over at OldPreMeds.org where they pull a question and deliver the answers right on to you. Today, they discuss about the best resource...
byOldPreMeds Podcast
0 ratings
0% found this document useful
Jeremiah Lowin – Machine Learning in Investing – [Invest Like the Best, EP.105]: My guest this week is one of my best and oldest friends, Jeremiah Lowin. Jeremiah has had a fascinating career, starting with advanced work in statistics before moving into the risk management field in the hedge fund world. Through his career he has studi
Podcast episode
Jeremiah Lowin – Machine Learning in Investing – [Invest Like the Best, EP.105]: My guest this week is one of my best and oldest friends, Jeremiah Lowin. Jeremiah has had a fascinating career, starting with advanced work in statistics before moving into the risk management field in the hedge fund world. Through his career he has studi
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models: Mathematical reasoning is a challenging task for large language models (LLMs), while the scaling relationship of it with respect to LLM capacity is under-explored. In this paper, we investigate how the pre-training loss, supervised data amount, and a...
Podcast episode
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models: Mathematical reasoning is a challenging task for large language models (LLMs), while the scaling relationship of it with respect to LLM capacity is under-explored. In this paper, we investigate how the pre-training loss, supervised data amount, and a...
byPapers Read on AI
0 ratings
0% found this document useful
Ep. 65 - Data Modeling
Podcast episode
Ep. 65 - Data Modeling
byWhat's Your Baseline? Enterprise Architecture & Business Process Management Demystified
0 ratings
0% found this document useful

Skip carousel

01 Giving Data Collectors—and Donors—a Real-Time Rush
Fast Company
Article
01 Giving Data Collectors—and Donors—a Real-Time Rush
Mar 20, 2017
7 min read
How Mature Is Your Organisation With Regards To Digital And Web Analytics?
NZ Marketing
Article
How Mature Is Your Organisation With Regards To Digital And Web Analytics?
Jun 9, 2021
1 min read
The Infrastructure of an AI Factory
Techfastly
Article
The Infrastructure of an AI Factory
Mar 3, 2021
Data is a crucial element for machine learning algorithms. It can be considered as a fuel of AI factories. Collection of useful data and feeding it into frameworks and models is the foremost step. Data acts as a case or example that the algorithms re
1 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
Q&A
Rotman Management
Article
Q&A
May 1, 2023
Describe the capability that companies like Netflix, UPS, Amazon and Caesars Entertainment have in common. These are all leading firms in their industries with respect to leveraging analytics as a source of competitive advantage. We now have so much
7 min read
How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
How Spooky Science Helps Us Peer Inside The Planets
All About Space
Article
How Spooky Science Helps Us Peer Inside The Planets
Dec 3, 2020
An assistant professor of computational science at the EPFL research centre in Lausanne, Switzerland, involved in the current research on metallic hydrogen. Could you explain how the machine-learning techniques used in your research work? Why were th
1 min read
Leadership Forum: Making Digital Transformation A Reality
Rotman Management
Article
Leadership Forum: Making Digital Transformation A Reality
Jan 1, 2018
Glenda Crisp Senior Vice President and Chief Data Officer, TD Bank Group + Connie Bonello Associate Partner, Financial Services, IBM Canada IN MOST OF TODAY’S ORGANIZATIONS, data underpins every transaction, operation and interaction. And yet, the ab
8 min read
Machine Learning And Investing: The Cautious Seldom Err Or Write Great Poetry
Finweek - English
Article
Machine Learning And Investing: The Cautious Seldom Err Or Write Great Poetry
Oct 18, 2019
5 min read
Better Together: Behavioural Science + Data Science
Rotman Management
Article
Better Together: Behavioural Science + Data Science
May 1, 2020
IMAGINE THIS SCENARIO: You are designing a new customer experience to drive a shift in customer behaviour. You have reviewed the reports and dashboards describing current behaviour. You have asked customers how they felt and incorporated their feedba
5 min read
Are You Making the Most of Your Data?
Rotman Management
Article
Are You Making the Most of Your Data?
Jan 1, 2023
8 min read
How To Train Computers Faster For ‘Extreme’ Datasets
Futurity
Article
How To Train Computers Faster For ‘Extreme’ Datasets
Dec 12, 2019
4 min read
The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
Pivoting To First-party Data
NZ Marketing
Article
Pivoting To First-party Data
Jun 9, 2021
5 min read
A Method Of Foresight In The Field Of Strategy Innovation
The European Business Review
Article
A Method Of Foresight In The Field Of Strategy Innovation
Feb 25, 2021
This research consisted of two phases. Phase 1. Information gathering: systematic collection of relevant signals and trends, informed by academic knowledge and consulting company reports. Phase 2. Diagnosis is a three-step exercise: In-Depth Ana
1 min read
Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
There’s A New Career In Town
True Love
Article
There’s A New Career In Town
Oct 21, 2019
2 min read
CULTURE SHIFT – An Indispensable Shift To Building An AI-Powered Organisation
Techfastly
Article
CULTURE SHIFT – An Indispensable Shift To Building An AI-Powered Organisation
May 3, 2021
5 min read
Deconstructing Management Analytics
Rotman Management
Article
Deconstructing Management Analytics
Sep 1, 2022
7 min read
Essentially There Many Fundamental Questions
The European Business Review
Article
Essentially There Many Fundamental Questions
May 25, 2021
1 What we are trying to assess? The answer appears to be no: selectors are still interested in an individual’s ability, personality and motivation as well as their integrity and health. Whilst new concepts appear every so often (e.g agility, resilien
1 min read
Analytics Adolescent Or Mature Metrics Master?
NZ Marketing
Article
Analytics Adolescent Or Mature Metrics Master?
Jun 25, 2020
An analytics company that specialise in advanced web analytics, Absolute Analytics, operates on the premise that every business has the right to actionable data and measurable results. From audit stage and set-up to training and reporting, these Alph
4 min read
The Era of Human + Machine Innovation
Rotman Management
Article
The Era of Human + Machine Innovation
Jan 1, 2019
Interview by Karen Christensen In today's environment, organizations that don't keep up with customers' evolving needs are doomed. What is the best way to get a handle on these evolving needs? The first step in understanding your customers is to acce
5 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Federated Learning Uses The Data Right On Our Devices
Futurity
Article
Federated Learning Uses The Data Right On Our Devices
Jul 21, 2022
2 min read
Data Analytics: From Bias to Better Decisions
Rotman Management
Article
Data Analytics: From Bias to Better Decisions
Sep 1, 2018
7 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Methodology
India Today
Article
Methodology
Aug 29, 2020
2 min read
WHAT EVERY MANAGER SHOULD KNOW ABOUT HUMAN-CENTERED AI: A Manager’s Introduction to Human-Centered Artificial Intelligence
The European Business Review
Article
WHAT EVERY MANAGER SHOULD KNOW ABOUT HUMAN-CENTERED AI: A Manager’s Introduction to Human-Centered Artificial Intelligence
Dec 3, 2019
9 min read

Related categories

Skip carousel

Reviews for R

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

R - Brett Lantz

R: Unleash Machine Learning Techniques

Credits

Preface

What this learning path covers

What you need for this learning path

Who this learning path is for

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

I. Module 1

1. Getting Started with R and Machine Learning

Delving into the basics of R

Using R as a scientific calculator

Operating on vectors

Special values

Data structures in R

Vectors

Creating vectors

Indexing and naming vectors

Arrays and matrices

Creating arrays and matrices

Names and dimensions

Matrix operations

Lists

Creating and indexing lists

Combining and converting lists

Data frames

Creating data frames

Operating on data frames

Working with functions

Built-in functions

User-defined functions

Passing functions as arguments

Controlling code flow

Working with if, if-else, and ifelse

Working with switch

Loops

Advanced constructs

lapply and sapply

apply

tapply

mapply

Next steps with R

Getting help

Handling packages

Machine learning basics

Machine learning – what does it really mean?

Machine learning – how is it used in the world?

Types of machine learning algorithms

Supervised machine learning algorithms

Unsupervised machine learning algorithms

Popular machine learning packages in R

Summary

2. Let's Help Machines Learn

Understanding machine learning

Algorithms in machine learning

Perceptron

Families of algorithms

Supervised learning algorithms

Linear regression

K-Nearest Neighbors (KNN)

Collecting and exploring data

Normalizing data

Creating training and test data sets

Learning from data/training the model

Evaluating the model

Unsupervised learning algorithms

Apriori algorithm

K-Means

Summary

3. Predicting Customer Shopping Trends with Market Basket Analysis

Detecting and predicting trends

Market basket analysis

What does market basket analysis actually mean?

Core concepts and definitions

Techniques used for analysis

Making data driven decisions

Evaluating a product contingency matrix

Getting the data

Analyzing and visualizing the data

Global recommendations

Advanced contingency matrices

Frequent itemset generation

Getting started

Data retrieval and transformation

Building an itemset association matrix

Creating a frequent itemsets generation workflow

Detecting shopping trends

Association rule mining

Loading dependencies and data

Exploratory analysis

Detecting and predicting shopping trends

Visualizing association rules

Summary

4. Building a Product Recommendation System

Understanding recommendation systems

Issues with recommendation systems

Collaborative filters

Core concepts and definitions

The collaborative filtering algorithm

Predictions

Recommendations

Similarity

Building a recommender engine

Matrix factorization

Implementation

Result interpretation

Production ready recommender engines

Extract, transform, and analyze

Model preparation and prediction

Model evaluation

Summary

5. Credit Risk Detection and Prediction – Descriptive Analytics

Types of analytics

Our next challenge

What is credit risk?

Getting the data

Data preprocessing

Dealing with missing values

Datatype conversions

Data analysis and transformation

Building analysis utilities

Analyzing the dataset

Saving the transformed dataset

Next steps

Feature sets

Machine learning algorithms

Summary

6. Credit Risk Detection and Prediction – Predictive Analytics

Predictive analytics

How to predict credit risk

Important concepts in predictive modeling

Preparing the data

Building predictive models

Evaluating predictive models

Getting the data

Data preprocessing

Feature selection

Modeling using logistic regression

Modeling using support vector machines

Modeling using decision trees

Modeling using random forests

Modeling using neural networks

Model comparison and selection

Summary

7. Social Media Analysis – Analyzing Twitter Data

Social networks (Twitter)

Data mining @social networks

Mining social network data

Data and visualization

Word clouds

Treemaps

Pixel-oriented maps

Other visualizations

Getting started with Twitter APIs

Overview

Registering the application

Connect/authenticate

Extracting sample tweets

Twitter data mining

Frequent words and associations

Popular devices

Hierarchical clustering

Topic modeling

Challenges with social network data mining

References

Summary

8. Sentiment Analysis of Twitter Data

Understanding Sentiment Analysis

Key concepts of sentiment analysis

Subjectivity

Sentiment polarity

Opinion summarization

Feature extraction

Approaches

Applications

Challenges

Sentiment analysis upon Tweets

Polarity analysis

Classification-based algorithms

Labeled dataset

Support Vector Machines

Ensemble methods

Boosting

Cross-validation

Summary

II. Module 2

1. Introducing Machine Learning

The origins of machine learning

Uses and abuses of machine learning

Machine learning successes

The limits of machine learning

Machine learning ethics

How machines learn

Data storage

Abstraction

Generalization

Evaluation

Machine learning in practice

Types of input data

Types of machine learning algorithms

Matching input data to algorithms

Machine learning with R

Installing R packages

Loading and unloading R packages

Summary

2. Managing and Understanding Data

R data structures

Vectors

Factors

Lists

Data frames

Matrixes and arrays

Managing data with R

Saving, loading, and removing R data structures

Importing and saving data from CSV files

Exploring and understanding data

Exploring the structure of data

Exploring numeric variables

Measuring the central tendency – mean and median

Measuring spread – quartiles and the five-number summary

Visualizing numeric variables – boxplots

Visualizing numeric variables – histograms

Understanding numeric data – uniform and normal distributions

Measuring spread – variance and standard deviation

Exploring categorical variables

Measuring the central tendency – the mode

Exploring relationships between variables

Visualizing relationships – scatterplots

Examining relationships – two-way cross-tabulations

Summary

3. Lazy Learning – Classification Using Nearest Neighbors

Understanding nearest neighbor classification

The k-NN algorithm

Measuring similarity with distance

Choosing an appropriate k

Preparing data for use with k-NN

Why is the k-NN algorithm lazy?

Example – diagnosing breast cancer with the k-NN algorithm

Step 1 – collecting data

Step 2 – exploring and preparing the data

Transformation – normalizing numeric data

Data preparation – creating training and test datasets

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Transformation – z-score standardization

Testing alternative values of k

Summary

4. Probabilistic Learning – Classification Using Naive Bayes

Understanding Naive Bayes

Basic concepts of Bayesian methods

Understanding probability

Understanding joint probability

Computing conditional probability with Bayes' theorem

The Naive Bayes algorithm

Classification with Naive Bayes

The Laplace estimator

Using numeric features with Naive Bayes

Example – filtering mobile phone spam with the Naive Bayes algorithm

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – cleaning and standardizing text data

Data preparation – splitting text documents into words

Data preparation – creating training and test datasets

Visualizing text data – word clouds

Data preparation – creating indicator features for frequent words

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Summary

5. Divide and Conquer – Classification Using Decision Trees and Rules

Understanding decision trees

Divide and conquer

The C5.0 decision tree algorithm

Choosing the best split

Pruning the decision tree

Example – identifying risky bank loans using C5.0 decision trees

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – creating random training and test datasets

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Boosting the accuracy of decision trees

Making mistakes more costlier than others

Understanding classification rules

Separate and conquer

The 1R algorithm

The RIPPER algorithm

Rules from decision trees

What makes trees and rules greedy?

Example – identifying poisonous mushrooms with rule learners

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Summary

6. Forecasting Numeric Data – Regression Methods

Understanding regression

Simple linear regression

Ordinary least squares estimation

Correlations

Multiple linear regression

Example – predicting medical expenses using linear regression

Step 1 – collecting data

Step 2 – exploring and preparing the data

Exploring relationships among features – the correlation matrix

Visualizing relationships among features – the scatterplot matrix

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Model specification – adding non-linear relationships

Transformation – converting a numeric variable to a binary indicator

Model specification – adding interaction effects

Putting it all together – an improved regression model

Understanding regression trees and model trees

Adding regression to trees

Example – estimating the quality of wines with regression trees and model trees

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Visualizing decision trees

Step 4 – evaluating model performance

Measuring performance with the mean absolute error

Step 5 – improving model performance

Summary

7. Black Box Methods – Neural Networks and Support Vector Machines

Understanding neural networks

From biological to artificial neurons

Activation functions

Network topology

The number of layers

The direction of information travel

The number of nodes in each layer

Training neural networks with backpropagation

Example – Modeling the strength of concrete with ANNs

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Understanding Support Vector Machines

Classification with hyperplanes

The case of linearly separable data

The case of nonlinearly separable data

Using kernels for non-linear spaces

Example – performing OCR with SVMs

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Summary

8. Finding Patterns – Market Basket Analysis Using Association Rules

Understanding association rules

The Apriori algorithm for association rule learning

Measuring rule interest – support and confidence

Building a set of rules with the Apriori principle

Example – identifying frequently purchased groceries with association rules

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – creating a sparse matrix for transaction data

Visualizing item support – item frequency plots

Visualizing the transaction data – plotting the sparse matrix

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Sorting the set of association rules

Taking subsets of association rules

Saving association rules to a file or data frame

Summary

9. Finding Groups of Data – Clustering with k-means

Understanding clustering

Clustering as a machine learning task

The k-means clustering algorithm

Using distance to assign and update clusters

Choosing the appropriate number of clusters

Example – finding teen market segments using k-means clustering

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – dummy coding missing values

Data preparation – imputing the missing values

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Summary

10. Evaluating Model Performance

Measuring performance for classification

Working with classification prediction data in R

A closer look at confusion matrices

Using confusion matrices to measure performance

Beyond accuracy – other measures of performance

The kappa statistic

Sensitivity and specificity

Precision and recall

The F-measure

Visualizing performance trade-offs

ROC curves

Estimating future performance

The holdout method

Cross-validation

Bootstrap sampling

Summary

11. Improving Model Performance

Tuning stock models for better performance

Using caret for automated parameter tuning

Creating a simple tuned model

Customizing the tuning process

Improving model performance with meta-learning

Understanding ensembles

Bagging

Boosting

Random forests

Training random forests

Evaluating random forest performance

Summary

12. Specialized Machine Learning Topics

Working with proprietary files and databases

Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files

Querying data in SQL databases

Working with online data and services

Downloading the complete text of web pages

Scraping data from web pages

Parsing XML documents

Parsing JSON from web APIs

Working with domain-specific data

Analyzing bioinformatics data

Analyzing and visualizing network data

Improving the performance of R

Managing very large datasets

Generalizing tabular data structures with dplyr

Making data frames faster with data.table

Creating disk-based data frames with ff

Using massive matrices with bigmemory

Learning faster with parallel computing

Measuring execution time

Working in parallel with multicore and snow

Taking advantage of parallel with foreach and doParallel

Parallel cloud computing with MapReduce and Hadoop

GPU computing

Deploying optimized learning algorithms

Building bigger regression models with biglm

Growing bigger and faster random forests with bigrf

Training and evaluating models in parallel with caret

Summary

III. Module 3

1. A Process for Success

The process

Business understanding

Identify the business objective

Assess the situation

Determine the analytical goals

Produce a project plan

Data understanding

Data preparation

Modeling

Evaluation

Deployment

Algorithm flowchart

Summary

2. Linear Regression – The Blocking and Tackling of Machine Learning

Univariate linear regression

Business understanding

Multivariate linear regression

Business understanding

Data understanding and preparation

Modeling and evaluation

Other linear model considerations

Qualitative feature

Interaction term

Summary

3. Logistic Regression and Discriminant Analysis

Classification methods and linear regression

Logistic regression

Business understanding

Data understanding and preparation

Modeling and evaluation

The logistic regression model

Logistic regression with cross-validation

Discriminant analysis overview

Discriminant analysis application

Model selection

Summary

4. Advanced Feature Selection in Linear Models

Regularization in a nutshell

Ridge regression

LASSO

Elastic net

Business case

Business understanding

Data understanding and preparation

Modeling and evaluation

Best subsets

Ridge regression

LASSO

Elastic net

Cross-validation with glmnet

Model selection

Summary

5. More Classification Techniques – K-Nearest Neighbors and Support Vector Machines

K-Nearest Neighbors

Support Vector Machines

Business case

Business understanding

Data understanding and preparation

Modeling and evaluation

KNN modeling

SVM modeling

Model selection

Feature selection for SVMs

Summary

6. Classification and Regression Trees

Introduction

An overview of the techniques

Regression trees

Classification trees

Random forest

Gradient boosting

Business case

Modeling and evaluation

Regression tree

Classification tree

Random forest regression

Random forest classification

Gradient boosting regression

Gradient boosting classification

Model selection

Summary

7. Neural Networks

Neural network

Deep learning, a not-so-deep overview

Business understanding

Data understanding and preparation

Modeling and evaluation

An example of deep learning

H2O background

Data preparation and uploading it to H2O

Create train and test datasets

Modeling

Summary

8. Cluster Analysis

Hierarchical clustering

Distance calculations

K-means clustering

Gower and partitioning around medoids

Gower

PAM

Business understanding

Data understanding and preparation

Modeling and evaluation

Hierarchical clustering

K-means clustering

Clustering with mixed data

Summary

9. Principal Components Analysis

An overview of the principal components

Rotation

Business understanding

Data understanding and preparation

Modeling and evaluation

Component extraction

Orthogonal rotation and interpretation

Creating factor scores from the components

Regression analysis

Summary

10. Market Basket Analysis and Recommendation Engines

An overview of a market basket analysis

Business understanding

Data understanding and preparation

Modeling and evaluation

An overview of a recommendation engine

User-based collaborative filtering

Item-based collaborative filtering

Singular value decomposition and principal components analysis

Business understanding and recommendations

Data understanding, preparation, and recommendations

Modeling, evaluation, and recommendations

Summary

11. Time Series and Causality

Univariate time series analysis

Bivariate regression

Granger causality

Business understanding

Data understanding and preparation

Modeling and evaluation

Univariate time series forecasting

Time series regression

Examining the causality

Summary

12. Text Mining

Text mining framework and methods

Topic models

Other quantitative analyses

Business understanding

Data understanding and preparation

Modeling and evaluation

Word frequency and topic models

Additional quantitative analysis

Summary

A. R Fundamentals

Introduction

Getting R up and running

Using R

Data frames and matrices

Summary stats

Installing and loading the R packages

Summary

A. Bibliography

Index

R: Unleash Machine Learning Techniques

Find out how to build smarter machine learning systems with R. Follow this three module course to become a more fluent machine learning practitioner

A course in three modules

BIRMINGHAM - MUMBAI

R: Unleash Machine Learning Techniques

All rights reserved. No part of this course may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this course to ensure the accuracy of the information presented. However, the information contained in this course is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this course.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this course by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Published on: September 2016

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78712-734-0

www.packtpub.com

Credits

Authors

Raghav Bali

Dipanjan Sarkar

Brett Lantz

Cory Lesmeister

Reviewers

Alexey Grigorev

Vijayakumar Nattamai Jawaharlal

Kent S. Johnson

Mzabalazo Z. Ngwenya

Anuj Saxena

Vikram Dhillon

Miro Kopecky

Pavan Narayanan

Doug Ortiz

Shivani Rao, PhD

Content Development Editor

Parshva Sheth

Graphics

Abhinash Sahu

Production Coordinator

Melwyn Dsa

Preface

Machine learning is a very broad topic. The following quote sums it up nicely: The first problem facing you is the bewildering variety of learning algorithms available. Which one to use? There are literally thousands available, and hundreds more are published each year. (Domingo, P., 2012.) It would therefore be irresponsible to try and cover everything in the chapters that follow because, to paraphrase Frederick the Great, we would achieve nothing. With this constraint in mind, we hope to provide a solid foundation of algorithms and business considerations that will allow the reader to walk away and, first of all, take on any machine learning tasks with complete confidence, and secondly, be able to help themselves in figuring out other algorithms and topics. Essentially, if this course significantly helps you to help yourself, then I would consider this a victory. Don't think of this course as a destination but rather, as a path to self-discovery.

What this learning path covers

Module 1, R Machine Learning By Example, Data science and machine learning are some of the top buzzwords in the technical world today. From retail stores to Fortune 500 companies, everyone is working hard to make machine learning give them data-driven insights to grow their businesses. With powerful data manipulation features, machine learning packages, and an active developer community, R empowers users to build sophisticated machine learning systems to solve real-world data problems. This module takes you on a data-driven journey that starts with the very basics of R and machine learning and gradually builds upon the concepts to work on projects that tackle real-world problems.

Module 2, Machine Learning with R, Machine learning, at its core, is concerned with the algorithms that transform information into actionable intelligence. This fact makes machine learning well-suited to the present-day era of big data. Without machine learning, it would be nearly impossible to keep up with the massive stream of information. Given the growing prominence of R—a cross-platform, zero-cost statistical programming environment—there has never been a better time to start using machine learning. R offers a powerful but easy-to-learn set of tools that canassist you with finding data insights. By combining hands-on case studies with the essential theory that you need to understand how things work under the hood, this book provides all the knowledge that you will need to start applying machine learning to your own projects.

Module 3 Mastering Machine Learning with R, The world of R can be as bewildering as the world of machine learning! There is seemingly an endless number of R packages with a plethora of blogs, websites, discussions, and papers of various quality and complexity from the community that supports R. This is a great reservoir of information and probably R's greatest strength, but I've always believed that an entity's greatest strength can also be its greatest weakness. R's vast community of knowledge can quickly overwhelm and/or sidetrack you and your efforts. Show me a problem and give me ten different R programmers and I'll show you ten different ways the code is written to solve the problem. As I've written each chapter, I've endeavored to capture the critical elements that can assist you in using R to understand, prepare, and model the data. I am no R programming expert by any stretch of the imagination, but again, I like to think that I can provide a solid foundation herein. Another thing that lit a fire under me to write this book was an incident that happened in the hallways of a former employer a couple of years ago. My team had an IT contractor to support the management of our databases. As we were walking and chatting about big data and the like, he mentioned that he had bought a book about machine learning with R and another about machine learning with Python. He stated that he could do all the programming, but all of the statistics made absolutely no sense to him. I have always kept this conversation at the back of my mind throughout the writing process. It has been a very challenging task to balance the technical and theoretical with the practical. One could, and probably someone has, turned the theory of each chapter to its own book. I used a heuristic of sorts to aid me in deciding whether a formula or technical aspect was in the scope, which was would this help me or the readers in the discussions with team members and business leaders? If I felt it might help, I would strive to provide the necessary details. I also made a conscious effort to keep the datasets used in the practical exercises large enough to be interesting but small enough to allow you to gain insight without becoming overwhelmed.

This book is not about big data, but make no mistake about it, the methods and concepts that we will discuss can be scaled to big data. In short, this module will appeal to a broad group of individuals, from IT experts seeking to understand and interpret machine learning algorithms to statistical gurus desiring to incorporate the power of R into their analysis. However, even those that are well-versed in both IT and statistics—experts if you will—should be able to pick up quite a few tips and tricks to assist them in their efforts.

What you need for this learning path

This software applies to all the chapters of the book:

Windows / Mac OS X / Linux

R 3.2.0 (or higher)

RStudio Desktop 0.99 (or higher)

For hardware, there are no specific requirements, since R can run on any PC that has Mac, Linux, or Windows, but a physical memory of minimum 4 GB is preferred to run some of the iterative algorithms smoothly.

Who this learning path is for

Aimed for intermediate-to-advanced people (especially data scientist) who are already into the field of data science

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <feedback@packtpub.com>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the SUPPORT tab at the top.

Click on Code Downloads & Errata.

Enter the name of the book in the Search box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at

https://github.com/PacktPublishing/R-Maching-Learning-Techniques

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <questions@packtpub.com>, and we will do our best to address the problem.

Part I. Module 1

R Machine Learning By Example

Understand the fundamentals of machine learning with R and build your own dynamic algorithms to tackle complicated real-world problems successfully

Chapter 1. Getting Started with R and Machine Learning

This introductory chapter will get you started with the basics of R which include various constructs, useful data structures, loops and vectorization. If you are already an R wizard, you can skim through these sections and dive right into the next part which talks about what machine learning actually represents as a domain and the main areas it encompasses. We will also talk about different machine learning techniques and algorithms used in each area. Finally, we will conclude by looking at some of the most popular machine learning packages in R, some of which we will be using in the subsequent chapters.

If you are a data or machine learning enthusiast, surely you would have heard by now that being a data scientist is referred to as the sexiest job of the 21st century by Harvard Business Review.

Note

Reference: https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/

There is a huge demand in the current market for data scientists, primarily because their main job is to gather crucial insights and information from both unstructured and structured data to help their business and organization grow strategically.

Some of you might be wondering how machine learning or R relate to all this! Well, to be a successful data scientist, one of the major tools you need in your toolbox is a powerful language capable of performing complex statistical calculations and working with various types of data and building models which help you get previously unknown insights and R is the perfect language for that! Machine learning forms the foundation of the skills you need to build to become a data analyst or data scientist, this includes using various techniques to build models to get insights from data.

This book will provide you with some of the essential tools you need to be well versed with both R and machine learning by not only looking at concepts but also applying those concepts in real-world examples. Enough talk; now let's get started on our journey into the world of machine learning with R!

In this chapter, we will cover the following aspects:

Delving into the basics of R

Understanding the data structures in R

Working with functions

Controlling code flow

Taking further steps with R

Understanding machine learning basics

Familiarizing yourself with popular machine learning packages in R

Delving into the basics of R

It is assumed here that you are at least familiar with the basics of R or have worked with R before. Hence, we won't be talking much about downloading and installations. There are plenty of resources on the web which provide a lot of information on this. I recommend that you use RStudio which is an Integrated Development Environment (IDE), which is much better than the base R Graphical User Interface (GUI). You can visit https://www.rstudio.com/ to get more information about it.

Note

For details about the R project, you can visit https://www.r-project.org/ to get an overview of the language. Besides this, R has a vast arsenal of wonderful packages at its disposal and you can view everything related to R and its packages at https://cran.r-project.org/ which contains all the archives.

You must already be familiar with the R interactive interpreter, often called a Read-Evaluate-Print Loop (REPL). This interpreter acts like any command line interface which asks for input and starts with a > character, which indicates that R is waiting for your input. If your input spans multiple lines, like when you are writing a function, you will see a + prompt in each subsequent line, which means that you didn't finish typing the complete expression and R is asking you to provide the rest of the expression.

It is also possible for R to read and execute complete files containing commands and functions which are saved in files with an .R extension. Usually, any big application consists of several .R files. Each file has its own role in the application and is often called as a module. We will be exploring some of the main features and capabilities of R in the following sections.

Using R as a scientific calculator

The most basic constructs in R include variables and arithmetic operators which can be used to perform simple mathematical operations like a calculator or even complex statistical calculations.

> 5 + 6 [1] 11 > 3 * 2 [1] 6 > 1 / 0 [1] Inf

Remember that everything in R is a vector. Even the output results indicated in the previous code snippet. They have a leading [1] symbol indicating it is a vector of size 1.

You can also assign values to variables and operate on them just like any other programming language.

> num <- 6 > num ^ 2 [1] 36 > num [1] 6 # a variable changes value only on re-assignment > num <- num ^ 2 * 5 + 10 / 3 > num [1] 183.3333

Operating on vectors

The most basic data structure in R is a vector. Basically, anything in R is a vector, even if it is a single number just like we saw in the earlier example! A vector is basically a sequence or a set of values. We can create vectors using the : operator or the c function which concatenates the values to create a vector.

> x <- 1:5 > x [1] 1 2 3 4 5 > y <- c(6, 7, 8 ,9, 10) > y [1] 6 7 8 9 10 > z <- x + y > z [1] 7 9 11 13 15

You can clearly in the previous code snippet, that we just added two vectors together without using any loop, using just the + operator. This is known as vectorization and we will be discussing more about this later on. Some more operations on vectors are shown next:

> c(1,3,5,7,9) * 2 [1] 2 6 10 14 18 > c(1,3,5,7,9) * c(2, 4) [1] 2 12 10 28 18 # here the second vector gets recycled

Output:

> factorial(1:5) [1] 1 2 6 24 120 > exp(2:10) # exponential function [1] 7.389056 20.085537 54.598150 148.413159 403.428793 1096.633158 [7] 2980.957987 8103.083928 22026.465795 > cos(c(0, pi/4)) # cosine function [1] 1.0000000 0.7071068 > sqrt(c(1, 4, 9, 16)) [1] 1 2 3 4 > sum(1:10) [1] 55

You might be confused with the second operation where we tried to multiply a smaller vector with a bigger vector but we still got a result! If you look closely, R threw a warning also. What happened in this case is, since the two vectors were not equal in size, the smaller vector in this case c(2, 4) got recycled or repeated to become c(2, 4, 2, 4, 2) and then it got multiplied with the first vector c(1, 3, 5, 7 ,9) to give the final result vector, c(2, 12, 10, 28, 18). The other functions mentioned here are standard functions available in base R along with several other functions.

Tip

Downloading the example code

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the SUPPORT tab at the top

Click on Code Downloads & Errata

Enter the name of the book in the Search box

Select the book for which you're looking to download the code files

Choose from the drop-down menu where you purchased this book from

Click on Code Download

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

Special values

Since you will be dealing with a lot of messy and dirty data in data analysis and machine learning, it is important to remember some of the special values in R so that you don't get too surprised later on if one of them pops up.

> 1 / 0 [1] Inf > 0 / 0 [1] NaN > Inf / NaN [1] NaN > Inf / Inf [1] NaN > log(Inf) [1] Inf > Inf + NA [1] NA

The main values which should concern you here are Inf which stands for Infinity, NaN which is Not a Number, and NA which indicates a value that is missing or Not Available. The following code snippet shows some logical tests on these special values and their results. Do remember that TRUE and FALSE are logical data type values, similar to other programming languages.

> vec <- c(0, Inf, NaN, NA) > is.finite(vec) [1] TRUE FALSE FALSE FALSE > is.nan(vec) [1] FALSE FALSE TRUE FALSE > is.na(vec) [1] FALSE FALSE TRUE TRUE > is.infinite(vec) [1] FALSE TRUE FALSE FALSE

The functions are pretty self-explanatory from their names. They clearly indicate which values are finite, which are finite and checks for NaN and NA values respectively. Some of these functions are very useful when cleaning dirty data.

Data structures in R

Here we will be looking at the most useful data structures which exist in R and use using them on some fictional examples to get a better grasp on their syntax and constructs. The main data structures which we will be covering here include:

Vectors

Arrays and matrices

Lists

Data frames

These data structures are used widely inside R as well as by various R packages and functions, including machine learning functions and algorithms which we will be using in the subsequent chapters. So it is essential to know how to use these data structures to work with data efficiently.

Vectors

Just like we mentioned briefly in the previous sections, vectors are the most basic data type inside R. We use vectors to represent anything, be it input or output. We previously saw how we create vectors and apply mathematical operations on them. We will see some more examples here.

Creating vectors

Here we will look at ways to initialize vectors, some of which we had also worked on previously, using operators such as : and functions such as c. In the following code snippet, we will use the seq family of functions to initialize vectors in different ways.

> c(2.5:4.5, 6, 7, c(8, 9, 10), c(12:15)) [1] 2.5 3.5 4.5 6.0 7.0 8.0 9.0 10.0 12.0 13.0 14.0 15.0 > vector(numeric, 5) [1] 0 0 0 0 0 > vector(logical, 5) [1] FALSE FALSE FALSE FALSE FALSE > logical(5) [1] FALSE FALSE FALSE FALSE FALSE > # seq is a function which creates sequences > seq.int(1,10) [1] 1 2 3 4 5 6 7 8 9 10 > seq.int(1,10,2) [1] 1 3 5 7 9 > seq_len(10) [1] 1 2 3 4 5 6 7 8 9 10

Indexing and naming vectors

One of the most important operations we can do on vectors involves subsetting and indexing vectors to access specific elements which are often useful when we want to run some code only on specific data points. The following examples show some ways in which we can index and subset vectors:

> vec <- c(R, Python, Julia, Haskell, Java, Scala) > vec[1] [1] R > vec[2:4] [1] Python Julia Haskell > vec[c(1, 3, 5)] [1] R Julia Java > nums <- c(5, 8, 10, NA, 3, 11) > nums [1] 5 8 10 NA 3 11 > which.min(nums) # index of the minimum element [1] 5 > which.max(nums) # index of the maximum element [1] 6 > nums[which.min(nums)] # the actual minimum element [1] 3 > nums[which.max(nums)] # the actual maximum element [1] 11

Now we look at how we can name vectors. This is basically a nifty feature in R where you can label each element in a vector to make it more readable or easy to interpret. There are two ways this can be done, which are shown in the following examples:

> c(first=1, second=2, third=3, fourth=4, fifth=5)

Output:

> positions <- c(1, 2, 3, 4, 5) > names(positions) NULL > names(positions) <- c(first, second, third, fourth, fifth) > positions

Output:

> names(positions) [1] first second third fourth fifth > positions[c(second, fourth)]

Output:

Thus, you can see, it becomes really useful to annotate and name vectors sometimes, and we can also subset and slice vectors using element names rather than values.

Arrays and matrices

Vectors are one dimensional data structures, which means that they just have one dimension and we can get the number of elements they have using the length property. Do remember that arrays may also have a similar meaning in other programming languages, but in R they have a slightly different meaning. Basically, arrays in R are data structures which hold the data having multiple dimensions. Matrices are just a special case of generic arrays having two dimensions, namely represented by properties rows and columns. Let us look at some examples in the following code snippets in the accompanying subsection.

Creating arrays and matrices

First we will create an array that has three dimensions. Now it is easy to represent two dimensions in your screen, but to go one dimension higher, there are special ways in which R transforms the data. The following example shows how R fills the data (column first) in each dimension and shows the final output for a 4x3x3 array:

> three.dim.array <- array( + 1:32, # input data + dim = c(4, 3, 3), # dimensions + dimnames = list( # names of dimensions + c(row1, row2, row3, row4), + c(col1, col2, col3), + c(first.set, second.set, third.set) + ) + ) > three.dim.array

Output:

Like I mentioned earlier, a matrix is just a special case of an array. We can create a matrix using the matrix function, shown in detail in the following example. Do note that we use the parameter byrow to fill the data row-wise in the matrix instead of R's default column-wise fill in any array or matrix. The ncol and nrow parameters stand for number of columns and rows respectively.

> mat <- matrix( + 1:24, # data + nrow = 6, # num of rows + ncol = 4, # num of columns + byrow = TRUE, # fill the elements row-wise + ) > mat

Output:

Names and dimensions

Just like we named vectors and accessed element names, will perform similar operations in the following code snippets. You have already seen the use of the dimnames parameter in the preceding examples. Let us look at some more examples as follows:

> dimnames(three.dim.array)

Output:

> rownames(three.dim.array) [1] row1 row2 row3 row4 > colnames(three.dim.array) [1] col1 col2 col3 > dimnames(mat) NULL > rownames(mat) NULL > rownames(mat) <- c(r1, r2, r3, r4, r5, r6) > colnames(mat) <- c(c1, c2, c3, c4) > dimnames(mat)

Output:

> mat

Output:

To access details of dimensions related to arrays and matrices, there are special functions. The following examples show the same:

> dim(three.dim.array) [1] 4 3 3 > nrow(three.dim.array) [1] 4 > ncol(three.dim.array) [1] 3 > length(three.dim.array) # product of dimensions [1] 36 > dim(mat) [1] 6 4 > nrow(mat) [1] 6 > ncol(mat) [1] 4 > length(mat) [1] 24

Matrix operations

A lot of machine learning and optimization algorithms deal with matrices as their input data. In the following section, we will look at some examples of the most common operations on matrices.

We start by initializing two matrices and then look at ways of combining the two matrices using functions such as c which returns a vector, rbind which combines the matrices by rows, and cbind which does the same by columns.

> mat1 <- matrix( + 1:15, + nrow = 5, + ncol = 3, + byrow = TRUE, + dimnames = list( + c(M1.r1, M1.r2, M1.r3, M1.r4, M1.r5) + ,c(M1.c1, M1.c2, M1.c3) + ) + ) > mat1

Output:

> mat2 <- matrix( + 16:30, + nrow = 5, + ncol = 3, + byrow = TRUE, + dimnames = list( + c(M2.r1, M2.r2, M2.r3, M2.r4, M2.r5), + c(M2.c1, M2.c2, M2.c3) + ) + ) > mat2

Output:

> rbind(mat1, mat2)

Output:

> cbind(mat1, mat2)

Output:

> c(mat1, mat2)

Output:

Now we look at some of the important arithmetic operations which can be performed on matrices. Most of them are quite self-explanatory from the following syntax:

> mat1 + mat2 # matrix addition

Output:

> mat1 * mat2 # element-wise multiplication

Output:

> tmat2 <- t(mat2) # transpose > tmat2

Output:

> mat1 %*% tmat2 # matrix inner product

Output:

> m <- matrix(c(5, -3, 2, 4, 12, -1, 9, 14, 7), nrow = 3, ncol = 3) > m

Output:

> inv.m <- solve(m) # matrix inverse > inv.m

Output:

> round(m %*% inv.m) # matrix * matrix_inverse = identity matrix

Output:

The preceding arithmetic operations are just some of the most popular ones amongst the vast number of functions and operators which can be applied to matrices. This becomes useful, especially in areas such as linear optimization.

Lists

Lists are a special case of vectors where each element in the vector can be of a different type of data structure or even simple data types. It is similar to the lists in the Python programming language in some aspects, if you have used it before, where the lists indicate elements which can be of different types and each have a specific index in the list. In R, each element of a list can be as simple as a single element or as complex as a whole matrix, a function, or even a vector of strings.

Creating and indexing lists

We will get started with looking at some common methods to create and initialize lists in the following examples. Besides that, we will also look at how we can access some of these list elements for further computations. Do remember that each element in a list can be a simple primitive data type or even complex data structures or functions.

> list.sample <- list( + 1:5,

+ c(first, second, third),

+ c(TRUE, FALSE, TRUE, TRUE), + cos, + matrix(1:9, nrow = 3, ncol = 3) + ) > list.sample

Output:

> list.with.names <- list( + even.nums = seq.int(2,10,2), + odd.nums = seq.int(1,10,2), + languages = c(R, Python, Julia, Java), + cosine.func = cos + ) > list.with.names

Output:

> list.with.names$cosine.func function (x) .Primitive(cos) > list.with.names$cosine.func(pi) [1] -1 > > list.sample[[4]] function (x) .Primitive(cos) > list.sample[[4]](pi) [1] -1 > > list.with.names$odd.nums [1] 1 3 5 7 9 > list.sample[[1]] [1] 1 2 3 4 5 > list.sample[[3]] [1] TRUE FALSE TRUE TRUE

You can see from the preceding examples how easy it is to access any element of the list and use it for further computations, such as the cos function.

Combining and converting lists

Now we will take a look at how to combine several lists together into one single list in the following examples:

> l1 <- list( + nums = 1:5, + chars = c(a, b, c, d, e), + cosine = cos + ) > l2 <- list( + languages = c(R, Python, Java), + months = c(Jan, Feb, Mar, Apr), + sine = sin + ) > # combining the lists now > l3 <- c(l1, l2) > l3

Output:

It is very easy to convert lists in to vectors and vice versa. The following examples show some common ways we can achieve this:

> l1 <- 1:5 > class(l1) [1] integer > list.l1 <- as.list(l1) > class(list.l1) [1] list > list.l1

Output:

> unlist(list.l1) [1] 1 2 3 4 5

Data frames

Data frames are special data structures which are typically used for storing data tables or data in the form of spreadsheets, where each column indicates a specific attribute or field and the rows consist of specific values for those columns. This data structure is extremely useful in working with datasets which usually have a lot of fields and attributes.

Creating data frames

We can create data frames easily using the data.frame function. We will look at some following examples to illustrate the same with some popular superheroes:

> df <- data.frame( + real.name = c(Bruce Wayne, Clark Kent, Slade Wilson, Tony Stark, Steve Rogers), + superhero.name = c(Batman, Superman, Deathstroke, Iron Man, Capt. America), + franchise = c(DC, DC, DC, Marvel, Marvel), + team = c(JLA, JLA, Suicide Squad, Avengers, Avengers), + origin.year = c(1939, 1938, 1980, 1963, 1941) + ) > df

Output:

> class(df) [1] data.frame > str(df)

Output:

> rownames(df) [1] 1 2 3 4 5 > colnames(df)

Output:

> dim(df) [1] 5 5

The str function talks in detail about the structure of the data frame where we see details about the data present in each column of the data frame. There are a lot of datasets readily available in R base which you can directly load and start using. One of them is shown next. The mtcars dataset has information about various automobiles, which was extracted from the Motor Trend U.S. Magazine of 1974.

> head(mtcars) # one of the datasets readily available in R

Output:

Operating on data frames

There are a lot of operations we can do on data frames, such as merging, combining, slicing, and transposing data frames. We will look at some of the important data frame operations in the following examples.

It is really easy to index and subset specific data inside data frames using simplex indexes and functions such as subset.

> df[2:4,]

Output:

> df[2:4, 1:2]

Output:

> subset(df, team==JLA, c(real.name, superhero.name, franchise))

Output:

> subset(df, team %in% c(Avengers,Suicide Squad), c(real.name, superhero.name, franchise))

Output:

We will now look at some more complex operations, such as combining and merging data frames.

> df1 <- data.frame( + id = c('emp001', 'emp003', 'emp007'), + name = c('Harvey Dent', 'Dick Grayson', 'James Bond'), + alias = c('TwoFace', 'Nightwing', 'Agent 007') + ) > > df2 <- data.frame( + id = c('emp001', 'emp003', 'emp007'), + location = c('Gotham City', 'Gotham City', 'London'), + speciality = c('Split Persona', 'Expert Acrobat', 'Gadget Master') + ) > df1

Output:

> df2

Output:

> rbind(df1, df2) # not possible since column names don't match Error in match.names(clabs, names(xi)) : names do not match previous names > cbind(df1, df2)

Output:

> merge(df1, df2, by=id)

Output:

From the preceding operations it is evident that rbind and cbind work in the same way as we saw previously with arrays and matrices. However, merge lets you merge the data frames in the same way as you join various tables in relational databases.

Working with functions

Next up, we will be looking at functions, which is a technique or methodology to easily structure and modularize your code, specifically lines of code which perform specific tasks, so that you can execute them whenever you need them without writing them again and again. In R, functions are basically treated as just another data type and you can assign functions, manipulate them as and when needed, and also pass them as arguments to other functions. We will be exploring all this in the following section.

Built-in functions

R consists of several functions which are available in the R-base package and, as you install more packages, you get more functionality, which is made available in the form of functions. We will look at a few built-in functions in the following examples:

> sqrt(5) [1] 2.236068 > sqrt(c(1,2,3,4,5,6,7,8,9,10)) [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 [8] 2.828427 3.000000 3.162278 > # aggregating functions > mean(c(1,2,3,4,5,6,7,8,9,10)) [1] 5.5 > median(c(1,2,3,4,5,6,7,8,9,10)) [1] 5.5

You can see from the preceding examples that functions such as mean, median, and sqrt are built-in and can be used anytime when you start R, without loading any other packages or defining the functions explicitly.

User-defined functions

The real power lies in the ability to define your own functions based on different operations and computations you want to perform on the data and making R execute those functions just in the way you intend them to work. Some illustrations are shown as follows:

square <- function(data){ return (data^2) } > square(5) [1] 25 > square(c(1,2,3,4,5)) [1] 1 4 9 16 25 point <- function(xval, yval){ return (c(x=xval,y=yval)) } > p1 <- point(5,6) > p2 <- point(2,3) > > p1 x y 5 6 > p2 x y 2 3

As we saw in the previous code snippet, we can define functions such as square which computes the square of a single number or even a vector of numbers using the same code. Functions such as point are useful to represent specific entities which represent points in the two-dimensional co-ordinate space. Now we will be looking at how to use the preceding functions together.

Passing functions as arguments

When you define any function, you can also pass other functions to it as arguments if you intend to use them inside your function to perform some complex computations. This reduces the complexity and redundancy of the code. The following example computes the Euclidean distance between two points using the square function defined earlier, which is passed as an argument:

> # defining the function euclidean.distance <- function(point1, point2, square.func){ distance <- sqrt( as.integer( square.func(point1['x'] - point2['x']) ) + as.integer( square.func(point1['y'] - point2['y']) ) ) return (c(distance=distance)) } > # executing the function, passing square as argument > euclidean.distance(point1 = p1, point2 = p2, square.func = square) distance 4.242641 > euclidean.distance(point1 = p2, point2 = p1, square.func = square) distance 4.242641 > euclidean.distance(point1 = point(10, 3), point2 = point(-4, 8), square.func = square) distance 14.86607

Thus, you can see that with functions you can define a specific function once and execute it as many times as you need.

Controlling code flow

This section covers areas related to controlling the execution of your code. Using specific constructs such as if-else and switch, you can execute code conditionally. Constructs like for, while, and repeat, and help in executing the same code multiple times which is also known as looping. We will be exploring all these constructs in the following section.

Working with if, if-else, and ifelse

There are several constructs which help us in executing code conditionally. This is especially useful when we don't want to execute a bunch of statements one after the other sequentially but execute the code only when it meets or does not meet specific conditions. The following examples illustrate the same:

> num = 5 > if (num == 5){ + cat('The number was 5') + } The number was 5 > > num = 7 > > if (num == 5){ + cat('The number was 5') + } else{ + cat('The number was not 5') + } The number was not 5 > > if (num == 5){ + cat('The number was 5') + } else if (num == 7){ + cat('The number was 7') + } else{ + cat('No match found') + } The number was 7 > ifelse(num == 5, Number was 5, Number was not 5) [1] Number was not 5

Working with switch

The switch function is especially useful when you have to match an expression or argument to several conditions and execute only if there is a specific match. This becomes extremely messy when implemented with the if-else constructs but is much more elegant with the switch function, as we will see next:

> switch( + first, + first = 1st, + second = 2nd, + third = 3rd, + No position + ) [1] 1st > > switch( + third, + first = 1st, + second = 2nd, + third = 3rd, + No position + ) [1] 3rd > # when no match, default statement executes > switch( + fifth, + first = 1st, + second = 2nd, + third = 3rd, + No position + ) [1] No position

Loops

Loops are an excellent way to execute code segments repeatedly when needed. Vectorization constructs are, however, more optimized than loops for working on larger data sets, but we will see that later in this chapter. For now, you should remember that there are three types of loops in R, namely, for, while, and repeat. We will look at all of them in the following examples:

> # for loop > for (i in 1:10){ + cat(paste(i, )) + } 1 2 3 4 5 6 7 8 9 10 > > sum = 0 > for (i in 1:10){ + sum <- sum + i + } > sum [1] 55 > > # while loop > count <- 1 > while (count <= 10){ + cat(paste(count, )) + count <- count + 1 + } 1 2 3 4 5 6 7 8 9 10 > > # repeat infinite loop > count = 1 > repeat{ + cat(paste(count, )) + if (count >= 10){ + break # break off from the infinite loop + } + count <- count + 1 + } 1 2 3 4 5 6 7 8 9 10

Advanced constructs

We heard the term vectorized earlier when we talked about operating on vectors without using loops. While looping is a great way to iterate through vectors and perform computations, it is not very efficient when we deal with what is known as Big Data. In this case, R provides some advanced constructs which we will be looking at in this section. We will be covering the following functions:

lapply: Loops over a list and evaluates a function on each element

sapply: A simplified version of lapply

apply: Evaluates a function on the boundaries or margins of an array

tapply: Evaluates a function over subsets of a vector

mapply: A multivariate version of lapply

lapply and sapply

Like we mentioned earlier, lapply takes a list and a function as input and evaluates that function over each element of the list. If the input list is not a list, it is converted into a list using the as.list function before the output is returned. It is much faster than a normal loop because the actual looping is done internally using C code. We look at its implementation and an example in the following code snippet:

> # lapply function definition > lapply function (X, FUN, ...) { FUN <- match.fun(FUN) if (!is.vector(X) || is.object(X)) X <- as.list(X) .Internal(lapply(X, FUN)) } > # example > nums <- list(l1=c(1,2,3,4,5,6,7,8,9,10), l2=1000:1020) > lapply(nums, mean)

Output:

Coming to sapply, it is similar to lapply except that it tries to simplify the results wherever possible. For example, if the final result is such that every element is of length 1, it returns a vector, if the length of every element in the result is the same but more than 1, a matrix is returned, and if it is not able to simplify the results, we get the same result as lapply. We illustrate the same with the following example:

> data <- list(l1=1:10, l2=runif(10), l3=rnorm(10,2)) > data

Output:

> > lapply(data, mean)

Output:

> sapply(data, mean)

Output:

apply

The apply function is used to evaluate a function over the margins or boundaries of an array; for instance, applying aggregate functions on the rows or columns of an array. The rowSums, rowMeans, colSums, and colMeans functions also use apply internally but are much more optimized and useful when operating on large arrays. We will see all the preceding constructs in the following example:

> mat <- matrix(rnorm(20), nrow=5, ncol=4) > mat

Output:

> # row sums > apply(mat, 1, sum) [1] 0.79786959 0.53900665 -2.36486927 -1.28221227 0.06701519 > rowSums(mat) [1] 0.79786959 0.53900665 -2.36486927 -1.28221227 0.06701519 > # row means > apply(mat, 1, mean) [1] 0.1994674 0.1347517 -0.5912173 -0.3205531 0.0167538 > rowMeans(mat) [1] 0.1994674 0.1347517 -0.5912173 -0.3205531 0.0167538 > > # col sums >

Enjoying the preview?

Page 1 of 1

R: Unleash Machine Learning Techniques

About this ebook

Brett Lantz

Read more from Brett Lantz

Related authors

Related to R

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for R

What did you think?

Book preview

R - Brett Lantz

Table of Contents

R: Unleash Machine Learning Techniques

R: Unleash Machine Learning Techniques

R: Unleash Machine Learning Techniques

Credits

Preface

What this learning path covers

What you need for this learning path

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Part I. Module 1

Chapter 1. Getting Started with R and Machine Learning

Note

Delving into the basics of R

Note

Using R as a scientific calculator

Operating on vectors

Tip

Special values

Data structures in R

Vectors

Creating vectors

Indexing and naming vectors

Arrays and matrices

Creating arrays and matrices

Names and dimensions

Matrix operations

Lists

Creating and indexing lists

Combining and converting lists

Data frames

Creating data frames

Operating on data frames

Working with functions

Built-in functions

User-defined functions

Passing functions as arguments

Controlling code flow

Working with if, if-else, and ifelse

Working with switch

Loops

Advanced constructs

lapply and sapply

apply