Ebook3,564 pages20 hours

R: Data Analysis and Visualization

Name: R: Data Analysis and Visualization
Brand: Packt Publishing
Rating: 5.0 (1 reviews)

By Brett Lantz, Jaynal Abedin, Hrishi V. Mittal and

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

This course is for data scientist or quantitative analyst who are looking at learning R and take advantage of its powerful analytical design framework. It’s a seamless journey in becoming a full-stack R developer.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateJun 24, 2016

ISBN9781786460486

Author

Brett Lantz

"Brett Lantz has spent the past 10 years using innovative data methods to understand human behavior. A sociologist by training, he was first enchanted by machine learning while studying a large database of teenagers' social networking website profiles. Since then, he has worked on interdisciplinary studies of cellular telephone calls, medical billing data, and philanthropic activity, among others. When he's not spending time with family, following college sports, or being entertained by his dachshunds, he maintains dataspelunking.com, a website dedicated to sharing knowledge about the search for insight in data."

Related to R

Related ebooks

Skip carousel

Mastering Predictive Analytics with R
Ebook
Mastering Predictive Analytics with R
byRui Miguel Forte
Rating: 4 out of 5 stars
4/5
Learning R Programming
Ebook
Learning R Programming
byKun Ren
Rating: 5 out of 5 stars
5/5
R: Recipes for Analysis, Visualization and Machine Learning
Ebook
R: Recipes for Analysis, Visualization and Machine Learning
byAtmajitsinh Gohil
Rating: 0 out of 5 stars
0 ratings
R Data Visualization Cookbook
Ebook
R Data Visualization Cookbook
byAtmajitsinh Gohil
Rating: 0 out of 5 stars
0 ratings
R Graphs Cookbook Second Edition
Ebook
R Graphs Cookbook Second Edition
byJaynal Abedin
Rating: 3 out of 5 stars
3/5
Learning Predictive Analytics with R
Ebook
Learning Predictive Analytics with R
byMayor Eric
Rating: 0 out of 5 stars
0 ratings
Python: Real-World Data Science
Ebook
Python: Real-World Data Science
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Learning Predictive Analytics with Python
Ebook
Learning Predictive Analytics with Python
byKumar Ashish
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis Cookbook
Ebook
Practical Data Analysis Cookbook
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
Learning Tableau 10 - Second Edition
Ebook
Learning Tableau 10 - Second Edition
byJoshua N. Milligan
Rating: 4 out of 5 stars
4/5
Mastering Tableau
Ebook
Mastering Tableau
byDavid Baldwin
Rating: 3 out of 5 stars
3/5
Mastering Scientific Computing with R
Ebook
Mastering Scientific Computing with R
byPaul Gerrard
Rating: 3 out of 5 stars
3/5
Simulation for Data Science with R
Ebook
Simulation for Data Science with R
byMatthias Templ
Rating: 0 out of 5 stars
0 ratings
R For Dummies
Ebook
R For Dummies
byAndrie de Vries
Rating: 4 out of 5 stars
4/5
Python Data Analysis Cookbook
Ebook
Python Data Analysis Cookbook
byIvan Idris
Rating: 5 out of 5 stars
5/5
Learning pandas
Ebook
Learning pandas
byHeydt Michael
Rating: 4 out of 5 stars
4/5
Practical Data Analysis - Second Edition
Ebook
Practical Data Analysis - Second Edition
byHector Cuesta
Rating: 0 out of 5 stars
0 ratings
Python Business Intelligence Cookbook
Ebook
Python Business Intelligence Cookbook
byDempsey Robert
Rating: 0 out of 5 stars
0 ratings
Python Data Visualization Cookbook - Second Edition
Ebook
Python Data Visualization Cookbook - Second Edition
byMilovanović Igor
Rating: 0 out of 5 stars
0 ratings
matplotlib Plotting Cookbook
Ebook
matplotlib Plotting Cookbook
byAlexandre Devert
Rating: 5 out of 5 stars
5/5
Machine Learning with R, the tidyverse, and mlr
Ebook
Machine Learning with R, the tidyverse, and mlr
byHefin Rhys
Rating: 0 out of 5 stars
0 ratings
Tableau 10 Business Intelligence Cookbook
Ebook
Tableau 10 Business Intelligence Cookbook
bySantos Donabel
Rating: 0 out of 5 stars
0 ratings
Python Data Visualization Cookbook
Ebook
Python Data Visualization Cookbook
byMilovanović Igor
Rating: 4 out of 5 stars
4/5
Data Visualization: Representing Information on Modern Web
Ebook
Data Visualization: Representing Information on Modern Web
byAndy Kirk
Rating: 5 out of 5 stars
5/5
Microsoft Tabular Modeling Cookbook
Ebook
Microsoft Tabular Modeling Cookbook
byPaul te Braak
Rating: 0 out of 5 stars
0 ratings
Mastering Data Analysis with R
Ebook
Mastering Data Analysis with R
byDaróczi Gergely
Rating: 5 out of 5 stars
5/5
Data Analysis with R
Ebook
Data Analysis with R
byFischetti Tony
Rating: 5 out of 5 stars
5/5
R for Data Science
Ebook
R for Data Science
byDan Toomey
Rating: 5 out of 5 stars
5/5
ggplot2 Essentials
Ebook
ggplot2 Essentials
byDonato Teutonico
Rating: 0 out of 5 stars
0 ratings
Practical Data Science with R, Second Edition
Ebook
Practical Data Science with R, Second Edition
byJohn Mount
Rating: 4 out of 5 stars
4/5

Data Modeling & Design For You

Skip carousel

WordPress For Beginners - How To Set Up A Self Hosted WordPress Blog
Ebook
WordPress For Beginners - How To Set Up A Self Hosted WordPress Blog
byCyrus Jackson
Rating: 0 out of 5 stars
0 ratings
DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5
Mastering Agile User Stories
Ebook
Mastering Agile User Stories
byDeEtta Balthazar
Rating: 4 out of 5 stars
4/5
Tableau Cookbook – Recipes for Data Visualization
Ebook
Tableau Cookbook – Recipes for Data Visualization
byShweta Sankhe-Savale
Rating: 0 out of 5 stars
0 ratings
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
Ebook
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
byPedro Lopes
Rating: 0 out of 5 stars
0 ratings
Minding the Machines: Building and Leading Data Science and Analytics Teams
Ebook
Minding the Machines: Building and Leading Data Science and Analytics Teams
byJeremy Adamson
Rating: 0 out of 5 stars
0 ratings
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
Ebook
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
byMichael Blake
Rating: 5 out of 5 stars
5/5
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
Ebook
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
byRob Collie
Rating: 4 out of 5 stars
4/5
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
Ebook
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
byMatt Allington
Rating: 5 out of 5 stars
5/5
How To Make Money With 3D Printing: The New Digital Revolution
Ebook
How To Make Money With 3D Printing: The New Digital Revolution
byAdidas Wilson
Rating: 3 out of 5 stars
3/5
Bayesian Analysis with Python
Ebook
Bayesian Analysis with Python
byOsvaldo Martin
Rating: 5 out of 5 stars
5/5
Data Analytics with Python: Data Analytics in Python Using Pandas
Ebook
Data Analytics with Python: Data Analytics in Python Using Pandas
byFrank Millstein
Rating: 3 out of 5 stars
3/5
Neural Networks: Neural Networks Tools and Techniques for Beginners
Ebook
Neural Networks: Neural Networks Tools and Techniques for Beginners
byJohn Slavio
Rating: 5 out of 5 stars
5/5
Living in Data: A Citizen's Guide to a Better Information Future
Ebook
Living in Data: A Citizen's Guide to a Better Information Future
byJer Thorp
Rating: 4 out of 5 stars
4/5
Data Visualization: a successful design process
Ebook
Data Visualization: a successful design process
byAndy Kirk
Rating: 4 out of 5 stars
4/5
R All-in-One For Dummies
Ebook
R All-in-One For Dummies
byJoseph Schmuller
Rating: 0 out of 5 stars
0 ratings
What Makes Us Smart: The Computational Logic of Human Cognition
Ebook
What Makes Us Smart: The Computational Logic of Human Cognition
bySamuel J. Gershman
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis
Ebook
Python Data Analysis
byIvan Idris
Rating: 4 out of 5 stars
4/5
150 Most Poweful Excel Shortcuts: Secrets of Saving Time with MS Excel
Ebook
150 Most Poweful Excel Shortcuts: Secrets of Saving Time with MS Excel
byAndrei Besedin
Rating: 3 out of 5 stars
3/5
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
Ebook
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
byDmitry Anoshin
Rating: 0 out of 5 stars
0 ratings
Python: Master the Art of Design Patterns
Ebook
Python: Master the Art of Design Patterns
byDusty Phillips
Rating: 4 out of 5 stars
4/5
Microsoft 365 Excel: The Only App That Matters: Calculations, Analytics, Modeling, Data Analysis and Dashboard Reporting for the New Era of Dynamic Data Driven Decision Making & Insight
Ebook
Microsoft 365 Excel: The Only App That Matters: Calculations, Analytics, Modeling, Data Analysis and Dashboard Reporting for the New Era of Dynamic Data Driven Decision Making & Insight
byMike Girvin
Rating: 3 out of 5 stars
3/5
Data Visualization with D3.js Cookbook
Ebook
Data Visualization with D3.js Cookbook
byNick Qi Zhu
Rating: 0 out of 5 stars
0 ratings
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
Ebook
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
byJason Scotts
Rating: 3 out of 5 stars
3/5
Secrets of MS Excel VBA Macros for Beginners !: Save Your Time With Visual Basic Macros!
Ebook
Secrets of MS Excel VBA Macros for Beginners !: Save Your Time With Visual Basic Macros!
byAndrei Besedin
Rating: 4 out of 5 stars
4/5
Logic Design: A Review Of Theory And Practice
Ebook
Logic Design: A Review Of Theory And Practice
byGlen G. Jr. Langdon
Rating: 0 out of 5 stars
0 ratings
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
Ebook
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
byIvan Vasilev
Rating: 0 out of 5 stars
0 ratings
Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence
Ebook
Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence
byHerbert Jones
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
Podcast episode
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
byAnalytics on Fire
0 ratings
0% found this document useful
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
Podcast episode
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
byDataFramed
100%
100% found this document useful
The Art & Science of Finding You Top Performers: The Art & Science of Finding You Top Performers Advanced Insights into Data Analysis and Optimization with Dr. Ellis Welcome to this episode of Seller Sessions, where we dive deep into the nuanced world of data analysis and optimisation with the...
Podcast episode
The Art & Science of Finding You Top Performers: The Art & Science of Finding You Top Performers Advanced Insights into Data Analysis and Optimization with Dr. Ellis Welcome to this episode of Seller Sessions, where we dive deep into the nuanced world of data analysis and optimisation with the...
bySeller Sessions Amazon FBA and Private Label
0 ratings
0% found this document useful
Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
Podcast episode
Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
416: Multi-Dimensional Numbers: Joël discusses the challenges he encountered while optimizing slow SQL queries in a non-Rails application. Stephanie shares her experience with canary deploys in a Rails upgrade. Together, Stephanie and Joël address a listener's question about replacing the wkhtml2pdf tool, which is no longer maintained. The episode's main topic revolves around the concept of multidimensional numbers and their applications in software development. Joël introduces the idea of treating objects containing multiple numbers as single entities, using the example of 2D points in space to illustrate how custom classes can define mathematical operations like addition and subtraction for complex data types. They explore how this approach can simplify operations on data structures, such as inventories of T-shirt sizes, by treating them as mathematical objects.
Podcast episode
416: Multi-Dimensional Numbers: Joël discusses the challenges he encountered while optimizing slow SQL queries in a non-Rails application. Stephanie shares her experience with canary deploys in a Rails upgrade. Together, Stephanie and Joël address a listener's question about replacing the wkhtml2pdf tool, which is no longer maintained. The episode's main topic revolves around the concept of multidimensional numbers and their applications in software development. Joël introduces the idea of treating objects containing multiple numbers as single entities, using the example of 2D points in space to illustrate how custom classes can define mathematical operations like addition and subtraction for complex data types. They explore how this approach can simplify operations on data structures, such as inventories of T-shirt sizes, by treating them as mathematical objects.
byThe Bike Shed
0 ratings
0% found this document useful
Ep. 65 - Data Modeling
Podcast episode
Ep. 65 - Data Modeling
byWhat's Your Baseline? Enterprise Architecture & Business Process Management Demystified
0 ratings
0% found this document useful
Data Types: One size does not fit all: Learn Programming and Electronics with Arduino
Podcast episode
Data Types: One size does not fit all: Learn Programming and Electronics with Arduino
byLearn Programming and Electronics with Arduino
0 ratings
0% found this document useful
Probe Data: The Good, The Bad, and The Ugly
Podcast episode
Probe Data: The Good, The Bad, and The Ugly
bySLP Nerdcast
0 ratings
0% found this document useful
66: A guide to data models and dynamic dashboards for marketers
Podcast episode
66: A guide to data models and dynamic dashboards for marketers
byHumans of Martech
0 ratings
0% found this document useful
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models: Mathematical reasoning is a challenging task for large language models (LLMs), while the scaling relationship of it with respect to LLM capacity is under-explored. In this paper, we investigate how the pre-training loss, supervised data amount, and a...
Podcast episode
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models: Mathematical reasoning is a challenging task for large language models (LLMs), while the scaling relationship of it with respect to LLM capacity is under-explored. In this paper, we investigate how the pre-training loss, supervised data amount, and a...
byPapers Read on AI
0 ratings
0% found this document useful
AWS S3 Storage Lens: The Best Service Not Announced at AWS Storage Day: Join Pete and Jesse as they talk about the coolest service not announced at AWS Storage Day: AWS S3 Storage Lens, which lets you track your S3 usage across accounts. They discuss how this new service solves a major problem, how you’d have to track S3 usag
Podcast episode
AWS S3 Storage Lens: The Best Service Not Announced at AWS Storage Day: Join Pete and Jesse as they talk about the coolest service not announced at AWS Storage Day: AWS S3 Storage Lens, which lets you track your S3 usage across accounts. They discuss how this new service solves a major problem, how you’d have to track S3 usag
byAWS Morning Brief
0 ratings
0% found this document useful
Whiteboard Confessional: Scaling Databases in a Single Bound: Join me as I continue a new series called Whiteboard Confessional by examining an all-too-common problem: having to scale a database when it’s too late. In this episode, I touch upon the underlying reason many developers don’t think about their database u
Podcast episode
Whiteboard Confessional: Scaling Databases in a Single Bound: Join me as I continue a new series called Whiteboard Confessional by examining an all-too-common problem: having to scale a database when it’s too late. In this episode, I touch upon the underlying reason many developers don’t think about their database u
byAWS Morning Brief
0 ratings
0% found this document useful
3 Act Math Tasks Aren’t Working
Podcast episode
3 Act Math Tasks Aren’t Working
byMaking Math Moments That Matter
0 ratings
0% found this document useful
Data Observability - Barr Moses
Podcast episode
Data Observability - Barr Moses
byDataTalks.Club
0 ratings
0% found this document useful
MLOps Coffee Sessions #14 Conversation with the Creators of Dask // Hugo Bowne-Anderson and Matthew Rocklin
Podcast episode
MLOps Coffee Sessions #14 Conversation with the Creators of Dask // Hugo Bowne-Anderson and Matthew Rocklin
byMLOps.community
0 ratings
0% found this document useful
Improving Upon a First-Draft Data Science Analysis: There are a lot of good resources out there for g…
Podcast episode
Improving Upon a First-Draft Data Science Analysis: There are a lot of good resources out there for g…
byLinear Digressions
0 ratings
0% found this document useful
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
Podcast episode
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
byMLOps.community
0 ratings
0% found this document useful
384: Not All Numbers Are Numbers
Podcast episode
384: Not All Numbers Are Numbers
byThe Bike Shed
0 ratings
0% found this document useful
100 billion Points Every Day: 100 billion Points Every Day 100 billion is a very large number, let's say that I gave you a spreadsheet with 100 billion rows in it, each row consisted of five columns Latitude, Longitude, Device ID, A Timestamp, and a column telling the name of the...
Podcast episode
100 billion Points Every Day: 100 billion Points Every Day 100 billion is a very large number, let's say that I gave you a spreadsheet with 100 billion rows in it, each row consisted of five columns Latitude, Longitude, Device ID, A Timestamp, and a column telling the name of the...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
Lessons learned from doing data science, at scale, in industry: If you’ve taken a machine learning class, or read…
Podcast episode
Lessons learned from doing data science, at scale, in industry: If you’ve taken a machine learning class, or read…
byLinear Digressions
0 ratings
0% found this document useful
Episode 113 - AWS Certification Exam Prep - Part 5/6 with Anya Derbakova and Ted Trentler: Welcome to part five in the AWS Certification Exa…
Podcast episode
Episode 113 - AWS Certification Exam Prep - Part 5/6 with Anya Derbakova and Ted Trentler: Welcome to part five in the AWS Certification Exa…
byAWS Developers Podcast
0 ratings
0% found this document useful
STaR: Bootstrapping Reasoning With Reasoning: Generating step-by-step"chain-of-thought"rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either con...
Podcast episode
STaR: Bootstrapping Reasoning With Reasoning: Generating step-by-step"chain-of-thought"rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either con...
byPapers Read on AI
0 ratings
0% found this document useful
#11: What Podcasters can learn from Spotify’s data
Podcast episode
#11: What Podcasters can learn from Spotify’s data
byTOPP - The Open Podcast Podcast
0 ratings
0% found this document useful
Potluck - Courses for Kids × Sub-Components × Recursion × DB Hosting × Frameworks × Data Structures & Algorithms × More!: It’s another potluck! In this episode, Scott and Wes answer your questions about kids learning to code, React sub-components, why it’s so hard to scale, new frameworks, data structures, and more! LogRocket - Sponsor LogRocket lets you replay what...
Podcast episode
Potluck - Courses for Kids × Sub-Components × Recursion × DB Hosting × Frameworks × Data Structures & Algorithms × More!: It’s another potluck! In this episode, Scott and Wes answer your questions about kids learning to code, React sub-components, why it’s so hard to scale, new frameworks, data structures, and more! LogRocket - Sponsor LogRocket lets you replay what...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
396: Build vs. Buy: Joël has been fighting a frustrating bug where he's integrating with a third-party database, and some queries just crash. Stephanie shares her own debugging story about a leaky stub that caused flaky tests. Additionally, they discuss the build vs. buy decision when integrating with third-party systems. They consider the time and cost implications of building their own integration versus using off-the-shelf components and conclude that the decision often depends on the specific needs and priorities of the project, including how quickly a solution is needed and whether the integration is core to the business's value proposition.
Podcast episode
396: Build vs. Buy: Joël has been fighting a frustrating bug where he's integrating with a third-party database, and some queries just crash. Stephanie shares her own debugging story about a leaky stub that caused flaky tests. Additionally, they discuss the build vs. buy decision when integrating with third-party systems. They consider the time and cost implications of building their own integration versus using off-the-shelf components and conclude that the decision often depends on the specific needs and priorities of the project, including how quickly a solution is needed and whether the integration is core to the business's value proposition.
byThe Bike Shed
0 ratings
0% found this document useful
How to Build a Website — The Show For Beginners: In this episode of Syntax, Scott and Wes talk about the basics of building a website — how to get started for beginners! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did you hear about us?”...
Podcast episode
How to Build a Website — The Show For Beginners: In this episode of Syntax, Scott and Wes talk about the basics of building a website — how to get started for beginners! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did you hear about us?”...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Defining Success: Metrics and KPIs - Adam Sroka
Podcast episode
Defining Success: Metrics and KPIs - Adam Sroka
byDataTalks.Club
0 ratings
0% found this document useful
DIY Consensus: Crafting Your Own Distributed Code (with Benjamin Bengfort)
Podcast episode
DIY Consensus: Crafting Your Own Distributed Code (with Benjamin Bengfort)
byDeveloper Voices
0 ratings
0% found this document useful
A MultiCloud Rant: You know what grinds Corey’s gears? MultiCloud, more specifically about how companies talk about MultiCloud. Everything from workloads to getting behind one cloud provider to the future. How should we actually talk about MultiCloud? This week Corey offers
Podcast episode
A MultiCloud Rant: You know what grinds Corey’s gears? MultiCloud, more specifically about how companies talk about MultiCloud. Everything from workloads to getting behind one cloud provider to the future. How should we actually talk about MultiCloud? This week Corey offers
byAWS Morning Brief
0 ratings
0% found this document useful
418: Mental Models For Reduce Functions: Joël talks about his difficulties optimizing queries in ActiveRecord, especially with complex scopes and unions, resulting in slow queries. He emphasizes the importance of optimizing subqueries in unions to boost performance despite challenges such as query duplication and difficulty reusing scopes. Stephanie discusses upgrading a client's app to Rails 7, highlighting the importance of patience, detailed attention, and the benefits of collaborative work with a fellow developer. The conversation shifts to Ruby's reduce method (inject), exploring its complexity and various mental models to understand it. They discuss when it's preferable to use reduce over other methods like each, map, or loops and the importance of understanding the underlying operation you wish to apply to two elements before scaling up with reduce. The episode also touches on monoids and how they relate to reduce, suggesting that a deep understanding of functional programming
Podcast episode
418: Mental Models For Reduce Functions: Joël talks about his difficulties optimizing queries in ActiveRecord, especially with complex scopes and unions, resulting in slow queries. He emphasizes the importance of optimizing subqueries in unions to boost performance despite challenges such as query duplication and difficulty reusing scopes. Stephanie discusses upgrading a client's app to Rails 7, highlighting the importance of patience, detailed attention, and the benefits of collaborative work with a fellow developer. The conversation shifts to Ruby's reduce method (inject), exploring its complexity and various mental models to understand it. They discuss when it's preferable to use reduce over other methods like each, map, or loops and the importance of understanding the underlying operation you wish to apply to two elements before scaling up with reduce. The episode also touches on monoids and how they relate to reduce, suggesting that a deep understanding of functional programming
byThe Bike Shed
0 ratings
0% found this document useful

Skip carousel

Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Comparing Time Series Data Like A Pro
Linux Format
Article
Comparing Time Series Data Like A Pro
Jun 1, 2021
8 min read
FRACTALS Going beyond the Mandelbrot Set
Linux Format
Article
FRACTALS Going beyond the Mandelbrot Set
Jul 2, 2019
10 min read
Using Calc For Serious Mathematics Work
Linux Format
Article
Using Calc For Serious Mathematics Work
Mar 10, 2020
10 min read
Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
Pandas And Data
Linux Format
Article
Pandas And Data
Jul 27, 2021
Pandas can perform a variety of tasks including data loading, preparation and manipulation as well as data modelling and analysis. You can join, merge and reshape data with the help of Pandas, using data from different sources. As mentioned elsewhere
1 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
CSV Handling
Linux Format
Article
CSV Handling
Mar 10, 2020
3 min read
Transform Your Grid
Digital Camera World
Article
Transform Your Grid
Aug 21, 2020
3 min read
A Ubiquitous Sierpinski Triangle
Linux Format
Article
A Ubiquitous Sierpinski Triangle
Jul 2, 2019
Here’s an interesting experiment you might like to try. You could use a ruler, pencil and paper, and an unfeasibly large measure of patience, although we recommend knocking up a simple bit of software. You could even do it as a spreadsheet. Draw an e
1 min read
Visualise Complex Data In Style Using Timelion
Linux Format
Article
Visualise Complex Data In Style Using Timelion
Oct 20, 2020
Simon Quain is a site reliability engineer who likes discovering open datasets online to play around with in the Elastic Stack. You’ve probably heard of Elasticsearch – the search engine that enables you to index and then quickly search through your
9 min read
“There’s No Single ‘Best’ Language To Learn. I Think The Real Key Is To Learn How To Write Code”
PC Pro Magazine
Article
“There’s No Single ‘Best’ Language To Learn. I Think The Real Key Is To Learn How To Write Code”
Oct 8, 2022
9 min read
Data Analysis
Linux Format
Article
Data Analysis
Mar 10, 2020
Sometimes you receive raw data that needs to be processed before plotting. In Veusz, look under the Data > Operations menu and find lots of options for manipulating data sets. Joining, merging, finding the average, filtering and many more are availab
1 min read
Mailserver
Linux Format
Article
Mailserver
Jun 27, 2023
4 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
SIZE matters (PART V: SIGNAL AND NOISE)
BeanScene
Article
SIZE matters (PART V: SIGNAL AND NOISE)
Aug 12, 2018
At this point in our series, we’d like to introduce a framework we think is very useful when it comes to brewing and tasting coffee. You might have heard the term “signal-to-noise ratio” (often abbreviated to SNR). With a SNR framework, the world can
3 min read
How Spooky Science Helps Us Peer Inside The Planets
All About Space
Article
How Spooky Science Helps Us Peer Inside The Planets
Dec 3, 2020
An assistant professor of computational science at the EPFL research centre in Lausanne, Switzerland, involved in the current research on metallic hydrogen. Could you explain how the machine-learning techniques used in your research work? Why were th
1 min read
The Race To Exascale Supercomputers
Maximum PC
Article
The Race To Exascale Supercomputers
Jun 21, 2022
9 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
Create Blueberries Procedurally In Substance Designer
3D World
Article
Create Blueberries Procedurally In Substance Designer
Jul 15, 2020
5 min read
Create A Cinematic Night Scene
3D World
Article
Create A Cinematic Night Scene
Oct 9, 2019
5 min read
How Do I Track Which Row I’m On?
Simply Crochet
Article
How Do I Track Which Row I’m On?
Sep 5, 2023
3 min read
Experiments In Photogrammetry
British Columbia History
Article
Experiments In Photogrammetry
Jun 15, 2023
Ever since the fire of June 30, 2021, destroyed the Lytton Museum and Archives, I have been trying to assemble preservation methods designed to reduce the effect of another catastrop loss. To this end, I have been studying ways of making digital thre
2 min read
Top Tips For A Smarter Archviz Workflow
3D World
Article
Top Tips For A Smarter Archviz Workflow
Aug 14, 2019
7 min read
Cropping and Resizing
Smart Photography
Article
Cropping and Resizing
Sep 3, 2021
10 min read
Working With Binary Tree Data Structures
Linux Format
Article
Working With Binary Tree Data Structures
Mar 8, 2022
Mihalis Tsoukalos is a systems engineer and a technical writer. You can reach him at www.mtsoukalos.eu and @mactsouk. The main benefit of using a binary tree or a tree in general is that you can quickly find out if an element is present or not, compa
10 min read
The Expert View Steve Cassidy
PC Pro Magazine
Article
The Expert View Steve Cassidy
Oct 8, 2020
The elephant in the room when it comes to training is the difference between prebuilt, standard courses and custom jobs. A prebuilt course steers a narrow semantic route between rapids and whirlpools, because knowing what Photoshop can do is very dif
1 min read
Working With Binary Tree Data Structures
Linux Format
Article
Working With Binary Tree Data Structures
Mar 8, 2022
Mihalis Tsoukalos is a systems engineer and a technical writer. You can reach him at www.mtsoukalos.eu and @mactsouk. The main benefit of using a binary tree or a tree in general is that you can quickly find out if an element is present or not, compa
10 min read

Related categories

Skip carousel

Reviews for R

Rating: 5 out of 5 stars

5/5

1 rating0 reviews

Book preview

R - Brett Lantz

R: Data Analysis and Visualization

Meet Your Course Guide

Course Structure

Course journey

The Course Roadmap and Timeline

I. Module 1: Data Analysis with R

1. RefresheR

Navigating the basics

Arithmetic and assignment

Logicals and characters

Flow of control

Getting help in R

Vectors

Subsetting

Vectorized functions

Advanced subsetting

Recycling

Functions

Matrices

Loading data into R

Working with packages

2. The Shape of Data

Univariate data

Frequency distributions

Central tendency

Spread

Populations, samples, and estimation

Probability distributions

Visualization methods

3. Describing Relationships

Multivariate data

Relationships between a categorical and a continuous variable

Relationships between two categorical variables

The relationship between two continuous variables

Covariance

Correlation coefficients

Comparing multiple correlations

Visualization methods

Categorical and continuous variables

Two categorical variables

Two continuous variables

More than two continuous variables

4. Probability

Basic probability

A tale of two interpretations

Sampling from distributions

Parameters

The binomial distribution

The normal distribution

The three-sigma rule and using z-tables

5. Using Data to Reason About the World

Estimating means

The sampling distribution

Interval estimation

How did we get 1.96?

Smaller samples

6. Testing Hypotheses

Null Hypothesis Significance Testing

One and two-tailed tests

When things go wrong

A warning about significance

A warning about p-values

Testing the mean of one sample

Assumptions of the one sample t-test

Testing two means

Don't be fooled!

Assumptions of the independent samples t-test

Testing more than two means

Assumptions of ANOVA

Testing independence of proportions

What if my assumptions are unfounded?

7. Bayesian Methods

The big idea behind Bayesian analysis

Choosing a prior

Who cares about coin flips

Enter MCMC – stage left

Using JAGS and runjags

Fitting distributions the Bayesian way

The Bayesian independent samples t-test

8. Predicting Continuous Variables

Linear models

Simple linear regression

Simple linear regression with a binary predictor

A word of warning

Multiple regression

Regression with a non-binary predictor

Kitchen sink regression

The bias-variance trade-off

Cross-validation

Striking a balance

Linear regression diagnostics

Second Anscombe relationship

Third Anscombe relationship

Fourth Anscombe relationship

Advanced topics

9. Predicting Categorical Variables

k-Nearest Neighbors

Using k-NN in R

Confusion matrices

Limitations of k-NN

Logistic regression

Using logistic regression in R

Decision trees

Random forests

Choosing a classifier

The vertical decision boundary

The diagonal decision boundary

The crescent decision boundary

The circular decision boundary

10. Sources of Data

Relational Databases

Why didn't we just do that in SQL?

Using JSON

XML

Other data formats

Online repositories

11. Dealing with Messy Data

Analysis with missing data

Visualizing missing data

Types of missing data

So which one is it?

Unsophisticated methods for dealing with missing data

Complete case analysis

Pairwise deletion

Mean substitution

Hot deck imputation

Regression imputation

Stochastic regression imputation

Multiple imputation

So how does mice come up with the imputed values?

Methods of imputation

Multiple imputation in practice

Analysis with unsanitized data

Checking for out-of-bounds data

Checking the data type of a column

Checking for unexpected categories

Checking for outliers, entry errors, or unlikely data points

Chaining assertions

Other messiness

OpenRefine

Regular expressions

tidyr

12. Dealing with Large Data

Wait to optimize

Using a bigger and faster machine

Be smart about your code

Allocation of memory

Vectorization

Using optimized packages

Using another R implementation

Use parallelization

Getting started with parallel R

An example of (some) substance

Using Rcpp

Be smarter about your code

13. Reproducibility and Best Practices

R Scripting

RStudio

Running R scripts

An example script

Scripting and reproducibility

R projects

Version control

Communicating results

II. Module 2: R Graphs

1. R Graphics

Base graphics using the default package

Trellis graphs using lattice

Graphs inspired by Grammar of Graphics

2. Basic Graph Functions

Introduction

Creating basic scatter plots

Getting ready

How to do it...

How it works...

There's more...

A note on R's built-in datasets

See also

Creating line graphs

Getting ready

How to do it...

How it works...

There's more...

See also

Creating bar charts

Getting ready

How to do it...

How it works...

There's more...

See also

Creating histograms and density plots

How to do it...

How it works...

There's more...

See also

Creating box plots

Getting ready

How to do it...

How it works...

There's more...

See also

Adjusting x and y axes' limits

How to do it...

How it works...

There's more...

See also

Creating heat maps

How to do it...

How it works...

There's more...

See also

Creating pairs plots

How to do it...

How it works...

There's more...

See also

Creating multiple plot matrix layouts

How to do it...

How it works...

There's more...

See also

Adding and formatting legends

Getting ready

How to do it...

How it works...

There's more...

See also

Creating graphs with maps

Getting ready

How to do it...

How it works...

There's more...

See also

Saving and exporting graphs

How to do it...

How it works...

There's more...

See also

3. Beyond the Basics – Adjusting Key Parameters

Introduction

Setting colors of points, lines, and bars

Getting ready

How to do it...

How it works...

There's more...

See also

Setting plot background colors

Getting ready

How to do it...

How it works...

There's more...

Setting colors for text elements – axis annotations, labels, plot titles, and legends

Getting ready

How to do it...

How it works...

There's more...

Choosing color combinations and palettes

Getting ready

How to do it...

How it works...

There's more...

See also

Setting fonts for annotations and titles

Getting ready

How to do it...

How it works...

There's more...

See also

Choosing plotting point symbol styles and sizes

Getting ready

How to do it...

How it works...

There's more...

See also

Choosing line styles and width

Getting ready

How to do it...

How it works...

See also

Choosing box styles

Getting ready

How to do it...

How it works...

There's more...

Adjusting axis annotations and tick marks

Getting ready

How to do it...

How it works...

There's more...

See also

Formatting log axes

Getting ready

How to do it...

How it works...

There's more...

Setting graph margins and dimensions

Getting ready

How to do it...

How it works...

See also

4. Creating Scatter Plots

Introduction

Grouping data points within a scatter plot

Getting ready

How to do it...

How it works...

There's more...

See also

Highlighting grouped data points by size and symbol type

Getting ready

How to do it...

How it works...

Labeling data points

Getting ready

How to do it...

How it works...

There's more...

Correlation matrix using pairs plots

Getting ready

How to do it...

How it works...

Adding error bars

Getting ready

How to do it...

How it works...

There's more...

Using jitter to distinguish closely packed data points

Getting ready

How to do it...

How it works...

Adding linear model lines

Getting ready

How to do it...

How it works...

Adding nonlinear model curves

Getting ready

How to do it...

How it works...

Adding nonparametric model curves with lowess

Getting ready

How to do it...

How it works...

Creating three-dimensional scatter plots

Getting ready

How to do it...

How it works...

There's more...

Creating Quantile-Quantile plots

Getting ready

How to do it...

How it works...

There's more...

Displaying the data density on axes

Getting ready

How to do it...

How it works...

There's more...

Creating scatter plots with a smoothed density representation

Getting ready

How to do it...

How it works...

There's more...

5. Creating Line Graphs and Time Series Charts

Introduction

Adding customized legends for multiple-line graphs

Getting ready

How to do it...

How it works...

There's more...

See also

Using margin labels instead of legends for multiple-line graphs

Getting ready

How to do it...

How it works...

There's more...

Adding horizontal and vertical grid lines

Getting ready

How to do it...

How it works...

There's more...

See also

Adding marker lines at specific x and y values using abline

Getting ready

How to do it...

How it works...

There's more...

Creating sparklines

Getting ready

How to do it...

How it works...

Plotting functions of a variable in a dataset

Getting ready

How to do it...

How it works...

There's more...

Formatting time series data for plotting

Getting ready

How to do it...

How it works...

There's more...

Plotting the date or time variable on the x axis

Getting ready

How to do it...

How it works...

There's more...

Annotating axis labels in different human-readable time formats

Getting ready

How to do it...

How it works...

There's more...

Adding vertical markers to indicate specific time events

Getting ready

How to do it...

How it works...

There's more...

Plotting data with varying time-averaging periods

Getting ready

How to do it...

How it works...

Creating stock charts

Getting ready

How to do it...

How it works...

There's more...

6. Creating Bar, Dot, and Pie Charts

Introduction

Creating bar charts with more than one factor variable

Getting ready

How to do it...

How it works...

See also

Creating stacked bar charts

Getting ready

How to do it...

How it works...

There's more...

Adjusting the orientation of bars – horizontal and vertical

Getting ready

How to do it...

How it works...

There's more...

Adjusting bar widths, spacing, colors, and borders

Getting ready

How to do it...

How it works...

There's more...

Displaying values on top of or next to the bars

Getting ready

How to do it...

How it works...

There's more...

See also

Placing labels inside bars

Getting ready

How to do it...

How it works...

There's more...

Creating bar charts with vertical error bars

Getting ready

How to do it...

How it works...

There's more...

Modifying dot charts by grouping variables

Getting ready

How to do it...

How it works...

Making better, readable pie charts with clockwise-ordered slices

Getting ready

How to do it...

How it works...

See also

Labeling a pie chart with percentage values for each slice

Getting ready

How it works...

There's more...

See also

Adding a legend to a pie chart

Getting ready

How to do it...

How it works...

There's more...

7. Creating Histograms

Introduction

Visualizing distributions as count frequencies or probability densities

Getting ready

How to do it...

How it works...

There's more

Setting the bin size and the number of breaks

Getting ready

How to do it...

How it works...

There's more

Adjusting histogram styles – bar colors, borders, and axes

Getting ready

How to do it...

How it works...

There's more

Overlaying a density line over a histogram

Getting ready

How to do it...

How it works...

Multiple histograms along the diagonal of a pairs plot

Getting ready

How to do it...

How it works...

Histograms in the margins of line and scatter plots

Getting ready

How to do it...

How it works...

8. Box and Whisker Plots

Introduction

Creating box plots with narrow boxes for a small number of variables

Getting ready

How to do it...

How it works...

There's more

See also

Grouping over a variable

Getting ready

How to do it...

How it works...

There's more

See also

Varying box widths by the number of observations

Getting ready

How to do it...

How it works...

Creating box plots with notches

Getting ready

How to do it...

How it works...

There's more

Including or excluding outliers

Getting ready

How to do it...

How it works...

See also

Creating horizontal box plots

Getting ready

How to do it...

How it works...

Changing the box styling

Getting ready

How to do it...

How it works...

There's more

Adjusting the extent of plot whiskers outside the box

Getting ready

How to do it...

How it works...

There's more

Showing the number of observations

Getting ready

How to do it...

How it works...

There's more

Splitting a variable at arbitrary values into subsets

Getting ready

How to do it...

How it works...

There's more

9. Creating Heat Maps and Contour Plots

Introduction

Creating heat maps of a single Z variable with a scale

Getting ready

How to do it...

How it works...

There's more

See also

Creating correlation heat maps

Getting ready

How to do it...

How it works...

There's more

Summarizing multivariate data in a single heat map

Getting ready

How to do it...

How it works...

There's more

Creating contour plots

Getting ready

How to do it...

How it works...

There's more

See also

Creating filled contour plots

Getting ready

How to do it...

How it works...

There's more

See also

Creating three-dimensional surface plots

Getting ready

How to do it...

How it works...

There's more

Visualizing time series as calendar heat maps

Getting ready

How to do it...

How it works...

There's more

10. Creating Maps

Introduction

Plotting global data by countries on a world map

Getting ready

How to do it...

How it works...

There's more

See also

Creating graphs with regional maps

Getting ready

How to do it...

How it works...

There's more

Plotting data on Google maps

Getting ready

How to do it...

How it works...

There's more

See also

Creating and reading KML data

Getting ready

How to do it...

How it works...

See Also

Working with ESRI shapefiles

Getting ready

How to do it...

How it works...

There's more

11. Data Visualization Using Lattice

Introduction

Creating bar charts

Getting ready

How to do it…

How it works…

There's more…

See also

Creating stacked bar charts

Getting ready

How to do it…

How it works…

There's more…

See also

Creating bar charts to visualize cross-tabulation

Getting ready

How to do it…

How it works…

There's more…

Creating a conditional histogram

Getting ready

How to do it…

How it works…

There's more…

See also

Visualizing distributions through a kernel-density plot

Getting ready

How to do it…

How it works…

There's more…

Creating a normal Q-Q plot

Getting ready

How to do it…

How it works…

There's more…

Visualizing an empirical Cumulative Distribution Function

Getting ready

How to do it…

How it works…

There's more…

Creating a boxplot

Getting ready

How to do it…

How it works…

There's more…

Creating a conditional scatter plot

Getting ready

How to do it…

How it works…

There's more…

12. Data Visualization Using ggplot2

Introduction

Creating bar charts

Getting ready

How to do it…

How it works…

There's more…

See also

Creating multiple bar charts

Getting ready

How to do it…

How it works…

There's more…

See also

Creating a bar chart with error bars

Getting ready

How to do it…

How it works…

There's more…

Visualizing the density of a numeric variable

Getting ready

How to do it...

How it works…

There's more...

Creating a box plot

Getting ready

How to do it...

How it works…

Creating a layered plot with a scatter plot and fitted line

Getting ready

How to do it...

How it works…

There's more...

Creating a line chart

Getting ready

How to do it...

How it works…

There's more...

Graph annotation with ggplot

Getting ready

How to do it...

How it works...

13. Inspecting Large Datasets

Introduction

Multivariate continuous data visualization

Getting ready

How to do it…

How it works…

There's more…

See also

Multivariate categorical data visualization

Getting ready

How to do it…

How it works…

There's more…

Visualizing mixed data

Getting ready

How to do it…

Zooming and filtering

Getting ready

How to do it...

How it works…

There's more...

14. Three-dimensional Visualizations

Introduction

Three-dimensional scatter plots

Getting ready

How to do it…

How it works…

There's more…

See also...

Three-dimensional scatter plots with a regression plane

Getting ready

How to do it…

How it works…

There's more…

Three-dimensional bar charts

Getting ready

How to do it…

How it works…

Three-dimensional density plots

Getting ready

How to do it...

How it works…

15. Finalizing Graphs for Publications and Presentations

Introduction

Exporting graphs in high-resolution image formats – PNG, JPEG, BMP, and TIFF

Getting ready

How to do it...

How it works...

There's more

See also

Exporting graphs in vector formats – SVG, PDF, and PS

Getting ready

How to do it...

How it works...

There's more

Adding mathematical and scientific notations (typesetting)

Getting ready

How to do it...

How it works...

There's more

Adding text descriptions to graphs

Getting ready

How to do it...

How it works...

There's more

Using graph templates

Getting ready

How to do it...

How it works...

There's more

Choosing font families and styles under Windows, Mac OS X, and Linux

Getting ready

How to do it...

How it works...

There's more

See also

Choosing fonts for PostScripts and PDFs

Getting ready

How to do it...

How it works...

There's more

III. Module 3: Learning Data Mining with R

1. Warming Up

Big data

Scalability and efficiency

Data source

Data mining

Feature extraction

Summarization

The data mining process

CRISP-DM

SEMMA

Social network mining

Social network

Text mining

Information retrieval and text mining

Mining text for prediction

Web data mining

Why R?

What are the disadvantages of R?

Statistics

Statistics and data mining

Statistics and machine learning

Statistics and R

The limitations of statistics on data mining

Machine learning

Approaches to machine learning

Machine learning architecture

Data attributes and description

Numeric attributes

Categorical attributes

Data description

Data measuring

Data cleaning

Missing values

Junk, noisy data, or outlier

Data integration

Data dimension reduction

Eigenvalues and Eigenvectors

Principal-Component Analysis

Singular-value decomposition

CUR decomposition

Data transformation and discretization

Data transformation

Normalization data transformation methods

Data discretization

Visualization of results

Visualization with R

2. Mining Frequent Patterns, Associations, and Correlations

An overview of associations and patterns

Patterns and pattern discovery

The frequent itemset

The frequent subsequence

The frequent substructures

Relationship or rules discovery

Association rules

Correlation rules

Market basket analysis

The market basket model

A-Priori algorithms

Input data characteristics and data structure

The A-Priori algorithm

The R implementation

A-Priori algorithm variants

The Eclat algorithm

The R implementation

The FP-growth algorithm

Input data characteristics and data structure

The FP-growth algorithm

The R implementation

The GenMax algorithm with maximal frequent itemsets

The R implementation

The Charm algorithm with closed frequent itemsets

The R implementation

The algorithm to generate association rules

The R implementation

Hybrid association rules mining

Mining multilevel and multidimensional association rules

Constraint-based frequent pattern mining

Mining sequence dataset

Sequence dataset

The GSP algorithm

The R implementation

The SPADE algorithm

The R implementation

Rule generation from sequential patterns

High-performance algorithms

3. Classification

Classification

Generic decision tree induction

Attribute selection measures

Tree pruning

General algorithm for the decision tree generation

The R implementation

High-value credit card customers classification using ID3

The ID3 algorithm

The R implementation

Web attack detection

High-value credit card customers classification

Web spam detection using C4.5

The C4.5 algorithm

The R implementation

A parallel version with MapReduce

Web spam detection

Web key resource page judgment using CART

The CART algorithm

The R implementation

Web key resource page judgment

Trojan traffic identification method and Bayes classification

Estimating

Prior probability estimation

Likelihood estimation

The Bayes classification

The R implementation

Trojan traffic identification method

Identify spam e-mail and Naïve Bayes classification

The Naïve Bayes classification

The R implementation

Identify spam e-mail

Rule-based classification of player types in computer games and rule-based classification

Transformation from decision tree to decision rules

Rule-based classification

Sequential covering algorithm

The RIPPER algorithm

The R implementation

Rule-based classification of player types in computer games

4. Advanced Classification

Ensemble (EM) methods

The bagging algorithm

The boosting and AdaBoost algorithms

The Random forests algorithm

The R implementation

Parallel version with MapReduce

Biological traits and the Bayesian belief network

The Bayesian belief network (BBN) algorithm

The R implementation

Biological traits

Protein classification and the k-Nearest Neighbors algorithm

The kNN algorithm

The R implementation

Document retrieval and Support Vector Machine

The SVM algorithm

The R implementation

Parallel version with MapReduce

Document retrieval

Classification using frequent patterns

The associative classification

CBA

Discriminative frequent pattern-based classification

The R implementation

Text classification using sentential frequent itemsets

Classification using the backpropagation algorithm

The BP algorithm

The R implementation

Parallel version with MapReduce

5. Cluster Analysis

Search engines and the k-means algorithm

The k-means clustering algorithm

The kernel k-means algorithm

The k-modes algorithm

The R implementation

Parallel version with MapReduce

Search engine and web page clustering

Automatic abstraction of document texts and the k-medoids algorithm

The PAM algorithm

The R implementation

Automatic abstraction and summarization of document text

The CLARA algorithm

The R implementation

CLARANS

The CLARANS algorithm

The R implementation

Unsupervised image categorization and affinity propagation clustering

Affinity propagation clustering

The R implementation

Unsupervised image categorization

The spectral clustering algorithm

The R implementation

News categorization and hierarchical clustering

Agglomerative hierarchical clustering

The BIRCH algorithm

The chameleon algorithm

The Bayesian hierarchical clustering algorithm

The probabilistic hierarchical clustering algorithm

The R implementation

News categorization

6. Advanced Cluster Analysis

Customer categorization analysis of e-commerce and DBSCAN

The DBSCAN algorithm

Customer categorization analysis of e-commerce

Clustering web pages and OPTICS

The OPTICS algorithm

The R implementation

Clustering web pages

Visitor analysis in the browser cache and DENCLUE

The DENCLUE algorithm

The R implementation

Visitor analysis in the browser cache

Recommendation system and STING

The STING algorithm

The R implementation

Recommendation systems

Web sentiment analysis and CLIQUE

The CLIQUE algorithm

The R implementation

Web sentiment analysis

Opinion mining and WAVE clustering

The WAVE cluster algorithm

The R implementation

Opinion mining

User search intent and the EM algorithm

The EM algorithm

The R implementation

The user search intent

Customer purchase data analysis and clustering high-dimensional data

The MAFIA algorithm

The SURFING algorithm

The R implementation

Customer purchase data analysis

SNS and clustering graph and network data

The SCAN algorithm

The R implementation

Social networking service (SNS)

7. Outlier Detection

Credit card fraud detection and statistical methods

The likelihood-based outlier detection algorithm

The R implementation

Credit card fraud detection

Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods

The NL algorithm

The FindAllOutsM algorithm

The FindAllOutsD algorithm

The distance-based algorithm

The Dolphin algorithm

The R implementation

Activity monitoring and the detection of mobile fraud

Intrusion detection and density-based methods

The OPTICS-OF algorithm

The High Contrast Subspace algorithm

The R implementation

Intrusion detection

Intrusion detection and clustering-based methods

Hierarchical clustering to detect outliers

The k-means-based algorithm

The ODIN algorithm

The R implementation

Monitoring the performance of the web server and classification-based methods

The OCSVM algorithm

The one-class nearest neighbor algorithm

The R implementation

Monitoring the performance of the web server

Detecting novelty in text, topic detection, and mining contextual outliers

The conditional anomaly detection (CAD) algorithm

The R implementation

Detecting novelty in text and topic detection

Collective outliers on spatial data

The route outlier detection (ROD) algorithm

The R implementation

Characteristics of collective outliers

Outlier detection in high-dimensional data

The brute-force algorithm

The HilOut algorithm

The R implementation

8. Mining Stream, Time-series, and Sequence Data

The credit card transaction flow and STREAM algorithm

The STREAM algorithm

The single-pass-any-time clustering algorithm

The R implementation

The credit card transaction flow

Predicting future prices and time-series analysis

The ARIMA algorithm

Predicting future prices

Stock market data and time-series clustering and classification

The hError algorithm

Time-series classification with the 1NN classifier

The R implementation

Stock market data

Web click streams and mining symbolic sequences

The TECNO-STREAMS algorithm

The R implementation

Web click streams

Mining sequence patterns in transactional databases

The PrefixSpan algorithm

The R implementation

9. Graph Mining and Network Analysis

Graph mining

Graph

Graph mining algorithms

Mining frequent subgraph patterns

The gPLS algorithm

The GraphSig algorithm

The gSpan algorithm

Rightmost path extensions and their supports

The subgraph isomorphism enumeration algorithm

The canonical checking algorithm

The R implementation

Social network mining

Community detection and the shingling algorithm

The node classification and iterative classification algorithms

The R implementation

10. Mining Text and Web Data

Text mining and TM packages

Text summarization

Topic representation

The multidocument summarization algorithm

The Maximal Marginal Relevance algorithm

The R implementation

The question answering system

Genre categorization of web pages

Categorizing newspaper articles and newswires into topics

The N-gram-based text categorization

The R implementation

Web usage mining with web logs

The FCA-based association rule mining algorithm

The R implementation

IV. Module 4: Mastering R for Quantitative Finance

1. Time Series Analysis

Multivariate time series analysis

Cointegration

Vector autoregressive models

VAR implementation example

Cointegrated VAR and VECM

Volatility modeling

GARCH modeling with the rugarch package

The standard GARCH model

The Exponential GARCH model (EGARCH)

The Threshold GARCH model (TGARCH)

Simulation and forecasting

References and reading list

2. Factor Models

Arbitrage pricing theory

Implementation of APT

Fama-French three-factor model

Modeling in R

Data selection

Estimation of APT with principal component analysis

Estimation of the Fama-French model

References

3. Forecasting Volume

Motivation

The intensity of trading

The volume forecasting model

Implementation in R

The data

Loading the data

The seasonal component

AR(1) estimation and forecasting

SETAR estimation and forecasting

Interpreting the results

References

4. Big Data – Advanced Analytics

Getting data from open sources

Introduction to big data analysis in R

K-means clustering on big data

Loading big matrices

Big data K-means clustering analysis

Big data linear regression analysis

Loading big data

Fitting a linear regression model on large datasets

References

5. FX Derivatives

Terminology and notations

Currency options

Exchange options

Two-dimensional Wiener processes

The Margrabe formula

Application in R

Quanto options

Pricing formula for a call quanto

Pricing a call quanto in R

References

6. Interest Rate Derivatives and Models

The Black model

Pricing a cap with Black's model

The Vasicek model

The Cox-Ingersoll-Ross model

Parameter estimation of interest rate models

Using the SMFI5 package

References

7. Exotic Options

A general pricing approach

The role of dynamic hedging

How R can help a lot

A glance beyond vanillas

Greeks – the link back to the vanilla world

Pricing the Double-no-touch option

Another way to price the Double-no-touch option

The life of a Double-no-touch option – a simulation

Exotic options embedded in structured products

References

8. Optimal Hedging

Hedging of derivatives

Market risk of derivatives

Static delta hedge

Dynamic delta hedge

Comparing the performance of delta hedging

Hedging in the presence of transaction costs

Optimization of the hedge

Optimal hedging in the case of absolute transaction costs

Optimal hedging in the case of relative transaction costs

Further extensions

References

9. Fundamental Analysis

The basics of fundamental analysis

Collecting data

Revealing connections

Including multiple variables

Separating investment targets

Setting classification rules

Backtesting

Industry-specific investment

References

10. Technical Analysis, Neural Networks, and Logoptimal Portfolios

Market efficiency

Technical analysis

The TA toolkit

Markets

Plotting charts - bitcoin

Built-in indicators

SMA and EMA

RSI

MACD

Candle patterns: key reversal

Evaluating the signals and managing the position

A word on money management

Wraping up

Neural networks

Forecasting bitcoin prices

Evaluation of the strategy

Logoptimal portfolios

A universally consistent, non-parametric investment strategy

Evaluation of the strategy

References

11. Asset and Liability Management

Data preparation

Data source at first glance

Cash-flow generator functions

Preparing the cash-flow

Interest rate risk measurement

Liquidity risk measurement

Modeling non-maturity deposits

A Model of deposit interest rate development

Static replication of non-maturity deposits

References

12. Capital Adequacy

Principles of the Basel Accords

Basel I

Basel II

Minimum capital requirements

Supervisory review

Transparency

Basel III

Risk measures

Analytical VaR

Historical VaR

Monte-Carlo simulation

Risk categories

Market risk

Credit risk

Operational risk

References

13. Systemic Risks

Systemic risk in a nutshell

The dataset used in our examples

Core-periphery decomposition

Implementation in R

Results

The simulation method

The simulation

Implementation in R

Results

Possible interpretations and suggestions

References

V. Module 5: Machine Learning with R module

1. Introducing Machine Learning

The origins of machine learning

Uses and abuses of machine learning

Machine learning successes

The limits of machine learning

Machine learning ethics

How machines learn

Data storage

Abstraction

Generalization

Evaluation

Machine learning in practice

Types of input data

Types of machine learning algorithms

Matching input data to algorithms

Machine learning with R

Installing R packages

Loading and unloading R packages

2. Managing and Understanding Data

R data structures

Vectors

Factors

Lists

Data frames

Matrixes and arrays

Managing data with R

Saving, loading, and removing R data structures

Importing and saving data from CSV files

Exploring and understanding data

Exploring the structure of data

Exploring numeric variables

Measuring the central tendency – mean and median

Measuring spread – quartiles and the five-number summary

Visualizing numeric variables – boxplots

Visualizing numeric variables – histograms

Understanding numeric data – uniform and normal distributions

Measuring spread – variance and standard deviation

Exploring categorical variables

Measuring the central tendency – the mode

Exploring relationships between variables

Visualizing relationships – scatterplots

Examining relationships – two-way cross-tabulations

3. Lazy Learning – Classification Using Nearest Neighbors

Understanding nearest neighbor classification

The k-NN algorithm

Measuring similarity with distance

Choosing an appropriate k

Preparing data for use with k-NN

Why is the k-NN algorithm lazy?

Example – diagnosing breast cancer with the k-NN algorithm

Step 1 – collecting data

Step 2 – exploring and preparing the data

Transformation – normalizing numeric data

Data preparation – creating training and test datasets

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Transformation – z-score standardization

Testing alternative values of k

4. Probabilistic Learning – Classification Using Naive Bayes

Understanding Naive Bayes

Basic concepts of Bayesian methods

Understanding probability

Understanding joint probability

Computing conditional probability with Bayes' theorem

The Naive Bayes algorithm

Classification with Naive Bayes

The Laplace estimator

Using numeric features with Naive Bayes

Example – filtering mobile phone spam with the Naive Bayes algorithm

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – cleaning and standardizing text data

Data preparation – splitting text documents into words

Data preparation – creating training and test datasets

Visualizing text data – word clouds

Data preparation – creating indicator features for frequent words

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

5. Divide and Conquer – Classification Using Decision Trees and Rules

Understanding decision trees

Divide and conquer

The C5.0 decision tree algorithm

Choosing the best split

Pruning the decision tree

Example – identifying risky bank loans using C5.0 decision trees

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – creating random training and test datasets

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Boosting the accuracy of decision trees

Making mistakes more costlier than others

Understanding classification rules

Separate and conquer

The 1R algorithm

The RIPPER algorithm

Rules from decision trees

What makes trees and rules greedy?

Example – identifying poisonous mushrooms with rule learners

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

6. Forecasting Numeric Data – Regression Methods

Understanding regression

Simple linear regression

Ordinary least squares estimation

Correlations

Multiple linear regression

Example – predicting medical expenses using linear regression

Step 1 – collecting data

Step 2 – exploring and preparing the data

Exploring relationships among features – the correlation matrix

Visualizing relationships among features – the scatterplot matrix

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Model specification – adding non-linear relationships

Transformation – converting a numeric variable to a binary indicator

Model specification – adding interaction effects

Putting it all together – an improved regression model

Understanding regression trees and model trees

Adding regression to trees

Example – estimating the quality of wines with regression trees and model trees

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Visualizing decision trees

Step 4 – evaluating model performance

Measuring performance with the mean absolute error

Step 5 – improving model performance

7. Black Box Methods – Neural Networks and Support Vector Machines

Understanding neural networks

From biological to artificial neurons

Activation functions

Network topology

The number of layers

The direction of information travel

The number of nodes in each layer

Training neural networks with backpropagation

Example – Modeling the strength of concrete with ANNs

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Understanding Support Vector Machines

Classification with hyperplanes

The case of linearly separable data

The case of nonlinearly separable data

Using kernels for non-linear spaces

Example – performing OCR with SVMs

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

8. Finding Patterns – Market Basket Analysis Using Association Rules

Understanding association rules

The Apriori algorithm for association rule learning

Measuring rule interest – support and confidence

Building a set of rules with the Apriori principle

Example – identifying frequently purchased groceries with association rules

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – creating a sparse matrix for transaction data

Visualizing item support – item frequency plots

Visualizing the transaction data – plotting the sparse matrix

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Sorting the set of association rules

Taking subsets of association rules

Saving association rules to a file or data frame

9. Finding Groups of Data – Clustering with k-means

Understanding clustering

Clustering as a machine learning task

The k-means clustering algorithm

Using distance to assign and update clusters

Choosing the appropriate number of clusters

Example – finding teen market segments using k-means clustering

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – dummy coding missing values

Data preparation – imputing the missing values

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

10. Evaluating Model Performance

Measuring performance for classification

Working with classification prediction data in R

A closer look at confusion matrices

Using confusion matrices to measure performance

Beyond accuracy – other measures of performance

The kappa statistic

Sensitivity and specificity

Precision and recall

The F-measure

Visualizing performance trade-offs

ROC curves

Estimating future performance

The holdout method

Cross-validation

Bootstrap sampling

11. Improving Model Performance

Tuning stock models for better performance

Using caret for automated parameter tuning

Creating a simple tuned model

Customizing the tuning process

Improving model performance with meta-learning

Understanding ensembles

Bagging

Boosting

Random forests

Training random forests

Evaluating random forest performance

12. Specialized Machine Learning Topics

Working with proprietary files and databases

Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files

Querying data in SQL databases

Working with online data and services

Downloading the complete text of web pages

Scraping data from web pages

Parsing XML documents

Parsing JSON from web APIs

Working with domain-specific data

Analyzing bioinformatics data

Analyzing and visualizing network data

Improving the performance of R

Managing very large datasets

Generalizing tabular data structures with dplyr

Making data frames faster with data.table

Creating disk-based data frames with ff

Using massive matrices with bigmemory

Learning faster with parallel computing

Measuring execution time

Working in parallel with multicore and snow

Taking advantage of parallel with foreach and doParallel

Parallel cloud computing with MapReduce and Hadoop

GPU computing

Deploying optimized learning algorithms

Building bigger regression models with biglm

Growing bigger and faster random forests with bigrf

Training and evaluating models in parallel with caret

A. Reflect and Test Yourself Answers

Module 1: Data Analysis with R

Chapter 1: RefresheR

Chapter 2: The Shape of Data

Chapter 3: Describing Relationships

Chapter 4: Probability

Chapter 5: Using Data to Reason About the World

Chapter 6: Testing Hypotheses

Chapter 7: Bayesian Methods

Chapter 8: Predicting Continuous Variables

Chapter 9: Predicting Categorical Variables

Chapter 10: Sources of Data

Chapter 11: Dealing with Messy Data

Chapter 12: Dealing with Large Data

Module 2: R Graphs

Chapter 1: R Graphics

Chapter 2: Basic Graph Functions

Chapter 3: Beyond the Basics – Adjusting Key Parameters

Chapter 4: Creating Scatter Plots

Chapter 5: Creating Line Graphs and Time Series Charts

Chapter 6: Creating Bar, Dot, and Pie Charts

Chapter 7: Creating Histograms

Chapter 8: Box and Whisker Plots

Chapter 9: Creating Heat Maps and Contour Plots

Module 4: Mastering R for Quantitative Finance

Chapter 1: Time Series Analysis

Chapter 3: Forecasting Volume

Chapter 4: Big Data – Advanced Analytics

Chapter 5: FX Derivatives

Chapter 6: Interest Rate Derivatives and Models

Chapter 7: Exotic Options

Chapter 8: Optimal Hedging

Chapter 9: Fundamental Analysis

Module 5: Machine Learning with R

Chapter 1: Introducing Machine Learning

Chapter 2: Managing and Understanding Data

Chapter 3: Lazy Learning – Classification Using Nearest Neighbors

Chapter 4: Probabilistic Learning – Classification Using Naive Bayes

Chapter 5: Divide and Conquer – Classification Using Decision Trees and Rules

Chapter 6: Forecasting Numeric Data – Regression Methods

Chapter 7: Black Box Methods – Neural Networks and Support Vector Machines

Chapter 8: Finding Patterns – Market Basket Analysis Using Association Rules

B. Bibliography

Index

R: Data Analysis and Visualization

A course in five modules

Master the art of building analytical models using R with your Course Guide Edwin Moses

Learn data analysis, data visualization techniques, data mining, and machine learning all using R and also learn to build models in quantitative finance using this powerful language

To contact your Course Guide

Email: <edwinm@packtpub.com>

BIRMINGHAM - MUMBAI

Meet Your Course Guide

Welcome to this course on R, the statistical programming language for data scientists and statisticians. With this course, you'll embark on a journey of learning R for data science.

If you have any questions along the way, you can reach out to me over email and I'll make sure you get everything from the course that we've planned – for you to become a working R developer. Details of how to contact me are included on the first page of this course.

Course Structure

The R learning path created for you has five connected modules. Each of these modules are a mini-course in their own right, and as you complete each one, you'll have gained key skills and be ready for the material in the next module!

Now, let’s look at the pathway these modules create and how they will take you from doing data analysis with R to creating analytical models based on machine learning.

Course journey

This course begins by looking at the Data Analysis with R module. This module will help you navigate the R environment. You'll gain a thorough understanding of statistical reasoning and sampling. Finally, you'll be able to put best practices into effect to make your job easier and facilitate reproducibility.

The second place to explore is R Graphs. This module will help you leverage powerful default R graphics and utilize advanced graphics systems such as lattice and ggplot2, the grammar of graphics. Through inspecting large datasets using tableplot and stunning three-dimensional visualizations, you will know how to produce, customize, and publish advanced visualizations using this popular, and powerful, framework.

With the third module, Learning Data Mining with R, you will learn how to manipulate data with R using code snippets and be introduced to mining frequent patterns, association, and correlations while working with R programs. Discover how to write code for various predication models, stream data, and time-series data. You will also be introduced to solutions written in R based on RHadoop projects. You will finish this module feeling confident in your ability to know which data mining algorithm to apply in any situation.

The Mastering R for Quantitative Finance module pragmatically introduces both the quantitative finance concepts and their modeling in R, enabling you to build a tailor-made trading system on your own. By the end of the module, you will be well versed with various financial techniques using R and will be able to place good bets while making financial decisions.

Finally, we'll look at the Machine Learning with R module. With this module, you'll discover all the analytical tools you need to gain insights from complex data and learn how to choose the correct algorithm for your specific needs. Through full engagement with the sort of real-world problems data-wranglers face, you'll learn to apply machine learning methods to deal with common tasks, including classification, prediction, forecasting, market analysis, and clustering.

The Course Roadmap and Timeline

Here's a view of the entire course plan before we begin. This grid gives you a topic overview of the whole course and its modules, so you can see how we will move through particular phases of learning to use R, what skills you’ll be learning along the way, and what you can do with those skills at each point. I also offer you an estimate of the time you might want to take for each module, although a lot depends on your learning style how much you’re able to give the course each week!

Part I. Module 1: Data Analysis with R

Chapter 1. RefresheR

Before we dive into the (other) fun stuff (sampling multi-dimensional probability distributions, using convex optimization to fit data models, and so on), it would be helpful if we review those aspects of R that all subsequent chapters will assume knowledge of.

If you fancy yourself as an R guru, you should still, at least, skim through this chapter, because you'll almost certainly find the idioms, packages, and style introduced here to be beneficial in following along with the rest of the material.

If you don't care much about R (yet), and are just in this for the statistics, you can heave a heavy sigh of relief that, for the most part, you can run the code given in this book in the interactive R interpreter with very little modification, and just follow along with the ideas. However, it is my belief (read: delusion) that by the end of this book, you'll cultivate a newfound appreciation of R alongside a robust understanding of methods in data analysis.

Fire up your R interpreter, and let's get started!

Navigating the basics

In the interactive R interpreter, any line starting with a > character denotes R asking for input (If you see a + prompt, it means that you didn't finish typing a statement at the prompt and R is asking you to provide the rest of the expression.). Striking the return key will send your input to R to be evaluated. R's response is then spit back at you in the line immediately following your input, after which R asks for more input. This is called a REPL (Read-Evaluate-Print-Loop). It is also possible for R to read a batch of commands saved in a file (unsurprisingly called batch mode), but we'll be using the interactive mode for most of the book.

As you might imagine, R supports all the familiar mathematical operators as most other languages:

Arithmetic and assignment

Check out the following example:

> 2 + 2

[1] 4

> 9 / 3

[1] 3

> 5 %% 2 # modulus operator (remainder of 5 divided by 2)

[1] 1

Anything that occurs after the octothorpe or pound sign, #, (or hash-tag for you young'uns), is ignored by the R interpreter. This is useful for documenting the code in natural language. These are called comments.

In a multi-operation arithmetic expression, R will follow the standard order of operations from math. In order to override this natural order, you have to use parentheses flanking the sub-expression that you'd like to be performed first.

> 3 + 2 - 10 ^ 2 # ^ is the exponent operator

[1] -95

> 3 + (2 - 10) ^ 2

[1] 67

In practice, almost all compound expressions are split up with intermediate values assigned to variables which, when used in future expressions, are just like substituting the variable with the value that was assigned to it. The (primary) assignment operator is <-.

> # assignments follow the form VARIABLE <- VALUE

> var <- 10

> var

[1] 10

> var ^ 2

[1] 100

> VAR / 2 # variable names are case-sensitive

Error: object 'VAR' not found

Notice that the first and second lines in the preceding code snippet didn't have an output to be displayed, so R just immediately asked for more input. This is because assignments don't have a return value. Their only job is to give a value to a variable, or to change the existing value of a variable. Generally, operations and functions on variables in R don't change the value of the variable. Instead, they return the result of the operation. If you want to change a variable to the result of an operation using that variable, you have to reassign that variable as follows:

> var # var is 10

[1] 10

> var ^ 2

[1] 100

> var # var is still 10

[1] 10

> var <- var ^ 2 # no return value

> var # var is now 100

[1] 100

Be aware that variable names may contain numbers, underscores, and periods; this is something that trips up a lot of people who are familiar with other programming languages that disallow using periods in variable names. The only further restrictions on variable names are that it must start with a letter (or a period and then a letter), and that it must not be one of the reserved words in R such as TRUE, Inf, and so on.

Although the arithmetic operators that we've seen thus far are functions in their own right, most functions in R take the form: function_name (value(s) supplied to the function). The values supplied to the function are called arguments of that function.

> cos(3.14159) # cosine function

[1] -1

> cos(pi) # pi is a constant that R provides

[1] -1

> acos(-1) # arccosine function

[1] 2.141593

> acos(cos(pi)) + 10

[1] 13.14159

> # functions can be used as arguments to other functions

(If you paid attention in math class, you'll know that the cosine of π is -1, and that arccosine is the inverse function of cosine.)

There are hundreds of such useful functions defined in base R, only a handful of which we will see in this book. Two sections from now, we will be building our very own functions.

Before we move on from arithmetic, it will serve us well to visit some of the odd values that may result from certain operations:

> 1 / 0

[1] Inf

> 0 / 0

[1] NaN

It is common during practical usage of R to accidentally divide by zero. As you can see, this undefined operation yields an infinite value in R. Dividing zero by zero yields the value NaN, which stands for Not a Number.

Logicals and characters

So far, we've only been dealing with numerics, but there are other atomic data types in R. To wit:

> foo <- TRUE # foo is of the logical data type

> class(foo) # class() tells us the type

[1] logical

> bar <- hi! # bar is of the character data type

> class(bar)

[1] character

The logical data type (also called Booleans) can hold the values TRUE or FALSE or, equivalently, T or F. The familiar operators from Boolean algebra are defined for these types:

> foo

[1] TRUE

> foo && TRUE # boolean and

[1] TRUE

> foo && FALSE

[1] FALSE

> foo || FALSE # boolean or

[1] TRUE

> !foo # negation operator

[1] FALSE

In a Boolean expression with a logical value and a number, any number that is not 0 is interpreted as TRUE.

> foo && 1

[1] TRUE

> foo && 2

[1] TRUE

> foo && 0

[1] FALSE

Additionally, there are functions and operators that return logical values such as:

> 4 < 2 # less than operator

[1] FALSE

> 4 >= 4 # greater than or equal to

[1] TRUE

> 3 == 3 # equality operator

[1] TRUE

> 3 != 2 # inequality operator

[1] TRUE

Just as there are functions in R that are only defined for work on the numeric and logical data type, there are other functions that are designed to work only with the character data type, also known as strings:

> lang.domain <- statistics

> lang.domain <- toupper(lang.domain)

> print(lang.domain)

[1] STATISTICS

> # retrieves substring from first character to fourth character

> substr(lang.domain, 1, 4)

[1] STAT

> gsub(I, 1, lang.domain) # substitutes every I for 1

[1] STAT1ST1CS

# combines character strings

> paste(R does, lang.domain, !!!)

[1] R does STATISTICS !!!

Flow of control

The last topic in this section will be flow of control constructs.

The most basic flow of control construct is the if statement. The argument to an if statement (what goes between the parentheses), is an expression that returns a logical value. The block of code following the if statement gets executed only if the expression yields TRUE. For example:

> if(2 + 2 == 4)

+ print(very good)

[1] very good

> if(2 + 2 == 5)

+ print(all hail to the thief)

It is possible to execute more than one statement if an if condition is triggered; you just have to use curly brackets ({}) to contain the statements.

> if((4/2==2) && (2*2==4)){

+ print(four divided by two is two...)

+ print(and two times two is four)

+ }

[1] four divided by two is two...

[1] and two times two is four

It is also possible to specify a block of code that will get executed if the if conditional is FALSE.

> closing.time <- TRUE

> if(closing.time){

+ print(you don't have to go home)

+ print(but you can't stay here)

+ } else{

+ print(you can stay here!)

+ }

[1] you don't have to go home

[1] but you can't stay here

> if(!closing.time){

+ print(you don't have to go home)

+ print(but you can't stay here)

+ } else{

+ print(you can stay here!)

+ }

[1] you can stay here!

There are other flow of control constructs (like while and for), but we won't directly be using them much in this text.

Getting help in R

Before we go further, it would serve us well to have a brief section detailing how to get help in R. Most R tutorials leave this for one of the last sections—if it is even included at all! In my own personal experience, though, getting help is going to be one of the first things you will want to do as you add more bricks to your R knowledge castle. Learning R doesn't have to be difficult; just take it slowly, ask questions, and get help early. Go you!

It is easy to get help with R right at the console. Running the help.start() function at the prompt will start a manual browser. From here, you can do anything from going over the basics of R to reading the nitty-gritty details on how R works internally.

You can get help on a particular function in R if you know its name, by supplying that name as an argument to the help function. For example, let's say you want to know more about the gsub() function that I sprang on you before. Running the following code:

> help(gsub)

> # or simply

> ?gsub

will display a manual page documenting what the function is, how to use it, and examples of its usage.

This rapid accessibility to documentation means that I'm never hopelessly lost when I encounter a function which I haven't seen before. The downside to this extraordinarily convenient help mechanism is that I rarely bother to remember the order of arguments, since looking them up is just seconds away.

Occasionally, you won't quite remember the exact name of the function you're looking for, but you'll have an idea about what the name should be. For this, you can use the help.search() function.

> help.search(chisquare)

> # or simply

> ??chisquare

For tougher, more semantic queries, nothing beats a good old fashioned web search engine. If you don't get relevant results the first time, try adding the term programming or statistics in there for good measure.

Vectors

Vectors are the most basic data structures in R, and they are ubiquitous indeed. In fact, even the single values that we've been working with thus far were actually vectors of length 1. That's why the interactive R console has been printing [1] along with all of our output.

Vectors are essentially an ordered collection of values of the same atomic data type. Vectors can be arbitrarily large (with some limitations), or they can be just one single value.

The canonical way of building vectors manually is by using the c() function (which stands for combine).

> our.vect <- c(8, 6, 7, 5, 3, 0, 9)

> our.vect

[1] 8 6 7 5 3 0 9

In the preceding example, we created a numeric vector of length 7 (namely, Jenny's telephone number).

Note that if we tried to put character data types into this vector as follows:

> another.vect <- c(8, 6, 7, -, 3, 0, 9)

> another.vect

[1] 8 6 7 - 3 0 9

R would convert all the items in the vector (called elements) into character data types to satisfy the condition that all elements of a vector must be of the same type. A similar thing happens when you try to use logical values in a vector with numbers; the logical values would be converted into 1 and 0 (for TRUE and FALSE, respectively). These logicals will turn into TRUE and FALSE (note the quotation marks) when used in a vector that contains characters.

Subsetting

It is very common to want to extract one or more elements from a vector. For this, we use a technique called indexing or subsetting. After the vector, we put an integer in square brackets ([]) called the subscript operator. This instructs R to return the element at that index. The indices (plural for index, in case you were wondering!) for vectors in R start at 1, and stop at the length of the vector.

> our.vect[1] # to get the first value

[1] 8

> # the function length() returns the length of a vector

> length(our.vect)

[1] 7

> our.vect[length(our.vect)] # get the last element of a vector

[1] 9

Note that in the preceding code, we used a function in the subscript operator. In cases like these, R evaluates the expression in the subscript operator, and uses the number it returns as the index to extract.

If we get greedy, and try to extract an element at an index that doesn't exist, R will respond with NA, meaning, not available. We see this special value cropping up from time to time throughout this text.

> our.vect[10]

[1] NA

One of the most powerful ideas in R is that you can use vectors to subset other vectors:

> # extract the first, third, fifth, and

> # seventh element from our vector

> our.vect[c(1, 3, 5, 7)]

[1] 8 7 3 9

The ability to use vectors to index other vectors may not seem like much now, but its usefulness will become clear soon.

Another way to create vectors is by using sequences.

> other.vector <- 1:10

> other.vector

[1] 1 2 3 4 5 6 7 8 9 10

> another.vector <- seq(50, 30, by=-2)

> another.vector

[1] 50 48 46 44 42 40 38 36 34 32 30

Above, the 1:10 statement creates a vector from 1 to 10. 10:1 would have created the same 10 element vector, but in reverse. The seq() function is more general in that it allows sequences to be made using steps (among many other things).

Combining our knowledge of sequences and vectors subsetting vectors, we can get the first 5 digits of Jenny's number thusly:

> our.vect[1:5]

[1] 8 6 7 5 3

Vectorized functions

Part of what makes R so powerful is that many of R's functions take vectors as arguments. These vectorized functions are usually extremely fast and efficient. We've already seen one such function, length(), but there are many many others.

> # takes the mean of a vector

> mean(our.vect)

[1] 5.428571

> sd(our.vect) # standard deviation

[1] 3.101459

> min(our.vect)

[1] 0

> max(1:10)

[1] 10

> sum(c(1, 2, 3))

[1] 6

In practical settings, such as when reading data from files, it is common to have NA values in vectors:

> messy.vector <- c(8, 6, NA, 7, 5, NA, 3, 0, 9)

> messy.vector

[1] 8 6 NA 7 5 NA 3 0 9

> length(messy.vector)

[1] 9

Some vectorized functions will not allow NA values by default. In these cases, an extra keyword argument must be supplied along with the first argument to the function.

> mean(messy.vector)

Enjoying the preview?

Page 1 of 1

R: Data Analysis and Visualization

About this ebook

Brett Lantz

Read more from Brett Lantz

Related authors

Related to R

Related ebooks

Data Modeling & Design For You

Related podcast episodes

Related articles

Related categories

Reviews for R

What did you think?

Book preview

R - Brett Lantz

Table of Contents

R: Data Analysis and Visualization

R: Data Analysis and Visualization

Meet Your Course Guide

Course Structure

Course journey

The Course Roadmap and Timeline

Chapter 1. RefresheR

Navigating the basics

Arithmetic and assignment

Logicals and characters

Flow of control

Getting help in R

Vectors

Subsetting

Vectorized functions