Statistical Machine Learning

Statistical
Machine
Learning
CS 4440
Anushka Gupta, Jian
Hua, Ethan Jen
Outline
- Background:
- What is Machine Learning?
- Application in Industry
- Timeline of Development
- Types of Learning
- Some General Machine Learning Principles
- Challenges to Machine Learning
- Products:
- Commercial: Microsoft Azure
Background Information
What is Machine Learning
Machine learning is the subfield of computer science that gives computers the
ability to learn without being explicitly programmed (Arthur Samuel, 1959)
What is Machine Learning
CMU computer science professor Tom M. Mitchell provided a widely quoted,
more formal definition:
A computer program is said to learn from experience E with respect to some

class of tasks T and performance measure P, if its performance at tasks in T,
as measured by P, improves with experience E.
Application in Industry
Manufacturing
Predictive maintenance or condition monitoring
Warranty reserve estimation
Propensity to buy
Demand forecasting
Process optimization
Telematics
Retail
Predictive inventory planning
Recommendation engines
Upsell and cross- channel marketing
Market segmentation and targeting

Healthcare and Life Science
Alerts and diagnostics from real-time patient data
Disease identification and risk stratification
Patient triage optimization
Proactive health management
Healthcare provider sentiment analysis

Financial services
Risk analytics and regulation
Customer segmentation
Sales and marketing campaign management
Creditworthiness evaluation
Travel and hospitality
Aircraft scheduling
Dynamic pricing
Social media: consumer feedback and interaction analysis
Customer complaint resolution

Products that Use Machine
Learning
Google search
Amazon Recommendations
Siri
Self driving car

Companies involved
Google
Amazon
Microsoft
Uber
Apple
Tesla
Yahoo
Other major tech companies

Timeline of Development
1950 -The Turing Test
1952- The first computer learning program
1957 The first neural network for computer
1967 nearest neighbor

1981 Explanation based learning
1997 Deep Blue
2010 Microsoft Kinect

2011 IBM Watson beats human at Jeopardy
2014 Facebook DeepFace
2015 Amazon launches own machine learning platform.
2016 Google AlphaGo

Types of Learning
1) Supervised Learning
2) Unsupervised Learning
3) Reinforcement Learning
Supervised Learning
Goal:
Learn to predict the output from the input data
Data:
predictors and result (x and y)
Types of problems:
Classification, Regression
Types of algorithms:
Naive Bayes Classifier
Decision Trees
Unsupervised Learning
Goal:
Discover an underlying structure/description of the data
Data:
only have input data (x)
Types of problems:
Clustering, Association
K-means
Apriori algorithm
Reinforcement Learning
Goal:
Make decisions, pick the best decision in the current state
Data:
Actions, states, rewards
Types of problems:
Game AIs
Markov Decision Process (MDPs)
General Machine Learning
Principles
- Goal: generate a model based on dataset that is capable of predicting
new instances
- Training and Testing Sets

- Rule of thumb: train on 70% of the data, test on 30% of the data
- Cross-validation
- Used for model selection
- Divide training set into k folds, run k-1 iterations to train and evaluate on kth iteration
- Advantages: allows you train and test on your training data, model is averaged
General Machine Learning
Principles
Eager Learners Lazy Learners
Ex: Neural Ex: k Nearest

Network Neighbors
Learns as data Learns only

comes in when queried
Saves functions Saves individual

data points
Space efficient Not space

efficient
Slow learning, Fast learning,

faster prediction slow prediction
Challenges to Machine
Learning
- Overfitting:
- Trusting the data too much and developing a model that fits your training set but cannot
predict new values in the test set
- Usually identified by high training accuracy but low testing accuracy
- Tricks applied to algorithms to help prevent overfitting
- Curse of Dimensionality:
- Searching in spaces with higher dimensions is much harder
- Prediction power decreases in higher dimension spaces
- More data is needed to train a model for higher dimensions

Example of Overfitting
Products
- Microsofts cloud computing platform with a built in machine learning library
as part of the Cortana Intelligence Suite
- Backed by the Microsoft Azure Cloud Databases
- Drag/drop application of machine learning algorithms
- Use WorkBench UI for experiments or can use API for building other
applications
- Collaboration with shared WorkBench or Jupyter Notebooks
- Integration of Python and R scripts for scalability and customization of the

algorithms
- Waikato Environment for Knowledge Analysis developed at the University
of Waikato in New Zealand
- Java architecture so can be run on most devices easily
- Similar purpose as Azure in a smaller scale:

- Facilitate ML in an easy to use app
- Similar workflow: import, preprocess and then build the model
- Free, open source software with UI or command line usage
- Preprocessing filters to resample, or create discrete values

- Open source neural network library initially created by Google Brain Team
- Neural networks can generalize to model most classification problems
- Can be used from C, C++ and Python applications

- TensorFlow inteface is in Python
- APIs vary from high level APIs for beginners to lower level APIs for fine
tuned ML research
- Graph of tensors (n-dimensional arrays) where nodes are operations

connected by tensors
Neural Network
Source: https://i.stack.imgur.com/1bCQl.png
Prope Micro Tenso WEKA
rties soft rFlow
Azure
Types Classif Neural Data

of ication, Networ prepro
Algorit Regre k cessin
hms ssion, g
Cluster ,C
ing, lusteri
Comp ng
,C
uter lassific
Vision, ation
Text ,R
Analyti egress
cs ion
,F
Data Azure Deskto SQL
eature
Source Cloud p Datab
selecti
s Storag upload ases,
on
Suppo e, Deskto
rted Hadoo p
p, upload
Manua
l Entry,
Research
Where Machine Learning is
headed
Machine learning is used in a great number of industries from self-driving cars to communicating
with humans
Chat bots
Security
recommendations
How do we efficiently learn in settings where exploration is required?
How can we do effective offline evaluation of algorithms?
How can we be both efficient in sample complexity and computational complexity?
How can we learn from lots of data?
How can we learn to index efficiently?

Machine Learning Research
Pedestrian Detection for autonomous Vehicles
k-Nearest Neighbours
Nave Bayes classifier
Support Vector Machine
Cloud of points generated by the sensor is processed to detect pedestrians, by selecting

cubic shapes and applying machine vision and machine learning algorithms to the XY, XZ,
and YZ projections of the points contained in the cube
Human Perception of Images

Subliminal priming of perception of images
Cont.
Neuroimaging for Drug Discovery and Development
Machine learning enables predictions at the individual level based on the distributed effects
across the whole brain
Disease detection
Cont.
Financial
Algorithmic trading
High Frequency Trading
Quantopian
Two Sigma
Citadel
Use technical indicators
Loan/Insurance underwriting
Machine Perception
WEKA Demo
Sources
Agost, 5 examples of predictive analytics in the travel industry, 2016, avaliable at
http://www.amadeus.com/blog/07/04/5-examples-predictive-analytics-travel-industry/
Buggey, T. (2007, Summer). Storyboard for Ivan's morning routine. Diagram. Journal of Positive Behavior Interventions, 9(3), 151. Retrieved December 14, 2007, from Academic
Search Premier database.
Costa, L., Gago, M. F., Yelshyna, D., Ferreira, J., David Silva, H., Rocha, L., & ... Bicho, E. (2016). Application of Machine Learning in Postural Control Kinematics for the Diagnosis of
Alzheimers Disease. Computational Intelligence & Neuroscience, 1-15. doi:10.1155/2016/3891253
Columbus ,10 Ways Machine Learning Is Revolutionizing Manufacturing 2016, avaliable at

http://www.forbes.com/sites/louiscolumbus/2016/06/26/10-ways-machine-learning-is-revolutionizing-manufacturing/#33f053862d7f
Columbus, Machine Learning Is Redefining the Enterprise in 2016, 2016 avaliable at

http://www.business2community.com/business-innovation/machine-learning-redefining-enterprise-2016-01569528#ZJmwLJcjKHZbLaXg.97
Doyle, O., Mehta, M., & Brammer, M. (2015). The role of machine learning in neuroimaging for drug discovery and development. Psychopharmacology,
232(21/22), 4179-4189.
doi:10.1007/s00213-015-3968-0
Google, TensorFlow: Basic Usage, 2016, available at https://www.tensorflow.org/versions/r0.10/get_started/basic_usage

Sources
Google, TensorFlow: Tensor Rank, Shapes and Types, 2016, avaliable at https://www.tensorflow.org/versions/r0.10/resources/dims_types
Google, TensorFlow: Reading Data, 2016, avaliable at https://www.tensorflow.org/programmers_guide/reading_data
IBM,Deep Blue, 1997, avaliable at http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/deepblue/
Machine Learning and Algorithms; Agile Development. (2012). Communications of the ACM, 55(8), 10-11. doi:10.1145/2240236.2240239
Marr, How Machine Learning, Big Data And AI Are Changing Healthcare, 2016, avaliable at
http://www.forbes.com/sites/bernardmarr/2016/09/23/how-machine-learning-big-data-and-ai-are-changing-healthcare-forever/#16a3c8654f49
Marr, Short History of Machine Learning -- Every Manager Should Read, 2016, avaliable at
http://www.forbes.com/sites/bernardmarr/2016/02/19/a-short-history-of-machine-learning-every-manager-should-read/#3f2162be323f
Microsoft, Azure Machine Learning, 2017, avaliable at https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-faq
Microsoft, Overview of Azure Machine Learning, 2016, avaliable at https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-studio-overview-diagram
Microsoft, Introduction to Azure Machine Learning in the Cloud, 2017, avaliable at

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-what-is-machine-learning
Sources
Mohan, D. M., Kumar, P., Mahmood, F., Wong, K. F., Agrawal, A., Elgendi, M., & ... Chan, A. D. (2016). Effect of Subliminal Lexical Priming on the Subjective Perception of Images: A
Machine Learning Approach. Plos ONE, 11(2), 1-22. doi:10.1371/journal.pone.0148332
Navarro, P. J., Fernndez, C., Borraz, R., & Alonso, D. (2017). A Machine Learning Approach to Pedestrian Detection for Autonomous Vehicles Using High-Definition 3D Range
Data. Sensors (14248220), 17(1), 1-20. doi:10.3390/s17010018
Shish, Big Data & Machine Learning Scenarios for Retail,2015, avaliable at
https://blogs.msdn.microsoft.com/shishirs/2015/01/26/big-data-machine-learning-scenarios-for-retail/
WEKA, The_WEKA_Workbench.pdf, 2016 available at http://www.cs.waikato.ac.nz/ml/weka/Witten_et_al_2016_appendix.pdf
Zhang, X., Mahoor, M., & Mavadati, S. (2015). Facial expression recognition using $${l}_{p}$$ -norm MKL multiclass-SVM. Machine Vision & Applications, 26(4), 467-483.
doi:10.1007/s00138-015-0677-y

Statistical Machine Learning

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Statistical Machine Learning

Cargado por

Copyright:

Formatos disponibles

Statistical

- Some General Machine Learning Principles

- Challenges to Machine Learning

A computer program is said to learn from experience E with respect to some

Predictive maintenance or condition monitoring

Warranty reserve estimation

Predictive inventory planning

Upsell and cross- channel marketing

Market segmentation and targeting

Alerts and diagnostics from real-time patient data

Disease identification and risk stratification

Patient triage optimization

Proactive health management

Healthcare provider sentiment analysis

Risk analytics and regulation

Sales and marketing campaign management

Social media: consumer feedback and interaction analysis

Customer complaint resolution

Self driving car

Other major tech companies

1952- The first computer learning program

1957 The first neural network for computer

1967 nearest neighbor

1997 Deep Blue

2010 Microsoft Kinect

2014 Facebook DeepFace

2015 Amazon launches own machine learning platform.

2016 Google AlphaGo

- Training and Testing Sets

Ex: Neural Ex: k Nearest

Learns as data Learns only

Saves functions Saves individual

Space efficient Not space

Slow learning, Fast learning,

- Usually identified by high training accuracy but low testing accuracy

- Tricks applied to algorithms to help prevent overfitting

- Prediction power decreases in higher dimension spaces

- More data is needed to train a model for higher dimensions

- Backed by the Microsoft Azure Cloud Databases

- Drag/drop application of machine learning algorithms

- Collaboration with shared WorkBench or Jupyter Notebooks

- Integration of Python and R scripts for scalability and customization of the

- Java architecture so can be run on most devices easily

- Similar purpose as Azure in a smaller scale:

- Similar workflow: import, preprocess and then build the model

- Free, open source software with UI or command line usage

- Preprocessing filters to resample, or create discrete values

- Neural networks can generalize to model most classification problems

- Can be used from C, C++ and Python applications

- Graph of tensors (n-dimensional arrays) where nodes are operations

Types Classif Neural Data

How do we efficiently learn in settings where exploration is required?

How can we do effective offline evaluation of algorithms?

How can we be both efficient in sample complexity and computational complexity?

How can we learn from lots of data?

How can we learn to index efficiently?

Nave Bayes classifier

Support Vector Machine

Cloud of points generated by the sensor is processed to detect pedestrians, by selecting

Human Perception of Images

High Frequency Trading

Use technical indicators

Columbus ,10 Ways Machine Learning Is Revolutionizing Manufacturing 2016, avaliable at

Columbus, Machine Learning Is Redefining the Enterprise in 2016, 2016 avaliable at