Test-Driven Machine Learning
()
About this ebook
Related to Test-Driven Machine Learning
Related ebooks
Python High Performance - Second Edition Rating: 0 out of 5 stars0 ratingsDesigning Machine Learning Systems with Python Rating: 0 out of 5 stars0 ratingsBuilding a Recommendation System with R Rating: 0 out of 5 stars0 ratingsDistributed Computing with Python Rating: 0 out of 5 stars0 ratingsNeo4j High Performance Rating: 0 out of 5 stars0 ratingsBuilding Web Applications with Flask Rating: 0 out of 5 stars0 ratingsLearning Data Mining with Python - Second Edition Rating: 0 out of 5 stars0 ratingsMicrosoft Azure Machine Learning Rating: 4 out of 5 stars4/5Reinforcement Learning Algorithms with Python: Learn, understand, and develop smart algorithms for addressing AI challenges Rating: 0 out of 5 stars0 ratingsTest-Driven Python Development Rating: 5 out of 5 stars5/5Modular Programming with Python Rating: 0 out of 5 stars0 ratingsPython Data Analysis - Second Edition Rating: 0 out of 5 stars0 ratingsMastering Flask Rating: 0 out of 5 stars0 ratingsLearning Data Mining with Python Rating: 0 out of 5 stars0 ratingsFlask Blueprints Rating: 0 out of 5 stars0 ratingsAdvanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch Rating: 0 out of 5 stars0 ratingsPython Unlocked Rating: 0 out of 5 stars0 ratingsDeep Learning for Computer Vision with SAS: An Introduction Rating: 0 out of 5 stars0 ratingsDataOps A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsLearning PySpark Rating: 0 out of 5 stars0 ratingsPython: Deeper Insights into Machine Learning Rating: 0 out of 5 stars0 ratingsDeep Belief Nets in C++ and CUDA C: Volume 1: Restricted Boltzmann Machines and Supervised Feedforward Networks Rating: 0 out of 5 stars0 ratingsParallel Python with Dask Rating: 0 out of 5 stars0 ratingsHands-on ML Projects with OpenCV: Master computer vision and Machine Learning using OpenCV and Python Rating: 0 out of 5 stars0 ratingsMachine Learning for Business: Using Amazon SageMaker and Jupyter Rating: 5 out of 5 stars5/5
Computers For You
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsThe ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 0 out of 5 stars0 ratingsProcreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 0 out of 5 stars0 ratingsAlan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5CompTIA Security+ Practice Questions Rating: 2 out of 5 stars2/5Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance Rating: 0 out of 5 stars0 ratingsGrokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice Rating: 0 out of 5 stars0 ratingsChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratingsThe Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5Going Text: Mastering the Command Line Rating: 4 out of 5 stars4/5People Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5
Reviews for Test-Driven Machine Learning
0 ratings0 reviews
Book preview
Test-Driven Machine Learning - Bozonier Justin
Table of Contents
Test-Driven Machine Learning
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Introducing Test-Driven Machine Learning
Test-driven development
The TDD cycle
Red
Green
Refactor
Behavior-driven development
Our first test
The anatomy of a test
Given
When
Then
TDD applied to machine learning
Dealing with randomness
Different approaches to validating the improved models
Classification overview
Regression
Clustering
Quantifying the classification models
Summary
2. Perceptively Testing a Perceptron
Getting started
Summary
3. Exploring the Unknown with Multi-armed Bandits
Understanding a bandit
Testing with simulation
Starting from scratch
Simulating real world situations
A randomized probability matching algorithm
A bootstrapping bandit
The problem with straight bootstrapping
Multi-armed armed bandit throw down
Summary
4. Predicting Values with Regression
Refresher on advanced regression
Regression assumptions
Quantifying model quality
Generating our own data
Building the foundations of our model
Cross-validating our model
Generating data
Summary
5. Making Decisions Black and White with Logistic Regression
Generating logistic data
Measuring model accuracy
Generating a more complex example
Test driving our model
Summary
6. You're So Naïve, Bayes
Gaussian classification by hand
Beginning the development
Summary
7. Optimizing by Choosing a New Algorithm
Upgrading the classifier
Applying our classifier
Upgrading to Random Forest
Summary
8. Exploring scikit-learn Test First
Test-driven design
Planning our journey
Creating a classifier chooser (it needs to run tests to evaluate classifier performance)
Getting choosey
Developing testable documentation
Decision trees
Summary
9. Bringing It All Together
Starting at the highest level
The real world
What we've accomplished
Summary
Index
Test-Driven Machine Learning
Test-Driven Machine Learning
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2015
Production reference: 1231115
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78439-908-5
www.packtpub.com
Credits
Author
Justin Bozonier
Reviewers
Lars Marius Garshol
Alexey Grigorev
Commissioning Editor
Dipika Gaonkar
Acquisition Editors
Divya Poojari
Llewellyn Rozario
Content Development Editor
Nikhil Potdukhe
Technical Editors
Rupali R. Shrawane
Copy Editor
Yesha Gangani
Project Coordinator
Paushali Desai
Proofreader
Safis Editing
Indexer
Tejal Daruwale Soni
Graphics
Jason Monteiro
Production Coordinator
Melwyn Dsa
Cover Work
Melwyn Dsa
About the Author
Justin Bozonier is a data scientist living in Chicago. He is currently a Senior Data Scientist at GrubHub. He has led the development of their custom analytics platform and also led the development of their first real time split test analysis platform which utilized Bayesian Statistics. In addition he has developed machine learning models for data mining as well as for prototyping product enhancements. Justin's software development expertise has earned him acknowledgements in the books Parallel Programming with Microsoft® .NET as well as Flow-Based Programming, Second Edition. He has also taught a workshop at PyData titled Simplified Statistics through Simulation.
His previous work experience includes being an Actuarial Systems Developer at Milliman, Inc., contracting as a Software Development Engineer II at Microsoft, and working as a Sr. Data Analyst and Lead Developer at Cheezburger Network amongst other experience.
Savannah Bozonier—the best partner I've ever had in life. Time and again she has made room in her life so I can push myself to do things that take an immense amount of time. Things like writing this book.
My friends and colleagues for their support and help which culminated in this book: Tom Hayden, Drew Fustin, and Andrew Slotnick.
My mentors across the years—Chad Boyer, Kelly Leahy, Robert Ream, James Thigpen, and Loren Bast.
My parents—I don't know what it's like to be told I can't do something. My life reflects that in every way.
About the Reviewers
Lars Marius Garshol has worked as a consultant, product developer, and open source developer for two decades. He added Unicode support to the Opera web browser, edited a number of ISO standards, and developed the query language tolog. Later, he worked as an enterprise architect and an R&D developer. He is the developer of Duke, an open source tool for identifying near-duplicate database records. He wrote Definitive XML Application Development, published in 2002. Currently he is a software engineer at Schibsted Products & Technology in Oslo, Norway. He's working on a book on Norwegian farmhouse ale.
Alexey Grigorev is an experienced software developer and data scientist with five years of professional experience. In his day-to-day job, he actively uses R and Python for data cleaning, data analysis, and modeling. He believes that testing is not only an integral part of software development, but it is also very useful for building machine learning models.
www.PacktPub.com
Support files, eBooks, discount offers, and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Preface
Consistent, steady improvement is the name of the game in Machine Learning. Sometimes you find yourself implementing an algorithm from scratch; sometimes you're pulling in libraries. You always need the option to try new algorithms and improve performance. Simultaneously, you need to know that performance doesn't degrade.
You could just ask an expert about every change because testing stochastic algorithms seems impossible. That's just as terribly slow as it sounds. What if you could automate checking that your updated algorithms outperform your previous ones? What if you could design your code so that you could swap in an algorithm from another library or pit one that you wrote yourself against what you have? These are all reasons for this book.
We'll be covering what test-driven development is and what value it brings to machine learning. We'll be using nosetests in Python 2.7 to develop our tests. For machine-learning algorithms, we will be using Statsmodels and sci-kit learn. Statsmodels has some great implementations of regression. sci-kit learn is useful for its plethora of supported classification algorithms.
What this book covers
Chapter 1, Introducing to Test-Driven Machine Learning, explains what Test-Driven Development is, what it looks like, and how it is done in practice.
Chapter 2, Perceptively Testing a Perceptron, develops a perceptron from scratch and defines its behavior even though it behaves non-deterministically.
Chapter 3, Exploring the Unknown with Multi-armed Bandits, introduces multi-armed bandit problems, testing different algorithms, and iterating on their performance.
Chapter 4, Predicting Values with Regression, uses statsmodels to implement regression and report on key performance metrics. We will also explore tuning the model.
Chapter 5, Making Decisions Black and White with Logistic Regression, continues exploring regression as well as quantifying quality of this different type of it. We will use statsmodels again to create our regression models.
Chapter 6, You're So Naïve, Bayes, helps us develop a Gaussian Naïve Bayes algorithm from scratch using test-driven development.
Chapter 7, Optimizing by Choosing a New Algorithm, continues the work from Chapter 6, You're So Naïve, Bayes, and attempts to improve upon it using a new algorithm: Random Forests.
Chapter 8, Exploring scikit-learn Test First, teaches how to teach oneself. You probably already have a lot of experience of this. This chapter will build upon this by teaching you to use the test framework to document sci-kit learn.
Chapter 9, Bringing it all Together, takes a business problem that requires a couple of different algorithms. Again, we will develop everything we need from scratch and mix our code with third party libraries, completely test-driven.
What you need for this book
We will be using Python 2.7 in this book along with nosetests to unit test our software. In addition, we will be using statsmodels as well as scikit-learn.
Who this book is for
This book is for machine learning professionals who want to be able to test the improvements to their algorithms in isolation and in an automated fashion. This book is for any data scientist who wants to get started in Test-Driven Development with minimal religion and maximum value. This book is not for someone who wants to learn state of the art Test-Driven Development. It is written with the idea that the majority of what can be learned from Test-Driven Development is remarkably simple. We will provide a relatively simple approach to it which the reader can choose to augment as they see fit.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: Notice that in my test, I instantiate a NumberGuesser object.
A block of code is set as follows:
def given_no_information_when_asked_to_guess_test():
number_guesser = NumberGuesser()
result = number_guesser.guess()
assert result is None, Then it should provide no result.
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
for the_class, trained_observations in self._classifications.items(): if len(trained_observations) <= 1:
return None
probability_of_observation_given_class[the_class] = self._probability_given_class(trained_observations, observation)
[default]
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <feedback@packtpub.com>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Downloading the color images of this book
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from http://www.packtpub.com/sites/default/files/downloads/TestDrivenMachineLearning_ColorImages.pdf.
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us