book

Machine Learning For Dummies

Name: Machine Learning For Dummies
ISBN: 9781119245513

by John Paul Mueller, Luca Massaron

May 2016

Intermediate to advanced

432 pages

11h 16m

English

For Dummies

Read now

Unlock full access

Cover
Introduction
About This BookFoolish AssumptionsIcons Used in This BookBeyond the BookWhere to Go from Here
Part 1: Introducing How Machines Learn
Chapter 1: Getting the Real Story about AI
Moving beyond the HypeDreaming of Electric SheepOvercoming AI FantasiesConsidering the Relationship between AI and Machine LearningConsidering AI and Machine Learning SpecificationsDefining the Divide between Art and Engineering
Chapter 2: Learning in the Age of Big Data
Defining Big DataConsidering the Sources of Big DataSpecifying the Role of Statistics in Machine LearningUnderstanding the Role of AlgorithmsDefining What Training Means
Chapter 3: Having a Glance at the Future
Creating Useful Technologies for the FutureDiscovering the New Work Opportunities with Machine LearningAvoiding the Potential Pitfalls of Future Technologies
Part 2: Preparing Your Learning Tools
Chapter 4: Installing an R Distribution
Choosing an R Distribution with Machine Learning in MindInstalling R on WindowsInstalling R on LinuxInstalling R on Mac OS XDownloading the Datasets and Example Code
Chapter 5: Coding in R Using RStudio
Understanding the Basic Data TypesWorking with VectorsOrganizing Data Using ListsWorking with MatricesInteracting with Multiple Dimensions Using ArraysCreating a Data FramePerforming Basic Statistical Tasks
Chapter 6: Installing a Python Distribution
Choosing a Python Distribution with Machine Learning in MindInstalling Python on LinuxInstalling Python on Mac OS XInstalling Python on WindowsDownloading the Datasets and Example Code

Chapter 7: Coding in Python Using Anaconda
Working with Numbers and LogicCreating and Using StringsInteracting with DatesCreating and Using FunctionsUsing Conditional and Loop StatementsStoring Data Using Sets, Lists, and TuplesDefining Useful IteratorsIndexing Data Using DictionariesStoring Code in Modules
Chapter 8: Exploring Other Machine Learning Tools
Meeting the Precursors SAS, Stata, and SPSSLearning in Academia with WekaAccessing Complex Algorithms Easily Using LIBSVMRunning As Fast As Light with Vowpal WabbitVisualizing with Knime and RapidMinerDealing with Massive Data by Using Spark
Part 3: Getting Started with the Math Basics
Chapter 9: Demystifying the Math Behind Machine Learning
Working with DataExploring the World of ProbabilitiesDescribing the Use of Statistics
Chapter 10: Descending the Right Curve
Interpreting Learning As OptimizationExploring Cost FunctionsDescending the Error CurveUpdating by Mini-Batch and Online
Chapter 11: Validating Machine Learning
Checking Out-of-Sample ErrorsGetting to Know the Limits of BiasKeeping Model Complexity in MindKeeping Solutions BalancedTraining, Validating, and TestingResorting to Cross-ValidationLooking for Alternatives in ValidationOptimizing Cross-Validation ChoicesAvoiding Sample Bias and Leakage Traps
Chapter 12: Starting with Simple Learners
Discovering the Incredible PerceptronGrowing Greedy Classification TreesTaking a Probabilistic Turn
Part 4: Learning from Smart and Big Data
Chapter 13: Preprocessing Data
Gathering and Cleaning DataRepairing Missing DataTransforming DistributionsCreating Your Own FeaturesCompressing DataDelimiting Anomalous Data
Chapter 14: Leveraging Similarity
Measuring Similarity between VectorsUsing Distances to Locate ClustersTuning the K-Means AlgorithmSearching for Classification by K-Nearest NeighborsLeveraging the Correct K Parameter
Chapter 15: Working with Linear Models the Easy Way
Starting to Combine VariablesMixing Variables of Different TypesSwitching to ProbabilitiesGuessing the Right FeaturesLearning One Example at a Time
Chapter 16: Hitting Complexity with Neural Networks
Learning and Imitating from NatureStruggling with OverfittingIntroducing Deep Learning
Chapter 17: Going a Step beyond Using Support Vector Machines
Revisiting the Separation Problem: A New ApproachExplaining the AlgorithmApplying NonlinearityIllustrating Hyper-ParametersClassifying and Estimating with SVM
Chapter 18: Resorting to Ensembles of Learners
Leveraging Decision TreesWorking with Almost Random GuessesBoosting Smart PredictorsAveraging Different Predictors
Part 5: Applying Learning to Real Problems
Chapter 19: Classifying Images
Working with a Set of ImagesExtracting Visual FeaturesRecognizing Faces Using EigenfacesClassifying Images
Chapter 20: Scoring Opinions and Sentiments
Introducing Natural Language ProcessingUnderstanding How Machines ReadUsing Scoring and Classification
Chapter 21: Recommending Products and Movies
Realizing the RevolutionDownloading Rating DataLeveraging SVD
Part 6: The Part of Tens
Chapter 22: Ten Machine Learning Packages to Master
Cloudera OryxCUDA-ConvnetConvNetJSe1071gbmGensimglmnetrandomForestSciPyXGBoost
Chapter 23: Ten Ways to Improve Your Machine Learning Models
Studying Learning CurvesUsing Cross-Validation CorrectlyChoosing the Right Error or Score MetricSearching for the Best Hyper-ParametersTesting Multiple ModelsAveraging ModelsStacking ModelsApplying Feature EngineeringSelecting Features and ExamplesLooking for More Data
About the Author
Advertisement Page
Connect with Dummies
End User License Agreement

Content preview from Machine Learning For Dummies

Chapter 11

Validating Machine Learning

IN THIS CHAPTER

Explaining how correct sampling is critical in machine learning

Highlighting errors dictated by bias and variance

Proposing different approaches to validation and testing

Warning against biased samples, overfitting, underfitting, and snooping

Having examples (in the form of datasets) and a machine learning algorithm at hand doesn’t assure that solving a learning problem is possible or that the results will provide the desired solution. For example, if you want your computer to distinguish a photo of a dog from a photo of a cat, you can provide it with good examples of dogs and cats. You then train a dog versus cat classifier based on some machine learning algorithm that could output the probability that a given photo is a dog or a cat. Of course, the output is a probability — not an absolute assurance that the photo is a dog or cat.

Based on the probability that the classifier reports, you can decide the class (dog or cat) of a photo based on the estimated probability calculated by the algorithm. When the probability is higher for a dog, you can minimize the risk of making a wrong assessment by choosing the higher chances favoring a dog. The greater the probability difference between the likelihood of a dog against that of a cat, the higher the confidence you can have in your choice. A close choice likely occurs because of some ambiguity in the photo (the photo is not clear or the dog is actually a bit cattish). For that ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781119245513Purchase book

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Machine Learning For Dummies

by John Paul Mueller, Luca Massaron

Validating Machine Learning

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.