book

Thoughtful Machine Learning with Python

Name: Thoughtful Machine Learning with Python
Author: Matthew Kirk
ISBN: 9781491924136

by Matthew Kirk

January 2017

Intermediate to advanced

218 pages

5h 10m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
1. Probably Approximately Correct Software
Writing Software RightSOLIDTesting or TDDRefactoringWriting the Right SoftwareWriting the Right Software with Machine LearningWhat Exactly Is Machine Learning?The High Interest Credit Card Debt of Machine LearningSOLID Applied to Machine LearningMachine Learning Code Is Complex but Not ImpossibleTDD: Scientific Method 2.0Refactoring Our Way to KnowledgeThe Plan for the Book
2. A Quick Introduction to Machine Learning
What Is Machine Learning?Supervised LearningUnsupervised LearningReinforcement LearningWhat Can Machine Learning Accomplish?Mathematical Notation Used Throughout the BookConclusion
3. K-Nearest Neighbors
How Do You Determine Whether You Want to Buy a House?How Valuable Is That House?Hedonic RegressionWhat Is a Neighborhood?K-Nearest NeighborsMr. K’s Nearest NeighborhoodDistancesTriangle InequalityGeometrical DistanceComputational DistancesStatistical DistancesCurse of DimensionalityHow Do We Pick K?Guessing KHeuristics for Picking KValuing Houses in SeattleAbout the DataGeneral StrategyCoding and Testing DesignKNN Regressor ConstructionKNN TestingConclusion
4. Naive Bayesian Classification
Using Bayes’ Theorem to Find Fraudulent OrdersConditional ProbabilitiesProbability SymbolsInverse Conditional Probability (aka Bayes’ Theorem)Naive Bayesian ClassifierThe Chain RuleNaiveté in Bayesian ReasoningPseudocountSpam FilterSetup NotesCoding and Testing DesignData SourceEmailObjectTokenization and ContextSpamTrainerError Minimization Through Cross-ValidationConclusion
5. Decision Trees and Random Forests
The Nuances of MushroomsClassifying Mushrooms Using a Folk TheoremFinding an Optimal Switch PointInformation GainGINI ImpurityVariance ReductionPruning TreesEnsemble LearningWriting a Mushroom ClassifierConclusion
6. Hidden Markov Models
Tracking User Behavior Using State MachinesEmissions/Observations of Underlying StatesSimplification Through the Markov AssumptionUsing Markov Chains Instead of a Finite State MachineHidden Markov ModelEvaluation: Forward-Backward AlgorithmMathematical Representation of the Forward-Backward AlgorithmUsing User BehaviorThe Decoding Problem Through the Viterbi AlgorithmThe Learning ProblemPart-of-Speech Tagging with the Brown CorpusSetup NotesCoding and Testing DesignThe Seam of Our Part-of-Speech Tagger: CorpusParserWriting the Part-of-Speech TaggerCross-Validating to Get Confidence in the ModelHow to Make This Model BetterConclusion
7. Support Vector Machines
Customer Happiness as a Function of What They SaySentiment Classification Using SVMsThe Theory Behind SVMsDecision BoundaryMaximizing BoundariesKernel Trick: Feature TransformationOptimizing with SlackSentiment AnalyzerSetup NotesCoding and Testing DesignSVM Testing StrategiesCorpus ClassCorpusSet ClassModel Validation and the Sentiment ClassifierAggregating SentimentExponentially Weighted Moving AverageMapping Sentiment to Bottom LineConclusion
8. Neural Networks
What Is a Neural Network?History of Neural NetsBoolean LogicPerceptronsHow to Construct Feed-Forward Neural NetsInput LayerHidden LayersNeuronsActivation FunctionsOutput LayerTraining AlgorithmsThe Delta RuleBack PropagationQuickPropRPropBuilding Neural NetworksHow Many Hidden Layers?How Many Neurons for Each Layer?Tolerance for Error and Max EpochsUsing a Neural Network to Classify a LanguageSetup NotesCoding and Testing DesignThe DataWriting the Seam Test for LanguageCross-Validating Our Way to a Network ClassTuning the Neural NetworkPrecision and Recall for Neural NetworksWrap-Up of ExampleConclusion
9. Clustering
Studying Data Without Any BiasUser CohortsTesting Cluster MappingsFitness of a ClusterSilhouette CoefficientComparing Results to Ground TruthK-Means ClusteringThe K-Means AlgorithmDownside of K-Means ClusteringEM ClusteringAlgorithmThe Impossibility TheoremExample: Categorizing MusicSetup NotesGathering the DataCoding DesignAnalyzing the Data with K-MeansEM Clustering Our DataThe Results from the EM Jazz ClusteringConclusion

10. Improving Models and Data Extraction
Debate ClubPicking Better DataFeature SelectionExhaustive SearchRandom Feature SelectionA Better Feature Selection AlgorithmMinimum Redundancy Maximum Relevance Feature SelectionFeature Transformation and Matrix FactorizationPrincipal Component AnalysisIndependent Component AnalysisEnsemble LearningBaggingBoostingConclusion
11. Putting It Together: Conclusion
Machine Learning Algorithms RevisitedHow to Use This Information to Solve ProblemsWhat’s Next for You?
Index

Content preview from Thoughtful Machine Learning with Python

Chapter 1. Probably Approximately Correct Software

If you’ve ever flown on an airplane, you have participated in one of the safest forms of travel in the world. The odds of being killed in an airplane are 1 in 29.4 million, meaning that you could decide to become an airline pilot, and throughout a 40-year career, never once be in a crash. Those odds are staggering considering just how complex airplanes really are. But it wasn’t always that way.

The year 2014 was bad for aviation; there were 824 aviation-related deaths, including the Malaysia Air plane that went missing. In 1929 there were 257 casualties. This makes it seem like we’ve become worse at aviation until you realize that in the US alone there are over 10 million flights per year, whereas in 1929 there were substantially fewer—about 50,000 to 100,000. This means that the overall probability of being killed in a plane wreck from 1929 to 2014 has plummeted from 0.25% to 0.00824%.

Plane travel changed over the years and so has software development. While in 1929 software development as we know it didn’t exist, over the course of 85 years we have built and failed many software projects.

Recent examples include software projects like the launch of healthcare.gov, which was a fiscal disaster, costing around $634 million dollars. Even worse are software projects that have other disastrous bugs. In 2013 NASDAQ shut down due to a software glitch and was fined $10 million USD. The year 2014 saw the Heartbleed bug infection, which ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491924129Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Thoughtful Machine Learning with Python

by Matthew Kirk

Chapter 1. Probably Approximately Correct Software

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.