book

Thoughtful Machine Learning with Python

Name: Thoughtful Machine Learning with Python
Author: Matthew Kirk
ISBN: 9781491924136

by Matthew Kirk

January 2017

Intermediate to advanced

218 pages

5h 10m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
1. Probably Approximately Correct Software
Writing Software RightSOLIDTesting or TDDRefactoringWriting the Right SoftwareWriting the Right Software with Machine LearningWhat Exactly Is Machine Learning?The High Interest Credit Card Debt of Machine LearningSOLID Applied to Machine LearningMachine Learning Code Is Complex but Not ImpossibleTDD: Scientific Method 2.0Refactoring Our Way to KnowledgeThe Plan for the Book
2. A Quick Introduction to Machine Learning
What Is Machine Learning?Supervised LearningUnsupervised LearningReinforcement LearningWhat Can Machine Learning Accomplish?Mathematical Notation Used Throughout the BookConclusion
3. K-Nearest Neighbors
How Do You Determine Whether You Want to Buy a House?How Valuable Is That House?Hedonic RegressionWhat Is a Neighborhood?K-Nearest NeighborsMr. K’s Nearest NeighborhoodDistancesTriangle InequalityGeometrical DistanceComputational DistancesStatistical DistancesCurse of DimensionalityHow Do We Pick K?Guessing KHeuristics for Picking KValuing Houses in SeattleAbout the DataGeneral StrategyCoding and Testing DesignKNN Regressor ConstructionKNN TestingConclusion
4. Naive Bayesian Classification
Using Bayes’ Theorem to Find Fraudulent OrdersConditional ProbabilitiesProbability SymbolsInverse Conditional Probability (aka Bayes’ Theorem)Naive Bayesian ClassifierThe Chain RuleNaiveté in Bayesian ReasoningPseudocountSpam FilterSetup NotesCoding and Testing DesignData SourceEmailObjectTokenization and ContextSpamTrainerError Minimization Through Cross-ValidationConclusion
5. Decision Trees and Random Forests
The Nuances of MushroomsClassifying Mushrooms Using a Folk TheoremFinding an Optimal Switch PointInformation GainGINI ImpurityVariance ReductionPruning TreesEnsemble LearningWriting a Mushroom ClassifierConclusion
6. Hidden Markov Models
Tracking User Behavior Using State MachinesEmissions/Observations of Underlying StatesSimplification Through the Markov AssumptionUsing Markov Chains Instead of a Finite State MachineHidden Markov ModelEvaluation: Forward-Backward AlgorithmMathematical Representation of the Forward-Backward AlgorithmUsing User BehaviorThe Decoding Problem Through the Viterbi AlgorithmThe Learning ProblemPart-of-Speech Tagging with the Brown CorpusSetup NotesCoding and Testing DesignThe Seam of Our Part-of-Speech Tagger: CorpusParserWriting the Part-of-Speech TaggerCross-Validating to Get Confidence in the ModelHow to Make This Model BetterConclusion
7. Support Vector Machines
Customer Happiness as a Function of What They SaySentiment Classification Using SVMsThe Theory Behind SVMsDecision BoundaryMaximizing BoundariesKernel Trick: Feature TransformationOptimizing with SlackSentiment AnalyzerSetup NotesCoding and Testing DesignSVM Testing StrategiesCorpus ClassCorpusSet ClassModel Validation and the Sentiment ClassifierAggregating SentimentExponentially Weighted Moving AverageMapping Sentiment to Bottom LineConclusion
8. Neural Networks
What Is a Neural Network?History of Neural NetsBoolean LogicPerceptronsHow to Construct Feed-Forward Neural NetsInput LayerHidden LayersNeuronsActivation FunctionsOutput LayerTraining AlgorithmsThe Delta RuleBack PropagationQuickPropRPropBuilding Neural NetworksHow Many Hidden Layers?How Many Neurons for Each Layer?Tolerance for Error and Max EpochsUsing a Neural Network to Classify a LanguageSetup NotesCoding and Testing DesignThe DataWriting the Seam Test for LanguageCross-Validating Our Way to a Network ClassTuning the Neural NetworkPrecision and Recall for Neural NetworksWrap-Up of ExampleConclusion
9. Clustering
Studying Data Without Any BiasUser CohortsTesting Cluster MappingsFitness of a ClusterSilhouette CoefficientComparing Results to Ground TruthK-Means ClusteringThe K-Means AlgorithmDownside of K-Means ClusteringEM ClusteringAlgorithmThe Impossibility TheoremExample: Categorizing MusicSetup NotesGathering the DataCoding DesignAnalyzing the Data with K-MeansEM Clustering Our DataThe Results from the EM Jazz ClusteringConclusion

10. Improving Models and Data Extraction
Debate ClubPicking Better DataFeature SelectionExhaustive SearchRandom Feature SelectionA Better Feature Selection AlgorithmMinimum Redundancy Maximum Relevance Feature SelectionFeature Transformation and Matrix FactorizationPrincipal Component AnalysisIndependent Component AnalysisEnsemble LearningBaggingBoostingConclusion
11. Putting It Together: Conclusion
Machine Learning Algorithms RevisitedHow to Use This Information to Solve ProblemsWhat’s Next for You?
Index

Content preview from Thoughtful Machine Learning with Python

Chapter 10. Improving Models and Data Extraction

How do you go about improving upon a simple machine learning algorithm such as Naive Bayesian Classifiers, SVMs, or really any method? That is what we will delve into in this chapter, by talking about four major ways of improving models:

Feature selection
Feature transformation
Ensemble learning
Bootstrapping

I’ll outline the benefits of each of these methods but in general they reduce entanglement, overcome the curse of dimensionality, and reduce correction cascades and sensitivity to data changes.

They each have certain pros and cons and should be used when there is a purpose behind it. Sometimes problems are so sufficiently complex that tweaking and improvement are warranted at this level, other times they are not. That is a judgment people must make depending on the business context.

Debate Club

I’m not sure if this is common throughout the world, but in the United States, debate club is a high school fixture. For those of you who haven’t heard of this, it’s a simple idea: high schoolers will take polarizing issues and debate their side. This serves as a great way for students who want to become lawyers to try out their skills arguing for a case.

The fascinating thing about this is just how rigorous and disciplined these kids are. Usually they study all kinds of facts to put together a dossier of important points to make. Sometimes they argue for a side they don’t agree with but they do so with conviction.

Why am ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491924129Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Thoughtful Machine Learning with Python

by Matthew Kirk

Chapter 10. Improving Models and Data Extraction

Debate Club

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.