book

Feature Engineering for Machine Learning

by Alice Zheng, Amanda Casari

April 2018

Beginner to intermediate

215 pages

5h 36m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
IntroductionConventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgmentsSpecial Thanks from AliceSpecial Thanks from Amanda
1. The Machine Learning Pipeline
DataTasksModelsFeaturesModel Evaluation
2. Fancy Tricks with Simple Numbers
Scalars, Vectors, and SpacesDealing with CountsBinarizationQuantization or BinningLog TransformationLog Transform in ActionPower Transforms: Generalization of the Log TransformFeature Scaling or NormalizationMin-Max ScalingStandardization (Variance Scaling)ℓ2 NormalizationInteraction FeaturesFeature SelectionSummaryBibliography
3. Text Data: Flattening, Filtering, and Chunking
Bag-of-X: Turning Natural Text into Flat VectorsBag-of-WordsBag-of-n-GramsFiltering for Cleaner FeaturesStopwordsFrequency-Based FilteringStemmingAtoms of Meaning: From Words to n-Grams to PhrasesParsing and TokenizationCollocation Extraction for Phrase DetectionSummaryBibliography
4. The Effects of Feature Scaling: From Bag-of-Words to Tf-Idf
Tf-Idf : A Simple Twist on Bag-of-WordsPutting It to the TestCreating a Classification DatasetScaling Bag-of-Words with Tf-Idf TransformationClassification with Logistic RegressionTuning Logistic Regression with RegularizationDeep Dive: What Is Happening?SummaryBibliography
5. Categorical Variables: Counting Eggs in the Age of Robotic Chickens
Encoding Categorical VariablesOne-Hot EncodingDummy CodingEffect CodingPros and Cons of Categorical Variable EncodingsDealing with Large Categorical VariablesFeature HashingBin CountingSummaryBibliography
6. Dimensionality Reduction: Squashing the Data Pancake with PCA
IntuitionDerivationLinear ProjectionVariance and Empirical VariancePrincipal Components: First FormulationPrincipal Components: Matrix-Vector FormulationGeneral Solution of the Principal ComponentsTransforming FeaturesImplementing PCAPCA in ActionWhitening and ZCAConsiderations and Limitations of PCAUse CasesSummaryBibliography
7. Nonlinear Featurization via K-Means Model Stacking
k-Means ClusteringClustering as Surface Tilingk-Means Featurization for ClassificationAlternative Dense FeaturizationPros, Cons, and GotchasSummaryBibliography
8. Automating the Featurizer: Image Feature Extraction and Deep Learning
The Simplest Image Features (and Why They Don’t Work)Manual Feature Extraction: SIFT and HOGImage GradientsGradient Orientation HistogramsSIFT ArchitectureLearning Image Features with Deep Neural NetworksFully Connected LayersConvolutional LayersRectified Linear Unit (ReLU) TransformationResponse Normalization LayersPooling LayersStructure of AlexNetSummaryBibliography
9. Back to the Feature: Building an Academic Paper Recommender
Item-Based Collaborative FilteringFirst Pass: Data Import, Cleaning, and Feature ParsingAcademic Paper Recommender: Naive ApproachSecond Pass: More Engineering and a Smarter ModelAcademic Paper Recommender: Take 2Third Pass: More Features = More InformationAcademic Paper Recommender: Take 3SummaryBibliography

A. Linear Modeling and Linear Algebra Basics
Overview of Linear ClassificationThe Anatomy of a MatrixFrom Vectors to SubspacesSingular Value Decomposition (SVD)The Four Fundamental Subspaces of the Data MatrixSolving a Linear SystemBibliography
Index

Content preview from Feature Engineering for Machine Learning

Appendix A. Linear Modeling and Linear Algebra Basics

Overview of Linear Classification

When we have a labeled dataset, the feature space is strewn with data points from different classes. It is the job of the classifier to separate the data points from different classes. It can do so by producing an output that is very different for data points from one class versus another. For instance, when there are only two classes, then a good classifier should produce large outputs for one class, and small ones for another. The points right on the cusp of being in one class versus another form a decision surface (Figure A-1).

Many functions can be made into classifiers. It’s a good idea to look for the simplest function that cleanly separates the classes, for a few reasons. First of all, it’s easier to find the best simple separator than the best complex separator. Also, simple functions often generalize better to new data, because it’s harder to tailor them too specifically to the training data (a concept known as overfitting). A simple model might make mistakes—like in Figure A-1, where some points are on the wrong side of the divide—but we’re willing to sacrifice some training accuracy in order to have a simpler decision surface that can achieve better test accuracy. The principle of ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491953235Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Feature Engineering for Machine Learning

by Alice Zheng, Amanda Casari

Appendix A. Linear Modeling and Linear Algebra Basics

Overview of Linear Classification

Figure A-1. Simple binary classification finds a surface that separates two classes of data points

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.