Skip to Content
The Data Science Handbook
book

The Data Science Handbook

by Field Cady
February 2017
Beginner to intermediate
416 pages
10h 39m
English
Wiley
Content preview from The Data Science Handbook

Chapter 7Interlude: Feature Extraction Ideas

Before we jump into specific machine learning technique, I want to come back to feature extraction. A machine learning analysis will be only as good as the features that you plug into it. The best features are the ones that carefully reflect the thing you are studying, so you're likely going to have to bring a lot of domain expertise to your problems. However, I can give some of the “usual suspects”: classical ways to extract features from data that apply in a wide range of contexts and are at the very least worth taking a look at. This interlude will go over several of them and lead to some discussion about applying them in real contexts.

7.1 Standard Features

Here are several types of feature extraction that are real classics, along with some of the real-world considerations of using them:

  • Is_null: One of the simplest, and surprisingly effective, features is just whether the original data entry is missing. This is often because the entry is null for an important reason. For example, maybe some data wasn't gathered for widgets produced by a particular factory. Or, with humans, maybe demographic data is missing because some demographic groups are less likely to report it.
  • Dummy variables: A categorical variable is one that can take on a finite number of values. A column for a US state, for example, has 50 possible values. A dummy variable is a binary variable that says whether the categorical column is a particular value. Then you ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

The Data Science Handbook, 2nd Edition

The Data Science Handbook, 2nd Edition

Field Cady
Doing Data Science

Doing Data Science

Cathy O'Neil, Rachel Schutt
Practical Statistics for Data Scientists, 2nd Edition

Practical Statistics for Data Scientists, 2nd Edition

Peter Bruce, Andrew Bruce, Peter Gedeck
Data Science for Business

Data Science for Business

Foster Provost, Tom Fawcett

Publisher Resources

ISBN: 9781119092940Purchase book