Skip to Content
The Data Science Handbook, 2nd Edition
book

The Data Science Handbook, 2nd Edition

by Field Cady
December 2024
Beginner to intermediate
368 pages
11h 47m
English
Wiley
Content preview from The Data Science Handbook, 2nd Edition

7Interlude: Feature Extraction Ideas

Before we jump into specific machine‐learning techniques, I want to come back to feature extraction. A machine‐learning analysis will be only as good as the features that you plug into it. The best features are the ones that carefully reflect the thing you are studying, so you’re likely going to have to bring a lot of domain expertise to your problems. However, I can give some of the “usual suspects”: classical ways to extract features from data that apply in a wide range of contexts and are at the very least worth taking a look at. This interlude will go over several of them and lead to some discussion about applying them in real contexts.

7.1 Standard Features

Here are several types of feature extraction that are real classics, along with some of the real‐world considerations of using them:

  • Is_null. One of the simplest, and surprisingly effective, features is just whether the original data entry is missing. This is often because the entry is null for an important reason. For example, maybe some data wasn’t gathered for widgets produced by a particular factory. Or, with humans, maybe demographic data is missing because some demographic groups are less likely to report it.
  • Dummy variables. A categorical variable is one that can take on a finite number of values. A column for a US state, for example, has 50 possible values. A dummy variable is a binary variable that says whether the categorical column is a particular value. Then, you might ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Practical Statistics for Data Scientists, 2nd Edition

Practical Statistics for Data Scientists, 2nd Edition

Peter Bruce, Andrew Bruce, Peter Gedeck

Publisher Resources

ISBN: 9781394234493Purchase Link