Skip to Content
Introduction to Machine Learning with Python
book

Introduction to Machine Learning with Python

by Andreas C. Müller, Sarah Guido
October 2016
Beginner to intermediate content levelBeginner to intermediate
400 pages
10h 25m
English
O'Reilly Media, Inc.
Book available
Content preview from Introduction to Machine Learning with Python

Chapter 4. Representing Data and Engineering Features

So far, we’ve assumed that our data comes in as a two-dimensional array of floating-point numbers, where each column is a continuous feature that describes the data points. For many applications, this is not how the data is collected. A particularly common type of feature is the categorical features. Also known as discrete features, these are usually not numeric. The distinction between categorical features and continuous features is analogous to the distinction between classification and regression, only on the input side rather than the output side. Examples of continuous features that we have seen are pixel brightnesses and size measurements of plant flowers. Examples of categorical features are the brand of a product, the color of a product, or the department (books, clothing, hardware) it is sold in. These are all properties that can describe a product, but they don’t vary in a continuous way. A product belongs either in the clothing department or in the books department. There is no middle ground between books and clothing, and no natural order for the different categories (books is not greater or less than clothing, hardware is not between books and clothing, etc.).

Regardless of the types of features your data consists of, how you represent them can have an enormous effect on the performance of machine learning models. We saw in Chapters 2 and 3 that scaling of the data is important. In other words, if you don’t rescale ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Machine Learning - Third Edition

Python Machine Learning - Third Edition

Sebastian Raschka, Vahid Mirjalili

Publisher Resources

ISBN: 9781449369880Errata PageSupplemental Content