8

Encoding, Transforming, and Scaling Features

Our data cleaning efforts are often intended to prepare that data for use with a machine learning algorithm. Machine learning algorithms typically require some form of encoding of variables. Our models also often perform better with some form of scaling so that features with higher variability do not overwhelm the optimization. We show examples of that in this chapter and of how standardizing addresses the issue.

Machine learning algorithms typically require some form of encoding of variables. We almost always need to encode our features for algorithms to understand them correctly. For example, most algorithms cannot make sense of the values female or male, or know not to treat zip codes as ordinal. ...

Get Python Data Cleaning Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.