Chapter 3. Feature Engineering and Feature Selection

Feature engineering and feature selection are at the heart of data preprocessing for ML, especially for model training. Feature engineering is also required when performing inference, and it’s critical that the preprocessing that is done during inference matches the preprocessing that was done during training.

Some of the material in this chapter may seem like a review, especially if you’ve worked in ML in a nonproduction context such as in an academic or research setting. But we’ll be focusing on production issues in this chapter. One major issue we’ll discuss is how to perform feature engineering at scale in a reproducible and consistent way.

We’ll also discuss feature selection and why it’s important in a production context. Often, you will have more features than you actually need for your model, and your goal should be to only include those features that offer the most predictive information for the problem you’re trying to solve. Including more than that adds cost and complexity and can contribute to quality issues such as overfitting.

Introduction to Feature Engineering

Coming up with features is difficult, time-consuming, and requires expert knowledge. Applied machine learning often requires careful engineering of the features and dataset.

Andrew Ng

Feature engineering is a type of preprocessing that is intended to help your model learn. Feature engineering is critical for making maximum use of your data, and it’s ...

Get Machine Learning Production Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.