Foreseeing Variable Problems When Building ML Models

A variable is a characteristic, number, or quantity that can be measured or counted. Most variables in a dataset are either numerical or categorical. Numerical variables take numbers as values and can be discrete or continuous, whereas for categorical variables, the values are selected from a group of categories, also called labels.

Variables in their original, raw format are not suitable to train machine learning algorithms. In fact, we need to consider many aspects of a variable to build powerful machine learning models. These aspects include variable type, missing data, cardinality and category frequency, variable distribution and its relationship with the target, outliers, and feature ...

Get Python Feature Engineering Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.