8 Learning with categorical features

This chapter covers

  • Introducing categorical features in machine learning
  • Preprocessing categorical features using supervised and unsupervised encoding
  • Understanding ordered boosting
  • Using CatBoost for categorical variables
  • Handling high-cardinality categorical features

Data sets for supervised machine learning consist of features that describe objects and labels that describe the targets we’re interested in modeling. At a high level, features, also known as attributes or variables, are usually classified into two types: continuous and categorical.

A categorical feature is one that takes a discrete value from a set of finite, nonnumeric values, called categories. Categorical features are ubiquitous and appear ...

Get Ensemble Methods for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.