Treating missing information

Most of the algorithms fail when the data includes missing values or takes a predetermined action about how to deal with them in an automatic way. It is important to take control when this happens.

Two actions are the most common to deal with missing information: to remove the observations with missing values or to replace them with a concrete value, usually the median or mean. When a value is imputed, you could be losing important information. For example, a missing value of the variable can be always observed in one of the classes of the target variable. A typical case is a model where we are trying to predict good and bad applicants for a bank loan.

It is common to have variables related to the number of days ...

Get Machine Learning with R Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.