Data imputation
And sometimes omitting missing values is not reasonable or possible at all, for example due to the low number of observations or if it seems that missing data is not random. Data imputation is a real alternative in such situations, and this method can replace NA
with some real values based on various algorithms, such as filling empty cells with:
- A known scalar
- The previous value appearing in the column (hot-deck)
- A random element from the same column
- The most frequent value in the column
- Different values from the same column with given probability
- Predicted values based on regression or machine learning models
The hot-deck method is often used while joining multiple datasets together. In such a situation, the roll
argument of data.table ...
Get Mastering Data Analysis with R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.