Data imputation

Sometimes, your data may have missing values. This could be due to errors in the data collection process, genuinely missing data, or any other reason, with the net result being that the information is not available. Real world examples of missing data can be found in surveys where the respondent did not answer a specific question on the survey.

You may have a dataset of, say, 1,000 records and 20 columns of which a certain column has 100 missing values. You may choose to discard this column altogether, but that also means discarding 90 percent of the information. You still have 19 other columns that have complete data. Another option is to simply exclude the column, but that means you cannot leverage the benefit afforded by ...

Get Practical Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.