February 2018
Intermediate to advanced
378 pages
10h 14m
English
Your data can be tough in a lot of ways: it can be sparse (in features or in target variable), it can contain outliers or missed values, or it can be high-dimensional or high-cardinal (for categorical features). Numerical features can be (and usually are) of different magnitude or suffer from multicollinearity. There is no bulletproof solution. Use force. Tidy your data up. The common techniques here are dimensionality reduction, missing values imputation, outlier detection, and statistical data normalization. Textbooks on statistics and data science will help you learn more on this topic.
Read now
Unlock full access