The following table recapitulates the different issues that one can find in raw data and whether Amazon ML offers ways to deal with them:
Linear model sensitivity | Available on Amazon ML | |
Missing values | Yes | Dealt with automatically |
Standardization | Yes | z-score standardization |
Outliers | Yes | Quantile binning |
Multicollinearity | Yes | No |
Imbalanced datasets | Yes | Uses the right metric F1 Score No sampling strategy (may exist in background) |
Non linearities | Yes | Quantile binning |