Comparing models

LOF indicates fewer outliers than the isolation forest, but perhaps they don't even agree with each other. As we learned in Chapter 10, Making Better Predictions – Optimizing Models, we can use the cohen_kappa_score() function from sklearn.metrics to check their level of agreement:

>>> from sklearn.metrics import cohen_kappa_score>>> is_lof_outlier = np.where(...     lof_preds < lof_pipeline.named_steps['lof'].offset_, ...     'outlier', 'inlier'... )>>> is_iso_outlier = np.where(...     isolation_forest_preds == -1, 'outlier', 'inlier'... )>>> cohen_kappa_score(is_lof_outlier, is_iso_outlier)0.012896350639585386

They agree only on 1.3% of the data points, indicating it's not so obvious which data points are anomalies. Without labeled ...

Get Hands-On Data Analysis with Pandas now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.