In the next steps, we demonstrate how to apply the Isolation Forest algorithm to detecting anomalies:
- Import the required libraries and set a random seed:
import numpy as npimport pandas as pdrandom_seed = np.random.RandomState(12)
- Generate a set of normal observations, to be used as training data:
X_train = 0.5 * random_seed.randn(500, 2)X_train = np.r_[X_train + 3, X_train]X_train = pd.DataFrame(X_train, columns=["x", "y"])
- Generate a testing set, also consisting of normal observations:
X_test = 0.5 * random_seed.randn(500, 2)X_test = np.r_[X_test + 3, X_test]X_test = pd.DataFrame(X_test, columns=["x", "y"])
- Generate a set of outlier observations. These are generated from a different distribution than the normal ...