Dealing with unbalanced datasets
In the case of credit card transactions, we said that the distribution of data is both unbalanced and non-stationary.
A solution to the unbalanced data distribution problem consists of rebalancing the classes before proceeding with training the algorithm.
Among the strategies that are commonly used to rebalance the sample classes includes undersampling and oversampling the dataset.
In essence, undersampling consists of removing some observations that belongs to a certain class at random, in order to reduce its relative consistency.
In the case of unbalanced distributions, such as those relating to transactions with credit cards, if we exclude random samples from the main class (which is representative of legitimate ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access