5
Working with Outliers
An outlier is a data point that diverges notably from other values within a variable. Outliers may stem from the inherent variability of the feature itself, manifesting as extreme values that occur infrequently within the distribution (typically found in the tails). They can be the result of experimental errors or inaccuracies in data collection processes, or they can signal important events. For instance, an unusually high expense in a card transaction may indicate fraudulent activity, warranting flagging and potentially blocking the card to safeguard customers. Similarly, unusually distinct tumor morphologies can suggest malignancy, prompting further examination.
Outliers can exert a disproportionately large impact ...
Get Python Feature Engineering Cookbook - Third Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.