Handling noise

Noise in data can come from many sources, but is not often a significant issue as most machine learning techniques are resilient to noisy datasets. Noise can come from environmental factors (for instance, the air conditioner compressor turning on randomly and causing signal noise in a nearby sensor), it can come from transcription errors (somebody recorded the wrong data point, selected the wrong option in a survey, or an OCR algorithm read a 3 as an 8), or it can be inherent to the data itself (such as fluctuations in temperature recordings, which will follow a seasonal pattern but have a noisy daily pattern).

Noise in categorical data can also be caused by category labels that aren't normalized, such as images that are tagged ...

Get Hands-on Machine Learning with JavaScript now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.