Data preprocessing
The useful information in the data is usually referred to as a signal. On the other hand, the pieces of data that represent errors of different kinds and irrelevant data are known as noise. Errors can occur in the data during measurements, information transmission, or due to human errors. The goal of data cleansing procedures is to increase the signal/noise ratio. During this stage, you will usually transform all data to one format, delete entries with missed values, and check suspicious outliers (they can be both noise and signal). It is widely believed among ML engineers, that the data preprocessing stage usually consumes 90% of the time allocated for the ML project. Then, algorithm tweaking consumes another 90% of time. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access