3.1 Data Cleaning and Handling Missing Data
Data cleaning is a crucial step in the data preprocessing pipeline, involving the systematic identification and rectification of issues within datasets. This process encompasses a wide range of activities, including:
Detecting corrupt data
This crucial step involves a comprehensive and meticulous examination of the dataset to identify any data points that have been compromised or altered during various stages of the data lifecycle. This includes, but is not limited to, the collection phase, where errors might occur due to faulty sensors or human input mistakes; the transmission phase, where data corruption can happen due to network issues or interference; and the storage phase, where data might be corrupted ...