3.4 What Could Go Wrong?
Pipelines and automation in data preprocessing offer numerous advantages, but they also come with potential challenges. Here are some common issues that might arise when using Pipelines and FeatureUnion, along with strategies for handling these pitfalls.
3.4.1 Data Leakage from Improper Pipeline Configuration
One of the main reasons for using pipelines is to prevent data leakage, which occurs when information from the test set inadvertently influences the model. However, data leakage can still happen if transformers or data preprocessing steps are misconfigured, such as applying scaling or encoding outside the pipeline.
What could go wrong?