Wrapping everything in a pipeline
As a concluding topic, we will discuss how to wrap together the operations of transformation and selection we have seen so far, into a single command, a pipeline that will take your data from source to your machine learning algorithm.
Wrapping all your data operations into a single command offers some advantages:
- Your code becomes clear and more logically constructed because pipelines force you to rely on functions for your operations (each step a function)
- You treat the test data in the same exact way as your train data without code repetitions or possibility of any mistake in the process
- You can easily grid-search the best parameters on all the data pipelines you devised, not just on the machine learning hyperparameters ...
Get Python Data Science Essentials - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.