Spark ML pipelines
MLlib's goal is to make practical machine learning (ML) scalable and easy. Spark introduced the pipeline API for the easy creation and tuning of practical ML pipelines. As discussed previously, extracting meaningful knowledge through feature engineering in an ML pipeline creation involves a sequence of data collection, preprocessing, feature extraction, feature selection, model fitting, validation, and model evaluation stages. For example, classifying the text documents might involve text segmentation and cleaning, extracting features, and training a classification model with cross-validation toward tuning. Most ML libraries are not designed for distributed computation or they do not provide native support for pipeline ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access