How do we predict customer churn with Spark?
Predicting customer churn in Apache Spark is similar to predicting any other binary outcome. Spark provides a number of algorithms to do such a prediction. While we'll focus on Random Forest, you can potentially look at other algorithms within the MLLib library to perform the prediction. We'll follow the typical steps of building a machine learning pipeline that we had discussed in our earlier MLLib chapter.
The typical stages include:
- Stage 1: Loading data/defining schema
- Stage 2: Exploring/visualizing the data set
- Stage 3: Performing necessary transformations
- Stage 4: Feature engineering
- Stage 5: Model training
- Stage 6: Model evaluation
- Stage 7: Model monitoring
Data set description
Since we are going to target ...
Get Learning Apache Spark 2 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.