How do we predict customer churn with Spark?

Predicting customer churn in Apache Spark is similar to predicting any other binary outcome. Spark provides a number of algorithms to do such a prediction. While we'll focus on Random Forest, you can potentially look at other algorithms within the MLLib library to perform the prediction. We'll follow the typical steps of building a machine learning pipeline that we had discussed in our earlier MLLib chapter.

The typical stages include:

  • Stage 1: Loading data/defining schema
  • Stage 2: Exploring/visualizing the data set
  • Stage 3: Performing necessary transformations
  • Stage 4: Feature engineering
  • Stage 5: Model training
  • Stage 6: Model evaluation
  • Stage 7: Model monitoring

Data set description

Since we are going to target ...

Get Learning Apache Spark 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.