How do we predict customer churn with Spark?

Predicting customer churn in Apache Spark is similar to predicting any other binary outcome. Spark provides a number of algorithms to do such a prediction. While we'll focus on Random Forest, you can potentially look at other algorithms within the MLLib library to perform the prediction. We'll follow the typical steps of building a machine learning pipeline that we had discussed in our earlier MLLib chapter.

The typical stages include:

Stage 1: Loading data/defining schema
Stage 2: Exploring/visualizing the data set
Stage 3: Performing necessary transformations
Stage 4: Feature engineering
Stage 5: Model training
Stage 6: Model evaluation
Stage 7: Model monitoring

Data set description

Since we are going to target ...

Get Learning Apache Spark 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Learning Apache Spark 2 by Muhammad Asif Abbasi

How do we predict customer churn with Spark?

Data set description

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly