Chapter 6. Introducing the ML Package
In the previous chapter, we worked with the MLlib package in Spark that operated strictly on RDDs. In this chapter, we move to the ML part of Spark that operates strictly on DataFrames. Also, according to the Spark documentation, the primary machine learning API for Spark is now the DataFrame-based set of models contained in the
So, let's get to it!
In this chapter, we will reuse a portion of the dataset we played within the previous chapter. The data can be downloaded from http://www.tomdrabas.com/data/LearningPySpark/births_transformed.csv.gz.
In this chapter, you will learn how to do the following:
- Prepare transformers, estimators, and pipelines
- Predict the chances of infant survival using ...