February 2017
Intermediate to advanced
274 pages
5h 58m
English
In this section, we will use the portion of the dataset from the previous chapter to present the ideas of PySpark ML.
If you have not yet downloaded the data while reading the previous chapter, it can be accessed here: http://www.tomdrabas.com/data/LearningPySpark/births_transformed.csv.gz.
In this section, we will, once again, attempt to predict the chances of the survival of an infant.
First, we load the data with the help of the following code:
import pyspark.sql.types as typ labels = [ ('INFANT_ALIVE_AT_REPORT', typ.IntegerType()), ('BIRTH_PLACE', typ.StringType()), ('MOTHER_AGE_YEARS', typ.IntegerType()), ('FATHER_COMBINED_AGE', typ.IntegerType()), ('CIG_BEFORE', typ.IntegerType()), ...Read now
Unlock full access