June 2018
Intermediate to advanced
436 pages
10h 33m
English
In the preceding chapter, we solved the Titanic survival prediction problem using Spark-based MLP. We also saw that by using Spark-based MLP, the user has very little transparency of using the layering structure. Moreover, it was not explicit to define hyperparameters and so on.
Therefore, what I have done is used the training dataset and then performed some preprocessing and feature engineering. Then I randomly split the pre-processed dataset into training and testing (to be precise, 70% for training and 30% for testing). First, we create the Spark session as follows:
SparkSession spark = SparkSession.builder() .master("local[*]") .config("spark.sql.warehouse.dir", "temp/")// change accordingly .appName("TitanicSurvivalPrediction") ...