How it works...

The following section explains the techniques used and insights gained from exploratory data analysis.

  1. The date column in the dataframe is more of a date-time column with the time values all ending in 00:00:00. This is unnecessary for what we will need during our modeling and therefore can be removed from the dataset. Luckily for us, PySpark has a to_date function that can do this quite easily. The dataframe, df, is transformed using the withColumn() function and now only shows the date column without the timestamp, as seen in the following screenshot:
  1. For analysis purposes, we want to extract the day, month, and year from ...

Get Apache Spark Deep Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.