O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Transforming timestamps into categorical features

Extract time of Day

To illustrate how to derive categorical features from numerical data, we will use the times of the ratings given by users to movies. Extract the date and time from the timestamp and, in turn, extract the hour of the day.

We will need a function to extract a datetime representation of the rating timestamp (in seconds); we will create this function now: extract the date and time from the timestamp and, in turn, extract the hour of the day. This will result in an RDD of the hour of the day for each rating.

Scala

First, we define a function which extracts currentHour from a date string:

def getCurrentHour(dateStr: String) : Integer = {   var currentHour = 0   try {  val date ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required