Supervised learning with Spark - an example

We will demonstrate an example by analyzing an air-flight delay. The dataset named On_Time_Performance_2016_1.csv from the United Department of Transportation website at http://www.transtats.bts.gov/ will be used.

Air-flight delay analysis using Spark

We are using flight information for 2016. For each flight, we have the following information presented in Table 1 (we have presented only a few fields out of 444,827 rows and 110 columns as of May 17, 2016):

Data field

Description

Example value

DayofMonth

Day of month

2

DayOfWeek

Day of week

5

TailNum

Tail number for the plane

N505NK

FlightNum

Flight number

48

AirlineID

Airline ID

19805

OriginAirportID

Origin airport ID

JFK

DestAirportID ...

Get Large Scale Machine Learning with Spark now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.