Performing Olympics Athletes analytics using the Spark Shell

Spark supports an interactive Scala-based shell, which can be used to process data as and when we receive actionable commands. In this recipe, we are going to analyze one sample dataset, which contains information about the athletes that have participated in the Olympics.

Getting ready

To perform this recipe, you should have Hadoop and Spark installed. You also need to install Scala. I am using Scala 2.11.0.

How to do it...

First of all, you need to download data from https://github.com/deshpandetanmay/hadoop-real-world-cookbook/blob/master/data/OlympicAthletes.csv, and store it in HDFS using the following commands:

$hadoop fs –mkdir /athletes
$hadoop fs –put OlympicAthletes.csv /athletes ...

Get Hadoop Real-World Solutions Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop Real-World Solutions Cookbook - Second Edition by Tanmay Deshpande

Performing Olympics Athletes analytics using the Spark Shell

Getting ready

How to do it...

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly