Olympics Athletes analytics using the Spark Shell
Spark supports an interactive Scala-based shell, which can be used to process data as and when we receive actionable commands. In this recipe, we are going to analyze one sample dataset, which contains information about the athletes that have participated in the Olympics.
To perform this recipe, you should have Hadoop and Spark installed. You also need to install Scala. I am using Scala 2.11.0.
How to do it...
First of all, you need to download data from https://github.com/deshpandetanmay/hadoop-real-world-cookbook/blob/master/data/OlympicAthletes.csv, and store it in HDFS using the following commands:
$hadoop fs –mkdir /athletes $hadoop fs –put OlympicAthletes.csv /athletes
The following ...