Performing Olympics Athletes analytics using the Spark Shell
Spark supports an interactive Scala-based shell, which can be used to process data as and when we receive actionable commands. In this recipe, we are going to analyze one sample dataset, which contains information about the athletes that have participated in the Olympics.
Getting ready
To perform this recipe, you should have Hadoop and Spark installed. You also need to install Scala. I am using Scala 2.11.0.
How to do it...
First of all, you need to download data from https://github.com/deshpandetanmay/hadoop-real-world-cookbook/blob/master/data/OlympicAthletes.csv, and store it in HDFS using the following commands:
$hadoop fs –mkdir /athletes $hadoop fs –put OlympicAthletes.csv /athletes ...
Get Hadoop Real-World Solutions Cookbook - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.