O'Reilly logo

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Interactively loading data from S3

Now let's try another exercise with the Spark shell. As part of Amazon's EMR Spark support, they have handily provided some sample data of Wikipedia traffic statistics in S3, in the format that Spark can use. To access the data, you first need to set your AWS access credentials as shell params. For instructions on signing up for EC2 and setting up the shell parameters, see the Running Spark on EC2 with the scripts section in Chapter 1, Installing Spark and Setting Up Your Cluster (S3 access requires additional keys such as fs.s3n.awsAccessKeyId/awsSecretAccessKey or the use of the s3n://user:pw@ syntax). You can also set the shell parameters as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. We will leave the AWS ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required