Time for action – running UFO analysis on EMR

Let us explore the use of EMR with Hive by doing some UFO analysis on the platform.

  1. Log in to the AWS management console at http://aws.amazon.com/console.
  2. Every Hive job flow on EMR runs from an S3 bucket and we need to select the bucket we wish to use for this purpose. Select S3 to see the list of the buckets associated with your account and then choose the bucket from which to run the example, in the example below, we select the bucket called garryt1use.
  3. Use the web interface to create three directories called ufodata, ufoout, and ufologs within that bucket. The resulting list of the bucket's contents should look like the following screenshot:
  4. Double-click on the ufodata directory to open it and within ...

Get Hadoop Beginner's Guide now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.