Best practices for using Amazon EMR

Amazon has made working with Hadoop a lot easier. You can launch an EMR cluster in minutes for big data processing, machine learning, and real-time stream processing with the Apache Hadoop ecosystem. You can use the Management Console or the command line to start a few nodes or a thousand nodes with ease.

Like EC2, EMR pricing is now changed from pay per hour to pay per second. If you stop your cluster—you stop paying immediately. This results in lower costs and you don’t have to worry about the hourly boundary, anymore.

EMR makes a whole bunch of the latest versions of open source software available to you. There are 19 open source projects, presently. New releases are made every 4 to 6 weeks so latest ...

Get Learning AWS - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.