July 2017
Beginner to intermediate
378 pages
10h 26m
English
EMR is a fully managed Hadoop framework that can be launched in minutes. It handles the tasks of node provisioning, cluster setup, configuration, and cluster tuning for you. It operates using EC2 instances and can scale from one node to thousands.
You can increase or decrease the number of instances manually or use auto scaling to do it dynamically, even while the cluster is running. The EMR service monitors your cluster; it can handle retries for failed tasks and will replace poor performing instances automatically.
Even though it is managed, you have complete control over the cluster, including root access. You can install additional applications. EMR has the option to choose from several Hadoop distributions ...