Before we get started with Amazon EMR, it is important to understand some of its key concepts and terminologies, starting out with clusters and nodes:
- Clusters: Clusters are the core functioning component in Amazon EMR. A cluster is a group of EC2 instances that together can be used to process your workloads. Each instance within a cluster is termed as a node and each node has a different role to perform within the cluster.
- Nodes: Amazon EMR distinguishes between clusters instances by providing them with one of these three roles:
- Master node: An instance that is responsible for the overall manageability, working and monitoring of your cluster. The master node takes care of all the data and task distributions ...