Chapter 9. Setting Up Flink for Streaming Applications

Today’s data infrastructures are diverse. Distributed data processing frameworks like Apache Flink need to be set up to interact with several components such as resource managers, filesystems, and services for distributed coordination.

In this chapter, we discuss the different ways to deploy Flink clusters and how to configure them securely and make them highly available. We explain Flink setups for different Hadoop versions and filesystems and discuss the most important configuration parameters of Flink’s master and worker processes. After reading this chapter, you will know how to set up and configure a Flink cluster.

Deployment Modes

Flink can be deployed in different environments, such as a local machine, a bare-metal cluster, a Hadoop YARN cluster, or a Kubernetes cluster. In “Components of a Flink Setup”, we introduced the different components of a Flink setup: the JobManager, TaskManager, ResourceManager, and Dispatcher. In this section, we explain how to configure and start Flink in different environments—including standalone clusters, Docker, Apache Hadoop YARN, and Kubernetes—and how Flink’s components are assembled in each setup.

Standalone Cluster

A standalone Flink cluster consists of at least one master process and at least one TaskManager process that run on one or more machines. All processes run as regular Java JVM processes. Figure 9-1 shows a standalone Flink setup.

Figure 9-1. Starting a standalone Flink ...

Get Stream Processing with Apache Flink now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.