Visualizing Spark application execution

In this section, we will present the key details of the SparkUI interface, which is indispensable for tuning tasks. There are several approaches to monitoring Spark applications, for example, using web UIs, metrics, and external instrumentation. The information displayed includes a list of scheduler stages and tasks, a summary of RDD sizes and memory usage, environmental information, and information about the running executors.

This interface can be accessed by simply opening http://<driver-node>:4040 (http://localhost:4040) in a web browser. Additional SparkContexts running on the same host bind to successive ports: 4041, 4042, and so on.

For a more detailed coverage of monitoring and instrumentation ...

Get Learning Spark SQL now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.