Spark provides various configuration parameters which if used efficiently can significantly improve the overall performance of your Spark Streaming job. Let's look at a few of the features which can help us in tuning our Spark jobs.
Spark Streaming jobs collect and buffer data at regular intervals (batch intervals) which is further divided into various stages of execution to form the execution pipeline. Each byte in the dataset is represented by RDD and the execution pipeline is called a Direct Acyclic Graph (DAG).
The dataset involved in each stage of the execution pipeline is further stored in the data blocks of equal sizes which is nothing more than the partitions represented by the RDD.
Lastly, for ...