Performance tuning and best practices

In this section, we will discuss various strategies for optimizing the performance of our Spark jobs. We will also discuss a few best practices with respect to Spark and Spark SQL.

Performance tuning is very subjective and a wide open statement. The very first step in performance tuning is to answer the question, "Do we really need to performance tune our jobs?" Now before we answer this question, we need to consider the following aspects:

  • Are our jobs meeting SLAs specified by the business?

    If yes, then no need for performance tuning.

  • What do we want to achieve and is it realistic?

    For example, expecting all Spark jobs (irrespective of data size or computations performed) to be completed in milliseconds is unrealistic. ...

Get Real-Time Big Data Analytics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.