Performance tuning and best practices

In this section, we will discuss various strategies for optimizing the performance of our Spark jobs. We will also discuss a few best practices with respect to Spark and Spark SQL.

Performance tuning is very subjective and a wide open statement. The very first step in performance tuning is to answer the question, "Do we really need to performance tune our jobs?" Now before we answer this question, we need to consider the following aspects:

  • Are our jobs meeting SLAs specified by the business?

    If yes, then no need for performance tuning.

  • What do we want to achieve and is it realistic?

    For example, expecting all Spark jobs (irrespective of data size or computations performed) to be completed in milliseconds is unrealistic. ...

Get Real-Time Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.