Sizing up your executors

When you set up Spark, executors are run on the nodes in the cluster. To put it simply, executors are the processes where you:

  • Run your compute
  • Store your data

Each application has its own executor processes and they will stay up and running until your application is up and running. So by definition, they seem to be quite important from a performance perspective, and hence the three key metrics during a Spark deployment are:

  • --num-executors: How many executors you need?
  • --executor-cores: How many CPU cores would you want to allocate to each executor?
  • --executor-memory: How much memory will you like to assign to each executor process?

So how do you allocate physical resources to Spark? While this may generally depend on the nature ...

Get Learning Apache Spark 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.