Garbage collection tuning
Although it is not a major problem in your Java or Scala programs that just read an RDD sequentially or randomly once and then execute numerous operations on it, Java Virtual Machine (JVM) GC can be problematic and complex if you have a large amount of data objects w.r.t RDDs stored in your driver program. When the JVM needs to remove obsolete and unused objects from the old objects to make space for the newer ones, it is mandatory to identify them and remove them from the memory eventually. However, this is a costly operation in terms of processing time and storage. You might be wondering that the cost of GC is proportional to the number of Java objects stored in your main memory. Therefore, we strongly suggest ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access