July 2017
Intermediate to advanced
796 pages
18h 55m
English
As discussed already, despite other types of memory tuning, when your objects are too large to fit in the main memory or disk efficiently, a simpler and better way of reducing memory usage is storing them in a serialized form.
If you specify using MEMORY_ONLY_SER, Spark will then store each RDD partition as one large byte array. However, the only downside of this approach is that it can slow down data access times. This is reasonable and obvious too; fairly speaking, there's no way to avoid it since each ...
Read now
Unlock full access