July 2017
Intermediate to advanced
796 pages
18h 55m
English
Data locality means how close the data is to the code to be processed. Technically, data locality can have a nontrivial impact on the performance of a Spark job to be executed locally or in cluster mode. As a result, if the data and the code to be processed are tied together, computation is supposed to be much faster. Usually, shipping a serialized code from a driver to an executor is much faster since the code size is much smaller than that of data.
In Spark application development and job execution, there are several levels of locality. In order from closest to farthest, the level depends on the current location of the data you have to process:
| Data Locality | Meaning | Special Notes |
| PROCESS_LOCAL | Data and code are in the ... |
Read now
Unlock full access