
9.4. AN ALTERNATIVE: “SNOWDOOP” 217
So again, we are using chunked files as in Hadoop, but writing ordinary R
code, e.g. tapply() and Reduce(). But most important, the data at each
worker persists across iterations. In Hadoop, it would be reread from disk
at each iteration, and in Spark, we’d need to request caching, but here it
comes for free, no special effort needed.