Summary
We began this chapter by explaining some of the reasons why large datasets sometimes present a problem for unoptimized R code, such as no auto-parallelization and no native support for out-of-memory data. For the rest of the chapter, we discussed specific routes to optimizing R code in order to tackle large data.
First, you learned of the dangers of optimizing code too early. Next, we saw (much to the relief of slackers everywhere) that taking the lazy way out (and buying or renting a more powerful machine) is often the more cost-effective solution.
After that, we saw that a little knowledge about the dynamics of memory allocation and vectorization in R can often go a long way in performance gains.
The next two sections focused less ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access