Now that we have gained an understanding of the Spark architecture, let's prepare for writing scalable analytics by introducing some of the challenges, or gotchas that you might face if you're not careful. Without knowledge of these up-front, you could lose time trying to figure them out on your own!

Algorithmic complexity

As well as the obvious effect of the size of your data, the performance of an analytic is highly dependent on the nature of the problem you're trying to solve. Even some seemingly simple problems, such as a depth first search of a graph, do not have well-defined algorithms that perform efficiently in distributed environments. This being the case, great care should be taken when designing analytics to ensure that they ...

Get Mastering Spark for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.