Spark in cluster mode

So far in this chapter we have focused on running Spark in local mode. As we mentioned, horizontal scaling is what makes Spark so sensual and powerful. You don't need software-hardware integration gurus to run clusters with Apache Spark, and you don't need to stop the organization's entire production to escalate and add more machines to your cluster.

The good news is that the same scripts that you build on your laptop on samples of a few kilobytes, can run on business clusters that handle terabytes of information. There's no need to change the code, and no need to invoke another API. All you have to do is to test again and again to be sure your model runs correctly, and then deploy the cluster.

In this section, we'll describe ...

Get Fast Data Processing Systems with SMACK Stack now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.