O'Reilly logo

Apache Spark 2.x Machine Learning Cookbook by Shuen Mei, Broderick Hall, Meenakshi Rajendran, Siamak Amirghodsi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Introduction

Spark streaming is an evolving journey toward unification and structuring of the APIs in order to address the concerns of batch versus stream. Spark streaming has been available since Spark 1.3 with Discretized Stream (DStream). The new direction is to abstract the underlying framework using an unbounded table model in which the users can query the table using SQL or functional programming and write the output to another output table in multiple modes (complete, delta, and append output). The Spark SQL Catalyst optimizer and Tungsten (off-heap memory manager) are now an intrinsic part of the Spark streaming, which leads to a much efficient execution.

In this chapter, we not only cover the streaming facilities available in Spark's ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required