Butch QuintoNext-Generation Big Datahttps://doi.org/10.1007/978-1-4842-3147-0_6

6. High Performance Data Processing with Spark and Kudu

Butch Quinto¹

(1)

Plumpton, Victoria, Australia

Kudu is just a storage engine. You need a way to get data into it and out. As Cloudera’s default big data processing framework, Spark is the ideal data processing and ingestion tool for Kudu. Not only does Spark provide excellent scalability and performance, Spark SQL and the DataFrame API make it easy to interact with Kudu.

If you are coming from a data warehousing background or if you are familiar with a relational database such as Oracle and SQL Server, you can consider Spark a more powerful and versatile equivalent to procedural extensions to ...

Get Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark by Butch Quinto

6. High Performance Data Processing with Spark and Kudu

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly