Skip to Main Content
Getting Started with Kudu
book

Getting Started with Kudu

by Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland, Ryan Bosshart
July 2018
Beginner to intermediate content levelBeginner to intermediate
156 pages
4h 2m
English
O'Reilly Media, Inc.
Content preview from Getting Started with Kudu

Chapter 5. Common Developer Tasks for Kudu

At its very core, Apache Kudu is a highly resilient, distributed, fault-tolerant storage engine that manages structured data really well. Moving data into Kudu and getting it out is meant to be done easily and efficiently through simple-to-understand APIs.

For the developer, you have several choices in how you could interact with the data you store in Kudu. Client-side APIs are provided for the following programming languages:

  • C++

  • Java

  • Python

Compute frameworks such as MapReduce and Spark are also available when interacting with Kudu. MapReduce, using the Java client, has a native Kudu input format, whereas Spark’s API provides a specialized Kudu Context together with deep integration with Spark SQL.

Providing SQL access to Kudu is a natural fit given that Kudu stores data in a structured, strongly typed fashion. Thus, as of today, not only can you use Spark SQL to access and manipulate your data, but also Apache Impala. Impala is an open source, native analytic database for Hadoop and is shipped by multiple Hadoop distributions. It, too, provides a clean abstraction of tables that can exist in Kudu, Hadoop Distributed File System (HDFS), HBase, or cloud-based object stores like Amazon Web Services Simple Storage Service (Amazon S3).

In this chapter, we dive into the various client-side APIs, including Spark, and then round out the chapter discussing how Impala’s integration with Kudu can be used for many types of use cases. All ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Building a Near Real-Time Analytical Application with Kudu

Building a Near Real-Time Analytical Application with Kudu

Ryan Bosshart

Publisher Resources

ISBN: 9781491980248Errata Page