Skip to Main Content
Getting Started with Kudu
book

Getting Started with Kudu

by Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland, Ryan Bosshart
July 2018
Beginner to intermediate content levelBeginner to intermediate
156 pages
4h 2m
English
O'Reilly Media, Inc.
Content preview from Getting Started with Kudu

Chapter 2. About Kudu

Apache Kudu is often summarized in one single phrase: a Hadoop storage layer to enable fast analytics on fast data. Although a short, simple-to-understand statement, until now, achieving those goals has not been easy.

We can achieve analytics with today’s big data technology. Namely, storing data in highly efficient, columnar storage formats in particular, such as Parquet and ORC, allows for compute engines to sequentially read data across the entire distributed filesystem, HDFS, at a very high rate. Analytical type queries perform large aggregations over a subset of columns. This effectively translates to a projection of the columns the query is requesting coupled with performing a mathematical operation on a large number of values in that column. Thus, columnar storage formats are terrific because a) to project a column simply means you limit the I/O to solely the pages on disk containing data for that column—which is doable because the format is already split across columns, and b) numbers in particular can use various encoding and packing mechanisms to stuff massive amounts of data representing many rows onto a single page on disk. This means that I/O can be extremely efficient, and compute operations on the values in a column can be performed quickly. In short, the HDFS filesystem, coupled with columnar file formats, yields highly performant I/O and compute capability resulting in analytics queries being processed quickly.

On the other hand, we also ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Building a Near Real-Time Analytical Application with Kudu

Building a Near Real-Time Analytical Application with Kudu

Ryan Bosshart

Publisher Resources

ISBN: 9781491980248Errata Page