Chapter 6. Table and Schema Design

In this chapter, we cover schema design in Kudu with the goal of explaining the basic concepts and primitives to make your project successful. An ideal schema would result in read and write operations spreading evenly across the cluster and also result in the minimum amount of data being processed during query evaluation. It’s our belief that by understanding the basics described in this chapter, you will be closer to building an ideal schema and thus be on the pathway to success.

The Kudu project itself has fantastic schema design documentation, so even though there is some overlap, we will also focus on topics of particular importance and provide additional background.

In any data storage system, schema design is extremely important and the cause of many headaches and showstoppers. Poor schema design in relational databases can cause issues ranging from intensive resource consumption to data corruption. HBase and Cassandra require extensive knowledge of how the data will be accessed prior to designing a schema, and a deficiency here is the most common cause of project blockers due to slow query performance—almost always due to intensive resource consumption. In Kudu, schema design is as important, but Kudu provides some features these other systems don’t provide to make a larger range of use cases possible.

Schema Design Basics

This section provides basics of Kudu schema design for readers who have not read the official schema design ...

Get Getting Started with Kudu now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.