Chapter 1. Why Kudu?
Why Does Kudu Matter?
As big data platforms continue to innovate and evolve, whether on-premises or in the cloud, it’s no surprise that many are feeling some fatigue at the pace of new open source big data project releases. After working with Kudu for the past year with large companies and real-world use cases, we’re more convinced than ever that Kudu matters and that it’s very much worthwhile to add yet another project to the open source big data world.
Our reasoning boils down to three essential points:
-
Big data is still too difficult—as the audience and appetite for data grows, Hadoop and big data platforms are still too difficult, and much of this complexity is driven from limitations in storage. At our office, long-winded architecture discussions are now being cut short with the common refrain, “Just use Kudu and be done with it.”
-
New use cases need Kudu—the use cases Hadoop is being called upon to serve are changing—this includes an increasing focus on machine-generated data and real-time analytics. To demonstrate this complexity, we walk through some architectures for real-time analytics using existing big data storage technologies and discuss how Kudu simplifies these architectures.
-
The hardware landscape is changing—many of the fundamental assumptions about hardware upon which Hadoop was built are changing. There are fresh opportunities to create a storage manager with improved performance and workload flexibility.
In this chapter we discuss ...