Kudu is just a storage engine. You need a way to get data into it and out. As Cloudera’s default big data processing framework, Spark is the ideal data processing and ingestion tool for Kudu. Not only does Spark provide excellent scalability and performance, Spark SQL and the DataFrame API make it easy to interact with Kudu.
If you are coming from a data warehousing background or if you are familiar with a relational database such as Oracle and SQL Server, you can consider Spark a more powerful and versatile equivalent to procedural extensions to ...