13

Playing with the TPC-DS Dataset

In this chapter, we will get acquainted with the TPC-DS dataset. Lakehouse platforms, including Databricks, use TPC-DS benchmarks to prove their capabilities. Hence, it is important to know about it. In this chapter, we will learn about the TPC-DS dataset, the TPC-DS benchmark, and how to use the TPC-DS dataset to validate some of the concepts we learned about in the previous chapters.  

This chapter is only for advanced users who wish to build a larger dataset to test out Databricks SQL features. If you already have access to such a dataset, or you don’t want to test with bigger datasets, there is no need to go through this chapter.

In this chapter, we will cover the following topics:

  • Understanding the TPC-DS ...

Get Business Intelligence with Databricks SQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.