10 Performance considerations for large datasets

This chapter covers

  • Preparing large volumes of data to be imported into DuckDB
  • Querying metadata and running exploratory data analysis (EDA) queries on large datasets
  • Exporting full databases concurrently to Parquet
  • Using aggregations on multiple columns to speed up statistical analysis
  • Using EXPLAIN and EXPLAIN ANALYZE to understand query plans

So far in this book, we’ve seen how to use DuckDB with a variety of datasets, but most of them have been small or medium in size. This isn’t unusual, as those datasets are representative of many of those we’ll come across in our daily work.

However, huge datasets do exist, and we wouldn’t want you to think that you need to use another data processing ...

Get DuckDB in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.