10 Performance considerations for large datasets

This chapter covers

Preparing large volumes of data to be imported into DuckDB
Querying metadata and running exploratory data analysis (EDA) queries on large datasets
Exporting full databases concurrently to Parquet
Using aggregations on multiple columns to speed up statistical analysis
Using EXPLAIN and EXPLAIN ANALYZE to understand query plans

So far in this book, we’ve seen how to use DuckDB with a variety of datasets, but most of them have been small or medium in size. This isn’t unusual, as those datasets are representative of many of those we’ll come across in our daily work.

However, huge datasets do exist, and we wouldn’t want you to think that you need to use another data processing ...

Get DuckDB in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

DuckDB in Action by Michael Simons, Mark Needham, Michael Hunger

10 Performance considerations for large datasets

This chapter covers

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly