10 Performance considerations for large datasets
This chapter covers
- Preparing large volumes of data to be imported into DuckDB
- Querying metadata and running exploratory data analysis (EDA) queries on large datasets
- Exporting full databases concurrently to Parquet
- Using aggregations on multiple columns to speed up statistical analysis
- Using
EXPLAIN
andEXPLAIN ANALYZE
to understand query plans
So far in this book, we’ve seen how to use DuckDB with a variety of datasets, but most of them have been small or medium in size. This isn’t unusual, as those datasets are representative of many of those we’ll come across in our daily work.
However, huge datasets do exist, and we wouldn’t want you to think that you need to use another data processing ...
Get DuckDB in Action now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.