8 Building data pipelines with DuckDB

This chapter covers

  • The meaning and relevance of data pipelines
  • What roles DuckDB can have as part of a pipeline
  • How DuckDB integrates with tools like the Python-based data load tool for ingestion and the data build tool from dbt Labs for transformation
  • Orchestrating pipelines with Dagster

Having explored DuckDB’s seamless integration with prominent data processing languages, such as Python, and libraries, such as pandas, Apache Arrow, and Polars, in chapter 6, we know that DuckDB and its ecosystem are capable of tackling various tasks that belong to data pipelines and can, therefore, be used within them. The combination of a powerful SQL engine, well-integrated tooling, and the potential of a cloud ...

Get DuckDB in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.