6 Integrating with the Python ecosystem

This chapter covers

  • The differences between DuckDB’s implementation of Python DB-API 2.0 and the DuckDB relational API
  • Ingesting data from pandas DataFrames, Apache Arrow tables, and more via the Python API
  • Querying pandas DataFrames with DuckDB methods
  • Exporting data to various DataFrames formats and Apache Arrow Tables
  • Using DuckDB’s relational API to compose queries

Up until now, we’ve consistently used the DuckDB CLI to manage and execute our queries. This tool is highly effective for on-the-spot analysis and for CLI-based pipelines. Many data workflows, however, involve Python and its ecosystem to a large extent. For example, pandas DataFrames can’t be ignored. In this chapter, we will learn ...

Get DuckDB in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.