7 High-performance pandas and Apache Arrow

This chapter covers

  • Optimizing memory usage with pandas’ data frame creation
  • Decreasing computational cost of pandas operations
  • Using Cython, NumExpr, and Numpy to accelerate pandas operations
  • Optimizing pandas with Apache Arrow

Data analytics is essentially synonymous with using pandas. pandas is a data frame library, or a library to process tabular data. pandas is the de facto standard in the Python world to process in-memory tabular data. In this chapter, we will discuss approaches to optimize pandas usage. This will be a two-pronged approach: we will optimize pandas usage directly, and we will also optimize it using Apache Arrow.

Apache Arrow provides language-agnostic functionality to efficiently ...

Get Fast Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.