8 Storing big data

This chapter covers

Getting to know fsspec, an abstraction library over filesystems
Storing heterogeneous columnar data efficiently with Parquet
Processing data files with in-memory libraries like pandas or Parquet
Processing homogeneous multi-dimensional array data with Zarr

When dealing with big data, persistence is of paramount importance. We want to be able to access—to read and write—data as fast as possible, preferably from many parallel processes. We also want persistent representations that are compact because storing large amounts of data can be expensive.

In this chapter, we will consider several approaches to make persistent storage of data more efficient. We will start with a short discussion of fsspec, a library ...

Get Fast Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Fast Python by Tiago Antao

8 Storing big data

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly