Exploring data using Blaze

Blaze is an open source Python library, primarily developed by Continuum.io, leveraging Python Numpy arrays and Pandas dataframe. Blaze extends to out-of-core computing, while Pandas and Numpy are single-core.

Blaze offers an adaptable, unified, and consistent user interface across various backends. Blaze orchestrates the following:

  • Data: Seamless exchange of data across storages such as CSV, JSON, HDF5, HDFS, and Bcolz files.
  • Computation: Using the same query processing against computational backends such as Spark, MongoDB, Pandas, or SQL Alchemy.
  • Symbolic expressions: Abstract expressions such as join, group-by, filter, selection, and projection with a syntax similar to Pandas but limited in scope. Implements the split-apply-combine ...

Get Spark for Python Developers now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.