Abstracting data

Blaze can abstract many different data structures and expose a single, easy-to-use API. This helps to get a consistent behavior and reduce the need to learn multiple interfaces to handle data. If you know pandas, there is not really that much to learn, as the differences in the syntax are subtle. We will go through some examples to illustrate this.

Working with NumPy arrays

Getting data from a NumPy array into the DataShape object of Blaze is extremely easy. First, let's create a simple NumPy array: we first load NumPy and then create a matrix with two rows and three columns:

import numpy as np
simpleArray = np.array([
        [1,2,3],
        [4,5,6]
    ])

Now that we have an array, we can abstract it with Blaze's DataShape structure:

simpleData_np ...

Get Learning PySpark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.