O'Reilly logo

Learning PySpark by Denny Lee, Tomasz Drabas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Abstracting data

Blaze can abstract many different data structures and expose a single, easy-to-use API. This helps to get a consistent behavior and reduce the need to learn multiple interfaces to handle data. If you know pandas, there is not really that much to learn, as the differences in the syntax are subtle. We will go through some examples to illustrate this.

Working with NumPy arrays

Getting data from a NumPy array into the DataShape object of Blaze is extremely easy. First, let's create a simple NumPy array: we first load NumPy and then create a matrix with two rows and three columns:

import numpy as np
simpleArray = np.array([
        [1,2,3],
        [4,5,6]
    ])

Now that we have an array, we can abstract it with Blaze's DataShape structure:

simpleData_np ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required