Chapter 12. Advanced NumPy

ndarray Object Internals

The NumPy ndarray provides a means to interpret a block of homogeneous data (either contiguous or strided, more on this later) as a multidimensional array object. As you’ve seen, the data type, or dtype, determines how the data is interpreted as being floating point, integer, boolean, or any of the other types we’ve been looking at.

Part of what makes ndarray powerful is that every array object is a strided view on a block of data. You might wonder, for example, how the array view arr[::2, ::-1] does not copy any data. Simply put, the ndarray is more than just a chunk of memory and a dtype; it also has striding information which enables the array to move through memory with varying step sizes. More precisely, the ndarray internally consists of the following:

  • A pointer to data, that is a block of system memory

  • The data type or dtype

  • A tuple indicating the array’s shape; For example, a 10 by 5 array would have shape (10, 5)

    In [8]: np.ones((10, 5)).shape
    Out[8]: (10, 5)
  • A tuple of strides, integers indicating the number of bytes to “step” in order to advance one element along a dimension; For example, a typical (C order, more on this later) 3 x 4 x 5 array of float64 (8-byte) values has strides (160, 40, 8)

    In [9]: np.ones((3, 4, 5), dtype=np.float64).strides
    Out[9]: (160, 40, 8)

    While it is rare that a typical NumPy user would be interested in the array strides, they are the critical ingredient in constructing copyless array views. Strides ...

Get Python for Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.