CHAPTER 2Extending Python Using NumPy

What Is NumPy?

In Python, you usually use the list data type to store a collection of items. The Python list is similar to the concept of arrays in languages like Java, C#, and JavaScript. The following code snippet shows a Python list:

list1 = [1,2,3,4,5] 

Unlike arrays, a Python list does not need to contain elements of the same type. The following example is a perfectly legal list in Python:

list2 = [1,"Hello",3.14,True,5] 

While this unique feature in Python provides flexibility when handling multiple types in a list, it has its disadvantages when processing large amounts of data (as is typical in machine learning and data science projects). The key problem with Python's list data type is its efficiency. To allow a list to have non‐uniform type items, each item in the list is stored in a memory location, with the list containing an “array” of pointers to each of these locations. A Python list requires the following:

  • At least 4 bytes per pointer.
  • At least 16 bytes for the smallest Python object—4 bytes for a pointer, 4 bytes for the reference count, 4 bytes for the value. All of these together round up to 16 bytes.

Due to the way that a Python list is implemented, accessing items in a large list is computationally expensive. To solve this limitation with Python's list feature, Python programmers turn to NumPy, an extension to the Python programming language that adds support for large, multidimensional arrays and matrices, along with ...

Get Python Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.