Chapter 7. Introduction to NumPy

This chapter aims to introduce the Numeric Python library (NumPy) to those unacquainted. NumPy is the key building block of Pandas, the powerhouse data analysis library that we will be using in the upcoming chapters to clean and explore our recently scraped Nobel Prize dataset (see Chapter 6). A basic understanding of NumPy’s core elements and principles is important if you are to get the most out of Pandas. Therefore, the emphasis of the chapter is to provide a foundation for the upcoming introduction to Pandas.

NumPy is a Python module that allows access to very fast, multi-dimensional array manipulation, implemented by low-level libraries written in C and Fortran.1 Python’s native performance with large quantities of data is relatively slow, but NumPy allows you to perform parallel operations on large arrays all at once, making it very fast. Given that NumPy is the chief building block of most of the heavyweight Python data-processing libraries, Pandas included, it’s hard to argue with its status as linchpin of the Python data-processing world.

In addition to Pandas, NumPy’s huge ecosystem includes Science Python (SciPy), which supplements NumPy with hardcore science and engineering modules; Scikit-learn, which adds a host of modern machine-learning algorithms in such domains as classification and feature extraction; and many other specialized libraries that use NumPy’s multidimensional arrays as their primary data objects. In this sense, basic ...

Get Data Visualization with Python and JavaScript now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.