Skip to Content
Python Data Science Handbook
book

Python Data Science Handbook

by Jake VanderPlas
November 2016
Beginner to intermediate
548 pages
13h 58m
English
O'Reilly Media, Inc.
Content preview from Python Data Science Handbook

Chapter 3. Data Manipulation with Pandas

In the previous chapter, we dove into detail on NumPy and its ndarray object, which provides efficient storage and manipulation of dense typed arrays in Python. Here we’ll build on this knowledge by looking in detail at the data structures provided by the Pandas library. Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a DataFrame. DataFrames are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data. As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.

As we saw, NumPy’s ndarray data structure provides essential features for the type of clean, well-organized data typically seen in numerical computing tasks. While it serves this purpose very well, its limitations become clear when we need more flexibility (attaching labels to data, working with missing data, etc.) and when attempting operations that do not map well to element-wise broadcasting (groupings, pivots, etc.), each of which is an important piece of analyzing the less structured data available in many forms in the world around us. Pandas, and in particular its Series and DataFrame objects, builds on the NumPy array structure and provides efficient access to these sorts of “data munging” tasks that occupy much of a data scientist’s ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Science Handbook, 2nd Edition

Python Data Science Handbook, 2nd Edition

Jake VanderPlas

Publisher Resources

ISBN: 9781491912126Errata Page