Chapter 12

Stretching Python’s Capabilities

IN THIS CHAPTER

Bullet Understanding how Scikit-learn works with classes

Bullet Using sparse matrices and the hashing trick

Bullet Testing performances and memory consumption

Bullet Saving time with multicore algorithms

If you’ve gone through the previous chapters, by this point you’ve dealt with all the basic data loading and manipulation methods offered by Python. Now it’s time to start using some more complex instruments for data wrangling (or munging) and for machine learning. The final step of most data science projects is to build a data tool able to automatically summarize, predict, and recommend directly from your data.

Before taking that final step, you still have to process your data by enforcing transformations that are even more radical. That’s the data wrangling or data munging part, where sophisticated transformations are followed by visual and statistical explorations, and then again by further transformations. In the following sections, you learn how to handle huge streams of text, explore the basic characteristics of a dataset, optimize the speed ...

Get Python for Data Science For Dummies, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.