Chapter 2

Shaping Data

IN THIS CHAPTER

check Manipulating HTML data

check Manipulating raw text

check Discovering the bag of words model and other techniques

check Manipulating graph data

“It is a capital mistake to theorize before one has data.”

— SHERLOCK HOLMES

Book 7, Chapter 1 demonstrates techniques for working with data as an entity — as something you work with in Python. However, data doesn’t exist in a vacuum. It doesn’t just suddenly appear within Python for absolutely no reason at all. As demonstrated in Book 6, Chapter 3, you load the data. However, loading may not be enough — you may have to shape the data as part of loading it. That’s the purpose of this chapter. You discover how to work with a variety of container types in a way that makes it possible to load data from a number of complex container types, such as HTML pages. In fact, you even work with graphics, images, and sounds.

remember As you progress through the book, you discover that data takes all kinds of forms and shapes. As far as the ...

Get Coding All-in-One For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.