IN THIS CHAPTER
Manipulating HTML data
Manipulating raw text
Discovering the bag of words model and other techniques
Manipulating graph data
“It is a capital mistake to theorize before one has data.”
— SHERLOCK HOLMES
Book 7, Chapter 1 demonstrates techniques for working with data as an entity — as something you work with in Python. However, data doesn’t exist in a vacuum. It doesn’t just suddenly appear within Python for absolutely no reason at all. As demonstrated in Book 6, Chapter 3, you load the data. However, loading may not be enough — you may have to shape the data as part of loading it. That’s the purpose of this chapter. You discover how to work with a variety of container types in a way that makes it possible to load data from a number of complex container types, such as HTML pages. In fact, you even work with graphics, images, and sounds.