Chapter 6. Words and Pixels – Working with Unstructured Data

Most of the data we have looked at thus far is composed of rows and columns with numerical or categorical values. This sort of information fits in both traditional spreadsheet software and the interactive Python notebooks used in the previous exercises. However, data is increasingly available in both this form, usually called structured data, and more complex formats such as images and free text. These other data types, also known as unstructured data, are more challenging than tabular information to parse and transform into features that can be used in machine learning algorithms.

What makes unstructured data challenging to use? It is challenging largely because images and text are extremely ...

Get Mastering Predictive Analytics with Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.