Chapter 8. Vectorization

New York City, Old St. Joe, Albuquerque, New Mexico This old rig is humming and rolling and she’s doing fine If somebody wants to know what’s become of this so and so Tell em’ I’m somewhere looking for the end of that long white line

Sturgill Simpson, Long White Line

Introduction to Vectorization in Machine Learning

This chapter is meant to serve as a set of guidelines for vectorizing different kinds of data used in the machine learning landscape. You might be wondering why we’ve taken a detour off into the land of vectorization in a book about deep learning. The main reason is that most machine learning books focus purely on the algorithms themselves and less so on the complete lifecycle of data mining. We want to experiment with data as fast as possible in our machine learning tools, and we end up spending far too much time on topics like custom vectorization of text data.

In our professional experience working with enterprise customers we’ve seen situations in which we’d talk about implementing text classification techniques only to have the exercise derailed by needing to diverge into a long discussion about the basics of converting text to vectors. Companies have a lot of simple data sources like spreadsheets that can be exported into comma-separated values (CSV) format yet still need to be transformed into vectors. We also find ourselves trying to explain the myriad ways textual data can be vectorized. Depending on the tools involved and the desired ...

Get Deep Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.