O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Converting unstructured to structured data

Notice that complaint data is in unstructured format. Certain text mining algorithms treat unstructured text as a bag of words, which means that in analyzing documents, one disregards semantics and grammar, and ends up treating each word as its own feature or variable.

An important data structure in text mining is a term document matrix (TDM), which simply indicates which words appear in each document.

The create_matrix() function will do this for us. However, before we do this, we will want to clean the data and impose some restrictions on the creation of the term document matrix.

First, we do not want to include any words, such as the, an, or it, that would not add any value to the TDM and would ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required