O'Reilly logo

R: Mining Spatial, Text, Web, and Social Media Data by Richard Heimann, Nathan Danneman, Pradeepta Mishra, Bater Makhabel

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Preliminary analyses

Text data, such as tweets, comes with little structure compared to spreadsheets and other typical types of data. One very useful way to impose some structure on text data is to turn it into a document-term matrix. This is a matrix where each row represents a document and each term is represented as a column. Each element in the matrix represents the number of times a particular term (column) appears in a particular document (row). Put differently, the i, jth element counts the number of times the term j appears in the document i. Document-term matrices get their length from the number of input documents and their width from the number of unique words used in the collection of documents, which is often called a corpus. Throughout ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required