This chapter focuses on textual content in social media and answers the question of how you can classify the subjects that users talk about. Because you obviously would like to see emerging, large-scale trends among the individual textual content you can find in online databases, you need to find a way to capture the meaning of the documents in a computational way. Becoming familiar with these methods will comprise a large part of this chapter.
You'll also see that after encoding text, you can find topics among the posts or documents. How popular are these topics, and how do they relate to each other? This is another important question we'll pay attention to in this chapter. You'll also see how to use these models to make predictions about users and how to further improve your understanding of text using the network connections among your users.
Defining Content: Focus on Text and Unstructured Data
In the following sections you work with datasets that highlight how individuals create and consume content, focusing on text analysis and the essential notions that you can use to understand what people are writing about. At first sight, it may not be straightforward how to make sense of text even when it's written in the same language by all users, and how to map it to concepts to structures that you can gain insights from.
To make our investigation more concrete, we'll work with a publicly available dataset of user-generated content ...