CHAPTER 5

Data Collection, Annotation, and Evaluation

5.1    INTRODUCTION

In this chapter, we discuss different complementary aspects of social media text analysis. The results of the information analysis could be influenced by the quality of collected input data. In order to use empirical methods of natural language processing or statistical machine learning algorithms, we need to build or acquire data for training or development, and for testing. These data sets need to be annotated. At least the test data needs to be annotated, so that we can evaluate the algorithms. The training data needs to be labeled in case the algorithms are supervised learning algorithms, while unsupervised learning algorithms can use the data as it is, without additional ...

Get Natural Language Processing for Social Media now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.