Chapter 9

Opinion Detection as a Topic Classification Problem 1

9.1. Introduction

In recent years, the classification of documents according to their opinion1 considered as a sub-task of document classification, has attracted a steadily growing interest from the Natural Language Processing (NLP) community. Various problems are resolved in document classification, including those which consist in determining the thematic of a document among a finite set of possible thematics. For example, in a corpus of journalistic documents, the task consists in classifying the thematics of texts as politics, society, sports, arts, and so on. The objective of opinion detection is to find out, for example, whether a positive or negative opinion is expressed in a text on a certain subject. From this perspective, positive and negative opinions can be considered as two classes which have to be attributed in the framework of the classical classification task. A priori, detection and classification of opinions might appear to be a simple task. For numerous reasons, the problem turns out to be rather complex and difficult to solve. An aggravating factor is that often only corpora of limited size and with an asymmetrical distribution of their classes are available.

However, the highly subjective nature of the documents (which may be, among other things, texts associated with products, music criticisms, cinema, political interventions, blogs, or discussion forums) adds to the difficulty of the task. This ...

Get Textual Information Access: Statistical Models now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.