Chapter 14

Transversal Text Mining Techniques

14.1. Mixed and interdisciplinary text mining techniques

Levels of analysis of the relations can benefit generic methods, particularly on aspects of matrix analysis or statistical learning analysis [BEN 73; HAR 75; TUK 77; MUR 87; JAI 88; GOV 09]. Such is the case of latent semantic indexing and approaches to extraction of named entities.

14.1.1. Supervised, unsupervised and semi-supervised techniques

Data mining methods can be used to elaborate a model which will then be exploited in accordance with given parameters or new data. This model is often computed based on the data input into a system. From this, we can distinguish two main categories of algorithms for computing the model. Drawing inspiration from the domain of learning, the terminology imposed the denomination of “supervised” and “unsupervised” models. Models in the supervised category suppose that knowledge about the data exists. This knowledge can be found in the form of metadata in the data or as an external knowledge base connected to the data. Unsupervised models, for their part, suppose that only the data input into the system are available; this is more economic and sometimes more robust; however, this assumes a less intense level of interpretation. Other denominations speak of “knowledge rich” or “knowledge poor” methods. There is an intermediary family: “semi-supervised” models, which suppose that little information/knowledge is available to us, on a reduced sample ...

Get Knowledge Needs and Information Extraction: Towards an Artificial Consciousness now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.