CHAPTER 7Boolean Methods of Classification and Prediction

RULE-BASED TEXT CLASSIFICATION AND PREDICTION

The Boolean rule process, named Boollear, in a Cox and Zhao (2014) patentⁱ that describes the origination of the approach, was a significant step forward in facilitating the interoperability of manually produced linguistic rules and numerically derived, statistically computed rules. The Boollear process works like traditional decision tree processes so that the process data is run against a target category and a set of predictive/classification rules are extracted. Because the terms of the expression consist of word-terms linked by Boolean operators such as “and,” “or,” and “not” they can be directly converted to linguistic rules and can therefore be used by a linguistic rules engine.

Whereas common predictive engines like traditional decision trees are limited to Boolean expressions that consist of “and,” and “or” operators, Boollear also uses the “not” operator. The “not” operator is a necessary tool in linguistic disambiguation; i.e. the process of specifying the semantic meaning of a word based on the specific context of the word in a sentence. So, for example, the word “bass” in a sentence could be either a bass instrument or bass line in music or it could be a type of fish. A simple disambiguation would look for the presence of “bass” and the absence of “fish” in a sentence to potentially indicate the musical sense of the term. This disambiguation has value in predictive ...

Get Text as Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Text as Data by Barry DeVille, Gurpreet Singh Bawa

CHAPTER 7Boolean Methods of Classification and Prediction

RULE-BASED TEXT CLASSIFICATION AND PREDICTION

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly