Chapter 4 Parsing and Extracting Features

Introduction

Tokens and Words

Lemmatization

POS Tags

Parsing Tree

Text Parsing Node in SAS Text Miner

Stemming and Synonyms

Identifying Parts of Speech

Using Start and Stop Lists

Spell Checking

Entities

Building Custom Entities Using SAS Contextual Extraction Studio

Summary

References

Introduction

In this chapter, we discuss the next step and perhaps the most important step in the text mining process flow—text parsing. In Chapters 2 and 3, we have seen how various methods collect and process textual documents. The next task is to convert the collected text documents (in unstructured form) to a vector representation (a structured form). Fundamentally, parsing is the first step in converting unstructured ...

Get Text Mining and Analysis now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.