CHAPTER 2Text Analytics Process Overview

TEXT ANALYTICS PROCESSING

In this chapter, we identify a number of best practices in the areas of machine learning, data and text mining, and analytics processing. A few processing templates have evolved for data mining and machine learning.i The cloud-enabled approach adopted by SAS is summarized in SAS Institute Inc.ii This is a fast-moving area where new practices evolve constantly.

PROCESS BUILDING BLOCKS

A high-level view of processing for text analytics resembles many solution approaches in information technology. This section looks at the primary building blocks often used in text analytics:

  • Preparation. Getting the text ready for analysis (data capture, text decomposition, mapping to a data representation)
  • Utilization. Interpretation and deployment.

Figure 2.1 describes the life cycle of text analytics from capture to deployment in six major processes. We can map document capture, test-to-data transfer, and characterization in the preparation phase. We can map latent structure development, composite document assembly, and prediction/classification in the utilization phase.

Schematic illustration of main stages of the text-mining process.

Figure 2.1 Main stages of the text-mining process.

Source: B. deVille.

Preparation

  • Capture documents. First, assemble the documents. Usually, text documents require some kind of preprocessing to bring them into the analysis environment. For example, articles ...

Get Text as Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.