Chapter 17.1

Managing Text

Abstract

In most organizations, text forms the basis of the majority of data in the corporation. Yet, many corporations do little or nothing with text. For many years, there were technological reasons why text was so difficult to handle. But in today's world, text is easily manageable. Organizations find that there is a wealth of value that can be attained by addressing and employing the text that is in the corporate walls.

Keywords

Text; DBMS; NLP; Stemming; Soundex; Taxonomy; Blob; Stop word; Context; Textual ETL; In line contextualization; Post processing; Preprocessing

Text is the Wednesday's child of technology. It has been forgotten and abandoned, to the point that organizations act as if they don’t have ...

Get Data Architecture: A Primer for the Data Scientist, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.