2.6

Textual Disambiguation

Abstract

Unstructured nonrepetitive data is contextualized by means of a process called textual disambiguation. It is only after textual disambiguation that unstructured nonrepetitive data is able to be analyzed. Textual disambiguation is sometimes called textual ETL. Textual disambiguation consists of many different algorithms. The two most prominent algorithms are document fracturing and named value process (sometimes called inline contextualization). The process of identifying documents that need to be processed through textual disambiguation is preceded by the mapping process. The iterative approach is the way that documents are normally processed. Another form of disambiguation is that of report decompilation. ...

Get Data Architecture: A Primer for the Data Scientist now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.