Chapter 2

Structuring Text

Abstract

Most of the data on the Web today is unstructured text, produced by individuals, trying their best to communicate with one another. Data simplification often begins with textual data. This chapter provides readers with tools and strategies for imposing some basic structure on free-text.

Keywords

Free-text; ASCII; Asciibetical order; Alphabetization; Sentence parsing; Abbreviations; Acronyms; Metadata; XML; HTML; Markup languages

2.1 The Meaninglessness of Free Text

I've had a perfectly wonderful evening. But this wasn't it.

Groucho Marx

English is such a ridiculous language that an objective observer might guess that it was designed for the purpose of impeding communication. As someone who has dabbled ...

Get Data Simplification now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.