Chapter 2

Structuring Text

Abstract

Most of the data on the Web today is unstructured text, produced by individuals, trying their best to communicate with one another. Data simplification often begins with textual data. This chapter provides readers with tools and strategies for imposing some basic structure on free-text.

Keywords

Free-text; ASCII; Asciibetical order; Alphabetization; Sentence parsing; Abbreviations; Acronyms; Metadata; XML; HTML; Markup languages

2.1 The Meaninglessness of Free Text

I've had a perfectly wonderful evening. But this wasn't it.

Groucho Marx

English is such a ridiculous language that an objective observer might guess that it was designed for the purpose of impeding communication. As someone who has dabbled ...

Get Data Simplification now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.