Chapter 2. Finding the Structure of Documents

Dilek Hakkani-Tür, Gokhan Tur, Benoit Favre, and Elizabeth Shriberg

2.1. Introduction

In human language, words and sentences do not appear randomly but usually have a structure. For example, combinations of words form sentences—meaningful grammatical units, such as statements, requests, and commands. Likewise, in written text, sentences form paragraphs—self-contained units of discourse about a particular point or idea. Sentences may also be related to each other by explicit discourse connectives such as therefore.

Automatic extraction of structure of documents helps subsequent natural language processing (NLP) tasks; for example, parsing, machine translation, and semantic role labeling use sentences ...

Get Multilingual Natural Language Processing Applications: From Theory to Practice now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.