Chapter 13 Text Processing

Despite the wealth of multimedia information, text processing remains one of the dominant functions of computers. Computer are used to edit, store, and display documents, and to transport documents over the Internet. Furthermore, digital systems are used to archive a wide range of textual information, and new data is being generated at a rapidly increasing pace. A large corpus can readily surpass a petabyte of data (which is equivalent to a thousand terabytes, or a million gigabytes). Common examples of digital collections that include textual information are:

Snapshots of the World Wide Web, as Internet document formats HTML and XML are primarily text formats, with added tags for multimedia content
All documents stored locally on a user's computer
Email archives
Customer reviews
Compilations ...

Get Data Structures and Algorithms in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Structures and Algorithms in Python by

Chapter 13

Text Processing

13.1 Abundance of Digitized Text

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly