Skip to Content
Data Structures and Algorithms in Python
book

Data Structures and Algorithms in Python

by Michael T. Goodrich, Roberto Tamassia, Michael H. Goldwasser
March 2013
Intermediate to advanced
748 pages
21h 42m
English
Wiley
Content preview from Data Structures and Algorithms in Python

Chapter 13

Text Processing

images

Contents

13.1 Abundance of Digitized Text

Despite the wealth of multimedia information, text processing remains one of the dominant functions of computers. Computer are used to edit, store, and display documents, and to transport documents over the Internet. Furthermore, digital systems are used to archive a wide range of textual information, and new data is being generated at a rapidly increasing pace. A large corpus can readily surpass a petabyte of data (which is equivalent to a thousand terabytes, or a million gigabytes). Common examples of digital collections that include textual information are:

  • Snapshots of the World Wide Web, as Internet document formats HTML and XML are primarily text formats, with added tags for multimedia content
  • All documents stored locally on a user's computer
  • Email archives
  • Customer reviews
  • Compilations ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Structures and Algorithms

Python Data Structures and Algorithms

Benjamin Baka
Hands-On Data Structures and Algorithms with Python - Second Edition

Hands-On Data Structures and Algorithms with Python - Second Edition

Dr. Basant Agarwal, Benjamin Baka, David Julian
Data Structures & Algorithms in Python

Data Structures & Algorithms in Python

John Canning, Alan Broder, Robert Lafore

Publisher Resources

ISBN: 9781118290279Purchase bookOtherOther