Example: Parsing RTF Files

Rich-Text Format (RTF) files have long and storied history. First introduced in 1987 with Microsoft Word 3, RTF became the default file format of many word processors in the intervening years. It’s still the default format of TextEdit in Mac OS X, for example, almost thirty years after its introduction. Since it allows for a wide variety of formatting and is a standard, well-understood, and lightweight file format, it’s still common to find data stored in RTF format.

While these files are admittedly less common in the wild than, say, .doc files, their structure is also much simpler. It’s fundamentally a plain text format, peppered with the occasional instruction to define some element of the text’s formatting—its color, ...

Get Text Processing with Ruby now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.