Skip to Main Content
Regular Expressions Cookbook, 2nd Edition
book

Regular Expressions Cookbook, 2nd Edition

by Jan Goyvaerts, Steven Levithan
August 2012
Intermediate to advanced content levelIntermediate to advanced
609 pages
19h 16m
English
O'Reilly Media, Inc.
Content preview from Regular Expressions Cookbook, 2nd Edition

9.5. Convert Plain Text to HTML by Adding <p> and <br> Tags

Problem

Given a plain text string, such as a multiline value submitted via a form, you want to convert it to an HTML fragment to display within a web page. Paragraphs, separated by two line breaks in a row, should be surrounded with <p></p>. Additional line breaks should be replaced with <br> tags.

Solution

This problem can be solved in four simple steps. In most programming languages, only the middle two steps benefit from regular expressions.

Step 1: Replace HTML special characters with named character references

As we’re converting plain text to HTML, the first step is to convert the three special HTML characters &, <, and > to named character references (see Table 9-3). Otherwise, the resulting markup could lead to unintended results when displayed in a web browser.

Table 9-3. HTML special character substitutions

Search for

Replace with

&

«&amp;»

<

«&lt;»

>

«&gt;»

Ampersands (&) must be replaced first, since you’ll be adding more ampersands to the subject string as part of the named character references.

Step 2: Replace all line breaks with <br>

Search for:

\r\n?|\n
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
\R
Regex options: None
Regex flavors: PCRE 7, Perl 5.10

Replace with:

<br>
Replacement text flavors: .NET, Java, JavaScript, Perl, PHP, Python, Ruby

Step 3: Replace double <br> tags with </p><p>

Search for:

<br>\s*<br>
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Mastering Regular Expressions, 3rd Edition

Mastering Regular Expressions, 3rd Edition

Jeffrey E.F. Friedl
Regular Expressions Cookbook

Regular Expressions Cookbook

Jan Goyvaerts, Steven Levithan

Publisher Resources

ISBN: 9781449327453Supplemental ContentErrata Page