You need to convert characters in text stored in or retrieved from a database to their proper HTML entities.
Use PHP's built-in string formatting functions, such as htmlentities( )
and str_ replace( )
, to build your own an
on-the-fly reformatting function:
function processText( $text ) { $text = str_replace(">",">",$text); $text = str_replace("<","<",$text); $text = str_replace("\r\n\r\n"," </p>\n<p> ",$text); $text = str_replace("\r\n "," </p>\n<p> ",$text); $text = str_replace("\n\n"," </p>\n<p> ",$text); $text = str_replace("\n "," </p>\n<p> ",$text); return $text; }
The articles and other content displayed on a dynamic or template-driven web site are stored in database tables on the web server in which each bit of a page—headline, subhead, byline, and main text—likely has its own field, or slot, in an individual article record. The logic of the template file, written in PHP or another server-side scripting language, then retrieves a specific article based on a browser request and formats the contents of the article record as an HTML web page. A template may be designed to display just a handful of different articles, or thousands of different entries.
PHP has some built-in tools for handling the special requirements of text that moves from a database to a web page and back again. Combining these tools into one master function that meets the specific needs of your database-driven site ensures that all the content on your site gets formatted the same way. When you need to make a change, editing this single function does the trick.
addslashes( )
and stripslashes( )
are two built-in PHP
functions that escape and unescape single-quote, double-quote, and
backslash characters in text strings inserted and retrieved from a
database. PHP will prevent those characters from being misinterpreted
as delimiters between records by adding a slash before them. For
example, addslashes( )
changes "St.
Patrick's Day" into "St. Patrick\'s Day", while stripslashes( )
reverses the process.
Tip
If your PHP installation has magic_quotes_gpc
enabled, then you
should not use either of these functions. PHP
will do it for you. You can easily check this and other PHP
configuration settings by uploading a file to your web site called
test.php containing this one line:
<?php phpinfo( ) ?>
Then request the file through your web browser
(http://domain.com/test.php); the status of
magic_quotes_gpc
should be listed
as "On" or "Off."
Beyond that convenience, PHP makes no assumptions about how the
text coming and going from your database should be formatted. But it
provides a handful of built-in functions that you can use to do it
yourself, such as converting new line characters—\n
—to HTML line break characters—<br>
—via the nl2br( )
function.
But using <br>
tags to
create line breaks after text blocks is an obsolete technique. In
fact, the <br>
tag has been
retired from use by the WC3 in the latest HTML DTDs (see Recipe 4.1).
Instead, you should be using block element markup—typically the
paragraph tags <p>
and
</p>
—along with a stylesheet
to define styles for text blocks between them. If you've got the rest
of your site formatted this way (and you should), then text converted
with nl2br( )
may not get formatted
correctly.
PHP's all-purpose find-and-replace function, str_replace( )
, offers a way to wrap text
blocks in paragraph tags:
$text = str_replace("\n\n"," </p>\n <p> ",$text);
Here, the str_replace( )
function replaces all double new line characters with closing and
opening paragraph tags and retains one new line character to make the
resulting code more readable. So this:
The quick brown fox jumped over the lazy dogs\n\n
Now is the time for all good men and
women to come to the aid of their country.
Becomes this:
The quick brown fox jumped over the lazy dogs</p>
<p>
Now is the time for all good men and women to come to the aid of their country.
When you print the text block in the PHP template, enclose it in paragraph tags, since the first and last paragraphs of the text block likely were not preceded or followed by new lines:
echo "<p>".$text."</p>"
So, the result would be:
<p>
The quick brown fox jumped over the lazy dogs</p>
<p>
Now is the time for all good men and women to come to the aid of their country.</p>
The str_replace( )
function
looks for an exact match, so you may need to include some alternate
searches, depending on how text is stored in your database. Paragraphs
may be separated by just one new line character, or as many as two new
line characters and two return characters (\r
), or more. Here's a function that handles
a few likely scenarios. When combining searches, always start with the
most complex pattern and work toward the simplest to avoid double
replacements. First, add this function to your PHP template:
function processText( $text ) { $text = str_replace("\r\n\r\n"," </p>\n<p> ",$text); $text = str_replace("\r\n "," </p>\n<p> ",$text); $text = str_replace("\n\n"," </p>\n<p> ",$text); $text = str_replace("\n "," </p>\n<p> ",$text); return $text; }
Then apply the function to your text:
$text = processText($text);
PHP also has a couple of functions for converting special characters to their HTML entities. For more about entities, see Recipe 4.2.
One—htmlspecialchars( )
—converts
only ampersands, greater than (>) signs, and less than (<) signs
(by default), as well as double and single quotes in a user-specified
extended mode. The other—htmlentities(
)
—converts any character for which there is an HTML entity,
including the ones that htmlspecialchars(
)
converts.
Tip
Support for characters outside the Latin 1 (ISO-8859-1) repertoire varies depending on the character set installed on your web server and your installed version of PHP.
If you or your web site's visitors submit content to a database that is then displayed on the site, you can create a function to format and encode that text before it is saved in the database.
This function includes the addslashes(
)
function to demonstrate how two built-in functions can be
combined into a custom function:
function processInsert( $text ) { $text = addslashes($text); $text = htmlentities($text); return $text; }
If the content submitted to the database includes inline HTML
tags, such as <em>
Important</em>
for italics, htmlentities( )
will change it to <em>Important</em>
.
And that will be rendered on the page as:
<em>Important</em>
The tags are showing, but without the emphasis the author
intended. You can modify the text-processing function I described
above to undo a little of what htmlentities(
)
has done so tags display properly. Two calls to str_replace( )
restore the greater than and
less than sign to the tags:
$text = str_replace(">",">",$text); $text = str_replace("<","<",$text);
The complete function now looks like this:
function processText( $text ) { $text = str_replace(">",">",$text); $text = str_replace("<","<",$text); $text = str_replace("\r\n\r\n"," </p>\n<p> ",$text); $text = str_replace("\r\n "," </p>\n<p> ",$text); $text = str_replace("\n\n"," </p>\n<p> ",$text); $text = str_replace("\n "," </p>\n<p> ",$text); return $text; }
The online PHP Manual has detailed information on all the built-in functions described in this Recipe:
addslashes
(http://us2.php.net/manual/en/function.addslashes.php)stripslashes
(http://us2.php.net/manual/en/function.stripslashes.php)htmlspecialchars
(http://us2.php.net/manual/en/function.htmlspecialchars.php)htmlentities
(http://us2.php.net/manual/en/function.htmlentities.php)
Get Web Site Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.