Because PHP programs often interact with HTML pages, web addresses
(URLs), and databases, there are functions to help you work with
those types of data. HTML, web page addresses, and database commands
are all strings, but they each require different characters to be
escaped in different ways. For instance, a space in a web address
must be written as %20
, while a literal less-than sign (<
) in an
HTML document must be written as <
. PHP has a number of
built-in functions to convert to and from these encodings.
Special
characters in HTML are represented by entities
such as &
and <
. There are two PHP functions for
turning special characters in a string into their entities, one for
removing HTML tags, and one for extracting only meta
tags.
The htmlspecialchars( )
function changes all characters with HTML entity equivalents into
those equivalents (with the exception of the space character). This
includes the
less-than sign (<
), the greater-than
sign (>
), the ampersand (&
), and accented
characters.
For example:
$string = htmlentities("Einsturzende Neubauten");
echo $string;
Einstürzende Neubauten
The entity-escaped version (ü
) correctly displays as
ü in the web page. As you can see, the space has not been
turned into
.
The htmlentities( )
function actually takes up to three
arguments:
$output = htmlentities(input
,quote_style
,charset
);
The charset
parameter, if given, identifies the character set. The default is
“ISO-8859-1”. The
quote_style
parameter controls whether
single and double quotes are turned into their entity forms.
ENT_COMPAT
(the default) converts only double
quotes, ENT_QUOTES
converts both types of quotes,
and ENT_NOQUOTES
converts neither. There is no
option to convert only single quotes. For example:
$input = <<< End "Stop pulling my hair!" Jane's eyes flashed.<p> End; $double = htmlentities($input); // "Stop pulling my hair!" Jane's eyes flashed.<p> $both = htmlentities($input, ENT_QUOTES); // "Stop pulling my hair!" Jane's eyes flashed.<p> $neither = htmlentities($input, ENT_NOQUOTES); // "Stop pulling my hair!" Jane's eyes flashed.<p>
The htmlspecialchars( )
function converts the
smallest set of entities possible to generate valid HTML. The
following entities are converted:
Ampersands (
&
) are converted to&
Double quotes (
"
) are converted to"
Single quotes (
'
) are converted to'
(ifENT_QUOTES
is on, as described forhtmlentities( )
)Less-than signs (
<
) are converted to<
Greater-than signs (
>
) are converted to>
If you have an application that displays data that a user has entered
in a form, you need to run that data through
htmlspecialchars( )
before displaying or saving
it. If you don’t, and the user enters a string like
"angle
<
30"
or "sturm
&
drang"
, the browser will
think the special characters are HTML, and you’ll
have a garbled page.
Like htmlentities( )
, htmlspecialchars( )
can take up to three arguments:
$output = htmlspecialchars(input
, [quote_style
, [charset]]
);
The quote_style
and
charset
arguments have the same meaning
that they do for htmlentities( )
.
There are no functions specifically
for converting back from the entities to the original text, because
this is rarely needed. There is a relatively simple way to do this,
though. Use the get_html_translation_table( )
function to fetch the translation
table used by either of these functions in a given quote style. For
example, to get the translation table that htmlentities( )
uses, do this:
$table = get_html_translation_table(HTML_ENTITIES);
To get the table for htmlspecialchars( )
in ENT_NOQUOTES
mode, use:
$table = get_html_translation_table(HTML_SPECIALCHARS, ENT_NOQUOTES);
A nice trick is to use this translation table, flip it using
array_flip( )
, and feed it to strtr( )
to apply it to a string, thereby effectively doing the
reverse of htmlentities( )
:
$str = htmlentities("Einstürzende Neubauten"); // now it is encoded
$table = get_html_translation_table(HTML_ENTITIES);
$rev_trans = array_flip($table);
echo strtr($str,$rev_trans); // back to normal
Einstürzende Neubauten
You can, of course, also fetch the translation table, add whatever
other translations you want to it, and then do the strtr( )
. For
example, if you wanted htmlentities( )
to also encode spaces to
s, you would do:
$table = get_html_translation_table(HTML_ENTITIES); $table[' '] = ' '; $encoded = strtr($original, $table);
The strip_tags( )
function removes HTML tags from a
string:
$input = '<p>Howdy, "Cowboy"</p>'; $output = strip_tags($input); // $output is 'Howdy, "Cowboy"'
The function may take a second argument that specifies a string of tags to leave in the string. List only the opening forms of the tags. The closing forms of tags listed in the second parameter are also preserved:
$input = 'The <b>bold</b> tags will <i>stay</i><p>'; $output = strip_tags($input, '<b>'); // $output is 'The <b>bold</b> tags will stay'
Attributes in preserved tags are not changed by strip_tags( )
. Because attributes such as style
and
onmouseover
can affect the look and behavior of
web pages, preserving some tags with strip_tags( )
won’t necessarily remove the potential for abuse.
If you have the HTML
for a web page in a string, the get_meta_tags( )
function returns an array of the meta tags in that page. The name of
the meta tag (keywords
, author
,
description
, etc.) becomes the key in the array,
and the content of the meta tag becomes the corresponding value:
$meta_tags = get_meta_tags('http://www.example.com/');
echo "Web page made by {$meta_tags[author]}";
Web page made by John Doe
The general form of the function is:
$array = get_meta_tags(filename
[,use_include_path
]);
Pass a true
value for
use_include_path
to let PHP attempt to
open the file using the standard include path.
PHP provides functions to convert
to and from URL encoding, which allows you to build and decode URLs.
There are actually two types of URL encoding, which differ in how
they treat spaces. The first (specified by RFC 1738) treats a space
as just another illegal character in a URL and encodes it as
%20
. The second (implementing the
application/x-www-form-urlencoded
system) encodes
a space as a +
and is used in building query
strings.
Note that you don’t want to use these functions on a
complete URL, like http://www.example.com/hello
,
as they will escape the colons and slashes to produce
http%3A%2F%2Fwww.example.com%2Fhello
. Only encode
partial URLs (the bit after
http://www.example.com/
), and add the protocol and
domain name later.
To encode a string according
to the URL conventions, use rawurlencode( )
:
$output = rawurlencode(input
);
This function takes a string and returns a copy with illegal URL
characters encoded in the %dd
convention.
If you are dynamically generating hypertext references for links in a
page, you need to convert them with rawurlencode( )
:
$name = "Programming PHP";
$output = rawurlencode($name);
echo "http://localhost/$output";
http://localhost/Programming%20PHP
The rawurldecode( )
function decodes
URL-encoded strings:
$encoded = 'Programming%20PHP';
echo rawurldecode($encoded);
Programming PHP
The urlencode( )
and urldecode( )
functions differ from
their raw counterparts only in that they encode spaces as plus signs
(+
) instead of as the sequence %20
. This is the format for building
query strings and cookie values, but because these values are
automatically decoded when they are passed through a form or cookie,
you don’t need to use these functions to process the
current page’s query string or cookies. The
functions are useful for generating query strings:
$base_url = 'http://www.google.com/q=';
$query = 'PHP sessions -cookies';
$url = $base_url . urlencode($query);
echo $url;
http://www.google.com/q=PHP+sessions+-cookies
Most database systems
require that string literals in your SQL queries be escaped.
SQL’s
encoding
scheme is pretty simple—
single quotes, double quotes,
NUL-bytes, and backslashes need to be preceded by a backslash. The
addslashes( )
function adds these slashes, and the
stripslashes( )
function removes them:
$string = <<< The_End "It's never going to work," she cried, as she hit the backslash (\\) key. The_End; echo addslashes($string); \"It\'s never going to work,\" she cried, as she hit the backslash (\\) key. echo stripslashes($string); "It's never going to work," she cried, as she hit the backslash (\) key.
Some databases escape single quotes with another single quote instead
of a backslash. For those databases, enable magic_quotes_sybase
in
your php.ini
file.
The addcslashes( )
function escapes
arbitrary characters by placing backslashes before them. With the
exception of the characters in Table 4-4, characters with ASCII values less than 32 or above 126 are encoded with their octal values
(e.g., "\002"
). The addcslashes( )
and stripcslashes( )
functions are
used with nonstandard database systems that have their own ideas of
which characters need to be escaped.
Call addcslashes( )
with two arguments—the
string to encode and the characters to escape:
$escaped = addcslashes(string
,charset
);
Specify a range of characters to escape with the
".."
construct:
echo addcslashes("hello\tworld\n", "\x00..\x1fz..\xff");
hello\tworld\n
Beware of specifying '0'
, 'a'
, 'b'
, 'f'
, 'n'
, 'r'
, 't'
, or 'v'
in the
character set, as they will be turned into '\0'
, '\a'
, etc. These
escapes are recognized by C and PHP and may cause confusion.
stripcslashes( )
takes a string and returns a copy with the escapes
expanded:
$string = stripcslashes(escaped
);
For example:
$string = stripcslashes('hello\tworld\n'); // $string is "hello\tworld\n"
Get Programming PHP now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.