11.11. Removing HTML and PHP Tags
Problem
You want to remove HTML and PHP tags from a string or file.
Solution
Use strip_tags( )
to remove HTML and PHP tags from a string:
$html = '<a href="http://www.oreilly.com">I <b>love computer books.</b></a>';
print strip_tags($html);
I love computer books.Use fgetss( )
to remove them from a file as you read in
lines:
$fh = fopen('test.html','r') or die($php_errormsg);
while ($s = fgetss($fh,1024)) {
print $s;
}
fclose($fh) or die($php_errormsg);Discussion
While fgetss( ) is convenient if you need to strip
tags from a file as you read it in, it may get confused if tags span
lines or if they span the buffer that fgetss( )
reads from the file. At the price of increased memory usage, reading
the entire file into a string provides better results:
$no_tags = strip_tags(join('',file('test.html')));Both strip_tags( ) and fgetss( ) can be told not to remove certain tags by specifying
those tags as a last argument. The tag specification is
case-insensitive, and for pairs of tags, you only have to specify the
opening tag. For example, this removes all but
<b></b> tags from
$html:
$html = '<a href="http://www.oreilly.com">I <b>love</b> computer books.</a>';
print strip_tags($html,'<b>');
I <b>love</b> computer books.See Also
Documentation on strip_tags( ) at
http://www.php.net/strip-tags and
fgetss( ) at
http://www.php.net/fgetss.