Tidy

The Tidy extension “cleans up” messy HTML and XML files into valid and pretty-looking documents. This feature is particularly useful when you’re serving lots of externally generated content.

For example, you want to allow visitors to enter HTML-enabled messages, but you don’t want them to be able to create an invalid page. Manually checking each post is quite laborious, but with Tidy you can automate this process.

Alternatively, Tidy can be used to reformat documents, either to reduce their file size or to make them easily understandable by humans. The first option saves you bandwidth, making your pages arrive more quickly and reducing your overall hosting costs. The second option simplifies your debugging process, as you’re not tracking down stray closing tags.

The Tidy extension is bundled with PHP, but not enabled, because it requires you to install the Tidy library. Download the Tidy library from http://tidy.sourceforge.net/ and add --with-tidy=DIR to turn on Tidy support in PHP.

Basics

Interacting with Tidy is a simple three step process. You parse the file, then clean its contents, and finally print or save the repaired file.

Use tidy_parse_file( ) to read in a file for tidying:

$tidy = tidy_parse_file('index.html');

When your data is in a string, use tidy_parse_string( ) instead:

// This string is missing a closing </i> tag
$tidy = tidy_parse_string('I am <b>bold and I am <i>bold and italic</b>');

Transform the document using the tidy_clean_repair( ) command:

$tidy = ...

Get Upgrading to PHP 5 now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.