HTML Tidy

Dave Raggett's HTML Tidy (http://tidy.sourceforge.net) is a wonderful open source tool for cleaning up HTML pages, including converting them to XHTML. Use it. HTML Tidy is a command line tool written in reasonably portable ANSI C that runs on most major platforms. Binaries are available for most platforms. To run it, just put the binary somewhere in your path, and use the --output-xhtml option to indicate you want XHTML output (instead of HTML). For example, the code below converts the file shows.html to XHTML.

C:/>tidy --output-xhtml shows.html

This dumps the converted document onto stdout, from where it can be redirected into a file in the usual way. If you prefer to convert the file in place, use the -m option.

 C:/>tidy --output-xhtml ...

Get Effective XML: 50 Specific Ways to Improve Your XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.