HTML Tidy

Dave Raggett's HTML Tidy (http://tidy.sourceforge.net) is a wonderful open source tool for cleaning up HTML pages, including converting them to XHTML. Use it. HTML Tidy is a command line tool written in reasonably portable ANSI C that runs on most major platforms. Binaries are available for most platforms. To run it, just put the binary somewhere in your path, and use the --output-xhtml option to indicate you want XHTML output (instead of HTML). For example, the code below converts the file shows.html to XHTML.

C:/>tidy --output-xhtml shows.html

This dumps the converted document onto stdout, from where it can be redirected into a file in the usual way. If you prefer to convert the file in place, use the -m option.

 C:/>tidy --output-xhtml ...

Get Effective XML: 50 Specific Ways to Improve Your XML now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.