Chapter 4. Converting Flat Files to XML

Relatively little of the world's data is currently stored in XML. Much of it is stored in flat files as tab-delimited text, comma-separated values, or some similar format. More is locked up in databases of one kind or another, whether relational, hierarchical, or object based. Even more is hidden inside unstructured documents, including Microsoft Word files, HTML documents, and plain text. XML tools are not suitable for working with any of this.

There are no magic bullets that will convert all of your data to semantically tagged XML. There are a few specialized programs that convert certain formats such as Word documents to particular XML applications such as XHTML. However, the output from even the best ...

Get Processing XML with Java™: A Guide to SAX, DOM, JDOM, JAXP, and TrAX now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.