O'Reilly logo

XML Hacks by Michael Fitzgerald

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Convert Microsoft Office Files, Old or New, to XML

Use OpenOffice as a tool to convert Microsoft Office files to XML.

OpenOffice (http://www.openoffice.org/), the free, open source, multiplatform office application suite that provides an alternative to Microsoft Office, uses a documented XML format as its native file format. Put this together with OpenOffice 1.1’s ability to read Word, Excel, and PowerPoint files from Office 97, 2000, and XP, plus Word 6.0 files, Word 95 files, and Excel 4.0, 5.0, and 95 files, and you’ve got a simple way to convert these files to XML.

When you store a document in OpenOffice’s own file format [Hack #65] , you’ll create a ZIP file with the extension .sxw if you saved it with the OpenOffice Writer word processing program, .sxc if you saved it with the OpenOffice Calc spreadsheet program, or .sxi if you used the OpenOffice Impress slideshow program. The six files that you’ll find in these ZIP files have self-explanatory names: mimetype, content.xml, styles.xml, meta.xml, settings.xml, and manifest.xml.

Unless you’re strongly interested in the inner workings of OpenOffice, the file content.xml should hold the most interest. Along with file content, it stores information about the use of built-in styles, styles you defined yourself, and even on-the-fly styling information not tied to defined styles, such as bolding of text with Ctrl-B. For word-processing files, the XML also identifies bulleted and numbered lists and footnotes. XML versions of spreadsheets ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required