Different XML Faces of Office

Microsoft Office has always bundled a set of tools specialized for working with information of particular kinds. The new XML functionality continues that tradition, with each application in the bundle using XML in ways that fit its particular task. Microsoft has also added a new application, InfoPath, to the Enterprise Edition of Microsoft Office, filling a common business need for flexible forms-based interfaces to structured information.

Word: Editing Documents

Word began as a program that let people express their thoughts on paper, and most users tend to think of it as a conveniently editable typewriter. Although Word has added more features over time, like mail merge capabilities and web page editing, it is still squarely focused on documents. While it’s possible to use Word as a calculator or a database, its primary strength has always been the creation of documents.

Microsoft has taken Word’s traditional document-orientation and extended it into the world of document-oriented XML. Word already deals with structured documents through features like styles, footnotes, forms, and comments, and is quite capable of supporting complex layers of variable structure. When asked what they want in an XML document editor, many people cite their experience using Word—and Microsoft has pretty much given that to them.

Word embraces XML on two levels. Without much effort, users can save any Word document as XML, using a vocabulary that reflects Word’s native understanding of the document. Styles, formatting, comments, revision marks, metadata, and everything else that normally goes into a .doc file are preserved. Better still, all this information (except for embedded objects, stored as Base64-encoded strings) is readily accessible, and developers can use any XML tools or even a text editor to explore and process it. Word can open these files as if they were .doc files as well, making it possible for other applications to create XML documents explicitly for consumption by Microsoft Word.

Word takes these features to the next level by allowing developers to create their own XML vocabularies and edit those documents using Word, as shown in Figure 1-1. This takes more effort as well as an understanding of XML, XSLT, and XSD, but that understanding is only necessary to create the templates, not to use them. Once the templates are created, users can simply edit XML within the ordinary confines of Word. They can even tell Word to show them the same information with a different set of presentation choices, making it easy to reuse information or edit documents in a form convenient for editing, while presenting it more formally later.

Editing an XML document in Microsoft Word 2003

Figure 1-1. Editing an XML document in Microsoft Word 2003

Although Word is a newcomer to XML, Microsoft has driven XML foundations deep into the program. Simply exposing Word information as XML is a sizable step, but Word has aimed higher with its approach to letting users edit the XML of their choice in Word rather than the XML of Microsoft’s choice. This should make it much easier to use Word as an interface to a much wider variety of XML-based systems, from Web Services to content management and workflow.

Excel: Analyzing Information

The spreadsheet was a wild new concept when VisiCalc first appeared back in 1981, and spreadsheets are still a fascinating hybrid of data storage and data processing. Excel has grown over the years from a basic calculating tool to a powerful set of features for analyzing and presenting largely numerical data. While many Excel spreadsheets quietly process data on their creators’ computers, others have evolved into programs by themselves, providing an interface to problem-solving tools that people beyond their creators can use.

Excel has had its own XML format since Excel XP. While this format doesn’t include quite everything—Visual Basic for Applications code isn’t included, and charts aren’t either—this format includes enough information that it’s possible for application to mine Excel spreadsheets and extract their information. A common complaint about spreadsheets (especially among database purists) is that information goes in but doesn’t come out. Microsoft’s XML Spreadsheet format is relatively easy to interpret and provides a foundation for exchanging information between Excel and other applications.

Excel 2003 goes beyond having an XML format. While it’s certainly possible for other applications to create XML Spreadsheet files containing their information, it’s generally more convenient to be able to open whatever XML files are already available (even without a schema) and analyze them within Excel, as shown in Figure 1-2. This makes it possible to create a spreadsheet that can analyze any given XML document—say, monthly sales data—and keep using that same spreadsheet on new data when it appears.

Working with XML data mapped into Microsoft Excel 2003.

Figure 1-2. Working with XML data mapped into Microsoft Excel 2003.

The mapping features included in Excel make it much easier to create reusable spreadsheets, and simplify the task of creating Excel-based applications for analyzing data. They also make it much easier to separate the raw data from the Excel spreadsheet, letting the spreadsheet stay up to date even when the data it first analyzed isn’t. To some extent this is like connecting Excel to a database, but it’s a good deal more flexible. If your document structures are simple enough, you can also use Excel as a simple XML editor.

Access: Sharing Data

Access remains a relational database for the desktop, providing convenient local storage of structured information as well as an interface for information on both local and remote databases. Of all the products in the Office suite, Access is the strictest in demanding that information conform to predefined rules, using those structures as a foundation for all the other work it performs.

Like Excel, Access has had some XML support in earlier versions, supporting an XML vocabulary for importing and exporting information. Access 2003 substantially upgrades that XML support, however. New features include support for XML data that is stored across multiple tables, integrated XSLT transformations when importing or exporting information, and greater standards-compliance for both XSLT and XSD. You can see Access’ XML export functionality in Figure 1-3. These features are also now more accessible from applications built using Access.

Exporting XML in Microsoft Access 2003

Figure 1-3. Exporting XML in Microsoft Access 2003

Because Access is built on a relational database foundation, it doesn’t really make sense to drive XML into its core. It’s possible to recreate tables in XML, but that loses the random access and indexing features that make relational databases so good at quickly processing structured information. Storing XML documents inside of relational databases is also possible, but again, the costs are high. Communicating with the outside world using XML seems to provide the best balance between connecting Access to other programs and letting Access do what it does best.

InfoPath: Editing Structured Information

InfoPath is a new addition to Microsoft Office, and only comes in the Enterprise Edition of Office, though it is also available for purchase as a standalone product. Unlike the other Office applications, which are largely self-sufficient, InfoPath is designed to connect users to other services and other users, and was built for the explicit purpose of working with XML. InfoPath provides both an environment for creating forms-based interfaces to structured information (stored in XML, naturally) and a framework for connecting that information to web, web service, and email applications. InfoPath can serve as a frontend to Microsoft’s SharePoint Server, but it can also connect to other applications that can process XML.

InfoPath fills a gap between the document-oriented vision of Word and the data-oriented approaches of Excel and Access. A lot of information is too loosely structured to fit easily in a spreadsheet grid or a database table, but not nearly as open-ended as Word makes possible. At the same time, InfoPath provides a more capable set of tools than traditional browser-based HTML forms have provided, and has tied that information more tightly to workflow processes.

InfoPath builds on the same core of XML specifications as the other members of the Office suite: XML, XSLT, and XSD. InfoPath provides a set of tools for creating forms based on the possibilities defined in an XSD schema, letting you drag and drop components and customize them to meet your form-creation needs. An example of form-creation is shown in Figure 1-4. The same information can be presented in multiple views, making it possible, for example, for a customer to fill out a form with the information they know, and have other steps in the process add more information. There’s no need for retyping or for mysterious “Office Use Only” sections on forms in this model.

Designing a form in Microsoft InfoPath

Figure 1-4. Designing a form in Microsoft InfoPath

InfoPath also takes advantage of XML to add some features that reflect how people typically work. Forms that collect a lot of information can take a while to fill out, and people frequently start and stop to rest, collect information, or switch to other tasks completely. Because InfoPath stores its information as XML, it’s easy to stop the process, save the results, and come back to them later. This also makes it possible, for instance, to send a partially filled-out form to someone else and ask for help. Even if that other person doesn’t have InfoPath, they may be able to open the file or apply an XSLT transformation to view the information inside of it.

Other Members of the Office Family

While the XML features in Word, Excel, Access, and InfoPath are especially interesting (and receive the bulk of coverage in this book), most of the other members of Microsoft’s Office family of products have an XML story of some sort.

Two members of the Microsoft Office family, PowerPoint and Outlook, are notable for not having an XML story. PowerPoint’s developers have continued work on its HTML features, but XML support has been left for later versions. Some developers use their own XML and XSLT to create HTML presentations, but this isn’t exactly common practice. Outlook is in a similar position, with new features but none of them XML-related. Future editions of this book may get to explore PowerPoint and Outlook XML, but for now there is no such thing.

Microsoft FrontPage, traditionally a GUI editor for web pages, is growing into a slightly more general tool for creating XSLT stylesheets that can then be easily used to create templates. The XSLT tools in FrontPage remain oriented toward web development and not to general XSLT work, but they may prove very useful for developers who want to create XML documents in Word and present them differently on the Web without users having to lift a finger.

Microsoft Visio has had its own XML format since Visio 2002, but the latest release adds support for Scalable Vector Graphics (SVG), a W3C standard for describing graphics in XML. Visio can import SVG documents and work with them much like regular Visio documents, adding its own markup where it needs to go beyond the capabilities of SVG but preserving the original SVG. Developers who need to exchange diagrams or put them on the Web for readers who don’t themselves have Visio should find these features very useful.

Tip

For an example of working with Visio’s XML format, see Recipe 11.1 of Sal Mangano’s XSLT Cookbook (O’Reilly). For more on SVG generally, see J. David Eisenberg’s SVG Essentials (O’Reilly).

Get Office 2003 XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.