Opening Office to the World

While the .doc and .xls file formats have served as de facto standard file formats for years, and developers have created a variety of tools for getting information into and out of these formats, writing code that could produce or consume them has never been much fun. Technologies like mail merges and ODBC connections have made it possible to connect the Office applications to other tools, but this is the first time that Microsoft has taken large steps to make Office data accessible through means other than the Office products themselves, and simultaneously has made the applications much more agnostic about where their information comes from.

By freeing users from their applications’ traditional perspectives on information sources, Microsoft has created a whole new range of possibilities for using its applications as interactive browsers. Users who have been frustrated by the limited interaction capabilities of web browsers can now access their data, and edit it, in familiar applications supporting many different styles of information manipulation. For the most part, the applications continue to prefer working with local documents and can read documents from the Web, but they have taken a big step toward integration with Web- and XML-based infrastructure.

While the details of each application make a big difference in how the integration works, details which will be covered in later chapters, it’s worth examining some potential use cases for the new technology before proceeding into those details.

Generating Word and Excel Documents from Databases

While much of the information that is currently managed by Microsoft Office users is created in Office and manipulated primarily through Office, there is plenty of other information out there. There are also a lot of reasons why organizations may want to keep even their document-like information in more conveniently managed and reused database management systems. While Office has long had pieces for connecting to these systems to extract information, dumping a relational database table into a Word or Excel file has required non-trivial programming. The new XML capabilities open up new possibilities for this kind of work.

The key to this project lies in Microsoft’s creation of application-specific XML formats for Word and Excel. Word’s WordprocessingML and Excel’s SpreadsheetML are formats that these applications can open and interact with just as if they were .doc or .xls files. (Some restrictions apply, especially for Excel, but enough is available to make this technique useful.)

Developers can create XML documents from databases much the same way that they have created HTML documents from databases for the past decade. Technologies like ASP, PHP, CGI, and all of their siblings are still up to the task. Alternatively, if a database can provide an XML representation of information in response to a query, the server could use XSLT to transform that representation, as shown in Figure 1-5. To create documents for Word, the developer would generate WordprocessingML, while creating documents for Excel would involve generating SpreadsheetML.

Using XSLT to generate WordprocessingML or SpreadsheetML from a custom XML vocabulary

Figure 1-5. Using XSLT to generate WordprocessingML or SpreadsheetML from a custom XML vocabulary

Users of Office 2003 can then open these documents directly, as if they were ordinary Word or Excel files. This works even if the documents are stored on the Web, thanks to Word and Excel’s long-time support for opening Web documents. If they need to exchange the information with people using older versions of Office, they can just use Save As . . . and the .doc or .xls format for backward-compatibility. Nothing is lost in the transition from XML to the traditional binary formats.

Separating Content from Presentation in Word

Most users treat Word as a tool for creating content that looks the way they want it to look. The gold standard for Word results has generally been the document’s appearance on a piece of paper, not the elegance of how that appearance was achieved. While the focus on presentation works well for a lot of applications, it breaks down when developers are trying to use Word’s familiar interface to create information that needs to be reused in other ways.

This book, for instance, was written in Word and the .doc files converted to FrameMaker using custom tools—tools that only focus on a subset of Word’s capabilities, its styles. Users who take advantage of Word’s other style features create problems for this converter, and the usual result is that some of the author’s intentions are lost in translation.

Word’s support for custom XML schemas offers a huge first step toward resolving this problem. Developers can create templates that emphasize structured content rather than presentation, while still using an interface that looks familiar. These templates can even offer users a choice of how to present the content, letting them work on the structures using a view that makes them comfortable. For small projects, this can be a quick and effective way to build forms. For larger, more complicated projects, a more sophisticated set of programming skills is necessary to make this work.

Separating Content from Analysis in Excel

Spreadsheets are wonderful tools for analyzing information. Within the basic confines of the grid system, developers can store both data and tools for processing that data. This paradigm has worked well for twenty years, but it also comes with some costs. Incredible amounts of information are stored in spreadsheets, much of it only in those spreadsheets. Users often use old spreadsheets as the foundations for new ones, often cutting and pasting data in from other sources.

With Excel 2003, it’s pretty easy to create a spreadsheet that includes a list area (or areas) designed to hold information retrieved from XML documents, as shown in Figure 1-6. Once the list is defined, the spreadsheet can add information to the area or replace the content with new data. The list can be extended easily to include formulas as well, if desired, and the rest of the spreadsheet can reference the list.

Using lists representing XML maps to create reusable Excel spreadsheets that can be applied to different XML data sets

Figure 1-6. Using lists representing XML maps to create reusable Excel spreadsheets that can be applied to different XML data sets

Thanks to these lists, users can keep a standard spreadsheet that they use to analyze information that appears on a regular basis. When new data arrives in XML format—say, a quarterly sales report—those users can just tell Excel to import the new data, and their spreadsheets will reflect the new data. The spreadsheets become small applications themselves, complete with their own XML-based data formats. It’s hard to imagine an easier way to write programs that analyze business data.

Creating and Editing XML in Excel

Excel’s forte is analysis, but it also provides an easily understood user interface for working with simple structured data. If you need to work with data that fits easily on a grid, Excel offers a convenient tool for working with that data. You can create a list, type data directly into it, and save it as XML without ever seeing a tag. The XML itself can have a slightly more complex structure than the simple grid, though that structure isn’t presented to the person working in Excel at all. It’s easy to use Excel as a quick interface for creating or editing simple XML documents.

Annotating Word Documents with Additional Information

While Microsoft has described Word’s support for custom XML as a feature that makes Word into an XML editor of sorts, the custom markup support has a side effect that gives Word new functionality, whether or not users ever save their files as XML. For many documents, presentation style is a good analog for structure, but there are times when you need to be able to annotate documents in a finer-grained or more complex way than Word’s existing styles and comments interfaces provide.

By associating an XML Schema with a Word document, developers can create templates that look like ordinary Word documents but have a hidden layer of additional information, which only surfaces when the document is saved as XML or viewed with XML tags visible. It’s more typical for documents to have a single structure, made visible through the traditional WYSIWYG interface, but if you need the document to have two sets of structure, this is definitely an option.

Exchanging Information Between Access and the World

Microsoft Access has traditionally been a desktop application, sharing information among a small group of people. Access now supports XML import and export to and from its tables, meaning that it’s rapidly becoming easier to use an Access database as a local host for information that may well come from or go to other systems.

Instead of treating Access databases as islands (or Access as a mere interface to more sophisticated database systems) this new openness makes it easier to treat Access databases as the outer nodes in a hub-spoke system, as shown in Figure 1-7. By picking up information from XML documents and storing it in the database, Access can act as a convenient local container that provides a lot of analytical and interface tools. Access might make an excellent temporary store for users analyzing complex data on disconnected laptops, or as a point of contact for users in remote offices who periodically send and receive updated information. Access can also function as an intermediary between XML and more complex, possibly legacy database systems that don’t necessarily support XML but do support import and export to and from Access.

A hub-spoke system of Access databases connected with XML

Figure 1-7. A hub-spoke system of Access databases connected with XML

Interacting with Web Services Using InfoPath

Web Services have remained stubbornly buried behind layers of code. Although it’s always been possible to write user interfaces for them, it required a lot of interface-building programming. Expanding Web Services (and XML) communications to include people as well as computers hasn’t been particularly easy. InfoPath takes direct aim at this project, drastically simplifying the task of designing and deploying interfaces to these services.

Microsoft has made a lot of complete functionality for managing projects using these tools available through the combination of SharePoint Server and InfoPath, but InfoPath can provide a friendly frontend interface to whatever services you’d like. If you need to collect information from users, give them a testing interface to explore a Web Service, or present information to users that they can use or change, InfoPath offers easy access to a wide variety of information types.

Interacting with Web Services Using Excel, Access, or Word

Excel can also be used as an interface to SOAP-based Web Services, not just XML. Doing so requires installing a toolkit and writing some Visual Basic for Applications code, but once you’ve done that, your Excel spreadsheet can serve as an interface to whatever Web Service you choose. Excel XP supported similar functionality, so this isn’t an major change, but it’s an important ingredient of the overall Office story.

The same toolkit used to integrate Web Services with Excel can be used with Access and Word. An Access database might use an external web service to support complex calculations or as a source of regularly changing data, while Word users might find Web Services a convenient source of information for documents that need autocompletion of regularly changing or even calculated boilerplate text.

Get Office 2003 XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.