Chapter 9. The Building Blocks of Services

Throughout this book I’ve said that web services are based on three fundamental technologies: HTTP, URIs, and XML. But there are also lots of technologies that build on top of these. You can usually save yourself some work and broaden your audience by adopting these extra technologies: perhaps a domain-specific XML vocabulary, or a standard set of rules for exposing resources through HTTP’s uniform interface. In this chapter I’ll show you several technologies that can improve your web services. Some you’re already familiar with and some will probably be new to you, but they’re all interesting and powerful.

Representation Formats

What representation formats should your service actually send and receive? This is the question of how data should be represented, and it’s an epic question. I have a few suggestions, which I present here in a rough order of precedence. My goal is to help you pick a format that says something about the semantics of your data, so you don’t find yourself devising yet another one-off XML vocabulary that no one else will use.

I assume your clients can accept whatever representation format you serve. The known needs of your clients take priority over anything I can say here. If you know your data is being fed directly into Microsoft Excel, you ought to serve representations in Excel format or a compatible CSV format. My advice also does not extend to document formats that can only be understood by humans. If you’re serving audio files, I’ve got nothing to say about which audio format you should choose. To a first approximation, a programmed client finds all audio files equally unintelligible.

XHTML

Media type: application/xhtml+xml

The common text/html media type is deprecated for XHTML. It’s also the only media type that Internet Explorer handles as HTML. If your service might be serving XHTML data directly to web browsers, you might want to serve it as text/html.

My number-one representation recommendation is the format I’ve been using in my own services throughout this book, and the one you’re probably most familiar with. HTML drives the human web, and XHTML can drive the programmable web. The XHTML standard (http://www.w3.org/TR/xhtml1/) relies on the HTML standard to do most of the heavy lifting (http://www.w3.org/TR/html401/).

XHTML is HTML under a few restrictions that make every XHTML document also valid XML. If you know HTML, you know most of what there is to know about XHTML, but there are some syntactic differences, like how to present self-closing tags. The tag names and attributes are the same: XHTML is expressive in the same ways as HTML. Since the XHTML standard just points to the HTML standard and then adds some restrictions to it, I tend to refer to “HTML tags” and the like except where there really is a difference between XHTML and HTML.

I don’t actually recommend HTML as a representation format, because it can’t be reliably parsed with an XML parser. There are many excellent and liberal HTML parsers, though (I mentioned a few in Chapter 2), so your clients have options if you can’t or don’t want to serve XHTML. Right now, XHTML is a better choice if you expect a wide variety of clients to handle your data.

HTML can represent many common types of data: nested lists (tags like ul and li), key-value pairs (the dl tag and its children), and tabular data (the table tag and its children). It supports many different kinds of hypermedia. HTML does have its shortcomings: its hypermedia forms are limited, and won’t fully support HTTP’s uniform interface until HTML 5 is released.

HTML is also poor in semantic content. Its tag vocabulary is very computer-centric. It has special tags for representing computer code and output, but nothing for the other structured fruits of human endeavor, like poetry. One resource can link to another resource, and there are standard HTML attributes (rel and rev) for expressing the relationship between the linker and the linkee. But the HTML standard defines only 15 possible relationships between resources, including “alternate,” “stylesheet,” “next,” “prev,” and “glossary.” See http://www.w3.org/TR/html401/types.html#type-links for a complete list.

Since HTML pages are representations of resources, and resources can be anything, these 15 relationships barely scratch the surface. HTML might be called upon to represent the relationship between any two things. Of course, I can come up with my own values for rel and rev to supplement the official 15, but if everyone does that confusion will reign: we’ll all pick different values to represent the same relationships. If I link my web page to my wife’s web page, should I specify my relationship to her as husband, spouse, or sweetheart? To a human it doesn’t matter much, but to a computer program (the real client on the programmable web) it matters a lot. Similarly, HTML can easily represent a list, and there’s a standard HTML attribute (class) for expressing what kind of list it is. But HTML doesn’t say what kinds of lists there are.

This isn’t HTML’s fault, of course. HTML is supposed to be used by people who work in any field. But once you’ve chosen a field, everyone who works in that field should be able to agree on what kinds of lists there are, or what kinds of relationships can exist between resources. This is why people have started getting together and adding standard semantics to XHTML with microformats.

XHTML with Microformats

Media type: application/xhtml+xml

Microformats are lightweight standards that extend XHTML to give domain-specific semantics to HTML tags. Instead of reinventing data storage techniques like lists, microformats use existing HTML tags like ol, span, and abbr. The semantic content usually lives in custom values for the attributes of the tags, such as class, rel, and rev. Example 9-1 shows an example: someone’s home telephone number represented in the microformat known as hCard.

Example 9-1. A telephone number represented in the hCard microformat
<span class="tel">
 <span class="type">home</span>:
 <span class="value">+1.415.555.1212</span>
</span>

Microformat adoption is growing, especially as more special-purpose devices get on the web. Any microformat document can be embedded in an XHTML page, because it is XHTML. A web service can serve an XHTML representation that contains microformat documents, along with links to other resources and forms for creating new ones. This document can be automatically parsed for its microformat data, or rendered for human consumption with a standard web browser.

As of the time of writing there were nine microformat specifications. The best-known is probably rel-nofollow, a standard value for the rel attribute invented by engineers at Google as a way of fighting comment spam on weblogs. Here’s a complete list of official microformats:

hCalendar

A way of representing events on a calendar or planner. Based on the IETF iCalendar format.

hCard

A way of representing contact information for people and organizations. Based on the vCard standard defined in RFC 2426.

rel-license

A new value for the rel attribute, used when linking to the license terms for a XHTML document. For example:

<a href="http://creativecommons.org/licenses/by-nd/" rel="license">
 Made avaliable under a Creative Commons Attribution-NoDerivs license.
</a>

That’s standard XHTML. The only thing the microformat does is define a meaning for the string license when it shows up in the rel attribute.

rel-nofollow

A new value for the rel attribute, used when linking to URIs without necessarily endorsing them.

rel-tag

A new value for the rel attribute, used to label a web page according to some external classification system.

VoteLinks

A new value for the rev attribute, an extension of the idea behind rel-nofollow. VoteLinks lets you say how you feel about the resource you’re linking to by casting a “vote.” For instance:

<a rev="vote-for" href="http://www.example.com">The best webpage ever.</a>
<a rev="vote-against" href="http://example.com/">
A shameless ripoff of www.example.com</a>
XFN

Stands for XHTML Friends Network. A new set of values for the rel attribute, for capturing the relationships between people. An XFN value for the rel attribute captures the relationship between this “person” resource and another such resource. To bring back the “Alice” and “Bob” resources from Relationships Between Resources” in Chapter 8, an XHTML representation of Alice might include this link:

<a rel="spouse" href="Bob">Bob</a>
XMDP

Stands for XHTML Meta Data Profiles. A way of describing your custom values for XHTML attributes, using the XHTML tags for definition lists: DL, DD, and DT. This is a kind of meta-microformat: a microformat like rel-tag could itself be described with an XMDP document.

XOXO

Stands (sort of) for Extensible Open XHTML Outlines. Uses XHTML’s list tags to represent outlines. There’s nothing in XOXO that’s not already in the XHTML standard, but declaring a document (or a list in a document) to be XOXO signals that a list is an outline, not just a random list.

Those are the official microformat standards; they should give you an idea of what microformats are for. As of the time of writing there were also about 10 microformat drafts and more than 50 discussions about possible new microformats. Here are some of the more interesting drafts:

geo

A way of marking up latitude and longitude on Earth. This would be useful in the mapping application I designed in Chapter 5. I didn’t use it there because there’s still a debate about how to represent latitude and longitude on other planetary bodies: extend geo or define different microformats for each body?

hAtom

A way of representing in XHTML the data Atom represents in XML.

hResume

A way of representing resumés.

hReview

A way of representing reviews, such as product reviews or restaurant reviews.

xFolk

A way of representing bookmarks. This would make an excellent representation format for the social bookmarking application in Chapter 7. I chose to use Atom instead because it was less code to show you.

You get the idea. The power of microformats is that they’re based on HTML, the most widely-deployed markup format in existence. Because they’re HTML, they can be embedded in web pages. Because they’re also XML, they can be embedded in XML documents. They can be understood at various levels by human beings, specialized microformat processors, dumb HTML processors, and even dumber XML processors.

Even if the microformats wiki shows no microformat standard or draft for your problem space, you might find an open discussion on the topic that helps you clarify your data structures. You can also create your own microformat (see Ad Hoc XHTML” later in this chapter).

Atom

Media type: application/atom+xml

Atom is an XML vocabulary for describing lists of timestamped entries. The entries can be anything, but they usually contain pieces of human-authored text like you’d see on a weblog or a news site. Why should you use an Atom list instead of a regular XHTML list? Because Atom provides special tags for conveying the semantics of publishing: authors, contributors, languages, copyright information, titles, categories, and so on. (Of course, as I mentioned earlier, there’s a microformat called hAtom that brings all of these semantics into XHTML.) Atom is a useful XML vocabulary because so many web services are, in the broad sense, ways of publishing information. What’s more, there are a lot of web service clients that understand the semantics of Atom documents. If your web service is addressable and your resources expose Atom representations, you’ve immediately got a huge audience.

Atom lists are called feeds, and the items in the lists are called entries.

Tip

Some feeds are written in some version of RSS, a different XML vocabulary with similar semantics. All versions of RSS have the same basic structure as Atom: a feed that contains a number of entries. There are a number of variants of RSS but you shouldn’t have to worry about it at all. Today, every major tool for consuming feeds understands Atom.

These days, most weblogs and news sites expose a special resource whose representation is an Atom feed. The entries in the feed describe and link to other resources: weblog entries or news stories published on the site. You, the client, can consume these resources with a feed reader or some other external program. In Chapter 7, I represented lists of bookmarks as Atom feeds. Example 9-2 shows a simple Atom feed document.

Example 9-2. A simple Atom feed containing one entry
 <?xml version="1.0" encoding="utf-8"?>
   <feed xmlns="http://www.w3.org/2005/Atom">
     <title>RESTful News</title>
     <link rel="alternate" href="http://example.com/RestfulNews" />
     <updated>2007-04-14T20:00:39Z</updated>
     <author><name>Leonard Richardson</name></author>
     <contributor><name>Sam Ruby</name></contributor>
     <id>urn:1c6627a0-8e3f-0129-b1a6-003065546f18</id>

     <entry>
       <title>New Resource Will Respond to PUT, City Says</title>
       <link rel="edit" href="http://example.com/RestfulNews/104" />
       <id>urn:239b2f40-8e3f-0129-b1a6-003065546f18</id>
       <updated>2007-04-14T20:00:39Z</updated>

       <summary> 
        After long negotiations, city officials say the new resource
        being built in the town square will respond to PUT. Earlier
        criticism of the proposal focused on the city's plan to modify
        the resource through overloaded POST.
       </summary>
       <category scheme="http://www.example.com/categories/RestfulNews" 
                 term="local" label="Local news" />
     </entry>
   </feed>

In that example you can see some of the tags that convey the semantics of publishing: author, title, link, summary, updated, and so on. The feed as a whole is a joint project: it has an author tag and a contributor tag. It’s also got a link tag that points to an alternate URI for the underlying “feed” resource: the news site. The single entry has no author tag, so it inherits author information from the feed. The entry does have its own link tag, which points to http://www.example.com/RestfulNews/104. That URI identifies the entry as a resource in its own right. The entry also has a textual summary of the story. To get the remainder, the client must presumably GET the entry’s URI.

An Atom document is basically a directory of published resources. You can use Atom to represent photo galleries, albums of music (maybe a link to the cover art plus one to each track on the album), or lists of search results. Or you can omit the LINK tags and use Atom as a container for original content like status reports or incoming emails. Remember: the two reasons to use Atom are that it represents the semantics of publishing, and that a lot of existing clients can consume it.

If your application almost fits in with the Atom schema, but needs an extra tag or two, there’s no problem. You can embed XML tags from other namespaces in an Atom feed. You can even define a custom namespace and embed its tags in your Atom feeds. This is the Atom equivalent of XHTML microformats: your Atom feeds can use conventions not defined in Atom, without becoming invalid. Clients that don’t understand your tag will see a normal Atom feed with some extra mysterious data in it.

OpenSearch

OpenSearch is one XML vocabulary that’s commonly embedded in Atom documents. It’s designed for representing lists of search results. The idea is that a service returns the results of a query as an Atom feed, with the individual results represented as Atom entries. But some aspects of a list of search results can’t be represented in a stock Atom feed: the total number of results, for instance. So OpenSearch defines three new elements, in the opensearch namespace:[28]

totalResults

The total number of results that matched the query.

itemsPerPage

How many items are returned in a single “page” of search results.

startindex

If all the search results are numbered from zero to totalResults, then the first result in this feed document is entry number startindex. When combined with itemsPerPage you can use this to figure out what “page” of results you’re on.

SVG

Media type: image/svg+xml

Most graphic formats are just ways of laying pixels out on the screen. The underlying content is opaque to a computer: it takes a skilled human to modify a graphic or reuse part of one in another. Scalable Vector Graphics is an XML vocabulary that makes it possible for programs to understand and manipulate graphics. It describes graphics in terms of primitives like shapes, text, colors, and effects.

It would be a waste of time to represent a photograph in SVG, but using it to represent a graph, a diagram, or a set of relationships gives a lot of power to the client. SVG images can be scaled to arbitrary size without losing any detail. SVG diagrams can be edited or rearranged, and bits of them can be seamlessly snipped out and incorporated into other graphics. In short, SVG makes graphic documents work like other sorts of documents. Web browsers are starting to get support for SVG: newer versions of Firefox support it natively.

Form-Encoded Key-Value Pairs

Media type: application/x-www-form-urlencoded

I covered this simple format in Chapter 6. This format is mainly used in representations the client sends to the server. A filled-out HTML form is represented in this format by default, and it’s an easy format for an Ajax application to construct. But a service can also use this format in the representations it sends. If you’re thinking of serving comma-separated values or RFC 822-style key-value pairs, try form-encoded values instead. Form-encoding takes care of the tricky cases, and your clients are more likely to have a library that can decode the document.

JSON

Media type: application/json

JavaScript Object Notation is a serialization format for general data structures. It’s much more lightweight and readable than an equivalent XML document, so I recommend it for most cases when you’re transporting a serialized data structure rather than a hypermedia document.

I introduced JSON in JSON Parsers: Handling Serialized Data” in Chapter 2, and showed a simple JSON document in Example 2-11. Example 9-3 shows a more complex JSON document: a hash of lists.

Example 9-3. A complex data type in JSON format
{"a":["b","c"], "1":[2,3]}

As I show in Chapter 11, JSON has special advantages when it comes to Ajax applications. It’s useful for any kind of application, though. If your data structures are more complex than key-value pairs, or you’re thinking of defining an ad hoc XML format, you might find it easier to define a JSON structure of nested hashes and arrays.

RDF and RDFa

The Resource Description Framework is a way of representing knowledge about resources. Resource here means the same thing as in Resource-Oriented-Architecture: a resource is anything important enough to have a URI. In RDF, though, the URIs might not be http: URIs. Abstract URI schemas like isbn: (for books) and urn: (for just about anything) are common. Example 9-4 is a simple RDF assertion, which claims that the title of this book is RESTful Web Services.

Example 9-4. An RDF assertion
<span about="isbn:9780596529260" property="dc:title">
 RESTful Web Services
</span>

There are three parts to an RDF assertion, or triple, as they’re called. There’s the subject, a resource identifier: in this case, isbn:9780596529260. There’s the predicate, which identifies a property of the resource: in this case, dc:title. Finally there’s the object, which is the value of the property: in this case, “RESTful Web Services.” The assertion as a whole reads: “The book with ISBN 9780596529260 has a title of ‘RESTful Web Services.’”

I didn’t make up the isbn: URI space: it’s a standard way of addressing books as resources. I didn’t make up the dc:title predicate, either. That comes from the Dublin Core Metadata Initiative. DCMI defines a set of useful predicates that apply to published works like books and weblogs. An automated client that understands the Dublin Core can scan RDF documents that use those terms, evaluate the assertions they contain, and even make logical deductions about the data.

Example 9-4 looks a lot like an XHTML snippet, because that’s what it is. There are a couple ways of representing RDF assertions, and I’ve chosen to show you RDFa, a microformat-like standard for embedding RDF in XHTML. RDF/XML is a more popular RDF representation format, but I think it makes RDF look more complicated than it is, and it’s difficult to integrate RDF/XML documents into the web. RDF/A documents can go into XHTML files, just like microformat documents. However, since RDFa takes some ideas from the unreleased XHTML 2 standard, a document that includes it won’t be valid XHTML for a while. A third way of representing RDF assertions is eRDF, which results in valid XHTML.

RDF in its generic form is the basis for the W3C’s Semantic Web project. On the human web, there are no standards for how we talk about the resources we link to. We describe resources in human language that’s difficult or impossible for machines to understand. RDF is a way of constraining human speech so that we talk about resources using a standard vocabulary—not one that machines “understand” natively, but one they can be programmed to understand. A computer program doesn’t understand the Dublin Core’s “dc:title” any more than it understands “title.” But if everyone agrees to use “dc:title,” we can program standard clients to reason about the Dublin Core in consistent ways.

Here’s the thing: I think microformats do a good job of adding semantics to the web we already have, and they add less complexity than RDF’s general subject-predicate-object form. I recommend using RDF only when you want interoperability with existing RDF processors, or are treating RDF as a general-purpose microformat for representing assertions about resources.

One very popular use of RDF is FOAF, a way of representing information about human beings and the relationships between them.

Framework-Specific Serialization Formats

Media type: application/xml

I’m talking here about informal XML vocabularies used by frameworks like Ruby’s ActiveRecord and Python’s Django to serialize database objects as XML. I gave an example back in Example 7-4. It’s a simple data structure: a hash or a list of hashes.

These representation formats are very convenient if you happen to be writing a service that gives you access to one. In Rails, you can just call to_xml on an ActiveRecord object or a list of such objects. The Rails serialization format is also useful if you’re not using Rails, but you want your service to be usable by ActiveResource clients. Otherwise, I don’t really recommend these formats, unless you’re just trying to get something up and running quickly (as I am in Chapters 7 and 12). The major downside of these formats is that they look like documents, but they’re really just serialized data structures. They never contain hypermedia links or forms.

Ad Hoc XHTML

Media type: application/xhtml+xml

If none of the work that’s already been done fits your problem space... well, first, think again. Just as you should think again before deciding you can’t fit your resources into HTTP’s uniform interface. If you think your resources can’t be represented by stock HTML or Atom or RDF or JSON, there’s a good chance you haven’t looked at the problem in the right way.

But it’s quite possible that your resources won’t fit any of the representation formats I’ve mentioned so far. Or maybe you can represent most of your resource state with XHTML plus some well-chosen microformats, but there’s still something missing. The next step is to consider creating your own microformat.

The high-impact way of creating a microformat is to go through the microformat process, hammer it out with other microformat enthusiasts, and get it published as an official microformat. This is most appropriate when lots of people are trying to represent the same kind of data. Ideally, you’re in a situation where the human web is littered with ad hoc HTML representations of the data, and where there are already a couple of big standards that can serve as a model for a more agile microformat. This is how the hCard and hCalendar microformats were developed. There were many people trying to put contact information and upcoming events on the human web, and preexisting standards (vCard and iCalendar) to steal ideas from. The representation of “places on a map” that I devised in Chapter 5 might be a starting point for an official microformat. There are lots of mapping sites on the human web, and lots of heavyweight standards for representing GIS data. If I wanted to build a microformat, I’d have a lot of ideas to work from.

The low-impact way of creating a microformat is to add semantic content to the XHTML you were going to write anyway. This is suitable for representation formats that no one else is likely to use, or as a starting point so you can get a real web service running while you’re going through the microformat process. The representation of the list of planets from Chapter 5 works better as an ad hoc set of semantics than as an official microformat. All it’s doing is saying that one particular list is a list of planets.

The microformat design patterns and naming principles give a set of sensible general rules for adding semantics to HTML. Their advice is useful even if you’re not trying to create an official microformat. The semantics you choose for your “micromicroformat” won’t be standardized, but you can present them in a standard way: the way microformats do it. Here are some of the more useful patterns.

  • If there’s an HTML tag that conveys the semantics you want, use it. To represent a set of key-value pairs, use the dl tag. To represent a list, use one of the list tags. If nothing fits, use the span or div tag.

  • Give a tag additional semantics by specifying its class attribute. This is especially important for span and div, which have no real meaning on their own.

  • Use the rel attribute in a link to specify another resource’s relationship to this one. Use the rev attribute to specify this page’s relationship to another one. If the relationship is symmetric, use rel. See Hypermedia Technologies” later in this chapter for more on this.

  • Consider providing an XMDP file that describes your custom values for class, rel, and rev.

Other XML Standards and Ad Hoc Vocabularies

Media type: application/xml

In addition to XHTML, Atom, and SVG, there are a lot of specialized XML vocabularies I haven’t covered: MathML, OpenDocument, Chemical Markup Language, and so on. There are also specialized vocabularies you can use in RDF assertions, like Dublin Core and FOAF. A web service might serve any of these vocabularies as standalone representations, embed them into Atom feeds, or even wrap them in SOAP envelopes. If none of these work for you, you can define a custom XML vocabulary to represent your resource state, or maybe the parts that Atom doesn’t cover.

Although I’ve presented this as the last resort, that’s certainly not the common view. People come up with custom XML vocabularies all the time: that’s how there got to be so many of them. Almost every real web service mentioned in this book exposes its representations in a custom XML vocabulary. Amazon S3, Yahoo!’s search APs, and the del.icio.us API all serve representations that use custom XML vocabularies, even though they could easily serve Atom or XHTML and reuse an existing vocabulary.

Part of this is tech culture. The microformats idea is fairly new, and a custom XML vocabulary still looks more “official.” But this is an illusion. Unless you provide a schema definition for your vocabulary, your custom tags have exactly the same status as a custom value for the HTML “class” attribute. Even a definition does nothing but codify the vocabulary you made up: it doesn’t confer any legitimacy. Legitimacy can only come “from the consent of the governed”: from other people adopting your vocabulary.

That said, there is a space for custom XML vocabularies. It’s usually easy to use XHTML instead of creating your own XML tags, but it’s not so easy when you need tags with a lot of custom attributes. In that situation, a custom XML vocabulary makes sense. All I ask is that you seriously think about whether you really need to define a new XML vocabulary for a given problem. It’s possible that in the future, people will err in the opposite direction, and create ad hoc microformats when they shouldn’t. Then I’ll urge caution before creating a microformat. But right now, the problem is too many ad hoc XML vocabularies.

Encoding Issues

It’s a global world (I actually heard someone say that once), and any service you expose must deal with the products of people who speak different languages from you and use different writing systems. You don’t have to understand all of these languages, but to handle multilingual data without mangling it, you do need to know something about character encodings: the conventions that let us represent human-readable text as strings of bytes.

Every text file you’ve ever created has some character encoding, even though you probably never made a decision about which encoding to use (it’s usually a system property). In the United States the encoding is usually UTF-8, US-ASCII, or Windows-1252. In western Europe it might also be ISO 8859-1. The default for HTML on the web is ISO 8859-1, which is almost but not quite the same as Windows-1252. Japanese documents are commonly encoded with EUC-JP, Shift_JIS, or UTF-8. If you’re curious about what character encodings are used in different places, most web browsers list the encodings they understand. My web browser supports five different encodings for simplified Chinese, five for Hebrew, nine for the Cyrillic alphabet, and so on. Most of these encodings are mutually incompatible, even when they encode the same language. It’s insane!

Fortunately there is a way out of this confusion. We as a species have come up with Unicode, a way of representing every human writing system. Unicode isn’t a character encoding, but there are two good encodings for it: UTF-8 (more efficient for alphabetic languages like English) and UTF-16 (more efficient for logographic languages like Japanese). Either of these encodings can handle text written in any combination of human languages. The best single decision you can make when handling multilingual data is to keep all of your data in one of these encodings: probably UTF-8 unless you live or do a lot of business in east Asia, then maybe UTF-16 with a byte-order mark.

This might be as simple as making a decision when you start the project, or you may have to convert an existing database. You might have to install an encoding converter to work on incoming data, or write encoding detection code. (The Universal Encoding Detector is an excellent autodetection library for Python.) It’s got a Ruby port, available as the chardet gem. It might be easy or difficult. But once you’re keeping all of this data in one of the Unicode encodings, most of your problems will be over. When your clients send you data in a weird encoding, you’ll be able to convert it to your chosen UTF-* encoding. If they send data that specifies no format at all, you’ll be able to guess its encoding and convert it, or reject it as unintelligible.

The other half of the equation is communicating with your clients: how do you tell them which encoding you’re using in your outgoing representations? Well, XML lets you specify a character encoding on the very first line:

<?xml version="1.0" encoding="UTF-8"?>

All but one of my recommended representation formats is based on XML, so that solves most of the problem. But there is an encoding problem with that one outlier, and there’s a further problem in the relationship between XML and HTTP.

XML and HTTP: Battle of the encodings

An XML document can and should define a character encoding in its first line, so that the client will know how to interpret the document. An HTTP response can and should specify a value for the Content-Type response header, so that the client knows it’s being given an XML document and not some other kind. But the Content-type can also specify a document character encoding with “charset,” and this encoding might conflict with what it actually says in the document.

Content-Type: application/xml; charset="ebcdic-fr-297+euro"

<?xml version="1.0" encoding="UTF-8"?>

Who wins? Surprisingly, HTTP’s character encoding takes precedence over the encoding in the document itself.[29]If the document says “UTF-8” and Content-Type says “ebcdic-fr-297+euro,” then extended French EBCDIC it is. Almost no one expects this kind of surprise, and most programmers write code first and check the RFCs later. The result is that the character encoding, as specified in Content-Type, tends to be unreliable. Some servers claim everything they serve is UTF-8, even though the actual documents say otherwise.

When serving XML documents, I don’t recommend going out of your way to send a character encoding as part of Content-type. You can do it if you’re absolutely sure you’ve got the right encoding, but it won’t do much good. What’s really important is that you specify a document encoding. (Technically you can do without a document encoding if you’re using UTF-8, or UTF-16 with a byte-order mark. But if you have that much control over the data, you should be able to specify a document encoding.) If you’re writing a web service client, be aware that any character encoding specified in Content-Type may be incorrect. Use common sense to decide which encoding declaration to believe, rather than relying on a counterintuitive rule from an RFC a lot of people haven’t read.

Another note: when you serve XML documents, you should serve them with a media type of application/xml, not text/xml. If you serve a document as text/xml with no charset, the correct client behavior is to totally ignore the encoding specified in the XML document and interpret the XML document as US-ASCII.[30]Avoid these complications altogether by always serving XML as application/xml, and always specifying an encoding in the first line of the XML documents you generate.

The character encoding of a JSON document

I didn’t mention plain text in my list of recommended representation formats, mostly because plain text is not a structured format, but also because the lack of structure means there’s no way to specify the character encoding of “plain text.” JSON is a way of structuring plain text, but it doesn’t solve the character encoding problem. Fortunately, you don’t have to solve it yourself: just follow the standard convention.

RFC 4627 states that a JSON file must contain Unicode characters, encoded in one of the UTF-* encodings. Practically, this means either UTF-8, or UTF-16 with a byte-order mark. Plain US-ASCII will also work, since ASCII text happens to be valid UTF-8. Given this restriction, a client can determine the character encoding of a JSON document by looking at the first four bytes (the details are in RFC 4627), and there’s no need to specify an explicit encoding. You should follow this convention whenever you serve plain text, not just JSON.

Prepackaged Control Flows

Not only does HTTP have a uniform interface, it has a standard set of response codes—possible ways a request can turn out. Though resources can be anything at all, they usually fall into a few broad categories: database tables and their rows, publications and the articles they publish, and so on. When you know what sort of resource a service exposes, you can often anticipate the possible responses to an HTTP request without knowing too much about the resource.

In one sense the standard HTTP response codes (see Appendix B) are just a suggested control flow: a set of instructions about what to do when you get certain kinds of requests. But that’s pretty vague advice, and we can do better. Here I present several prepackaged control flows: patterns that bring together advice about resource design, representation formats, and response codes to help you design real-world services.

General Rules

These snippets of control flow can be applied to almost any service. I can make very general statements about them because they have nothing to do with the actual nature of your resources. All I’m doing here is picking out a few important HTTP status codes and telling you when to use them.

You should be able to implement these rules as common code that runs before your normal request handling. In Example 7-11 I implemented most of them as Rails filters that run before certain actions, or as Ruby methods that short-circuit a request unless a certain condition is met.

If the client tries to do something without providing the correct authorization, send a response code of 401 (“Unauthorized”) along with instructions for correctly formatting the Authorization header.

If the client tries to access a URI that doesn’t correspond to any existing resource, send a response code of 404 (“Not Found”). The only possible exception is when the client is trying to PUT a new resource to that URI.

If the client tries to use a part of the uniform interface that a resource doesn’t support, send a response code of 405 (“Method Not Allowed”). This is the proper response when the client tries to DELETE a read-only resource.

Database-Backed Control Flow

In many web services there’s a strong connection between a resource and something in a SQL database: a row in the database, a table, or the database as a whole. These services are so common that entire frameworks like Rails are oriented to making them easy to write. Since these services are similar in design, it makes sense that their control flows should also be similar.

For instance, if an incoming request contains a nonsensical representation, the proper response is almost certainly 415 (“Unsupported Media Type”) or 400 (“Bad Request”). It’s up to the application to decide which representations make sense, but the HTTP standard is pretty strict about the possible responses to “nonsensical representation.”

With this in mind, I’ve devised a standard control flow for the uniform interface in a database-backed application. It runs on top of the general rules I mentioned in the previous section. I used this control flow in the controller code throughout Chapter 7. Indeed, if you look at the code in that chapter you’ll see that I implemented the same ideas multiple times. There’s space in the REST ecosystem for a higher-level framework that implements this control flow, or some improved version of it.

GET

If the resource can be identified, send a representation along with a response code of 200 (“OK”). Be sure to support conditional GET!

PUT

If the resource already exists, parse the representation and turn it into a series of changes to the state of this resource. If the changes would leave the resource in an incomplete or inconsistent state, send a response code of 400 (“Bad Request”).

If the changes would cause the resource state to conflict with some other resource, send a response code of 409 (“Conflict”). My social bookmarking service sends a response code of 409 if you try to change your username to a name that’s already taken.

If there are no problems with the proposed changes, apply them to the existing resource. If the changes in resource state mean that the resource is now available at a different URI, send a response code of 301 (“Moved Permanently”) and include the new URI in the Location header. Otherwise, send a response code of 200 (“OK”). Requests to the old URI should now result in a response code of 301 (“Moved Permanently”), 404 (“Not Found”), or 410 (“Gone”).

There are two ways to handle a PUT request to a URI that doesn’t correspond to any resource. You can return a status code of 404 (“Not Found”), or you can create a resource at that URI. If you want to create a new resource, parse the representation and use it to form the initial resource state. Send a response code of 201 (“Created”). If there’s not enough information to create a new resource, send a response code of 400 (“Bad Request”).

POST for creating a new resource

Parse the representation, pick an appropriate URI, and create a new resource there. Send a response code of 201 (“Created”) and include the URI of the new resource in the Location header. If there’s not enough information provided to create the resource, send a response code of 400 (“Bad Request”). If the provided resource state would conflict with some existing resource, send a response code of 409 (“Conflict”), and include a Location header that points to the problematic resource.

POST for appending to a resource

Parse the representation. If it doesn’t make sense, send a response code of 400 (“Bad Request”). Otherwise, modify the resource state so that it incorporates the information in the representation. Send a response code of 200 (“OK”).

DELETE

Send a response code of 200 (“OK”).

The Atom Publishing Protocol

Earlier I described Atom as an XML vocabulary that describes the semantics of publishing: authors, summaries, categories, and so on. The Atom Publishing Protocol (APP) defines a set of resources that capture the process of publishing: posting a story to a site, editing it, assigning it to a category, deleting it, and so on.

The obvious applications for the APP are those for Atom and online publishing in general: weblogs, photo albums, content management systems, and the like. The APP defines four kinds of resources, specifies some of their behavior under the uniform interface, and defines the representation documents they should accept and serve. It says nothing about URI design or what data should go into the documents: that’s up to the individual application.

The APP takes HTTP’s uniform interface and puts a higher-level uniform interface on top of it. Many kinds of applications can conform to the APP, and a generic APP client should be able to access all of them. Specific applications can extend the APP by exposing additional resources, or making the APP resources expose more of HTTP’s uniform interface, but they should all support the minimal features mentioned in the APP standard.

The ultimate end of the APP is to serve Atom documents to the end user. Of course, the Atom documents are just the representations of underlying resources. The APP defines what those resources are. It defines two resources that correspond to Atom documents, and two that help the client find and modify APP resources.

Collections

An APP collection is a resource whose representation is an Atom feed. The document in Example 9-2 has everything it takes to be a representation of an Atom collection. There’s no necessary difference between an Atom feed you subscribe to in your feed reader, and an Atom feed that you manipulate with an APP client. A collection is just a list or grouping of pieces of data: what the APP calls members. The APP is heavily oriented toward manipulating “collection” type resources.

The APP defines a collection’s response to GET and POST requests. GET returns a representation: the Atom feed. POST adds a new member to the collection, which (usually) shows up as a new entry in the feed. Maybe you can also DELETE a collection, or modify its settings with a PUT request. The APP doesn’t cover that part: it’s up to your application.

Members

An APP collection is a collection of members. A member corresponds roughly to an entry in an Atom feed: a weblog entry, a news article, or a bookmark. But a member can also be a picture, song, movie, or Word document: a binary format that can’t be represented in XML as part of an Atom document.

A client creates a member inside a collection by POSTing a representation of the member to the collection URI. This pattern should be familiar to you by now: the member is created as a subordinate resource of the collection. The server assigns the new member a URI. The response to the POST request has a response code of 201 (“Created”), and a Location header that lets the client know where to find the new resource.

Example 9-5 shows an Atom entry document: a representation of a member. This is the same sort of entry tag I showed you in Example 9-2, presented as a standalone XML document. POSTing this document to a collection creates a new member, which starts showing up as a child of the collection’s feed tag. A document like this one might be how the entry tag in Example 9-2 got where it is today.

Example 9-5. A sample Atom entry document, suitable for POSTing to a collection
<?xml version="1.0" encoding="utf-8"?>
<entry>
 <title>New Resource Will Respond to PUT, City Says</title>
 <summary>
   After long negotiations, city officials say the new resource
   being built in the town square will respond to PUT. Earlier
   criticism of the proposal focused on the city's plan to modify the 
   resource through overloaded POST.
 </summary>
 <category scheme="http://www.example.com/categories/RestfulNews" 
           term="local" label="Local news" />
</entry>

Service document

This vaguely-named type of resource is just a grouping of collections. A typical move is to serve a single service document, listing all of your collections, as your service’s “home page.” A service document is an XML document written using a particular vocabulary, and its media type is application/atomserv+xml (see Example 9-6).

Example 9-6 shows a representation of a typical service document. It describes three collections. One of them is a weblog called “RESTful news,” which accepts a POST request if the representation is an Atom entry document like the one in Example 9-5. The other two are personal photo albums, which accept a POST request if the representation is an image file.

Example 9-6. A representation of a service document that describes three collections
<?xml version="1.0" encoding='utf-8'?>
<service xmlns="http://purl.org/atom/app#"
         xmlns:atom="http://www.w3.org/2005/Atom">
  <workspace>
    <atom:title>Weblogs</atom:title>
    <collection href="http://www.example.com/RestfulNews">
      <atom:title>RESTful News</atom:title>
      <categories href="http://www.example.com/categories/RestfulNews" />
    </collection>
  </workspace>

  <workspace>
    <atom:title>Photo galleries</atom:title>
    <collection
        href="http://www.example.com/samruby/photos" >
      <atom:title>Sam's photos</atom:title>
      <accept>image/*</accept>
      <categories href="http://www.example.com/categories/samruby-photo" />
    </collection>

    <collection
        href="http://www.example.com/leonardr/photos" >
      <atom:title>Leonard's photos</atom:title>
      <accept>image/*</accept>
      <categories href="http://www.example.com/categories/leonardr-photo" />
    </collection>
  </workspace>
</service>

How do I know what kind of POST requests a collection will accept? From the accept tags. The accept tag works something like the HTTP Accept header, only in reverse. The Accept header is usually sent by the client with a GET request, to tell the server which representation formats the client understands. The accept tag is the APP server’s way of telling the client which incoming representations a collection will accept as part of a POST request that creates a new member.

My two photo gallery collections specify an accept of image/*. Those collections will only accept POST requests where the representation is an image. On the other hand, the RESTful News weblog doesn’t specify an accept tag at all. The APP default is to assume that a collection only accepts POST requests when the representation is an Atom entry document (like the one in Example 9-5). The accept tag defines what the collections are for: the weblog is for textual data, and the photo collections are for images.

The other important thing about a service document is the categories tag, which links to a “category document” resource. The category document says what categories are allowed.

The APP doesn’t say much about service documents. It specifies their representation format, and says that they must serve a representation in response to GET. It doesn’t specify how service documents get on the server in the first place. If you write an APP application you can hardcode your service documents in advance, or you can make it possible to create new ones by POSTing to some new resource not covered by the APP. You can expose them as static files, or you can make them respond to PUT and DELETE. It’s up to you.

Tip

As you can see from Example 9-6, a service document’s representation doesn’t just describe collections: it groups collections into workspaces. When I wrote that representation I put the weblog in a workspace of its own, and grouped the photo galleries into a second workspace. The APP standard devotes some time to workspaces, but I’m going to pass over them, because the APP doesn’t define workspaces as resources. They don’t have their own URIs, and they only exist as elements in the representation of a service document. You can expose workspaces as resources if you want. The APP doesn’t prohibit it, but it doesn’t tell you how to do it, either.

Category documents

APP members (which correspond to Atom elements) can be put into categories. In Chapter 7, I represented a bookmark’s tags with Atom categories. The Atom entry described in Example 9-5 put the entry into a category called “local.” Where did that category come from? Who says which categories exist for a given collection? This is the last big question the APP answers.

The Atom entry document in Example 9-5 gave its category a “scheme” of http://www.example.com/categories/RestfulNews. The representation of the RESTful News collection, in the service document, gave that same URI in its categories tag. That URI points to the final APP resource: a category document (see Example 9-7). A category document lists the category vocabulary for a particular APP collection. Its media type is application/atomcat+xml.

Example 9-7 shows a representation of the category document for the collection “RESTful News.” This category document defines three categories: “local,” “international,” and “lighterside,” which can be referenced in Atom entry entities like the one in Example 9-5.

Example 9-7. A representation of a category document
<?xml version="1.0" ?>
<app:categories
     xmlns:app="http://purl.org/atom/app#"
     xmlns="http://www.w3.org/2005/Atom"
     scheme="http://www.example.com/categories/RestfulNews"
     fixed="no">
 <category term="local" label="Local news"/>
 <category term="international" label="International news"/>
 <category term="lighterside" label="The lighter side of REST"/>
</app:categories>

The scheme is not fixed, meaning that it’s OK to publish members to the collection even if they belong to categories not listed in this document. This document might be used in an end-user application to show a selectable list of categories for a new “RESTful news” story.

As with service documents, the APP defines the representation format for a category document, but says nothing about how category documents are created, modified, or destroyed. It only defines GET on the category document resource. Any other operations (like automatically modifying the category document when someone files an entry under a new category) are up to you to define.

Binary documents as APP members

There’s one important wrinkle I’ve glossed over. It has to do with the “photo gallery” collections I described in Example 9-6. I said earlier that a client can create a new member in a photo gallery by POSTing an image file to the collection. But an image file can’t go into an Atom feed: it’s a binary document. What exactly happens when a client POSTs a binary document to an APP collection? What’s in those photo galleries, really?

Remember that a resource can have more than one representation. Each photo I upload to a photo collection has two representations. One representation is the binary photo, and the other is an XML document containing metadata. The XML document is an Atom entry, the same as the news item in Example 9-5, and that’s the data that shows up in the Atom feed.

Here’s an example. I POST a JPEG file to my “photo gallery” collection, like so:

POST /leonardr/photos HTTP/1.1
Host: www.example.com
Content-type: image/jpeg
Content-length: 62811
Slug: A picture of my guinea pig

[JPEG file goes here]

The Slug is a custom HTTP header defined by the APP, which lets me specify a title for the picture while uploading it. The slug can show up in several pieces of resource state, as you’ll see in a bit.

The HTTP response comes back as I described it in Members” earlier in this chapter. The response code is 201 and the Location header gives me the URI of the newly created APP member.

201 Created
Location: http://www.example.com/leonardr/photos/my-guinea-pig.atom

But what’s at the other end of the URI? Not the JPEG file I uploaded, but an Atom entry document describing and linking to that file:

<?xml version="1.0" encoding="utf-8"?>
<entry>
 <title>A picture of my guinea pig</title>
 <updated>2007-01-24T11:52:29Z</updated>
 <id>urn:f1ef2e50-8ec8-0129-b1a7-003065546f18</id>
 <summary></summary>
 <link rel="edit-media" type="image/jpeg"
       href="http://www.example.com/leonardr/photos/my-guinea-pig.jpg" />
</entry>

The actual JPEG I uploaded is at the other end of that link. I can GET it, of course, and I can PUT to it to overwrite it with another image. My POST created a new “member” resource, and my JPEG is a representation of some of its resource state. But there’s also this other representation of resource state: the metadata. These other elements of resource state include:

  • The title, which I chose (the server decided to use my Slug as the title) and can change later.

  • The summary, which starts out blank but I can change.

  • The “last update” time, which I sort of chose but can’t change arbitrarily.

  • The URI to the image representation, which the server chose for me based on my Slug.

  • The unique ID, which the server chose without consulting me at all.

This metadata document can be included in an Atom feed: I’ll see it in the representation of the “photo gallery” collection. I can also modify this document and PUT it back to http://www.example.com/leonardr/photos/my-guinea-pig.atom to change the resource state. I can specify myself as the author, add categories, change the title, and so on. If I get tired of having this member in the collection, I can delete it by sending a DELETE request to either of its URIs.

That’s how the APP handles photos and other binary data as collection members. It splits the representation of the resource into two parts: the binary part that can’t go into an Atom feed and the metadata part that can. This works because the metadata of publishing (categories, summary, and so on) applies to photos and movies just as easily as to news articles and weblog entries.

Tip

If you read the APP standard (which you should, since this section doesn’t cover everything), you’ll see that it describes this behavior in terms of two different resources: a “Media Link Entry,” whose representation is an Atom document, and a “Media Resource,” whose representation is a binary file. I’ve described one resource that has two representations. The difference is purely philosophical and has no effect on the actual HTTP requests and responses.

Summary

That’s a fairly involved workflow, and I haven’t even covered everything that the APP specifies, but the APP is just a well-thought-out way of handling a common web service problem: the list/feed/collection that keeps having items/elements/members added to it. If your problem fits this domain, it’s easier to use the APP design—and get the benefits of existing client support—than to reinvent something similar (see Table 9-1).

Table 9-1. APP resources and their methods
 GETPOSTPUTDELETE
Service documentReturn a representation (XML)UndefinedUndefinedUndefined
Category documentReturn a representation (XML)UndefinedUndefinedUndefined
CollectionReturn a representation (Atom feed)Create a new memberUndefinedUndefined
MemberReturn the representation identified by this URI. (This is usually an Atom entry document, but it might be a binary file.)UndefinedUpdate the representation identified by this URIDelete the member

GData

I said earlier that the Atom Publishing Protocol defines only a few resources and only a few operations on those resources. It leaves a lot of space open for extension. One extension is Google’s GData, which adds a new kind of resource and some extras like an authorization mechanism. As of the time of writing, the Google properties Blogger, Google Calendar, Google Code Search, and Google Spreadsheets all expose RESTful web service interfaces. In fact, all four expose the same interface: the Atom Publishing Protocol with the GData extensions.

Unless you work for Google, you probably won’t create any services that expose the precise GData interface, but you may encounter GData from the client side. It’s also useful to see how the APP can be extended to handle common cases. See how Google used the APP as a building block, and you’ll see how you can do the same thing.

Querying collections

The biggest change GData makes is to expose a new kind of resource: the list of search results. The APP says what happens when you send a GET request to a collection’s URI. You get a representation of some of the members in the collection. The APP doesn’t say anything about finding specific subsets of the collection: finding members older than a certain date, written by a certain author, or filed under a certain category. It doesn’t specify how to do full-text search of a member’s text fields. GData fills in these blanks.

GData takes every APP collection and exposes an infinite number of additional resources that slice it in various ways. Think back to the “RESTful News” APP collection I showed in Example 9-2. The URI to that collection was http://www.example.com/RestfulNews. If that collection were exposed through a GData interface, rather than just an APP interface, the following URIs would also work:

  • http://www.example.com/RestfulNews?q=stadium: A subcollection of the members where the content contains the word “stadium.”

  • http://www.example.com/RestfulNews/-/local: A subcollection of the members categorized as “local.”

  • http://www.example.com/RestfulNews?author=Tom%20Servo&max-results=50: At most 50 of the members where the author is “Tom Servo.”

Those are just three of the search possibilities GData exposes. (For a complete list, see the GData developer’s guide. Note that not all GData applications implement all query mechanisms.) Search results are usually represented as Atom feeds. The feed contains a entry element for every member of the collection that matched the query. It also contains OpenSearch elements (q.v.) that specify how many members matched the query, and how many members fit on a page of search results.

Data extensions

I mentioned earlier that an Atom feed can contain markup from arbitrary other XML namespaces. In fact, I just said that GData search results include elements from the OpenSearch namespace. GData also defines a number of new XML entities in its own “gd” namespace, for representing domain-specific data from the Google web services.

Consider an event in the Google Calendar service. The collection is someone’s calendar and the member is the event itself. This member probably has the typical Atom fields: an author, a summary, a “last updated” date. But it’s also going to have calendar-specific data. When does the event take place? Where will it happen? Is it a one-time event or does it recur?

Google Calendar’s GData API puts this data in its Atom feeds, using tags like gd:when, gd:who, and gd:recurrence. If the client understands Google Calendar’s extensions it can act as a calendar client. If it only understands the APP, it can act as a general APP client. If it only understands the basic Atom feed format, it can treat the list of events as an Atom feed.

POST Once Exactly

POST requests are the fly in the ointment that is reliable HTTP. GET, PUT, and DELETE requests can be resent if they didn’t go through the first time, because of the restrictions HTTP places on those methods. GET requests have no serious side effects, and PUT and DELETE have the same effect on resource state whether they’re sent once or many times. But a POST request can do anything at all, and sending a POST request twice will probably have a different effect from sending it once. Of course, if a service committed to accepting only POST requests whose actions were safe or idempotent, it would be easy to make reliable HTTP requests to that service.

POST Once Exactly (POE) is a way of making HTTP POST idempotent, like PUT and DELETE. If a resource supports Post Once Exactly, then it will only respond successfully to POST once over its entire lifetime. All subsequent POST requests will give a response code of 405 (“Method Not Allowed”). A POE resource is a one-off resource exposed for the purpose of handling a single POST request.

Tip

POE was defined by Mark Nottingham in an IETF draft that expired in 2005. I think POE was a little ahead of its time, and if real services start implementing it, there could be another draft.

You can see the original standard at http://www.mnot.net/drafts/draft-nottingham-http-poe-00.txt.

Think of a “weblog” resource that responds to POST by creating a new weblog entry. How would we change this design so that no resource responds to POST more than once? Clearly the weblog can’t expose POST anymore, or there could only ever be one weblog entry. Here’s how POE does it. The client sends a GET or HEAD request to the “weblog” resource, and the response includes the special POE header:

HEAD /weblogs/myweblog HTTP/1.1
Host: www.example.com
POE: 1

The response contains the URI to a POE resource that hasn’t yet been POSTed to. This URI is nothing more than a unique ID for a future POST request. It probably doesn’t even exist on the server. Remember that GET is a safe operation, so the original GET request couldn’t have changed any server state.

200 OK
POE-Links: /weblogs/myweblog/entry-factory-104a4ed

POE and POE-Links are custom HTTP headers defined by the POE draft. POE just tells the server that the client is expecting a link to a POE resource. POE-Links gives one or more links to POE resources. At this point the client can POST a representation of its new weblog entry to /weblogs/myweblog/entry-factory-104a4ed. After the POST goes through, that URI will start responding to POST with a response code of 405 (“Operation Not Supported”). If the client isn’t sure whether or not the POST request went through, it can safely resend. There’s no possiblity that the second POST will create a second weblog entry. POST has been rendered idempotent.

The nice thing about Post Once Exactly is that it works with overloaded POST. Even if you’re using POST in a way that totally violates the Resource-Oriented Architecture, your clients can use HTTP as a reliable protocol if you expose the overloaded POST operations through POE.

An alternative to making POST idempotent is to get rid of POST altogether. Remember, POST is only necessary when the client doesn’t know which URI it should PUT to. POE works by generating a unique ID for each of the client’s POST operations. If you allow clients to generate their own unique IDs, they can use PUT instead. You can get the benefits of POE without exposing POST at all. You just need to make sure that two clients will never generate the same ID.

Hypermedia Technologies

There are two kinds of hypermedia: links and forms. A link is a connection between the current resource and some target resource, identified by its URI. Less formally, a link is any URI found in the body of a representation. Even JSON and plain text are hypermedia formats of a sort, since they can contain URIs in their text. But throughout this book when I say “hypermedia format,” I mean a format with some kind of structured support for links and forms.

There are two kinds of forms. The simplest kind I’ll call application forms, because they show the client how to manipulate application state. An application form is a way of handling resources whose names follow a pattern: it basically acts as a link with more than one destination. A search engine doesn’t link to every search you might possibly make: it gives you a form with a space for you to type in your search query. When you submit the form, your browser constructs a URI from what you typed into the form (say, http://www.google.com/search?q=jellyfish), and makes a GET request to that URI. The application form lets one resource link to an infinite number of others, without requiring an infinitely large representation.

The second kind of form I’ll call resource forms, because they show the client how to format a representation that modifies the state of a resource. GET and DELETE requests don’t need representations, of course, but POST and PUT requests often do. Resource forms say what the client’s POST and PUT representations should look like.

Links and application forms implement what I call connectedness, and what the Fielding thesis calls “hypermedia as the engine of application state.” The client is in charge of the application state, but the server can send links and forms that suggest possible next states. By contrast, a resource form is a guide to changing the resource state, which is ultimately kept on the server.

I cover four hypermedia technologies in this section. As of the time of writing, XHTML 4 is the only hypermedia technology in active use. But this is a time of rapid change, thanks in part to growing awareness of RESTful web services. XHTML 5 is certain to be widely used once it’s finally released. My guess is that URI Templates will also catch on, whether or not they’re incorporated into XHTML 5. WADL may catch on, or it may be supplanted by a combination of XHTML 5 and microformats.

URI Templates

URI Templates (currently an Internet Draft) are a technology that makes simple resource forms look like links. I’ve used URI Template syntax whenever I want to show you an infinite variety of similar URIs. There was this example from Chapter 3, when I was showing you the resources exposed by Amazon’s S3 service:

https://s3.amazonaws.com/{name-of-bucket}/{name-of-object}

That string is not a valid URI, because curly brackets aren’t valid in URIs, but it is a valid URI Template. The substring {name-of-bucket} is a blank to be filled in, a placeholder to be replaced with the value of the variable name-of-bucket. There are an infinite number of URIs lurking in that one template, including https://s3.amazonaws.com/bucket1/object1, https://s3.amazonaws.com/my-other-bucket/subdir/SomeObject.avi, and so on.

URI templating gives us a precise way to play fill-in-the-blanks with URIs. Without URI Templates, a client must rely on preprogrammed URI construction rules based on English descriptions like “https://s3.amazonaws.com/, and then the bucket name.”

URI Templates are not a data format, but any data format can improve its hypermedia capabilities by allowing them. There is currently a proposal to support URI Templates in XHTML 5, and WADL supports them already.

XHTML 4

HTML is the most successful hypermedia format of all time, but its success on the human web has typecast it as sloppy, and sent practitioners running for the more structured XML. The compromise standard is XHTML, an XML vocabulary for describing documents which uses the same tags and attributes found in HTML. Since it’s basically the same as HTML, XHTML has a powerful set of hypermedia features, though its forms are somewhat anemic.

XHTML 4 links

A number of HTML tags can be used to make hypertext links (consider img, for example), but the two main ones are link and a. A link tag shows up in the document’s head, and connects the document to some resource. The link tag contains no text or other tags: it applies to the entire document. An a tag shows up in the document’s body. It can contain text and other tags, and it links its contents (not the document as a whole) to another resource (see Example 9-8).

Example 9-8. An XHTML 4 document with some links
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
 <head>
  <link rel="alternate" type="application/atom+xml" href="atom.xml">
  <link rel="stylesheet" href="display.css">
 </head>

 <body>
  <p>
   Have you read 
   <a href="Great-Expectations.html"><i>Great Expectations</i></a>?
  </p>
 </body>
</html>

Example 9-8 shows a simple HTML document that contains both sorts of hyperlinks. There are two links that use link to relate the document as a whole to other URIs, and there’s one link that uses a to relate part of the document (the italicized phrase “Great Expectations”) to another URI.

The three important attributes of link and a tags are href, rel, and rev. The href attribute is the most important: it gives the URI of the resource that’s being linked to. If you don’t have an href attribute, you don’t have a hyperlink.

The rel attribute adds semantics that explain the foreign URI’s relationship to this document. I mentioned this attribute earlier when I was talking about microformats. In Example 9-8, the relationship of the URI atom.xml to this document is “alternate”. The relationship of the URI display.css to this document is “stylesheet”. These particular values for rel are among the 15 defined in the HTML 4 standard. The value “alternate” means that the linked URI is an alternate representation of the resource this document represents. The value “stylesheet” means that the linked URI contains instructions on how to format this document for display. Microformats often define additional values for rel. The rel-nofollow microformat defines the relationship “nofollow”, to show that a document doesn’t trust the resource it’s linking to.

The rev attribute is the exact opposite of rel: it explains the relationship of this document to the foreign URI. The VoteLinks microformat lets you express your opinion of a URI by setting rev to “vote-for” or “vote-against”. In this case, the foreign URI probably has no relationship to you, but you have a relationship to it.

A simple example illustrates the difference between rel and rev. Here’s an HTML snippet of a user’s home page, which contains two links to his father’s home page.

<a rel="parent" href="/Dad">My father</a>
<a rev="child" href="/Dad">My father</a>

XHTML 4 forms

These are the forms that drive the human web. You might not have known about the rel and rev attributes, but if you’ve done any web programming, you should be familiar with the hypermedia capabilities of XHTML forms.

To recap what you might already know: HTML forms are described with the form tag. A form tag has a method attribute, which names the HTTP method the client should use when submitting the form. It has an action attribute, which gives the (base) URI of the resource the form is accessing. It also has an enctype attribute, which gives the media type of any representation the client is supposed to send along with the request.

A form tag can contain form elements: children like input and select tags. These show up in web browsers as GUI elements: text inputs, checkboxes, buttons, and the like. In application forms, the values entered into the form elements are used to construct the ultimate destination of a GET request. Here’s an application form I just made up: an interface to a search engine.

<form method="GET" action="http://search.example.com/search">
 <input name="q" type="text">
 <input type="submit" />
</form>

Since this is an application form, it’s not designed to operate on any particular resource. The point of the form is to use the URI in the action as a jumping-off point to an infinity of resources with user-generated URIs: http://search.example.com/search?q=jellyfish, http://search.example.com/search?q=chocolate, and so on.

A resource form in HTML 4 identifies one particular resource, and it specifies an action of POST. The form elements are used to build up a representation to be sent along with the POST request. Here’s a resource form I just made up: an interface to a file upload script.

<form method="POST" action="http://files.example.com/dir/subdir/" 
enctype="multipart/form-data">
 <input type="text" name="description" />
 <input type="file" name="newfile" />
</form>

This form is designed to manipulate resource state, to create a new “file” resource as a subordinate resource of the “directory” resource at http://files.example.com/dir/subdir/. The representation format is a “multipart/form-data” document that contains a textual description and a (possibly binary) file.

Shortcomings of XHTML 4

HTML 4’s hypermedia features are obviously good enough to give us the human web we enjoy today, but they’re not good enough for web services. I have five major problems with HTML’s forms.

  1. Application forms are limited in the URIs they can express. You’re limited to URIs that take a base URI and then tack on some key-value pairs. With an HTML application form you can “link” to http://search.example.com/search?q=jellyfish, but not http://search.example.com/search/jellyfish. The variables must go into the URI’s query string as key-value pairs.

  2. Resource forms in HTML 4 are limited to using HTTP POST. There’s no way to use a form to tell a client to send a DELETE request, or to show a client what the representation of a PUT request should look like. The human web, which runs on HTML forms, has a different uniform interface from web services as a whole. It uses GET for safe operations, and overloaded POST for everything else. If you want to get HTTP’s uniform interface with HTML 4 forms, you’ll need to simulate PUT and DELETE with overloaded POST (see Faking PUT and DELETE” in Chapter 8 for the standard way).

  3. There’s no way to use an HTML form to describe the HTTP headers a client should send along with its request. You can’t define a form entity and say “the value of this entity goes into the HTTP request header X-My-Header.” I generally don’t think services should require this of their clients, but sometimes it’s necessary. The Atom Publishing Protocol defines a special request header (Slug, mentioned above) for POST requests that create a new member in a collection. The APP designers defined a new header, instead of requiring that this data go into the entity-body, because the entity-body might be a binary file.

  4. You can’t use an HTML form to specify a representation more complicated than a set of key-value pairs. All the form elements are designed to be turned into key-value pairs, except for the “file” element, which doesn’t help much. The HTML standard defines two content types for form representations: application/x-www-form-urlencoded, which is for key-value pairs (I covered it in Form-encoding” in Chapter 6); and multipart/form-data, which is for a combination of key-value pairs and uploaded files.

    You can specify any content type you want in enctype, just as you can put anything you want in a tag’s class and rel attributes. So you can tell the client it should POST an XML file by setting a form’s enctype to application/xml. But there’s no way of conveying what should go into that XML file, unless it happens to be an XML representation of a bunch of key-value pairs. You can’t nest form elements, or define new ones that represent data structures more complex than key-value pairs. (You can do a little better if the XML vocabulary you’re using has its own media type, like application/atom+xml or application/rdf+xml.)

  5. As I mentioned in Link the Resources to Each Other” in Chapter 5, you can’t define a repeating field in an HTML form. You can define the same field twice, or ten times, but eventually you’ll have to stop. There’s no way to tell the client: “you can specify as many values as you want for this key-value pair.”

XHTML 5

HTML 5 solves many of the problems that turn up when you try to use HTML on the programmable web. The main problem with HTML 5 is the timetable. The official estimate has HTML 5 being adopted as a W3C Proposed Recommendation in late 2008. More conservative estimates push that date all the way to 2022. Either way, HTML 5 won’t be a standard by the time this book is published. That’s not really the issue, though. The issue is when real clients will start supporting the HTML 5 features I describe below. Until they do, if you use the features of HTML 5, your clients will have to write custom code to interpret them.

HTML 5 forms support all four basic methods of HTTP’s uniform interface: GET, POST, PUT, and DELETE. I took advantage of this when designing my map application, if you’ll recall Example 6-3. This is the easiest HTML 5 feature to support today, especially since (as I’ll show in Chapter 11) most web browsers can already make PUT and DELETE requests.

There’s a proposal (not yet incorporated into HTML 5; see http://blog.welldesignedurls.org/2007/01/11/proposing-uri-templates-for-webforms-2/) that would allow forms to use URI Templates. Under this proposal, an application form can have its template attribute (not its action attribute) be a URI Template like http://search.example.com/search/{q}. It could then define q as a text field within the form. This would let you use an application form to “link” to http://search.example.com/search/jellyfish.

HTML 4 forms can specify more than one form element with the same name. This lets clients know they can submit the same key with 2 or 10 values: as many values as there are form elements. HTML 5 forms support the “repetition model,” a way of telling the client it’s allowed to submit the same key as many times as it wants. I used a simple repetition block in Example 5-11.

Finally, HTML 5 defines two new ways of serializing key-value pairs into representations: as plain text, or using a newly defined XML vocabulary. The content type for the latter is application/x-www-form+xml. This is not as big an advance as you might think. Form entities like input are still ways of getting data in the form of key-value pairs. These new serialization formats are just new ways of representing those key-value pairs. There’s still no way to show the client how to format a more complicated representation, unless the client can figure out the format from just the content type.

WADL

The Web Application Description Language is an XML vocabulary for expressing the behavior of HTTP resources (see the development site for the Java client). It was named by analogy with the Web Service Description Language, a different XML vocabulary used to describe the SOAP-based RPC-style services that characterize Big Web Services.

Look back to Service document” earlier in this chapter where I describe the Atom Publishing Protocol’s service documents. The representation of a service document is an XML document, written in a certain vocabulary, which describes a set of resources (APP collections) and the operations you’re allowed to perform on those resources. WADL is a standard vocabulary that can do for any resource at all what APP service documents do for APP collection resources.

You can provide a WADL file that describes every resource exposed by your service. This corresponds roughly to a WSDL file in a SOAP/WSDL service, and to the “site map” pages you see on the human web. Alternatively, you can embed a snippet of WADL in an XML representation of a particular resource, the way you might embed an HTML form in an HTML representation. The WADL snippet tells you how to manipulate the state of the resource.

As I said way back in Chapter 2, WADL makes it easy to write clients for web services. A WADL description of a resource can stand in for any number of programming-language interfaces to that resource: all you need is a WADL client written in the appropriate language. WADL abstracts away the details of HTTP requests, and the building and parsing of representations, without hiding HTTP’s uniform interface.

As of the time of writing, WADL is more talked about than used. There’s a Java client implementation, a rudimentary Ruby client, and that’s about it. Most existing WADL files are bootleg descriptions of other peoples’ RESTful and REST-RPC services.

WADL does better than HTML 5 as a hypermedia format. It supports URI Templates and every HTTP method there is. A WADL file can also tell the client to populate certain HTTP headers when it makes a request. More importantly, WADL can describe representation formats that aren’t just key-value pairs. You can specify the format of an XML representation by pointing to a schema definition. Then you can point out which parts of the document are most important by specifying key-value pairs where the “keys” are XPath statements. This is a small step, but an important one. With HTML you can only specify the format of an XML representation by giving it a different content type.

Of course, the “small step” only applies to XML. You can use WADL to say that a certain resource serves or accepts a JSON document, but unless that JSON document happens to be a hash (key-value pairs again!), there’s no way to specify what the JSON document ought to look like. This is a general problem which was solved in the XML world with schema definitions. It hasn’t been solved for other formats.

Describing a del.icio.us resource

Example 9-9 shows a Ruby client for the del.icio.us web service based on Ruby’s WADL library. It’s a reprint of the code from Clients Made Easy with WADL” in Chapter 2.

Example 9-9. A Ruby/WADL client for del.icio.us
#!/usr/bin/ruby
# delicious-wadl-ruby.rb
require 'wadl'

if ARGV.size != 2
  puts "Usage: #{$0} [username] [password]"
  exit
end
username, password = ARGV

# Load an application from the WADL file
delicious = WADL::Application.from_wadl(open("delicious.wadl"))

# Give authentication information to the application
service = delicious.v1.with_basic_auth(username, password)

begin
  # Find the "recent posts" functionality
  recent_posts = service.posts.recent

  # For every recent post...
  recent_posts.get.representation.each_by_param('post') do |post|
    # Print its description and URI.
    puts "#{post.attributes['description']}: #{post.attributes['href']}"
  end
rescue WADL::Faults::AuthorizationRequired
  puts "Invalid authentication information!"
end

The code’s very short but you can see what’s happening, especially now that we’re past Chapter 2 and I’ve shown you how resource-oriented services work. The del.icio.us web service exposes a resource that the WADL library identifies with v1. That resource has a subresource identified by posts.recent. If you recall the inner workings of del.icio.us from Chapter 2, you’ll recognize this as corresponding to the URI https://api.del.icio.us/v1/posts/recent. When you tell the WADL library to make a GET request to that resource, you get back some kind of response object which includes an XML representation. Certain parts of this representation, the posts, are especially interesting, and I process them as XML elements, extracting their descriptions and hrefs.

Let’s look at the WADL file that makes this code possible. I’ve split it into three sections: resource definition, method definition, and representation definition. Example 9-10 shows the resource definition. I’ve defined a nested set of WADL resources: recent inside posts inside v1. The recent WADL resource corresponds to the HTTP resource the del.icio.us API exposes at https://api.del.icio.us/v1/posts/recent.

Example 9-10. WADL file for del.icio.us: the resource
<?xml version="1.0"?>
<!-- This is a partial bootleg WADL file for the del.icio.us API. -->

<application xmlns="http://research.sun.com/wadl/2006/07">
 
  <!-- The resource -->
  <resources base="https://api.del.icio.us/">
    <doc xml:lang="en" title="The del.icio.us API v1">
      Post or retrieve your bookmarks from the social networking website.
      Limit requests to one per second.
    </doc>
    
    <resource path="v1">
      <param name="Authorization" style="header" required="true">
	<doc xml:lang="en">All del.icio.us API calls must be authenticated
	using Basic HTTP auth.</doc>
      </param>

      <resource path="posts">
	<resource path="recent">
	  <method href="#getRecentPosts" />
	</resource>
      </resource>     
    </resource>
  </resources>

That HTTP resource exposes a single method of the uniform interface (GET), so I define a single WADL method inside the WADL resource. Rather than define the method inside the resource tag and clutter up Example 9-10, I’ve defined it by reference. I’ll get to it next.

Every del.icio.us API request must include an Authorization header that encodes your del.icio.us username and password using HTTP Basic Auth. I’ve represented this with a param tag that tells the client it must provide an Authorization header. The param tag is the equivalent of an HTML form element: it tells the client about a blank to be filled in.[31]

Example 9-11 shows the definition of the method getRecentPosts. A WADL method corresponds to a request you might make using HTTP’s uniform interface. The id of the method can be anything, but its name is always the name of an HTTP method: here, “GET”. The method definition models both the HTTP request and response.

Example 9-11. WADL file for del.icio.us: the method
  <!-- The method -->
  <method id="getRecentPosts" name="GET">

    <doc xml:lang="en" title="Returns a list of the most recent posts." />
    
    <request>
      <param name="tag" style="form">
	<doc xml:lang="en" title="Filter by this tag." />
      </param>
      
      <param name="count" style="form" default="15">
	<doc xml:lang="en" title="Number of items to retrieve.">
	  Maximum: 100
	</doc>
      </param>
    </request>
    
    <response>
      <representation href="#postList" />
      <fault id="AuthorizationRequired" status="401" />
    </response>
  </method>

This particular request defines two more params: two more blanks to be filled in by the client. These are “query” params, which in a GET request means they’ll be tacked onto the query string—just like elements in an HTML form would be. These param definitions make it possible for the WADL client to access URIs like https://api.del.icio.us/v1/posts/recent?count=100 and https://api.del.icio.us/v1/posts/recent?tag=rest&count=20.

This WADL method defines an application form: not a way of manipulating resource state, but a pointer to possible new application states. This method tag tells the client about an infinite number of GET requests they can make to a set of related resources, without having to list infinitely many URIs. If this method corresponded to a PUT or POST request, its request might be a resource form, a way of manipulating resource state. Then it might describe a representation for you to send along with your request.

The response does describe a representation: the response document you get back from del.icio.us when you make one of these GET requests. It also describes a possible fault condition: if you submit a bad Authorization header, you’ll get a response code of 401 (“Unauthorized”) instead of a representation.

Take a look at Example 9-12, which defines the representation. This is WADL’s description of the XML document you receive when you GET https://api.del.icio.us/v1/posts/recent: a document like the one in Example 2-3.

Example 9-12. WADL file for del.icio.us: the representation
  <!-- The representation -->
  <representation id="postList" mediaType="text/xml" element="posts">
    <param name="post" path="/posts/post" repeating="true" />
  </representation>
  
</application>

The WADL description gives the most important points about this document: its content type is text/xml, and it’s rooted at the posts tag. The param tag points out that the the posts tag has a number of interesting children: the post tags. The param’s path attribute gives an XPath expression which the client can use on the XML document to fetch all the del.icio.us posts. My client’s call to each_by_param('post') runs that XPath expression against the document, and lets me operate on each matching element without having to know anything about XPath or the structure of the representation.

There’s no schema definition for this kind of XML representation: it’s a very simple document and del.icio.us just assumes you can figure out the format. But for the sake of demonstration, let’s pretend this representation has an XML Schema Definition (XSD) file. The URI of this imaginary definition is https://api.del.icio.us/v1/posts.xsd, and it defines the schema for the posts and post tags. In that fantasy situation, Example 9-13 shows how I might define the representation in terms of the schema file.

Example 9-13. WADL file for del.icious: the resource
<?xml version="1.0"?>
<!-- This is a partial bootleg WADL file for the del.icio.us API. -->

<application xmlns="http://research.sun.com/wadl/2006/07"
             xmlns:delicious="https://api.del.icio.us/v1/posts.xsd">

 <grammars>
  <include "https://api.del.icio.us/v1/posts.xsd" />
 </grammars>

 ...

  <representation id="postList" mediaType="text/xml" element="delicious:posts" />
 ...

</application>

I no longer need a param to say that this document is full of post tags. That information’s in the XSD file. I just have to define the representation in terms of that file. I do this by referencing the XSD file in this WADL file’s grammars, assigning it to the delicious: namespace, and scoping the representation’s element attribute to that namespace. If the client is curious about what a delicious:posts tag might contain, it can check the XSD. Even though the XSD completely describes the representation format, I might define some param tags anyway to point out especially important parts of the document.

Describing an APP collection

That was a pretty simple example. I used an application form to describe an infinite set of related resources, each of which responds to GET by sending a simple XML document. But I can use WADL to describe the behavior of any resource that responds to the uniform interface. If a resource serves an XML representation, I can reach into that representation with param tags: show where the interesting bits of data are, and where the links to other resources can be found.

Earlier I compared WADL files to the Atom Publishing Protocol’s service documents. Both are XML vocabularies for describing resources. Service documents describe APP collections, and WADL documents describe any resource at all. You’ve seen how a service document describes a collection (Example 9-6). What would a WADL description of the same resources look like?

As it happens, the WADL standard gives just this example. Section A.2 of the standard shows an APP service document and then a WADL description of the same resources. I’ll present a simplified version of this idea here.

The service document in Example 9-6 describes three Atom collections. One accepts new Atom entries via POST, and the other two accept image files. These collections are pretty similar. In an object-oriented system I might factor out the differences by defining a class hierarchy. I can do something similar in WADL. Instead of defining all three resources from scratch, I’m going to define two resource types. Then it’ll be simple to define individual resources in terms of the types (see Example 9-14).

Example 9-14. A WADL file for APP: resource types
<?xml version="1.0"?>
<!-- This is a description of two common types of resources that respond
     to the Atom Publishing Protocol. -->

<application xmlns="http://research.sun.com/wadl/2006/07"
             xmlns:app="http://purl.org/atom/app">

  <!-- An Atom collection accepts Atom entries via POST. -->
  <resource_type id="atom_collection">
    <method href="#getCollection" />
    <method href="#postNewAtomMember" />
  </resource_type>

  <!-- An image collection accepts image files via POST. -->
  <resource_type id="image_collection">
    <method href="#getCollection" />
    <method href="#postNewImageMember" />
  </resource_type>

There are my two resource types: the Atom collection and the image collection. These don’t correspond to any specific resources: they’re equivalent to classes in an object-oriented design. Both “classes” support a method identified as getCollection, but the Atom collection supports a method postNewAtomMember where the image collection supports postNewImageMember. Example 9-15 shows those three methods:

Example 9-15. A WADL file for APP: methods
  <!-- Three possible operations on resources. -->
  <method name="GET" id="getCollection">
    <response>
      <representation href="#feed" />
    </response>
  </method>

  <method name="POST" id="postNewAtomMember">
    <request>
      <representation href="#entry" />
    </request>
  </method>

  <method name="POST" id="postNewImageMember">
    <request>
      <representation id="image"  mediaType="image/*" />
      <param name="Slug" style="header" />
    </request>
  </method>

The getCollection WADL method is revealed as a GET operation that expects an Atom feed (to be described) as its representation. The postNewAtomMember method is a POST operation that sends an Atom entry (again, to be described) as its representation. The postNewImageMember method is also a POST operation, but the representation it sends is an image file, and it knows how to specify a value for the HTTP header Slug.

Finally, Example 9-16 describes the two representations: Atom feeds and atom entries. I don’t need to describe these representations in great detail because they’re already described in the XML Schema Document for Atom: I can just reference the XSD file. But I’m free to annotate the XSD by defining param elements that tell a WADL client about the links between resources.

Example 9-16. A WADL file for APP: the representations
  <!-- Two possible XML representations. -->
  <representation id="feed" mediaType="application/atom+xml"
		  element="atom:feed" />

  <representation id="entry" mediaType="application/atom+xml"
		  element="atom:entry" />

</application>

I can make the file I just defined available on the Web: say, at http://www.example.com/app-resource-types.wadl. Now it’s a resource. I can use it in my services by referencing its URI. So can anyone else. It’s now possible to define certain APP collections in terms of these resource types. My three collections are defined in just a few lines in Example 9-17.

Example 9-17. A WADL file for a set of APP collections
<?xml version="1.0"?>
<!-- This is a description of three "collection" resources that respond
     to the Atom Publishing Protocol. -->

<application xmlns="http://research.sun.com/wadl/2006/07"
             xmlns:app="http://purl.org/atom/app">
  <resources base="http://www.example.com/">
    <resource path="RESTfulNews" 
     type="http://www.example.com/app-resource-types.wadl#atom_collection" />
    <resource path="samruby/photos" 
     type="http://www.example.com/app-resource-types.wadl#image_collection" />
    <resource path="leonardr/photos" 
     type="http://www.example.com/app-resource-types.wadl#image_collection"/>
  </resources>
</application>

The Atom Publishing Protocol is popular because it’s such a general interface. The major differences between two APP services are described in the respective service documents. A generic APP client can read these documents and reprogram itself to act as a client for many different services. But there’s an even more general interface: the uniform interface of HTTP. An APP service document uses a domain-specific XML vocabulary, but hypermedia formats like HTML and WADL can be used to describe any web service at all. Their clients can be even more general than APP clients.

Hypermedia is how one service communicates the ways it differs from other services. If that intelligence is embedded in hypermedia, the programmer needs to hardwire less of it in code. More importantly, hypermedia gives you access to the link: the second most important web technology after the URI. The potential of REST will not be fully exploited until web services start serving their representations as link-rich hypermedia instead of plain media.

Is WADL evil?

In Chapter 10 I’ll talk about how WSDL turned SOAP from a simple XML envelope format to a name synonymous with the RPC style of web services. WSDL abstracts away the details of HTTP requests and responses, and replaces them with a model based on method calls in a programming language. Doesn’t WADL do the exact same thing? Should we worry that WADL will do to plain-HTTP web services what WSDL did to SOAP web services: tie them to the RPC style in the name of client convenience?

I think we’re safe. WADL abstracts away the details of HTTP requests and responses, but—this is the key point—it doesn’t add any new abstraction on top. Remember, REST isn’t tied to HTTP. When you abstract HTTP away from a RESTful service, you’ve still got REST. A resource-oriented web service exposes resources that respond to a uniform interface: that’s REST. A WADL document describes resources that respond to a uniform interface: that’s REST. A program that uses WADL creates objects that correspond to resources, and accesses them with method calls that embody a uniform interface: that’s REST. RESTfulness doesn’t live in the protocol. It lives in the interface.

About the worst you can do with WADL is hide the fact that a service responds to the uniform interface. I’ve deliberately not shown you how to do this, but you should be able to figure it out. You may need to do this if you’re writing a WADL file for a web application or REST-RPC hybrid service that doesn’t respect the uniform interface.

I’m fairly sure that WADL itself won’t tie HTTP to an RPC model, the way WSDL did to SOAP. But what about those push-button code generators, the ones that take your procedure-call-oriented code and turn it into a “web service” that only exposes one URI? WADL makes you define your resources, but what if tomorrow’s generator creates a WADL file that only exposes a single “resource”, the way an autogenerated WSDL file exposes a single “endpoint”?

This is a real worry. Fortunately, WADL’s history is different from WSDL’s. WSDL was introduced at a time when SOAP was still officially associated with the RPC style. But WADL is being introduced as people are becoming aware of the advantages of REST, and it’s marketed as a way to hide the details while keeping the RESTful interface. Hopefully, any tool developers who want to make their tools support WADL will also be interested in making their tools support RESTful design.



[28] OpenSearch also defines a simple control flow: a special kind of resource called a “description document.” I’m not covering OpenSearch description documents in this book, mainly for space reasons.

[29] This is specified, and argued for, in RFC 3023.

[30] Again, according to RFC 3023, which few developers have read. For a lucid explanation of these problems, see Mark Pilgrim’s article “XML on the Web Has Failed” (http://www.xml.com/pub/a/2004/07/21/dive.html).

[31] Marc Hadley, the primary author of the WADL standard, is working on more elegant ways of representing the need to authenticate.

Get RESTful Web Services now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.