Chapter 8. REST and ROA Best Practices

By now you should have a good idea of how to build resource-oriented, RESTful web services. This chapter is a pause to gather in one place the most important ideas so far, and to fill in some of the gaps in my coverage.

The gaps exist because the theoretical chapters have focused on basics, and the practical chapters have worked with specific services. I’ve implemented conditional HTTP GET but I haven’t explained it. I’ve implemented HTTP Basic authentication and a client for Amazon’s custom authentication mechanism, but I haven’t compared them to other kinds of HTTP authentication, and I’ve glossed over the problem of authenticating a client to its own user.

The first part of this chapter is a recap of the main ideas of REST and the ROA. The second part describes the ideas I haven’t already covered. I talk about specific features of HTTP and tough cases in resource design. In Chapter 9 I discuss the building blocks of services: specific technologies and patterns that have been used to make successful web services. Taken together, this chapter and the next form a practical reference for RESTful web services. You can consult them as needed when making technology or design decisions.

Resource-Oriented Basics

The only differences between a web service and a web site are the audience (preprogrammed clients instead of human beings) and a few client capabilities. Both web services and web sites benefit from a resource-oriented design based on HTTP, URIs, and (usually) XML.

Every interesting thing your application manages should be exposed as a resource. A resource can be anything a client might want to link to: a work of art, a piece of information, a physical object, a concept, or a grouping of references to other resources.

A URI is the name of a resource. Every resource must have at least one name. A resource should have as few names as possible, and every name should be meaningful.

The client cannot access resources directly. A web service serves representations of a resource: documents in specific data formats that contain information about the resource. The difference between a resource and its representation is somewhat academic for static web sites, where the resources are just files on disk that are sent verbatim to clients. The distinction takes on greater importance when the resource is a row in a database, a physical object, an abstract concept, or a real-world event in progress.

All access to resources happens through HTTP’s uniform interface. These are the four basic HTTP verbs (GET, POST, PUT, and DELETE), and the two auxiliaries (HEAD and OPTIONS). Put complexity in your representations, in the variety of resources you expose, and in the links between resources. Don’t put it in the access methods.

The Generic ROA Procedure

Reprinted from Chapter 6, this is an all-purpose procedure for splitting a problem space into RESTful resources.

This procedure only takes into account the constraints of REST and the ROA. Your choice of framework may impose additional constraints. If so, you might as well take those into account while you’re designing the resources. In Chapter 12 I give a modified version of this procedure that works with Ruby on Rails.

  1. Figure out the data set

  2. Split the data set into resources

    For each kind of resource:

  3. Name the resources with URIs

  4. Expose a subset of the uniform interface

  5. Design the representation(s) accepted from the client

  6. Design the representation(s) served to the client

  7. Integrate this resource into existing resources, using hypermedia links and forms

  8. Consider the typical course of events: what’s supposed to happen? Standard control flows like the Atom Publishing Protocol can help (see Chapter 9).

  9. Consider error conditions: what might go wrong? Again, standard control flows can help.

Addressability

A web service is addressable if it exposes the interesting aspects of its data set through resources. Every resource has its own unique URI: in fact, URI just stands for “Universal Resource Identifier.” Most RESTful web services expose an infinite number of URIs. Most RPC-style web services expose very few URIs, often as few as one.

Representations Should Be Addressable

A URI should never represent more than one resource. Then it wouldn’t be a Universal Resource Identifier. Furthermore, I suggest that every representation of a resource should have its own URI. This is because URIs are often passed around or used as input to other web services. The expectation then is that the URI designates a particular representation of the resource.

Let’s say you’ve exposed a press release at /releases/104. There’s an English and a Spanish version of the press release, an HTML and plain-text version of each. Your clients should be able set the Accept-Language request header to choose an English or Spanish representation of /releases/104, and the Accept request header to choose an HTML or plain-text representation. But you should also give each representation a separate URI: maybe URIs like /releases/104.en, /releases/104.es.html, and /releases/104.txt.

In the bookmarking service from Chapter 7, I exposed two representations of a set of bookmarks: a generic XML representation at /v1/users/leonardr/bookmarks.xml, and an Atom representation at /v1/users/leonardr/bookmarks.atom. I also exposed a canonical URI for the resource at /v1/users/leonardr/bookmarks. A client can set its Accept request header to distinguish between Atom and generic XML representations of /v1/users/leonardr/bookmarks, or it can tweak the URI to get a different representation. Both techniques work, and both techniques are RESTful, but a URI travels better across clients if it specifies a resource and a representation.

It’s OK for a client to send information in HTTP request headers, so long as the server doesn’t make that the only way of selecting a resource or representation. Headers can also contain sensitive information like authentication credentials, or information that’s different for every client. But headers shouldn’t be the only tool a client has to specify which representation is served or which resource is selected.

State and Statelessness

There are two types of state in a RESTful service. There’s resource state, which is information about resources, and application state, which is information about the path the client has taken through the application. Resource state stays on the server and is only sent to the client in the form of representations. Application state stays on the client until it can be used to create, modify, or delete a resource. Then it’s sent to the server as part of a POST, PUT, or DELETE request, and becomes resource state.

A RESTful service is “stateless” if the server never stores any application state. In a stateless application, the server considers each client request in isolation and in terms of the current resource state. If the client wants any application state to be taken into consideration, the client must submit it as part of the request. This includes things like authentication credentials, which are submitted with every request.

The client manipulates resource state by sending a representation as part of a PUT or POST request. (DELETE requests work the same way, but there’s no representation.) The server manipulates client state by sending representations in response to the client’s GET requests. This is where the name “Representational State Transfer” comes from.

Connectedness

The server can guide the client from one application state to another by sending links and forms in its representations. I call this connectedness because the links and forms connect the resources to each other. The Fielding thesis calls this “hypermedia as the engine of application state.”

In a well-connected service, the client can make a path through the application by following links and filling out forms. In a service that’s not connected, the client must use predefined rules to construct every URI it wants to visit. Right now the human web is very well-connected, because most pages on a web site can be reached by following links from the main page. Right now the programmable web is not very well-connected.

The server can also guide the client from one resource state to another by sending forms in its representations. Forms guide the client through the process of modifying resource state with a PUT or POST request, by giving hints about what representations are acceptable.

Links and forms reveal the levers of state: requests the client might make in the future to change application or resource state. Of course, the levers of state can be exposed only when the representation format supports links or forms. A hypermedia format like XHTML is good for this; so is an XML format that can have XHTML or WADL embedded in it.

The Uniform Interface

All interaction between clients and resources is mediated through a few basic HTTP methods. Any resource will expose some or all of these methods, and a method does the same thing on every resource that supports it.

A GET request is a request for information about a resource. The information is delivered as a set of headers and a representation. The client never sends a representation along with a GET request.

A HEAD request is the same as a GET request, except that only the headers are sent in response. The representation is omitted.

A PUT request is an assertion about the state of a resource. The client usually sends a representation along with a PUT request, and the server tries to create or change the resource so that its state matches what the representation says. A PUT request with no representation is just an assertion that a resource should exist at a certain URI.

A DELETE request is an assertion that a resource should no longer exist. The client never sends a representation along with a DELETE request.

A POST request is an attempt to create a new resource from an existing one. The existing resource may be the parent of the new one in a data-structure sense, the way the root of a tree is the parent of all its leaf nodes. Or the existing resource may be a special “factory” resource whose only purpose is to generate other resources. The representation sent along with a POST request describes the initial state of the new resource. As with PUT, a POST request doesn’t need to include a representation at all.

A POST request may also be used to append to the state of an existing resource, without creating a whole new resource.

An OPTIONS request is an attempt to discover the levers of state: to find out which subset of the uniform interface a resource supports. It’s rarely used. Today’s services specify the levers of state up front, either in human-readable documentation or in hypermedia documents like XHTML and WADL files.

If you find yourself wanting to add another method or additional features to HTTP, you can overload POST (see Overloading POST below), but you probably need to add another kind of resource. If you start wanting to add transactional support to HTTP, you should probably expose transactions as resources that can be created, updated, and deleted. See Resource Design” later in this chapter for more on this technique.

Safety and Idempotence

A GET or HEAD request should be safe: a client that makes a GET or HEAD request is not requesting any changes to server state. The server might decide on its own to change state (maybe by logging the request or incrementing a hit counter), but it should not hold the client responsible for those changes. Making any number of GET requests to a certain URI should have the same practical effect as making no requests at all.

A PUT or DELETE request should be idempotent. Making more than one PUT or DELETE request to a given URI should have the same effect as making only one. One common problem: PUT requests that set resource state in relative terms like “increment value by 5.” Making 10 PUT requests like that is a lot different from just making one. PUT requests should set items of resource state to specific values.

The safe methods, GET and HEAD, are automatically idempotent as well. POST requests for resource creation are neither safe nor idempotent. An overloaded POST request might or might not be safe or idempotent. There’s no way for a client to tell, since overloaded POST can do anything at all. You can make POST idempotent with POST Once Exactly (see Chapter 9).

New Resources: PUT Versus POST

You can expose the creation of new resources through PUT, POST, or both. But a client can only use PUT to create resources when it can calculate the final URI of the new resource. In Amazon’s S3 service, the URI path to a bucket is /{bucket-name}. Since the client chooses the bucket name, a client can create a bucket by constructing the corresponding URI and sending a PUT request to it.

On the other hand, the URI to a resource in a typical Rails web service looks like /{database-table-name}/{database-ID}. The name of the database table is known in advance, but the ID of the new resource won’t be known until the corresponding record is saved to the database. To create a resource, the client must POST to a “factory” resource, located at /{database-table-name}. The server chooses a URI for the new resource.

Overloading POST

POST isn’t just for creating new resources and appending to representations. You can also use it to turn a resource into a tiny RPC-style message processor. A resource that receives an overloaded POST request can scan the incoming representation for additional method information, and carry out any task whatsoever. This gives the resource a wider vocabulary than one that supports only the uniform interface.

This is how most web applications work. XML-RPC and SOAP/WSDL web services also run over overloaded POST. I strongly discourage the use of overloaded POST, because it ruins the uniform interface. If you’re tempted to expose complex objects or processes through overloaded POST, try giving the objects or processes their own URIs, and exposing them as resources. I show several examples of this in Resource Design” later in this chapter.

There are two noncontroversial uses for overloaded POST. The first is to simulate HTTP’s uniform interface for clients like web browsers that don’t support PUT or DELETE. The second is to work around limits on the maximum length of a URI. The HTTP standard specifies no limit on how long a URI can get, but many clients and servers impose their own limits: Apache won’t respond to requests for URIs longer than 8 KB. If a client can’t make a GET request to http://www.example.com/numbers/1111111 because of URI length restrictions (imagine a million more ones there if you like), it can make a POST request to http://www.example.com/numbers?_method=GET and put “1111111” in the entity-body.

If you want to do without PUT and DELETE altogether, it’s entirely RESTful to expose safe operations on resources through GET, and all other operations through overloaded POST. Doing this violates my Resource-Oriented Architecture, but it conforms to the less restrictive rules of REST. REST says you should use a uniform interface, but it doesn’t say which one.

If the uniform interface really doesn’t work for you, or it’s not worth the effort to make it work, then go ahead and overload POST, but don’t lose the resource-oriented design. Every URI you expose should still be a resource: something a client might want to link to. A lot of web applications create new URIs for operations exposed through overloaded POST. You get URIs like /weblog/myweblog/rebuild-index. It doesn’t make sense to link to that URI. Instead of putting method information in the URI, expose overloaded POST on your existing resources (/weblog/myweblog) and ask for method information in the incoming representation (method=rebuild-index). This way, /weblog/myweblog still acts like a resource, albeit one that doesn’t totally conform to the uniform interface. It responds to GET, PUT, DELETE... and also “rebuild-index” through overloaded POST. It’s still an object in the object-oriented sense.

A rule of thumb: if you’re using overloaded POST, and you never expose GET and POST on the same URI, you’re probably not exposing resources at all. You’ve probably got an RPC-style service.

This Stuff Matters

The principles of REST and the ROA are not arbitrary restrictions. They’re simplifying assumptions that give advantages to resource-oriented services over the competition. RESTful resource-oriented services are simpler, easier to use, more interoperable, and easier to combine than RPC-style services. As I introduced the principles of the ROA in Chapter 4, I gave brief explanations of the ideas underlying the principles. In addition to recapping these ideas to help this chapter serve as a summary, I’d like to revisit them now in light of the real designs I’ve shown for resource-oriented services: the map service of Chapters 5 and 6, and the social bookmarking service of Chapter 7.

Why Addressability Matters

Addressability means that every interesting aspect of your service is immediately accessible from outside. Every interesting aspect of your service has a URI: a unique identifier in a format that’s familiar to every computer-literate person. This identifier can be bookmarked, passed around between applications, and used as a stand-in for the actual resource. Addressability makes it possible for others to make mashups of your service: to use it in ways you never imagined.

In Chapter 4 I compared URIs to cell addresses in a spreadsheet, and to file paths in a command-line shell. The web is powerful in the same way that spreadsheets and command-line shells are powerful. Every piece of information has a structured name that can be used as a reference to the real thing.

Why Statelessness Matters

Statelessness is the simplifying assumption to beat all simplifying assumptions. Each of a client’s requests contains all application states necessary to understand that request. None of this information is kept on the server, and none of it is implied by previous requests. Every request is handled in isolation and evaluated against the current resource state.

This makes it trivial to scale your application up. If one server can’t handle all the requests, just set up a load balancer and make a second server handle half the requests. Which half? It doesn’t matter, because every request is self-contained. You can assign requests to servers randomly, or with a simple round-robin algorithm. If two servers can’t handle all the requests, you add a third server, ad infinitum. If one server goes down, the others automatically take over for it. When your application is stateless, you don’t need to coordinate activity between servers, sharing memory or creating “server affinity” to make sure the same server handles every request in a “session.” You can throw web servers at the problem until the bottleneck becomes access to your resource state. Then you have to get into database replication, mirroring, or whatever strategy is most appropriate for the way you’ve chosen to store your resource state.

Stateless applications are also more reliable. If a client makes a request that times out, statelessness means the client can resend the request without worrying that its “session” has gone into a strange state that it can’t recover from. If it was a POST request, the client might have to worry about what the request did to the resource state, but that’s a different story. The client has complete control over the application state at all times.

There’s an old joke. Patient: “Doctor, it hurts when I try to scale a system that keeps client state on the server!” Doctor: “Then don’t do that.” That’s the idea behind statelessness: don’t do the thing that causes the trouble.

Why the Uniform Interface Matters

I covered this in detail near the end of Chapter 4, so I’ll just give a brief recap here. If you say to me, “I’ve exposed a resource at http://www.example.com/myresource,” that gives me no information about what that resource is, but it tells me a whole lot about how I can manipulate it. I know how to fetch a representation of it (GET), I know how to delete it (DELETE), I know roughly how to modify its state (PUT), and I know roughly how to spawn a subordinate resource from it (POST).

There are still details to work out: which of these activities the resource actually supports,[24]which representation formats the resource serves and expects, and what this resource represents in the real world. But every resource works basically the same way and can be accessed with a universal client. This is a big part of the success of the Web.

The restrictions imposed by the uniform interface (safety for GET and HEAD, idempotence for PUT and DELETE), make HTTP more reliable. If your request didn’t go through, you can keep resending it with no ill effects. The only exception is with POST requests. (See POST Once Exactly” in Chapter 9 for ways of making POST idempotent.)

The power of the uniform interface is not in the specific methods exposed. The human web has a different uniform interface—it uses GET for safe operations, and POST for everything else—and it does just fine. The power is the uniformity: everyone uses the same methods for everything. If you deviate from the ROA’s uniform interface (say, by adopting the human web’s uniform interface, or WebDAV’s uniform interface), you switch communities: you gain compatibility with certain web services at the expense of others.

Why Connectedness Matters

Imagine the aggravation if instead of hypertext links, web pages gave you English instructions on how to construct the URI to the next page. That’s how most of today’s RESTful web services work: the resources aren’t connected to each other. This makes web services more brittle than human-oriented web sites, and it means that emergent properties of the Web (like Google’s PageRank) don’t happen on the programmable web.

Look at Amazon S3. It’s a perfectly respectable resource-oriented service. It’s addressable, it’s stateless, and it respects the uniform interface. But it’s not connected at all. The representation of the S3 bucket list gives the name of each bucket, but it doesn’t link to the buckets. The representation of a bucket gives the name of each object in the bucket, but it doesn’t link to the objects. We humans know these objects are conceptually linked, but there are no actual links in the representations (see Figure 8-1).

We see links, but there are none
Figure 8-1. We see links, but there are none

An S3 client can’t get from one resource to another by following links. Instead it must internalize rules about how to construct the URI to a given bucket or object. These rules are given in the S3 technical documentation, not anywhere in the service itself. I demonstrated the rules in Resources” in Chapter 3. This wouldn’t work on the human web, but in a web service we don’t complain. Why is that?

In general, we expect less from web services than from the human web. We experience the programmable web through customized clients, not generic clients like web browsers. These customized clients can be programmed with rules for URI construction. Most information on the programmable web is also available on the human web, so a lack of connectedness doesn’t hide data from generic clients like search engines. Or else the information is hidden behind an authentication barrier and you don’t want a search engine seeing it anyway.

The S3 service gets away with a lack of connectedness because it only has three simple rules for URI construction. The URI to a bucket is just a slash and the URI-escaped name of the bucket. It’s not difficult to program these rules into a client. The only bug that’s at all likely is a failure to URI-escape the bucket or object name. Of course, there are additional rules for filtering and paginating the contents of buckets, which I skimmed over in Chapter 3. Those rules are more complex, and it would be better for S3 representations to provide hypermedia forms instead of making clients construct these URIs on their own.

More importantly, the S3 resources have simple and stable relationships to each other. The bucket list contains buckets, and a bucket contains objects. A link is just an indication of a relationship between two resources. A simple relationship is easy to program into a client, and “contains” is one of the simplest. If a client is preprogrammed with the relationships between resources, links that only serve to convey those relationships are redundant.

The social bookmarking service I implemented in Chapter 7 is a little better-connected than S3. It represents lists of bookmarks as Atom documents full of internal and external links. But it’s not totally connected: its representation of a user doesn’t link to that user’s bookmarks, posting history, or tag vocabulary (look back to Figure 7-1). And there’s no information about where to find a user in the service, or how post a bookmark. The client is just supposed to know how to turn a username into a URI, and just supposed to know how to represent a bookmark.

It’s easy to see how this is theoretically unsatisfying. A service ought to be self-describing, and not rely on some auxiliary English text that tells programmers how to write clients. It’s also easy to see that a client that relies on rules for URI construction is more brittle. If the server changes those rules, it breaks all the clients. It’s less easy to see the problems that stem from a lack of connectedness when the relationships between resources are complex or unstable. These problems can break clients even when the rules for URI construction never change.

Let’s go back to the mapping service from Chapter 5. My representations were full of hyperlinks and forms, most of which were not technically necessary. Take this bit of markup from the representation of a road map that was in Example 5-6:

<a class="zoom_in" href="/road.1/Earth/37.0,-95.8" />Zoom out</a>
<a class="zoom_out" href="/road.3/Earth/37.0,-95.8" />Zoom in</a>

Instead of providing these links everywhere, the service provider could put up an English document telling the authors of automated clients how to manipulate the zoom level in the first path variable. That would disconnect some related resources (the road map at different zoom levels), but it would save some bandwidth in every representation and it would have little effect on the actual code of any automated client. Personally, if I was writing a client for this service, I’d rather get from zoom level 8 to zoom level 4 by setting road.4 directly, than by following the “Zoom out” link over and over again. My client will break if the URI construction rule ever changes, but maybe I’m willing to take that risk.

Now consider this bit of markup from the representation of the planet Earth. It’s reprinted from Example 5-7:

 <dl class="place">
  <dt>name</dt> <dd>Earth</dd>
  <dt>maps</dt>
    <ul class="maps">
     <li><a class="map" href="/road/Earth">Road</a></li>
     <li><a class="map" href="/satellite/Earth">Satellite</a></li>
     ...
    </ul>

The URIs are technically redundant. The name of the place indicates that these are maps of Earth, and the link text indicates that there’s a satellite and a road map. Given those two pieces of information, a client can construct the corresponding map URI using a rule like the one for S3 objects: slash, map type, slash, planet name. Since the URIs can be replaced by a simple rule, the service might follow the S3 model and save some bandwidth by presenting the representation of Earth in an XML format like this:

<place name="Earth" type="planet">
 <map type="satellite" />
 <map type="road" />
 ...
</place>

If I was writing a client for this service, I would rather be given those links than have to construct them myself, but it’s up for debate.

Here’s another bit of markup from Example 5-6. These links are to help the client move from one tile on the map to another.

<a class="map_nav" href="46.0518,-95.8">North</a>
<a class="map_nav" href="41.3776,-89.7698">Northeast</a>
<a class="map_nav" href="36.4642,-84.5187">East</a>
<a class="map_nav" href="32.3513,-90.4459">Southeast</a>

It’s technically possible for a client to generate these URIs based on rules. After all, the server is generating them based on rules. But the rules involve knowing how latitude and longitude work, the scale of the map at the current zoom level, and the size and shape of the planet. Any client programmer would agree it’s easier to navigate a map by following the links than by calculating the coordinates of tiles. We’ve reached a point at which the relationships between resources are too complex to be expressed in simple rules. Connectedness becomes very important.

This is where Google Maps’s tile-based navigation system pays off (I described that system back in Representing Maps and Points on Maps” in Chapter 5, if you’re curious). Google Maps addresses its tiles by arbitrary X and Y coordinates instead of latitude and longitude. Finding the tile to the north is usually as easy as subtracting one from the value of Y. The relationships between tiles are much simpler. Nobody made me design my tile system in terms of latitude and longitude. If latitude/longitude calculations are why I have to send navigation links along with every map representation, maybe I should rethink my strategy and expose simpler URIs, so that my clients can generate them more easily.

But there’s another reason why connectedness is valuable: it makes it possible for the client to handle relationships that change over time. Links not only hide the rules about how to build a URI for a given resource, they embody the rules of how resources are related to each other. Here’s a terrifying example to illustrate the point.

A terrifying example

Suppose I get some new map data for my service. It’s more accurate than the old data, but the scale is a little different. At zoom level 8, the client sees a slightly smaller map than it did before. Let’s say at zoom level 8, a tile 256 pixels square now depicts an area three-quarters of a mile square, instead of seven-eigths of a mile square.

At first glance, this has no effect on anything. Latitude and longitude haven’t changed, so every point on the old map is in the same place on the new map. Google Maps-style tile URIs would break at this point, because they use X and Y instead of latitude and longitude. When the map data was updated, I’d have to recalculate all the tile images. Many points on the map would suddenly shift to different tiles, and get different X and Y coordinates. But all of my URIs still work. Every point on the map has the same URI it did before.

In this new data set, the URI /road.8/Earth/40.76,-73.98.png still shows part of the island of Manhattan, and the URI /road.8/Earth/40.7709,-73.98 still shows a point slightly to the north. But the rules have changed for finding the tile directly to the north of another tile. Those two tile graphics are centered on the same coordinates as before, but now each tile depicts a slightly smaller space. They used to be adjacent on the map, but now there’s a gap between them (see Figure 8-2).

When clients choose URIs for map tiles: before and after
Figure 8-2. When clients choose URIs for map tiles: before and after

If a client application finds nearby tiles by following the navigation links I provide, it will automatically adapt to the new map scale. But an application that “already knows” how to turn latitude and longitude into image URIs will suddenly start showing maps that look like MAD Magazine fold-ins.

I made a reasonable change to my service that didn’t change any URIs, but it broke clients that always construct their own URIs. What changed was not the resources but the relationships between them: not the rules for constructing URIs but the rules for driving the application from one state to another. Those rules are embedded in my navigation links, and a client duplicates those rules at its own peril.

And that’s why it’s important to connect your resources to each other. It’s fine to expect your clients to use your rules to construct an initial URI (say, a certain place on the map at a certain zoom level), but if they need to navigate from one URI to another, you should provide appropriate links. As the programmable web matures, connectedness will become more and more important.

Resource Design

You’ll need one resource for each “thing” exposed by your service. “Resource” is about as vague as “thing,” so any kind of data or algorithm you want to expose can be a resource. There are three kinds of resources:

  • Predefined one-off resources, such as your service’s home page or a static list of links to resources. A resource of this type corresponds to something you’ve only got a few of: maybe a class in an object-oriented system, or a database table in a database-oriented system.

  • A large (possibly infinite) number of resources corresponding to individual items of data. A resource of this type might correspond to an object in an object-oriented system, or a database row in a database-oriented system.

  • A large (probably infinite) number of resources corresponding to the possible outputs of an algorithm. A resource of this type might correspond to the results of a query in a database-oriented system. Lists of search results and filtered lists of resources fall into this category.

There are some difficult cases in resource design, places where it seems you must manipulate a resource in a way that doesn’t fit the uniform interface. The answer is almost always to expose the thing that’s causing the problem as a new set of resources. These new resources may be more abstract then the rest of your resources, but that’s fine: a resource can be anything.

Relationships Between Resources

Suppose Alice and Bob are resources in my service. That is, they’re people in the real world, but my service gives them URIs and offers representations of their state. One day Alice and Bob get married. How should this be represented in my service?

A client can PUT to Alice’s URI, modifying her state to reflect the fact that she’s married to Bob, and then PUT to Bob’s URI to say he’s married to Alice. That’s not very satisfying because it’s two steps. A client might PUT to Alice’s URI and forget to PUT to Bob’s. Now Alice is married to Bob but not vice versa.

Instead I should treat the marriage, this relationship between two resources, as a thing in itself: a third resource. A client can declare two people married by sending a PUT request to a “marriage” URI or a POST request to a “registrar” URI (it depends on how I choose to do the design). The representation includes links to Alice and Bob’s URIs: it’s an assertion that the two are married. The server applies any appropriate rules about who’s allowed to get married, and either sends an error message or creates a new resource representing the marriage. Other resources can now link to this resource, and it responds to the uniform interface. A client can GET it or DELETE it (though hopefully DELETEing it won’t be necessary).

Asynchronous Operations

HTTP has a synchronous request-response model. The client opens an Internet socket to the server, makes its request, and keeps the socket open until the server has sent the response. If the client doesn’t care about the response it can close the socket early, but to get a response it must leave the socket open until the server is ready.

The problem is not all operations can be completed in the time we expect an HTTP request to take. Some operations take hours or days. An HTTP request would surely be timed out after that kind of inactivity. Even if it didn’t, who wants to keep a socket open for days just waiting for a server to respond? Is there no way to expose such operations asynchronously through HTTP?

There is, but it requires that the operation be split into two or more synchronous requests. The first request spawns the operation, and subsequent requests let the client learn about the status of the operation. The secret is the status code 202 (“Accepted”).

I’ll demonstrate one strategy for implementing asynchronous requests with the 202 status code. Let’s say we have a web service that handles a queue of requests. The client makes its service request normally, possibly without any knowledge that the request will be handled asynchronously. It sends a request like this one:

POST /queue HTTP/1.1
Host: jobs.example.com
Authorization: Basic mO1Tcm4hbAr3gBUzv3kcceP=

Give me the prime factorization of this 100000-digit number:
...

The server accepts the request, creates a new job, and puts it at the end of the queue. It will take a long time for the new job to be completed, or there wouldn’t be a need for a queue in the first place. Instead of keeping the client waiting until the job finally runs, the server sends this response right away:

202 Accepted
Location: http://jobs.example.com/queue/job11a4f9

The server has created a new “job” resource and given it a URI that doesn’t conflict with any other job. The asynchronous operation is now in progress, and the client can make GET requests to that URI to see how it’s going— that is, to get the current state of the “job” resource. Once the operation is complete, any results will become available as a representation of this resource. Once the client is done reading the results it can DELETE the job resource. The client may even be able to cancel the operation by DELETEing its job prematurely.

Again, I’ve overcome a perceived limitation of the Resource-Oriented Architecture by exposing a new kind of resource corresponding to the thing that was causing the problem. In this case, the problem was how to handle asynchronous operations, and the solution was to expose each asynchronous operation as a new resource.

There’s one wrinkle. Because every request to start an asynchronous operation makes the server create a new resource (if only a transient one), such requests are neither safe nor idempotent. This means you can’t spawn asynchronous operations with GET, DELETE, or (usually) PUT. The only HTTP method you can use and still respect the uniform interface is POST. This means you’ll need to expose different resources for asynchronous operations than you would for synchronous operations. You’ll probably do something like the job queue I just demonstrated. You’ll expose a single resource—the job queue—to which the client POSTs to create a subordinate resource—the job. This will hold true whether the purpose of the asynchronous operation is to read some data, to make a calculation (as in the factoring example), or to modify the data set.

Batch Operations

Sometimes clients need to operate on more than one resource at once. You’ve already seen this: a list of search results is a kind of batch GET. Instead of fetching a set of resources one at a time, the client specifies some criteria and gets back a document containing abbreviated representations of many resources. I’ve also mentioned “factory” resources that respond to POST and create subordinate resources. The factory idea is easy to scale up. If your clients need to create resources in bulk, you can expose a factory resource whose incoming representation describes a set of resources instead of just one, and creates many resources in response to a single request.

What about modifying or deleting a set of resources at once? Existing resources are identified by URI, but addressability means an HTTP request can only point to a single URI, so how can you DELETE two resources at once? Remember that URIs can contain embedded URI paths, or even whole other URIs (if you escape them). One way to let a client modify multiple resources at once is to expose a resource for every set of resources. For instance, http://www.example.com/sets/resource1;subdir/resource2 might refer to a set of two resources: the one at http://www.example.com/resource1 and the one at http://www.example.com/subdir/resource2. Send a DELETE to that “set” resource and you delete both resources in the set. Send a PUT instead, with a representation of each resource in the set, and you can modify both resources with a single HTTP request.

You might be wondering what HTTP status code to send in response to a batch operation. After all, one of those PUTs might succeed while the other one fails. Should the status code be 200 (“OK”) or 500 (“Internal Server Error”)? One solution is to make a batch operation spawn a series of asynchronous jobs. Then you can send 202 (“Accepted”), and show the client how to check on the status of the individual jobs. Or, you can use an extended HTTP status code created by the WebDAV extension to HTTP: 207 (“Multi-Status”).

The 207 status code tells the client to look in the entity-body for a list of status codes like 200 (“OK”) and 500 (“Internal Server Error”). The entity-body is an XML document that tells the client which operations succeeded and which failed. This is not an ideal solution, since it moves information about what happened out of the status code and into the response entity-body. It’s similar to the way overloaded POST moves the method information out of the HTTP method and into the request entity-body. But since there might be a different status code for every operation in the batch, you’re really limited in your options here. Appendix B has more information about the 207 status code.

Transactions

In the Resource-Oriented Architecture, every incoming HTTP request has some resource as its destination. But some services expose operations that span multiple resources. The classic example is an operation that transfers money from a checking to a savings account. In a database-backed system you’d use a transaction to prevent the possibility of losing or duplicating money. Is there a resource-oriented way to implement transactions?

You can expose simple transactions as batch operations, or use overloaded POST, but here’s another way. It involves (you guessed it) exposing the transactions themselves as resources. I’ll show you a sample transaction using the account transfer example. Let’s say the “checking account” resource is exposed at /accounts/checking/11, and the “savings account” resource is exposed at /accounts/savings/55. Both accounts have a current balance of $200, and I want to transfer $50 from checking to savings.

I’ll quickly walk you through the requests and then explain them. First I create a transaction by sending a POST to a transaction factory resource:

POST /transactions/account-transfer HTTP/1.1
Host: example.com

The response gives me the URI of my newly created transaction resource:

201 Created
Location: /transactions/account-transfer/11a5

I PUT the first part of my transaction: the new, reduced balance of the checking account.

PUT /transactions/account-transfer/11a5/accounts/checking/11 HTTP/1.1
Host: example.com

balance=150

I PUT the second part of my transaction: the new, increased balance of the savings account.

PUT /transactions/account-transfer/11a5/accounts/savings/55 HTTP/1.1
Host: example.com

balance=250

At any point up to this I can DELETE the transaction resource to roll back the transaction. Instead, I’m going to commit the transaction:

PUT /transactions/account-transfer/11a5 HTTP/1.1
Host: example.com

committed=true

This is the server’s chance to make sure that the transaction doesn’t create any inconsistencies in resource state. For an “account transfer” transaction the server should check whether the transaction tries to create or destroy any money, or whether it tries to move money from one person to another without authorization. If everything checks out, here’s the response I might get from my final PUT:

200 OK
Content-Type: application/xhtml+xml

...
<a href="/accounts/checking/11">Checking #11</a>: New balance $150
<a href="/accounts/savings/55">Savings #55</a>: New balance $250
...

At this point I can DELETE the transaction and it won’t be rolled back. Or the server might delete it automatically. More likely, it will be archived permanently as part of an audit trail. It’s an addressable resource. Other resources, such as a list of transactions that affected checking account #11, can link to it.

The challenge in representing transactions RESTfully is that every HTTP request is supposed to be a self-contained operation that operates on one resource. If you PUT a new balance to /accounts/checking/11, then either the PUT succeeds or it doesn’t. But during a transaction, the state of a resource is in flux. Look at the checking account from inside the transaction, and the balance is $150. Look at it from outside, and the balance is still $200. It’s almost as though there are two different resources.

That’s how this solution presents it: as two different resources. There’s the actual checking account, at /accounts/checking/11, and there’s one transaction’s view of the checking account, at /transactions/account-transfer/11a5/accounts/checking/11. When I POSTed to create /transactions/account-transfer/11a5/, the service exposed additional resources beneath the transaction URI: probably one resource for each account on the system. I manipulated those resources as I would the corresponding account resources, but my changes to resource state didn’t go “live” until I committed the transaction.

How would this be implemented behind the scenes? Probably with something that takes incoming requests and builds a queue of actions associated with the transaction. When the transaction is committed the server might start a database transaction, apply the queued actions, and then try to commit the database transaction. A failure to commit would be propagated as a failure to commit the web transaction.

A RESTful transaction is more complex to implement than a database or programming language transaction. Every step in the transaction comes in as a separate HTTP request. Every step identifies a resource and fits the uniform interface. It might be easier to punt and use overloaded POST. But if you implement transactions RESTfully, your transactions have the benefits of resources: they’re addressable, operations on them are transparent, and they can be archived or linked to later. Yet again, the way to deal with an action that doesn’t fit the uniform interface is to expose the action itself as a resource.

When In Doubt, Make It a Resource

The techniques I’ve shown you are not the official RESTful or resource-oriented ways to handle transactions, asynchronous operations, and so on. They’re just the best ones I could think up. If they don’t work for you, you’re free to try another arrangement.

The larger point of this section is that when I say “anything can be a resource” I do mean anything. If there’s a concept that’s causing you design troubles, you can usually fit it into the ROA by exposing it as a new kind of resource. If you need to violate the uniform interface for performance reasons, you’ve always got overloaded POST. But just about anything can be made to respond to the uniform interface.

URI Design

URIs should be meaningful and well structured. Wherever possible, a client should be able to construct the URI for the resource they want to access. This increases the “surface area” of your application. It makes it possible for clients to get directly to any state of your application without having to traverse a bunch of intermediate resources. (But see Why Connectedness Matters” earlier in this chapter; links are the most reliable way to convey the relationships between resources.)

When designing URIs, use path variables to separate elements of a hierarchy, or a path through a directed graph. Example: /weblogs/myweblog/entries/100 goes from the general to the specific. From a list of weblogs, to a particular weblog, to the entries in that weblog, to a particular entry. Each path variable is in some sense “inside” the previous one.

Use punctuation characters to separate multiple pieces of data at the same level of a hierarchy. Use commas when the order of the items matters, as it does in latitude and longitude: /Earth/37.0,-95.2. Use semicolons when the order doesn’t matter: /color-blends/red;blue.

Use query variables only to suggest arguments being plugged into an algorithm, or when the other two techniques fail. If two URIs differ only in their query variables, it implies that they’re the different sets of inputs into the same underlying algorithm.

URIs are supposed to designate resources, not operations on the resources. This means it’s almost never appropriate to put the names of operations in your URIs. If you have a URI that looks like /object/do-operation, you’re in danger of slipping into the RPC style. Nobody wants to link to do-operation: they want to link to the object. Expose the operation through the uniform interface, or use overloaded POST if you have to, but make your URIs designate objects, not operations on the objects.

I can’t make this an ironclad rule, because a resource can be anything. Operations on objects can be first-class objects, similar to how methods in a dynamic programming language are first-class objects. /object/do-operation might be a full-fledged resource that responds to GET, PUT, and DELETE. But if you’re doing this, you’re well ahead of the current web services curve, and you’ve got weightier issues on your mind than whether you’re contravening some best practice I set down in a book.

Outgoing Representations

Most of the documents you serve will be representations of resources, but some of them will be error conditions. Use HTTP status codes to convey how the client should regard the document you serve. If there’s an error, you should set the status code to indicate an appropriate error condition, possibly 400 (“Bad Request”). Otherwise, the client might treat your error message as a representation of the resource it requested.

The status code says what the document is for. The Content-Type response header says what format the document is in. Without this header, your clients won’t know how to parse or handle the documents you serve.

Representations should be human-readable, but computer-oriented. The job of the human web is to present information for direct human consumption. The main job of the programmable web is to present the same information for manipulation by computer programs. If your service exposes a set of instrument readings, the focus should be on providing access to the raw data, not on making human-readable graphs. Clients can make their own graphs, or pipe the raw data into a graph-generation service. You can provide graphs as a convenience, but a graph should not be the main representation of a set of numbers.

Representations should be useful: that is, they should expose interesting data instead of irrelevant data that no one will use. A single representation should contain all relevant information necessary to fulfill a need. A client should not have to get several representations of the same resource to perform a single operation.

That said, it’s difficult to anticipate what part of your data set clients will use. When in doubt, expose all the state you have for a resource. This is what a Rails service does by default: it exposes representations that completely describe the corresponding database rows.

A resource’s representations should change along with its state.

Incoming Representations

I don’t have a lot to say about incoming representations, apart from talking about specific formats, which I’ll do in the next chapter. I will mention the two main kinds of incoming representations. Simple representations are usually key-value pairs: set this item of resource state to that value: username=leonardr. There are lots of representations for key-value pairs, form-encoding being the most popular.

If your resource state is too complex to represent with key-value pairs, your service should accept incoming representations in the same format it uses to serve outgoing representations. A client should be able to fetch a representation, modify it, and PUT it back where it found it. It doesn’t make sense to have your clients understand one complex data format for outgoing representations and another, equally complex format for incoming representations.

Service Versioning

Web sites can (and do) undergo drastic redesigns without causing major problems, because their audience is made of human beings. Humans can look at a web page and understand what it means, so they’re good at adapting to changes. Although URIs on the Web are not supposed to change, in practice they can (and do) change all the time. The consequences are serious—external links and bookmarks still point to the old URIs—but your everyday use of a web site isn’t affected. Even so, after a major redesign, some web sites keep the old version around for a while. The web site’s users need time to adapt to the new system.

Computer programs are terrible at adapting to changes. A human being (a programmer) must do the adapting for them. This is why connectedness is important, and why extensible representation formats (like Atom and XHTML) are so useful. When the client’s options are described by hypermedia, a programmer can focus on the high-level semantic meaning of a service, rather than the implementation details. The implementations of resources, the URIs to the resources, and even the hypermedia representations themselves can change, but as long as the semantic cues are still there, old clients will still work.

The mapping service from Chapter 5 was completely connected and served representations in an extensible format. The URI to a resource followed a certain pattern, but you didn’t need that fact to use the service: the representations were full of links, and the links were annotated with semantic content like “zoom_in” and “coordinates.” In Chapter 6 I added new resources and added new features to the representations, but a client written against the Chapter 5 version would still work. (Except for the protocol change: the Chapter 5 service was served through HTTP, and the Chapter 6 service through HTTPS.) All the semantic cues stayed the same, so the representations still “meant” the same thing.

By contrast, the bookmarking service from Chapter 7 isn’t well connected. You can’t get a representation of a user except by applying a URI construction rule I described in English prose. If I change that rule, any clients you wrote will break. In a situation like this, the service should allow for a transitional period where the old resources work alongside the new ones. The simplest way is to incorporate version information into the resources’ URIs. That’s what I did in Chapter 7: my URIs looked like /v1/users/leonardr instead of /users/leonardr.

Even a well-connected service might need to be versioned. Sometimes a rewrite of the service changes the meaning of the representations, and all the clients break, even ones that understood the earlier semantic cues. When in doubt, version your service.

You can use any of the methods developed over the years for numbering software releases. Your URI might designate the version as v1, or 1.4.0, or 2007-05-22. The simplest way to incorporate the version is to make it the first path variable: /v1/resource versus /v2/resource. If you want to get a little fancy, you can incorporate the version number into the hostname: v1.service.example.com versus v2.service.example.com.

Ideally, you would keep the old versions of your services around until no more clients use them, but this is only possible in private settings where you control all the clients. More realistically, you should keep old versions around until architectural changes make it impossible to expose the old resources, or until the maintenance cost of the old versions exceeds the cost of actively helping your user base migrate.

Permanent URIs Versus Readable URIs

I think there should be an intuitive correspondence between a URI and the resource it identifies. REST doesn’t forbid this, but it doesn’t require it either. REST says that resources should have names, not that the names should mean anything. The URI /contour/Mars doesn’t have to be the URI to the contour map of Mars: it could just as easily be the URI to the radar map of Venus, or the list of open bugs in a bug tracker. But making a correspondence between URI and resource is one of the most useful things you can do for your clients. Usability expert Jakob Nielsen recommends this in his essay “URL as UI”. If your URIs are intuitive enough, they form part of your service’s user interface. A client can get right to the resource they want by constructing an appropriate URI, or surf your resources by varying the URIs.

There’s a problem, though. A meaningful URI talks about the resource, which means it contains elements of resource state. What happens when the resource state changes? Nobody will ever successfully rename the planet Mars (believe me, I’ve tried), but towns change names occasionally, and businesses change names all the time. I ran into trouble in Chapter 6 because I used latitude and longitude to designate a “place” that turned out to be a moving ship. Usernames change. People get married and change their names. Almost any piece of resource state that might add meaning to a URI can change, breaking the URI.

This is why Rails applications expose URIs that incorporate database table IDs, URIs like /weblogs/4. I dissed those URIs in Chapter 7, but their advantage is that they’re based on a bit of resource state that never changes. It’s state that’s totally useless to the client, but it never changes, and that’s worth something too.

Jakob Nielsen makes the case for meaningful URIs, but Tim Berners-Lee makes the case for URI opacity: “meaningless” URIs that never change. Berners-Lee’s Axioms of Web Architecture describes URI opacity like this: When you are not dereferencing you should not look at the contents of the URI string to gain other information. That is: you can use a URI as the name of a resource, but you shouldn’t pick the URI apart to see what it says, and you shouldn’t assume that you can vary the resource by varying the URI. Even if a URI really looks meaningful, you can’t make any assumptions.

This is a good rule for a general web client, because there are no guarantees about URIs on the Web as a whole. Just because a URI ends in “.html” doesn’t mean there’s an HTML document on the other side. But today’s average RESTful web service is built around rules for URI construction. With URI Templates, a web service can make promises about whole classes of URIs that fit a certain pattern. The best argument for URI opacity on the programmable web is the fact that a non-opaque URI incorporates resource state that might change. To use another of Tim Berners-Lee’s coinages, opaque URIs are “cool.”[25]

So which is it? URI as UI, or URI opacity? For once in this book I’m going to give you the cop-out answer: it depends. It depends on which is worse for your clients: a URI that has no visible relationship to the resource it names, or a URI that breaks when its resource state changes. I almost always come down on the side of URI as UI, but that’s just my opinion.

To show you how subjective this is, I’d like to break the illusion of the authorial “I” for just a moment. The authors of this book both prefer informative URIs to opaque ones, but Leonard tries to choose URIs using the bits of resource state that are least likely to change. If he designed a weblog service, he’d put the date of a weblog entry in that entry’s URI, but he wouldn’t put the entry title in there. He thinks the title’s too easy to change. Sam would rather put the title in the URI, to help with search engine optimization and to give the reader a clue what content is behind the URI. Sam would handle retitled entries by setting up a permanent redirect at the old URI.

Standard Features of HTTP

HTTP has several features designed to solve specific engineering problems. Many of these features are not widely known, either because the problems they solve don’t come up very often on the human web, or because today’s web browsers implement them transparently. When working on the programmable web, you should know about these features, so you don’t reinvent them or prematurely give up on HTTP as an application protocol.

Authentication and Authorization

By now you probably know that HTTP authentication and authorization are handled with HTTP headers—“stickers” on the HTTP “envelope.” You might not know that these headers were designed to be extensible. HTTP defines two authentication schemes, but there’s a standard way of integrating other authentication schemes into HTTP, by customizing values for the headers Authorization and WWW-Authenticate. You can even define custom authentication schemes and integrate them into HTTP: I’ll show you how that’s done by adapting a small portion of the WS-Security standard to work with HTTP authentication. But first, I’ll cover the two predefined schemes.

Basic authentication

Basic authentication is a simple challenge/response. If you try to access a resource that’s protected by basic authentication, and you don’t provide the proper credentials, you receive a challenge and you have to make the request again. It’s used by the del.icio.us web service I showed you in Chapter 2, as well as my mapping service in Chapter 6 and my del.icio.us clone in Chapter 7.

Here’s an example. I make a request for a protected resource, not realizing it’s protected:

GET /resource.html HTTP/1.1
Host: www.example.com

I didn’t include the right credentials. In fact, I didn’t include any credentials at all. The server sends me the following response:

401 Unauthorized
WWW-Authenticate: Basic realm="My Private Data"

This is a challenge. The server dares me to repeat my request with the correct credentials. The WWW-Authenticate header gives two clues about what credentials I should send. It identifies what kind of authentication it’s using (in this case, Basic), and it names a realm. The realm can be any name you like, and it’s generally used to identify a collection of resources on a site. In Chapter 7 the realm was “Social bookmarking service” (I defined it in Example 7-11). A single web site might have many sets of protected resources guarded in different ways: the realm lets the client know which authentication credentials it should provide. The realm is the what, and the authentication type is the how.

To meet a Basic authentication challenge, the client needs a username and a password. This information might be filed in a cache under the name of the realm, or the client may have to prompt an end user for this information. Once the client has this information, username and password are combined into a single string and encoded with base 64 encoding. Most languages have a standard library for doing this kind of encoding: Example 8-1 uses Ruby to encode a username and password.

Example 8-1. Base 64 encoding in Ruby
#!/usr/bin/ruby
# calculate-base64.rb
USER="Alibaba"
PASSWORD="open sesame"

require 'base64'
puts Base64.encode64("#{USER}:#{PASSWORD}")
# QWxpYmFiYTpvcGVuIHNlc2FtZQ==

This seemingly random string of characters is the value of the Authorization header. Now I can send my request again, using the username and password as Basic auth credentials.

GET /resource.html HTTP/1.1
Host: www.example.com
Authorization: Basic QWxpYmFiYTpvcGVuIHNlc2FtZQ==

The server decodes this string and matches it against its user and password list. If they match, the response is processed further. If not, the request fails, and once again the status code is 401 (“Unauthorized”).

Of course, if the server can decode this string, so can anyone who snoops on your network traffic. Basic authentication effectively transmits usernames and passwords in plain text. One solution to this is to use HTTPS, also known as Transport Level Security or Secure Sockets Layer. HTTPS encrypts all communications between client and server, incidentally including the Authorization header. When I added authentication to my map service in Chapter 6, I switched from plain HTTP to encrypted HTTPS.

Digest authentication

HTTP Digest authentication is another way to hide the authorization credentials from network snoops. It’s more complex than Basic authentication, but it’s secure even over unencrypted HTTP. Digest follows the same basic pattern as Basic: the client issues a request, and gets a challenge. Here’s a sample challenge:

401 Unauthorized
WWW-Authenticate: Digest realm="My Private Data",
  qop="auth",
  nonce="0cc175b9c0f1b6a831c399e269772661",
  opaque="92eb5ffee6ae2fec3ad71c777531578f"

This time, the WWW-Authenticate header says that the authentication type is Digest. The header specifies a realm as before, but it also contains three other pieces of information, including a nonce: a random string that changes on every request.

The client’s responsibility is to turn this information into an encrypted string that proves the client knows the password, but that doesn’t actually contain the password. First the client generates a client-side nonce and a sequence number. Then the client makes a single “digest” string out of a huge amount of information: the HTTP method and path from the request, the four pieces of information from the challenge, the username and password, the client-side nonce, and the sequence number. The formula for doing this is considerably more complicated than for Basic authentication (see Example 8-2).

Example 8-2. HTTP digest calculation in Ruby
#!/usr/bin/ruby
# calculate-http-digest.rb
require 'md5'

#Information from the original request
METHOD="GET"
PATH="/resource.html"

# Information from the challenge
REALM="My Private Data"
NONCE="0cc175b9c0f1b6a831c399e269772661",
OPAQUE="92eb5ffee6ae2fec3ad71c777531578f"
QOP="auth"

# Information calculated by or known to the client
NC="00000001"
CNONCE="4a8a08f09d37b73795649038408b5f33"
USER="Alibaba"
PASSWORD="open sesame"

# Calculate the final digest in three steps.
ha1 = MD5::hexdigest("#{USER}:#{REALM}:#{PASSWORD}")
ha2 = MD5::hexdigest("#{METHOD}:#{PATH}")
ha3 = MD5::hexdigest("#{ha1}:#{NONCE}:#{NC}:#{CNONCE}:#{QOP}:#{ha2}")

puts ha3
# 2370039ff8a9fb83b4293210b5fb53e3

The digest string is similar to the S3 request signature in Chapter 3. It proves certain things about the client. You could never produce this string unless you knew the client’s username and password, knew what request the client was trying to make, and knew which challenge the server had sent in response to the first request.

Once the digest is calculated, the client resends the request and passes back all the constants (except, of course, the password), as well as the final result of the calculation:

GET /resource.html HTTP/1.1
Host: www.example.com
Authorization: Digest username="Alibaba",
  realm="My Private Data",
  nonce="0cc175b9c0f1b6a831c399e269772661",
  uri="/resource.html",
  qop=auth,
  nc=00000001,
  cnonce="4a8a08f09d37b73795649038408b5f33",
  response="2370039ff8a9fb83b4293210b5fb53e3",
  opaque="92eb5ffee6ae2fec3ad71c777531578f"

The cryptography is considerably more complicated, but the process is the same as for HTTP Basic auth: request, challenge, response. One key difference is that even the server can’t figure out your password from the digest. When a client initially sets a password for a realm, the server needs to calculate the hash of user:realm:password (ha1 in the example above), and keep it on file. That gives the server the information it needs to calculate the final value of ha3, without storing the user’s actual password.

A second difference is that every request the client makes is actually two requests. The point of the first request is to get a challenge: it includes no authentication information, and it always fails with a status code of 401 (“Unauthorized”). But the WWW-Authenticate header includes a unique nonce, which the client can use to construct an appropriate Authorization header. It makes a second request, using this header, and this one is the one that succeeds. In Basic auth, the client can avoid the challenge by sending its authorization credentials along with the first request. That’s not possible in Digest.

Digest authentication has some options I haven’t shown here. Specifying qop=auth-int instead of qop=auth means that the calculation of ha2 above must include the request’s entity-body, not just the HTTP method and the URI path. This prevents a man-in-the-middle from tampering with the representations that accompany PUT and POST requests.

My goal here isn’t to dwell on the complex mathematics— that’s what libraries are for. I want to demonstrate the central role the WWW-Authenticate and Authorization headers play in this exchange. The WWW-Authenticate header says, “Here’s everything you need to know to authenticate, assuming you know the secret.” The Authorization header says, “I know the secret, and here’s the proof.” Everything else is parameter parsing and a few lines of code.

WSSE username token

What if neither HTTP Basic or HTTP Digest work for you? You can define your own standards for what goes into WWW-Authenticate and Authorization. Here’s one real-life example. It turns out that, for a variety of technical reasons, users with low-cost hosting accounts can’t take advantage of either HTTP Basic or HTTP Digest.[26] At one time, this was important to a segment of the Atom community. Coming up with an entirely new cryptographically secure option was beyond the ability of the Atom working group. Instead, they looked to the WS-Security specification, which defines several different ways of authenticating SOAP messages with SOAP headers. (SOAP headers are the “stickers” on the SOAP envelope I mentioned back in Chapter 1.) They took a single idea—WS-Security UsernameToken—from this standard and ported it from SOAP headers to HTTP headers. They defined an extension to HTTP that used WWW-Authenticate and Authorization in a way that people with low-cost hosting accounts could use. We call the resulting extension WSSE UsernameToken, or WSSE for short. (WSSE just means WS-Security Extension. Other extensions would have a claim to the same name, but there aren’t any others right now.)

WSSE is like Digest in that the client runs their password through a hash algorithm before sending it across the network. The basic pattern is the same: the client makes a request, gets a challenge, and formulates a response. A WSSE challenge might look like this:

HTTP/1.1 401 Unauthorized
WWW-Authenticate: WSSE realm="My Private Data", profile="UsernameToken"

Instead of Basic or Digest, the authentication type is WSSE. The realm serves the same purpose as before, and the “profile” tells the client that the server expects it to generate a response using the UsernameToken rules (as opposed to some other rule from WS-Security that hasn’t yet been ported to HTTP headers). The UsernameToken rules mean that the client generates a nonce, then hashes their password along with the nonce and the current date (see Example 8-3).

Example 8-3. Calculating a WSSE digest
#!/usr/bin/ruby
# calculate-wsse-digest.rb
require 'base64'
require 'sha1'

PASSWORD = "open sesame"
NONCE = "EFD89F06CCB28C89",
CREATED = "2007-04-13T09:00:00Z"

puts Base64.encode64(SHA1.digest("#{NONCE}#{CREATED}#{PASSWORD}"))
# Z2Y59TewHV6r9BWjtHLkKfUjm2k=

Now the client can send a response to the WSSE challenge:

GET /resource.html HTTP/1.1
Host: www.example.com
Authorization: WSSE profile="UsernameToken"
X-WSSE: UsernameToken Username="Alibaba",
  PasswordDigest="Z2Y59TewHV6r9BWjtHLkKfUjm2k=",
  Nonce="EFD89F06CCB28C89",
  Created="2007-04-13T09:00:00Z"

Same headers. Different authentication method. Same message flow. Different hash algorithm. That’s all it takes to extend HTTP authentication. If you’re curious, here’s what those authentication credentials would look like as a SOAP header under the original WS-Security UsernameToken standard.

<wsse:UsernameToken
   xmlns:wsse="http://schemas.xmlsoap.org/ws/2002/xx/secext"
   xmlns:wsu="http://schemas.xmlsoap.org/ws/2002/xx/utility">
   <wsse:Username>Alibaba</wsse:Username>
   <wsse:Password Type="wsse:PasswordDigest">
      Z2Y59TewHV6r9BWjtHLkKfUjm2k=
   </wsse:Password>
   <wsse:Nonce>EFD89F06CCB28C89</wsse:Nonce>
   <wsu:Created>2007-04-13T09:00:00Z</wsu:Created>
</wsse:UsernameToken>

WSSE UsernameToken authentication has two big advantages. It doesn’t send the password in the clear over the network, the way HTTP Basic does, and it doesn’t require any special setup on the server side, the way HTTP Digest usually does. It’s got one big disadvantage. Under HTTP Basic and Digest, the server can keep a one-way hash of the password instead of the password itself. If the server gets cracked, the passwords are still (somewhat) safe. With WSSE UsernameToken, the server must store the password in plain text, or it can’t verify the responses to its challenges. If someone cracks the server, they’ve got all the passwords. The extra complexity of HTTP Digest is meant to stop this from happening. Security always involves tradeoffs like these.

Compression

Textual representations like XML documents can be compressed to a fraction of their original size. An HTTP client library can request a compressed version of a representation and then transparently decompress it for its user. Here’s how it works: along with an HTTP request the client sends an Accept-Encoding header that says what kind of compression algorithms the client understands. The two standard values for Accept-Encoding are compress and gzip.

GET /resource.html HTTP/1.1
Host: www.example.com
Accept-Encoding: gzip,compresss

If the server understands one of the compression algorithms from Accept-Encoding, it can use that algorithm to compress the representation before serving it. The server sends the same Content-Type it would send if the representation wasn’t compressed. But it also sends the Content-Encoding header, so the client knows the document has been compressed:

200 OK
Content-Type: text/html
Content-Encoding: gzip

[Binary representation goes here]

The client decompresses the data using the algorithm given in Content-Encoding, and then treats it as the media type given as Content-Type. In this case the client would use the gzip algorithm to decompress the binary data back into an HTML document. This technique can save a lot of bandwidth, with very little cost in additional complexity.

You probably remember that I think different representations of a resource should have distinct URIs. Why do I recommend using HTTP headers to distinguish between compressed and uncompressed versions of a representation? Because I don’t think the compressed and uncompressed versions are different representations. Compression, like encryption, is something that happens to a representation in transit, and must be undone before the client can use the representation. In an ideal world, HTTP clients and servers would compress and decompress representations automatically, and programmers should not have to even think about it. Today, most web browsers automatically request compressed representations, but few programmable clients do.

Conditional GET

Conditional HTTP GET allows a server and client to work together to save bandwidth. I covered it briefly in Chapter 5, in the context of the mapping service. There, the problem was sending the same map tiles over and over again to clients who had already received them. This is a more general treatment of the same question: how can a service keep from sending representations to clients that already have them?

Neither client nor server can solve this problem alone. If the client retrieves a representation and never talks to the server again, it will never know when the representation has changed. The server keeps no application state, so it doesn’t know when a client last retrieved a certain representation. HTTP isn’t a reliable protocol anyway, and the client might not have received the representation the first time. So when the client requests a representation, the server has no idea whether the client has done this before—unless the client provides that information as part of the application state.

Conditional HTTP GET requires client and server to work together. When the server sends a representation, it sets some HTTP response headers: Last-Modified and/or ETag. When the client requests the same representation, it should send the values for those headers as If-Modified-Since and/or If-None-Match. This lets the server make a decision about whether or not to resend the representation. Example 8-4 gives a demonstration of conditional HTTP GET.

Example 8-4. Make a regular GET request, then a conditional GET request
#!/usr/bin/ruby
# fetch-oreilly-conditional.rb

require 'rubygems'
require 'rest-open-uri'
uri = 'http://www.oreilly.com'

# Make an HTTP request and then describe the response.
def request(uri, *args)
  begin
    response = open(uri, *args)
  rescue OpenURI::HTTPError => e
    response = e.io
  end
    
  puts " Status code: #{response.status.inspect}"
  puts " Representation size: #{response.size}"
  last_modified = response.meta['last-modified']
  etag = response.meta['etag']
  puts " Last-Modified: #{last_modified}"
  puts " Etag: #{etag}"
  return last_modified, etag
end    

puts "First request:"
last_modified, etag = request(uri)

puts "Second request:"
request(uri, 'If-Modified-Since' => last_modified, 'If-None-Match' => etag)

If you run that code once, it’ll fetch http://www.oreilly.com twice: once normally and once conditionally. It prints information about each request. The printed output for the first request will look something like this:

First request:
 Status code: ["200", "OK"]
 Representation size: 41123
 Last-Modified: Sun, 21 Jan 2007 09:35:19 GMT
 Etag: "7359b7-a37c-45b333d7"

The Last-Modified and Etag headers are the ones that make HTTP conditional GET possible. To use them, I make the HTTP request again, but this time I use the value of Last-Modified as If-Modified-Since, and the value of ETag as If-None-Match. Here’s the result:

Second request:
 Status code: ["304", "Not Modified"]
 Representation size: 0
 Last-Modified:
 Etag: "7359b7-a0a3-45b5d90e"

Instead of a 40-KB representation, the second request gets a 0-byte representation. Instead of 200 (“OK”), the status code is 304 (“Not Modified”). The second request saved 40 KB of bandwidth because it made the HTTP request conditional on the representation of http://www.oreilly.com/ actually having changed since last time. The representation didn’t change, so it wasn’t resent.

Last-Modified is a pretty easy header to understand: it’s the last time the representation of this resource changed. You may be able to view this information in your web browser by going to “view page info” or something similar. Sometimes humans check a web page’s Last-Modified time to see how recent the data is, but its main use is in conditional HTTP requests.

If-Modified-Since makes an HTTP request conditional. If the condition is met, the server carries out the request as it would normally. Otherwise, the condition fails and the server does something unusual. For If-Modified-Since, the condition is: “the representation I’m requesting must have changed after this date.” The condition succeeds when the server has a newer representation than the client does. If the client and server have the same representation, the condition fails and the server does something unusual: it omits the representation and sends a status code of 304 (“Not Modified”). That’s the server’s way of telling the client: “reuse the representation you saved from last time.”

Both client and server benefit here. The server doesn’t have to send a representation of the resource, and the client doesn’t have to wait for it. Both sides save bandwidth. This is one of the tricks underlying your web browser’s cache, and there’s no reason not to use it in custom web clients.

How does the server calculate when a representation was last modified? A web server like Apache has it easy: it mostly serves static files from disk, and filesystems already track the modification date for every file. Apache just gets that information from the filesystem. In more complicated scenarios, you’ll need to break the representation down into its component parts and see when each bit of resource state was last modified. In Chapter 7, the Last-Modified value for a list of bookmarks was the most recent timestamp in the list. If you’re not tracking this information, the bandwidth savings you get by supporting Last-Modified might make it worth your while to start tracking it.

Even when a server provides Last-Modified, it’s not totally reliable. Let’s say a client GETs a representation at 12:30:00.3 and sees a Last-Modified with the time “12:30:00.” A tenth of a second later, the representation changes, but the Last-Modified time is still “12:30:00.” If the client tries a conditional GET request using If-Modified-Since, the server will send a 304 (“Not Modified”) response, even though the resource was modified after the original GET. One second is not a high enough resolution to keep track of when a resource changes. In fact, no resolution is high enough to keep track of when a resource changes with total accuracy.

This is not quite satisfactory. The world cries out for a completely reliable way of checking whether or not a representation has been modified since last you retrieved it. Enter the Etag response header. The Etag (it stands for “entity tag”) is a nonsensical string that must change whenever the corresponding representation changes.

The If-None-Match request header is to Etag as the If-Modified-Since request header is to Last-Modified. It’s a way of making an HTTP request conditional. In this case, the condition is “the representation has changed, as embodied in the entity tag.” It’s supposed to be a totally reliable way of identifying changes between representations.

It’s easy to generate a good ETag for any representation. Transformations like the MD5 hash can turn any string of bytes into a short string that’s unique except in pathological cases. The problem is, by the time you can run one of those transformations, you’ve already created the representation as a string of bytes. You may save bandwidth by not sending the representation over the wire, but you’ve already done everything necessary to build it.

The Apache server uses filesystem information like file size and modification time to generate Etag headers for static files without reading their contents. You might be able to do the same thing for your representations: pick the data that tends to change, or summary data that changes along with the representation. Instead of doing an MD5 sum of the entire representation, just do a sum of the important data. The Etag header doesn’t need to incorporate every bit of data in the representation: it just has to change whenever the representation changes.

If a server provides both Last-Modified and Etag, the client can provide both If-Modified-Since and If-None-Match in subsequent requests (as I did in Example 8-4). The server should make both checks: it should only send a new representation if the representation has changed and the Etag is different.

Caching

Conditional HTTP GET gives the client a way to refresh a representation by making a GET request that uses very little bandwidth if the representation has not changed. Caching gives the client some rough guidelines that can make it unnecessary to make that second GET request at all.

HTTP caching is a complex topic, even though I’m limiting my discussion to client-side caches and ignoring proxy caches that sit between the client and the server.[27]The basics are these: when a client makes an HTTP GET or HEAD request, it might be able to cache the HTTP response document, headers and all. The next time the client is asked to make the same GET or HEAD request, it may be able to return the cached document instead of actually making the request again. From the perspective of the user (a human using a web browser, or a computer program using an HTTP library), caching is transparent. The user triggers a request, but instead of making an actual HTTP request, the client retrieves a cached response from the server and presents it as though it were freshly retrieved. I’m going to focus on three topics from the point of view of the service provider: how you can tell the client to cache, how you can tell the client not to cache, and when the client might be caching without you knowing it.

Please cache

When the server responds to a GET or HEAD request, it may send a date in the response header Expires. For instance:

Expires: Tue, 30 Jan 2007 17:02:06 GMT

This header tells the client (and any proxies between the server and client) how long the response may be cached. The date may range from a date in the past (meaning the response has expired by the time it gets to the client) to a date a year in the future (which means, roughly, “the response will never expire”). After the time specified in Expires, the response becomes stale. This doesn’t mean that it must be removed from the cache immediately. The client might be able to make a conditional GET request, find out that the response is actually still fresh, and update the cache with a new expiration date.

The value of Expires is a rough guide, not an exact date. Most services can’t predict to the second when a response is going to change. If Expires is an hour in the future, that means the server is pretty sure the response won’t change for at least an hour. But something could legitimately happen to the resource the second after that response is sent, invalidating the cached response immediately. When in doubt, the client can make another HTTP request, hopefully a conditional one.

The server should not send an Expires that gives a date more than a year in the future. Even if the server is totally confident that a particular response will never change, a year is a long time. Software upgrades and other events in the real world tend to invalidate cached responses sooner than you’d expect.

If you don’t want to calculate a date at which a response should become stale, you can use Cache-Control to say that a response should be cached for a certain number of seconds. This response can be cached for an hour:

Cache-Control: max-age=3600

Thank you for not caching

That covers the case when the server would like the client to cache. What about the opposite? Some responses to GET requests are dynamically generated and different every time: caching them would be useless. Some contain sensitive information that shouldn’t be stored where someone else might see it: caching them would cause security problems. Use the Cache-Control header to convey that the client should not cache the representation at all:

Cache-Control: no-cache

Where Expires is a fairly simple response header, Cache-Control header is very complex. It’s the primary interface for controlling client-side caches, and proxy caches between the client and server. It can be sent as a request or as a response header, but I’m just going to talk about its use as a response header, since my focus is on how the server can work with a client-side cache.

I already showed how specifying “max-age” in Cache-Control controls how long a response can stay fresh in a cache. A value of “no-cache” prevents the client from caching a response at all. A third value you might find useful is “private,” which means that the response may be cached by a client cache, but not by any proxy cache between the client and server.

Default caching rules

In the absence of Expires or Cache-Control, section 13 of the HTTP standard defines a complex set of rules about when a client can cache a response. Unless you’re going to set caching headers on every response, you’ll need to know when a client is likely to cache what you send, so that you can override the defaults when appropriate. I’ll summarize the basic common-sense rules here.

In general, the client may cache the responses to its successful HTTP GET and HEAD requests. “Success” is defined in terms of the HTTP status code: the most common success codes are 200 (“OK”), 301 (“Moved Permanently”), and 410 (“Gone”).

Many (poorly-designed) web applications expose URIs that trigger side effects when you GET them. These dangerous URIs usually contain query strings. The HTTP standard recommends that if a URI contains a query string, the response from that URI should not be automatically cached: it should only be cached if the server explicitly says caching is OK. If the client GETs this kind of URI twice, it should trigger the side effects twice, not trigger them once and then get a cached copy of the response from last time.

If a client then finds itself making a PUT, POST, or DELETE request to a URI, any cached responses from that URI immediately become stale. The same is true of any URI mentioned in the Location or Content-Location of a response to a PUT, POST, or DELETE request. There’s a wrinkle here, though: site A can’t affect how the client caches responses from site B. If you POST to http://www.example.com/resource, then any cached response from http://www.example.com/resource is automatically stale. If the response comes back with a Location of http://www.example.com/resource2, then any cached response from http://www.example.com/resource2 is also stale. But if the Location is http://www.oreilly.com/resource2, it’s not OK to consider a cached response from http://www.oreilly.com/resource2 to be stale. The site at www.example.com doesn’t tell www.oreilly.com what to do.

If none of these rules apply, and if the server doesn’t specify how long to cache a response, the decision falls to the client side. Responses may be removed at any time or kept forever. More realistically, a client-side cache should consider a response to be stale after some time between an hour and a day. Remember that a stale response doesn’t have to be removed from the cache: the client might make a conditional GET request to check whether the cached response can still be used. If the condition succeeds, the cached response is still fresh and it can stay in the cache.

Look-Before-You-Leap Requests

Conditional GET is designed to save the server from sending enormous representations to a client that already has them. Another feature of HTTP, less often used, can save the client from fruitlessly sending enormous (or sensitive) representations to the server. There’s no official name for this kind of request, so I’ve came up with a silly name: look-before-you-leap requests.

To make a LBYL request, a client sends a PUT or POST request normally, but omits the entity-body. Instead, the client sets the Expect request header to the string “100-continue”. Example 8-5 shows a sample LBYL request.

Example 8-5. A sample look-before-you-leap request
PUT /filestore/myfile.txt HTTP/1.1
Host: example.com
Content-length: 524288000
Expect: 100-continue

This is not a real PUT request: it’s a question about a possible future PUT request. The client is asking the server: “would you allow me to PUT a new representation to the resource at /filestore/myfile.txt?” The server makes its decision based on the current state of that resource, and the HTTP headers provided by the client. In this case the server would examine Content-length and decide whether it’s willing to accept a 500 MB file.

If the answer is yes, the server sends a status code of 100 (“Continue”). Then the client is expected to resend the PUT request, omitting the Expect and including the 500-MB representation in the entity-body. The server has agreed to accept that representation.

If the answer is no, the server sends a status code of 417 (“Expectation Failed”). The answer might be no because the resource at /filestore/myfile.txt is write-protected, because the client didn’t provide the proper authentication credentials, or because 500 MB is just too big. Whatever the reason, the initial look-before-you-leap request has saved the client from sending 500 MB of data only to have that data rejected. Both client and server are better off.

Of course, a client with a bad representation can lie about it in the headers just to get a status code of 100, but it won’t do any good. The server won’t accept a bad representation on the second request, any more than it would have on the first request.

Partial GET

Partial HTTP GET allows a client to fetch only a subset of a representation. It’s usually used to resume interrupted downloads. Most web servers support partial GET for static content; so does Amazon’s S3 service.

Example 8-6 is a bit of code that makes two partial HTTP GET requests to the same URI. The first request gets bytes 10 through 20, and the second request gets everything from byte 40,000 to the end.

Example 8-6. Make two partial HTTP GET requests
#!/usr/bin/ruby
# fetch-oreilly-partial.rb

require 'rubygems'
require 'rest-open-uri'
uri = 'http://www.oreilly.com/'

# Make a partial HTTP request and describe the response.
def partial_request(uri, range)
  begin
    response = open(uri, 'Range' => range)
  rescue OpenURI::HTTPError => e
    response = e.io
  end

  puts " Status code: #{response.status.inspect}"
  puts " Representation size: #{response.size}"
  puts " Content Range: #{response.meta['content-range']}"
  puts " Etag: #{response.meta['etag']}"
end    

puts "First request:"
partial_request(uri, "bytes=10-20")

puts "Second request:"
partial_request(uri, "bytes=40000-")

When I run that code I see this for the first request:

First request:
 Status code: ["206", "Partial Content"]
 Representation size: 11
 Content Range: bytes 10-20/41123
 Etag: "7359b7-a0a3-45b5d90e"

Instead of 40 KB, the server has only sent me the 11 bytes I requested. Similarly for the second request:

Second request:
 Status code: ["206", "Partial Content"]
 Representation size: 1123
 Content Range: bytes 40000-41122/41123
 Etag: "7359b7-a0a3-45b5d90e"

Note that the Etag is the same in both cases. In fact, it’s the same as it was back when I ran the conditional GET code back in Example 8-4. The value of Etag is always a value calculated for the whole document. That way I can combine conditional GET and partial GET.

Partial GET might seem like a way to let the client access subresources of a given resource. It’s not. For one thing, a client can only address part of a representation by giving a byte range. That’s not very useful unless your representation is a binary data structure. More importantly, if you’ve got subresources that someone might want to talk about separately from the containing resource, guess what: you’ve got more resources. A resource is anything that might be the target of a hypertext link. Give those subresources their own URIs.

Faking PUT and DELETE

Not all clients support HTTP PUT and DELETE. The action of an XHTML 4 form can only be GET or POST, and this has made a lot of people think that PUT and DELETE aren’t real HTTP methods. Some firewalls block HTTP PUT and DELETE but not POST. If the server supports it, a client can get around these limitations by tunneling PUT and DELETE requests through overloaded POST. There’s no reason these techniques can’t work with other HTTP actions like HEAD, but PUT and DELETE are the most common.

I recommend a tunneling technique pioneered by today’s most RESTful web frameworks: include the “real” HTTP method in the query string. Ruby on Rails defines a hidden form field called _method which references the “real” HTTP method. If a client wants to delete the resource at /my/resource but can’t make an HTTP DELETE request, it can make a POST request to /my/resource?_method=delete, or include _method=delete in the entity-body. Restlet uses the method variable for the same purpose.

The second way is to include the “real” HTTP action in the X-HTTP-Method-Override HTTP request header. Google’s GData API recognizes this header. I recommend appending to the query string instead. A client that doesn’t support PUT and DELETE is also likely to not support custom HTTP request headers.

The Trouble with Cookies

A web service that sends HTTP cookies violates the principle of statelessness. In fact, it usually violates statelessness twice. It moves application state onto the server even though it belongs on the client, and it stops clients from being in charge of their own application state.

The first problem is simple to explain. Lots of web frameworks use cookies to implement sessions. They set cookies that look like the Rails cookie I showed you back in Chapter 4:

Set-Cookie: _session_id=c1c934bbe6168dcb904d21a7f5644a2d; path=/

That long hexadecimal number is stored as client state, but it’s not application state. It’s a meaningless key into a session hash: a bunch of application state stored on the server. The client has no access to this application state, and doesn’t even know what’s being stored. The client can only send its cookie with every request and let the server look up whatever application state the server thinks is appropriate. This is a pain for the client, and it’s no picnic for the server either. The server has to keep this application state all the time, not just while the client is making a request.

OK, so cookies shouldn’t contain session IDs: that’s just an excuse to keep application state on the server. What about cookies that really do contain application state? What if you serialize the actual session hash and send it as a cookie, instead of just sending a reference to a hash on the server?

This can be RESTful, but it’s usually not. The cookie standard says that the client can get rid of a cookie when it expires, or when the client terminates. This is a pretty big restriction on the client’s control over application state. If you make 10 web requests and suddenly the server sends you a cookie, you have to start sending this cookie with your future requests. You can’t make those 10 precookie requests unless you quit and start over. To use a web browser analogy, your “Back” button is broken. You can’t put the application in any of the states it was in before you got the cookie.

Realistically, no client follows the cookie standard that slavishly. Your web browser lets you choose which cookies to accept, and lets you destroy cookies without restarting your browser. But clients aren’t generally allowed to modify the server’s cookies, or even understand what they mean. If the client sends application state without knowing what it means, it doesn’t really know what request it’s making. The client is just a custodian for whatever state the server thinks it should send. Cookies are almost always a way for the server to force the client to do what it wants, without explaining why. It’s more RESTful for the server to guide the client to new application states using hypermedia links and forms.

The only RESTful use of cookies is one where the client is in charge of the cookie value. The server can suggest values for a cookie using the Set-Cookie header, just like it can suggest links the client might want to follow, but the client chooses what cookie to send just as it chooses what links to follow. In some browser-based applications, cookies are created by the client and never sent to the server. The cookie is just a convenient container for application state, which makes its way to the server in representations and URIs. That’s a very RESTful use of cookies.

Why Should a User Trust the HTTP Client?

HTTP authentication covers client-server authentication: the process by which the web service client proves to the server that it has some user’s credentials. What HTTP doesn’t cover is why the user should trust the web service client with its credentials. This isn’t usually a problem on the human web, because we implicitly trust our web browsers (even when we shouldn’t, like when there’s spyware present on the system). If I’m using a web application on example.com, I’m comfortable supplying my example.com username and password.

But what if, behind the scenes, the web application on example.com is a client for eBay’s web services? What if it asks me for my eBay authentication information so it can make hidden web service requests to ebay.com? Technically speaking, there’s no difference between this application and a phishing site that pretends to be ebay.com, trying to trick me into giving it my eBay username and password.

The standalone client programs presented in this book authenticate by encoding the end user’s username and password in the Authorization header. That’s how many web services work. It works fine on the human web, because the HTTP clients are our own trusted web browsers. But when the HTTP client is an untrusted program, possibly running on a foreign computer, handing it your username and password is naive at best. There’s another way. Some web services attack phishing by preventing their clients from handling usernames and passwords at all.

In this scenario, the end user uses her web browser (again, trusted implicitly) to get an authorization token. She gives this token to the web service client instead of giving her username and password, and the web service client sends this token in the Authorization header. The end user is basically delegating the ability to make web service calls as herself. If the web service client abuses that ability, its authorization token can be revoked without making the user change her password.

Google, eBay, Yahoo!, and Flickr all have user-client authorization systems of this type. Amazon’s request signing, which I showed you in Chapter 3, fulfills the same function. There’s no official standard, but all four systems are similar in concept, so I’ll discuss them in general terms. When I need to show you specific URIs, I’ll use Google’s and Flickr’s user-client authorization systems as examples.

Applications with a Web Interface

Let’s start with the simplest case: a web application that needs to access a web service such as Google Calendar. It’s the simplest case because the web application has the same user interface as the application that gives out authorization tokens: a web browser. When a web application needs to make a Google web service call, it serves an HTTP redirect that sends the end user to a URI at google.com. The URI might look something like this:

https://www.google.com/accounts/AuthSubRequest
 ?scope=http%3A%2F%2Fwww.google.com%2Fcalendar%2Ffeeds%2F
 &next=http%3A%2F%2Fcalendar.example.com%2Fmy

That URI has two other URIs embedded in it as query variables. The scope variable, with a value of http://www.google.com/calendar/feeds/, is the base URI of the web service we’re trying to get an authorization token for. The next variable, value http://calendar.example.com/my, will be used when Google hands control of the end user’s web browser back to the web application.

When the end user’s browser hits this URI, Google serves a web page that tells the end user that example.com wants to access her Google Calendar account on her behalf. If the user decides she trusts example.com, she authenticates with Google. She never gives her Google username or password to example.com.

After authenticating the user, Google hands control back to the original web application by redirecting the end user’s browser to a URI based on the value of the query variable next in the original request. In this example, next was http://calendar.example.com/my, so Google might redirect the end user to http://calendar.example.com/my?token=IFM29SdTSpKL77INCn. The new query variable token contains a one-time authorization token. The web application can put this token in the Authorization header when it makes a web service call to Google Calendar:

Authorization: AuthSub token="IFM29SdTSpKL77INCn"

Now the web application can make a web-service call as the end user, without actually knowing anything about the end user. The authentication information never leaves google.com, and the authorization token is only good for one request.

Those are the basics. Google’s user-client authorization mechanism has lots of other features. A web service client can use the one-time authorization token to get a “session token” that’s good for more than one request. A client can digitally sign requests, similarly to how I signed Amazon S3 requests back in Chapter 3. These features are different for every user-client authorization mechanism, so I won’t dwell on them here. The point is this flow (shown graphically in Figure 8-3): control moves from the web application’s domain to the web service’s domain. The user authenticates with the web service, and authorizes the foreign web application to act on her behalf. Then control moves back to the web application’s domain. Now the web app has an authorization token that it can use in the Authorization header. It can make web service calls without knowing the user’s username and password.

How a web application gets authorization to use Google Calendar
Figure 8-3. How a web application gets authorization to use Google Calendar

Applications with No Web Interface

For applications that expose a web interface, browser-based user-client authorization makes sense. The user is already in her web browser, and the application she’s using is running on a faraway server. She doesn’t trust the web application with her password, but she does trust her own web browser. But what if the web service client is a standalone application running on the user’s computer? What if it’s got a GUI or command-line interface, but it’s not a web browser?

There are two schools of thought on this. The first is that the end user should trust any client-side application as much as she trusts her web browser. Web applications run on an untrusted computer, but I control every web service client that runs on my computer. I can keep track of what the clients are doing and kill them if they get out of control.

If you as a service designer subscribe to this philosophy, there’s no need to hide the end user’s username and password from desktop clients. They’re all just as trustworthy as the web browser. Google takes this attitude. Its authentication mechanism for client-side applications is different from the web-based one I described above. Both systems are based on tokens, but desktop applications get an authorization token by gathering the user’s username and password and “logging in” as them—not by redirecting the user’s browser to a Google login page. This token serves little purpose from a security standpoint. The client needs a token to make web service requests, but it can only get one if it knows the user’s username and password—a far more valuable prize.

If you don’t like this, then you probably think the web browser is the only client an end user should trust with her username and password. This creates a problem for the programmer of a desktop client. Getting an authentication token means starting up a trusted client—the web browser—and getting the end user to visit a certain URI. For the Flickr service the URI might look like this:

http://flickr.com/services/auth/?perms=write&api_sig=925e1&api_key=1234&frob=abcd

The most important query variable here is frob. That’s a predefined ID, obtained through an earlier web service call, and I’ll use it in a moment. The first thing the end user sees is that her browser suddenly pops up and visits this URI, which shows a Flickr login screen. The end user gives her authentication credentials and authorizes the client with api_key=1234 to act on her behalf. In the Google example above, the web service client was the web application at example.com. Here, the web service client is the application running on the end user’s own desktop.

Without the frob, the desktop client at this point would have to cajole the end user to copy and paste the authorization token from the browser into the desktop client. But the client and the service agreed on a frob ahead of time, and the desktop client can use this frob to get the authorization token. The end user can close her browser at this point, and the desktop client makes a GET request to a URI that looks like this:

http://flickr.com/services/rest/?method=flickr.auth.getToken
&api_sig=1f348&api_key=1234&frob=abcd

The eBay and Flickr web services use a mechanism like this: what Flickr calls a frob, eBay calls an runame. The end user can authorize a standalone client to make web service requests on her behalf, without ever telling it her username or password. I’ve diagrammed the whole process in Figure 8-4.

How a web application gets authorization to use Flickr
Figure 8-4. How a web application gets authorization to use Flickr

Some mobile devices have network connectivity but no web browser. A web service that thinks the only trusted client is a web browser must make special allowances for such devices, or live with the fact that it’s locking them out.

What Problem Does this Solve?

Despite appearances, I’ve gone into very little detail: just enough to give you a feel for the two ways an end user might delegate her authority to make web service calls. Even in the high-level view it’s a complex system, and it’s worth asking what problem it actually solves. After all, the end user still has to type her username and password into a web form, and nothing prevents a malicious application writer from sending the browser to a fake authentication page instead of the real page. Phishers redirect people to fake sign-in pages all the time, and a lot of people fall for it. So what does this additional infrastructure really buy?

If you look at a bank or some other web site that’s a common target of phishing attacks, you’ll see a big warning somewhere that looks like this: “Never type in your mybank.com username and password unless you’re using a web browser and visiting a URI that starts with https://www.mybank.com/.” Common sense, right? It’s not the most ironclad guarantee of security, but if you’re careful you’ll be all right. Yet most web services can’t even provide this milquetoast cover. The standalone applications presented throughout this book take your service username and password as input. Can you trust them? If the web site at example.com wants to help you manage your del.icio.us bookmarks, you need to give it your del.icio.us username and password. Do you trust example.com?

The human web has a universal client: the web browser. It’s not a big leap of faith to trust a single client that runs on your computer. The programmable web has different clients for different purposes. Should the end user trust all those clients? The mechanisms I described in this section let the end user use her web browser—which she already trusts—as a way of bestowing lesser levels of trust on other clients. If a client abuses the trust, it can be blocked from making future web service requests. These strategies don’t eliminate phishing attacks, but they make it possible for a savvy end user to avoid them, and they allow service providers to issue warnings and disclaimers. Without these mechanisms, it’s technically impossible for the end user to tell the difference between a legitimate client and a phishing site. They both take your password: the only difference is what they do with it.



[24] In theory, I know how to find out which of these activities are supported: send an OPTIONS request. But right now, nobody supports OPTIONS.

[26] Documented by Mark Pilgrim in “Atom Authentication” on xml.com.

[27] For more detailed coverage, see section 13 of RFC 2616, and Chapter 7 of HTTP: The Definitive Guide, by Brian Totty and David Gourley (O’Reilly).

Get RESTful Web Services now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.