Chapter 4. Hypermedia

The story so far: URLs identify resources. A client makes HTTP requests to those URLs. A server sends representations in response, and over time the client builds up a picture of the resource state, as seen through the representations. Eventually the client makes that fateful PUT or POST or PATCH request, sending a representation back to the server and modifying resource state.

Look closer, and you’ll see a question that hasn’t been answered: how does the client know which requests it can make? There are infinitely many URLs. How does a client know which URLs have representations behind them and which ones will give a 404 error? Should the client send an entity-body with its POST request? If so, what should the entity-body look like? HTTP defines a set of protocol semantics, but which subset of those semantics does this web server support on this URL right now?

The missing piece of the puzzle is hypermedia. Hypermedia connects resources to each other, and describes their capabilities in machine-readable ways. Properly used, hypermedia can solve—or at least mitigate—the usability and stability problems found in today’s web APIs.

Like REST, hypermedia isn’t a single technology described by a standards document somewhere. Hypermedia is a strategy, implemented in different ways by dozens of technologies. I’ll cover several hypermedia standards in the next three chapters, and a whole lot more in Chapter 10. It’s up to you to choose the technologies that fit your business requirements.

The hypermedia strategy always has the same goal. Hypermedia is a way for the server to tell the client what HTTP requests the client might want to make in the future. It’s a menu, provided by the server, from which the client is free to choose. The server knows what might happen, but the client decides what actually happens.

There’s nothing new here. The World Wide Web works this way, and we all take it for granted that it should work this way. Anything else would be an unusable throwback to the 1980s. But in the world of APIs, hypermedia is a confusing and controversial topic. That’s why today’s APIs are terrible at managing change.

In this chapter, I want to dispel the mystery of hypermedia, so you can create APIs that have some of the flexibility of the Web.

HTML as a Hypermedia Format

You’re probably already familiar with HTML,[9] so let’s start with an HTML example.

Here’s an HTML <a> tag:

<a href="http://www.youtypeitwepostit.com/messages/">
 See the latest messages
</a>

This tag is a simple hypermedia control. It’s a description of an HTTP request your browser might make in the near future. An <a> tag is a signal to your browser that it can make an HTTP GET request that would look something like this:

GET /messages HTTP/1.1
Host: www.youtypeitwepostit.com

The HTML standard says that when the user activates a link, the user “visits” the resource on the other end of the link.[10] In practice, this means fetching a representation of the resource and displaying it in the browser window, replacing the original representation (the one that included the link). Of course, that doesn’t happen automatically. Nothing will happen until the user clicks on the link. An <a> tag is a promise from the web server that a certain URL names a resource you can visit. If you sent a GET request to a URL you made up, such as http://www.youtypeitwepostit.com/give-me-the-messages?please=true, you’d probably just get a 404 error.

Compare the <a> tag to another of HTML’s hypermedia controls, the <img> tag:

<img rel="icon" src="http://www.example.com/logo.png" />

The <img> tag also describes an HTTP request your browser might make in the near future, but there’s no implication that you’re moving from one document to another. Instead, the representation of the linked resource is supposed to be embedded as an image in the current document. When your browser finds an <img> tag, it makes the request for the image automatically, without asking you to click on anything. Then it incorporates the representation in the document you’re viewing, again without asking your permission.

Let’s look at a more complex hypermedia control—an HTML form:

<form action="http://www.youtypeitwepostit.com/messages" method="post">
  <input type="text" name="message" value="" required="true" />
  <input type="submit" value="Post" />
</form>

This form describes a request to the URL http://www.youtypeitwepostit.com/messages/. That’s the same URL I used for the <a> tag. But the <a> tag described a GET request, and this form describes a POST request.

This form doesn’t just give you the URL and send you off to make a POST request. There are also two controls—a text field and a submit button—which are rendered as GUI elements in a web browser.

When you click the submit button, the value you entered in the text field and the value on the button are transformed into a representation, according to rules set down in the HTML specification. Those rules say the media type of the representation will be application/x-www-form-urlencoded, and it will look something like this:

message=Hello%21&submit=Post

Putting it all together, that <form> tag tells your browser that it can make a POST request that looks something like this:

POST /messages HTTP/1.1
Host: www.youtypeitwepostit.com
Content-Type: application/x-www-form-urlencoded

message=Hello%21&submit=Post

As with the <a> tag, the server’s guiding you, but its hand is pretty light. If you don’t want to fill out this form, you can ignore it. If you do fill out the form, you can put whatever you want in the message field (although the server might reject certain values). The <form> tag is the server telling you that, of all the possible POST requests you might make, there’s one type of request that’s likely to result in something useful. That’s a POST to /messages, which includes a form-encoded entity-body that includes a value for message.

Here’s one more <form> tag:

<form method="GET" action="http://www.youtypewepostit.com/messages/">
 <input type="text" id="query" name="query"/>
 <input type="submit" name="Search"/>
</form>

This form also has a text box you’re supposed to fill out, but the form is telling you to make a GET request, and GET requests don’t include an entity-body. Instead, the data you type into that text box gets incorporated into the request URL—again, according to rules laid out in the HTML specification.

If you fill out this form, the HTTP request your browser makes will look something like this:

GET /messages/?query=rest HTTP/1.1
Host: www.youtypeitwepostit.com

To sum up, the familiar HTML controls allow the server to describe four kinds of HTTP requests.

  • The <a> tag describes a GET request for one specific URL, which is made only if the user triggers the control.
  • The <img> tag describes a GET request for one specific URL, which happens automatically, in the background.
  • The <form> tag with method="POST" describes a POST request to one specific URL, with a custom entity-body constructed by the client. The request is only made if the user triggers the control.
  • The <form> tag with method="GET" describes a GET request to a custom URL constructed by the client. The request is only made if the user triggers the control.

HTML also defines some more exotic hypermedia controls, and other data formats may define controls that are stranger still. All of them fall under the formal definition of hypermedia given in the Fielding dissertation:

Hypermedia is defined by the presence of application control information embedded within, or as a layer above, the presentation of information.

The World Wide Web is full of HTML documents, and the documents are full of things people like to read—prices, statistics, personal messages, prose, and poetry. But all of those things fall under presentation of information. In terms of presentation of information, the Web isn’t much different from a printed book.

It’s the application control information that distinguishes an HTML document from a book. I’m talking about the hypermedia controls that people interact with all the time, but rarely examine closely. The <img> tags that tell the browser to embed certain images, the <a> tags that transport the end user to another part of the Web, and the <script> tags that supply JavaScript for the browser to execute.

An HTML document that contains a poem will probably also feature a link to “Other poems by this author,” or a form that lets the reader “Rate this poem.” This is application control information that couldn’t show up in a printed book of poetry. The presence of application control information can certainly reduce the emotional impact of a poem, but an HTML document containing only the text of a poem is not a full participant in the Web. It’s just simulating a printed book.

URI Templates

The custom URLs you can create using an HTML <form> tag are limited in form. http://www.youtypeitwepostit.com/messages/?search=rest doesn’t look very nice. On a technical level, this doesn’t matter. URLs don’t have to look nice. URLs don’t even need to make sense to human eyes. But we humans prefer nice-looking URLs, like http://www.youtypeitwepostit.com/search/rest.

HTML’s hypermedia controls have no way of telling a browser how to construct a URL like http://www.youtypeitwepostit.com/search/rest. But URI Templates, a different hypermedia technology, can do this. URI Templates are defined in RFC 6570, and they look like this:

http://www.youtypeitwepostit.com/search/{search}

That’s not a valid URL, because it contains curly brackets. Those brackets identify the string as a URI Template. RFC 6570 tells you how to turn that string into an infinite number of URLs. It says you can replace {search} with any string you want, so long as that string would be valid in a URL:

This HTML form:

<form method="GET" action="http://www.youtypeitwepostit.com/messages/">
 <input type="text" id="query" name="query"/>
 <input type="submit" name="Search"/>
</form>

is exactly equivalent to this URI Template:

http://www.youtypeitwepostit.com/messages/?query={query}

That’s a very common case, so the URI Templates standard defines a shortcut for URLs that include a query string. This URI Template is exactly equivalent to the previous one, and it’s also equivalent to the previous HTML form:

http://www.youtypeitwepostit.com/messages/{?query}

The URI Templates standard is full of examples, but here are a few more sample templates, along with just a few of the URLs you can get from them:

If parameter values are set to:
   var   := "title"
   hello := "Hello World!"
   path  := "/foo/bar"

Then these URI templates:
   http://www.example.org/greeting?g={+hello}
   http://www.example.org{+path}/status
   http://www.example.org/document#{+var}

Expand to these URLs:
   http://www.example.org/greeting?g=Hello%20World!
   http://www.example.org/foo/bar/status
   http://www.example.org/document#title

Although a URI Template is shorter and more flexible than an HTML GET form, the two technologies aren’t much different. URI Templates and HTML forms allow a web server to describe an infinite number of URLs with a short string. The HTTP client can plug in some values, choose one URL from that infinite family, and make a GET request to that specific URL.

URI Templates don’t make sense on their own. A URI Template needs to be embedded in a hypermedia format. The idea is that every standard that needs this functionality should just use URI Templates, instead of defining a custom format, which is what was happening before RFC 6570 was published.

URI Versus URL

I’ve put this off for as long as I can, but now I need to explain the difference between URL (the term I use almost everywhere in this book), and URI (the more general term used in the names of technologies such as URI Templates). Most web APIs deal exclusively with URLs, so for most of this book, the distinction doesn’t matter. But when it’s important (as it will be in Chapter 12), it’s really important.

A URL is a short string used to identify a resource. A URI is also a short string used to identify a resource. Every URL is a URI. They’re described in the same standard: RFC 3986.

What’s the difference? As far as this book is concerned, the difference is this: there’s no guarantee that a URI has a representation. A URI is nothing but an identifier. A URL is an identifier that can be dereferenced. That is, a computer can somehow take a URL and get a representation of the underlying resource.

If you see an http: URI, you know how a computer can get a representation: by making an HTTP GET request. If you see an ftp: URI, you know how a computer can get a representation: by starting up an FTP client and executing certain FTP commands. These URIs are URLs. They have protocols associated with them: rules for obtaining representations of these resources (very detailed rules that a computer can follow).

Here’s a URI that’s not a URL: urn:isbn:9781449358063. It designates a resource: the print edition of this book. Not any particular copy of this book, but the abstract concept of an entire edition. (Remember that a resource can be anything at all.) This URI is not a URL because… what’s the protocol? How would a computer get a representation? You can’t do it.

Without a URL, you can’t get a representation. Without representations, there can be no representational state transfer. A resource that’s not identified by a URL cannot fulfill many of the Fielding constraints. It can’t fulfill the self-descriptive message constraint, because it can’t send any messages. A representation can link to a URI that’s not a URL (<a href="urn:isbn:9781449358063">), but that won’t fulfill the hypermedia constraint, because a client can’t follow the link.

Here’s a URL that identifies the print edition of this book: http://shop.oreilly.com/product/0636920028468.do. You can send a GET request to this URL and get a representation of the edition. Not a physical copy of the book, but an HTML document that conveys some of its resource state: the title, the number of pages, and so on. The HTML document also contains hypermedia, like links to the book’s authors—not the people themselves, but some information about them. A resource identified by a URL can fulfill all the Fielding constraints.

There are some good reasons to use URIs that aren’t URLs, and I’ll cover them when I discuss the resource description strategy in Chapter 12. But it’s a pretty rare situation. In general, when your web API refers to a resource, it should use a URL with the http or https scheme, and that URL should work: it should serve a useful representation in response to a GET request.

Here’s a technology that puts hypermedia where you might not expect it: inside the headers of an HTTP request or response. RFC 5988 defines an extension to HTTP, a header called Link. This header lets you add simple hypermedia controls to entity-bodies that don’t normally support hypermedia at all, like JSON objects and binary image files.

Here’s a plain-text representation of a story that’s been split into multiple parts with cliffhangers (the entity-body of this HTTP response contains the first part of the story, and the Link header points to the second part):

HTTP/1.1 200 OK
Content-Type: text/plain
Link: <http://www.example.com/story/part2>;rel="next"

It was a dark and stormy night. Suddenly, a...
(continued in part 2)

The Link header has approximately the same functionality as an HTML <a> tag. I recommend you use real hypermedia formats whenever possible, but when that’s not an option, the Link header can be very useful.

The LINK and UNLINK extension methods use the Link header. This example from Chapter 3 (which assigns an author to the story) should make a little more sense now:

LINK /story HTTP/1.1
Host: www.example.com
Link: <http://www.example.com/~drmilk>;rel="author"

What Hypermedia Is For

I’ll be covering a lot of hypermedia data formats in this book, but at this point telling you about one technology after another won’t help very much. We need to take a step back and see what hypermedia is for.

Hypermedia controls have three jobs:

  • They tell the client how to construct an HTTP request: what HTTP method to use, what URL to use, what HTTP headers and/or entity-body to send.
  • They make promises about the HTTP response, suggesting the status code, the HTTP headers, and/or the data the server is likely to send in response to a request.
  • They suggest how the client should integrate the response into its workflow.

HTML GET forms and URI Templates feel similar because they do the same job. They both tell the client how to construct a URL for use in an HTTP GET request.

Guiding the Request

An HTTP request has four parts: the method, the target URL, the HTTP headers, and the entity-body. Hypermedia controls can guide the client into specifying all four of these.

This HTML <a> tag specifies both the target URL and the HTTP method to use:

<a href="http://www.example.com/">An outbound link</a>

The target URL is defined explicitly, in the href attribute. The HTTP method is defined implicitly: the HTML spec says that an <a> tag becomes a GET request when the end user clicks the link.

This HTML form defines the method, the target URL, and the entity-body of a potential future HTTP request:

<form action="/stores" method="post">
  <input type="text" name="storeName" value=""  />
  <input type="text" name="nearbyCity" value="" />
  <input type="submit" value="Add" />
</form>

Both the HTTP method and the target URL are defined explicitly. The entity-body is defined in terms of a set of questions for the client. The client needs to figure out what values it wants to provide for the variables storeName and nearbyCity. Then it can construct a form-encoded entity-body that the server will accept. (Who says it needs to be form-encoded? That’s defined implicitly, by HTML’s rules for processing a <form> tag.)

This URI Template specifies the target URL of an HTTP request, and nothing else:

http://www.youtypeitwepostit.com/messages/{?search}

The target URL is defined in terms of a variable that needs to be filled in, just like the entity-body of an HTML form would be. The client uses an algorithm to turn the URI Template and its desired value for the search variable into a real URL: say, for example, http://www.youtypeitwepostit.com/messages/?search=rest.

A URI Template defines nothing about the HTTP request except for the target URI. It’s not telling you to make a GET request, a POST request, or any kind of request in particular. That’s why I said URI Templates don’t make sense on their own, why they need to be combined with another hypermedia technology.

Here’s an HTML form that tells the client to set a specific value for the HTTP header Content-Type:

<form action="POST" enctype="text/plain">
  ...
</form>

Ordinarily, the entity-body of an HTML POST form is form-encoded, and sent over the network with the Content-Type header set to application/x-www-form-urlencoded. But specifying the enctype attribute of the <form> tag overrides this behavior. A form with enctype="text/plain" tells the browser to encode its entity-body in a plain text format, and to send it over the network with the Content-Type header set to text/plain.

This isn’t a great example, because the enctype attribute only changes the Content-Type header as a side effect of changing the entity-body. But it is the best example I can come up with using a popular hypermedia format like HTML.

Hypermedia controls generally leave an HTTP client free to send whatever headers it wants. But this laissez-faire attitude is only a convention. A hypermedia control can describe an HTTP request in great detail. It can instruct the client to send an HTTP request to a specific URL, using a specific HTTP method, providing an entity-body constructed according to specific rules, and providing specific values for specific HTTP headers.

Promises About the Response

Here’s another HTML tag:

<img src="http://www.example.com/logo.png" />

Like an <a> tag, an <img> tag is a promise that the client can make a GET request to a particular URL. But the <img> tag makes another promise: that the server will send some kind of image representation in response to GET.

Here’s another example—a simple XML hypermedia control from the Atom Publishing Protocol (which I’ll discuss in more detail in Chapter 6):

<link rel="edit" href="http://example.org/posts/1"/>

This looks simple enough; in fact, this <link> tag could legally show up in an HTML document. But interpreted according to the AtomPub standard, that rel="edit" gives you a lot of information about the resource at http://example.org/posts/1.

First, rel="edit" says that the resource at http://example.org/posts/1 supports PUT and DELETE as well as GET. You can GET a representation of this resource, modify the representation, and PUT it back to change the resource’s state. That’s a perfectly standard use of HTTP, and perhaps not something that needs to be stated explicitly. But given that most HTTP resources don’t respond to PUT or DELETE, it’s worth spelling out.

More important, rel="edit" means the client needn’t speculate about what kind of representation you’ll get if you send a GET request to http://example.org/posts/1. You’ll get back the kind of document AtomPub calls a Member Entry. (The details aren’t important right now—skip to Chapter 6 if you want to learn more about AtomPub.)

The server is making a promise to the client: if you make that GET request, you’ll receive an AtomPub Member Entry representation in return. The client doesn’t have to make a blind GET and see what the Content-Type says. It knows the representation will be of type application/atom+xml, and it also knows something about the representation’s application semantics.

Workflow Control

The third job of hypermedia is to describe the relationships between resources. This is best explained by an example. Here’s an HTML <a> tag:

<a href="http://www.example.com/">An outbound link</a>

If you click this link in your web browser, the browser will move to the web page mentioned in the link’s href attribute. The old page will become completely irrelevant, except as an item in your browser history. The <a> tag is an outbound link: a hypermedia control that, when activated, replaces the client’s application state with a brand new state.

Compare this to the <img> tag in HTML:

<img src="http://www.example.com/logo.png" />

This is a link, but it’s not an outbound link; it’s an embedded link. Embedded links don’t replace the client’s application state. They augment it. If you visit a web page whose HTML includes this <img> tag, the image is automatically loaded in a separate HTTP request (without you having to click anything), and displayed in the same window as the web page itself. You’re still on the same page, but now you have more information.

An HTML document can embed more than images. Here’s some HTML markup that downloads and runs some executable code written in JavaScript:

<script type="application/javascript" src="/my_javascript_application.js"/>

Here’s some markup that downloads a CSS stylesheet and applies it to the main document:

<link rel="stylesheet" type="text/css" href="/my_stylesheet.css"/>

Here’s some markup that embeds another full HTML document inside this one:

<frameset>
  <iframe src="/another-document.html" />
</frameset>

All of these are embedding links. The process of embedding one document in another is also called transclusion.

Of course, a client is free to ignore the server’s guidance. There are browser extensions that prevent the browser from transcluding the files referenced by <script> tags, and options to override the formatting instructions specified by stylesheets for greater readability. The point of these tags, as with the <form> tag, is to give the client hints as to which HTTP requests are likely to get the client what it wants. The client is always free not to make a request.

Beware of Fake Hypermedia!

There are a lot of existing APIs that were designed by people who understood the benefits of hypermedia, but that don’t technically contain any hypermedia. Imagine a bookstore API that serves a JSON representation like this:

HTTP/1.1 200 OK
Content-Type: application/json

{
 "title": "Example: A Novel",
 "description": "http://www.example.com/"
}

This is a representation of a book. The description field happens to look like a URL: http://www.example.com/. But is this a link? Is description supposed to link to a resource that gives the description? Or is it supposed to be a textual description, and some smart aleck typed in some text that happens to be a valid URL?

Formally speaking, "http://www.example.com/" is a string. The application/json media type doesn’t define any hypermedia controls, so even if some part of a representation really looks like a hypermedia link, it’s not! It’s just a string!

If you’re trying to consume an API like this, you won’t get very far dogmatically denying the existence of links. Instead, you’ll read some human-readable documentation written by the API provider. That documentation will explain the conventions the provider used to embed hypermedia links in a format (JSON) that doesn’t support hypermedia. Then you’ll know how to distinguish between links and strings, and you’ll be able to write a client that can detect and follow the hypermedia links.

But your client will only work for that specific API. The documentation you read is the documentation for a one-off fiat standard. The next API you use will have a different set of conventions for embedding hypermedia links in JSON, and you’ll have to do the work all over again.

That’s why API designers shouldn’t design APIs that serve plain JSON. You should use a media type that has real support for hypermedia. Your users will thank you. They’ll be able to use preexisting libraries written against the media type, rather than writing new ones specifically for your API.

JSON has been the most popular representation format for APIs for quite a while, but as recently as a couple years ago, there were no JSON-based hypermedia formats. As you’ll see in the next few chapters, that has changed. Don’t worry that you’ll have to give up JSON to gain real hypermedia.

The Semantic Challenge: How Are We Doing?

At the end of Chapter 1, I set out a challenge: “How can we program a computer to decide which links to click?” A web browser works by passing the representations it gets to a human, who makes all the decisions. How can we get similar behavior without consulting a human at each step?

Providing the links is a step in the right direction. Out of the infinite set of legal HTTP requests, a hypermedia document explains which requests might be useful right now, on this particular site. The client doesn’t have to guess.

But that’s not enough. Suppose an HTML document contains only two links, A and B. Two possible requests the client might make. How does the client choose? On what basis can it make its decision?

Well, suppose one of those links is represented by an HTML <img> tag, and the other is represented by a <script> tag. As far as HTTP is concerned, there’s no difference between these two links. They have the same protocol semantics. They both trigger a GET request to a predetermined URL. But the two links have different application semantics. The representation at the other end of an <img> tag is supposed to be displayed as an image, and the representation at the other end of a <script> tag is supposed to be executed as client-side code.

For some clients, that’s enough information to make a decision. A client designed to scrape all the images from a web page will follow the link in the <img> tag and ignore the link in the <script> tag.

This shows that hypermedia controls can bridge the semantic gap. They can tell the client why it might want to make a certain HTTP request.

But for most clients, the distinction between <img> and <script> isn’t enough information to make a decision. “Image” and “script” are very generic bits of application semantics. The application described by HTML is the World Wide Web, a very flexible application that’s used for all sorts of things.

When I think about application semantics, I usually think on a higher level than that. I think about the concepts that separate a wiki from an online store. They’re both websites, they both use embedded images and scripts, but they mean very different things.

A hypermedia format doesn’t have to be generic like HTML. It can be defined in enough detail to convey the application semantics of a wiki or a store. In the next chapter, I’ll talk about hypermedia formats that are designed to represent one specific type of problem. Outside that problem space, they’re practically useless. But within their limits, they meet the semantic challenge very well.



[9] There are two HTML specifications you should know about: the HTML 4 spec and the HTML 5 spec. Both are open standards produced by the W3C. HTML 4 has been stable for over 10 years; HTML 5 is a work in progress.

[10] That’s in section 12.1.1 of the HTML 4 specification.

Get RESTful Web APIs now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.