RESTful Web APIs

ChapterÂ 4.Â Hypermedia

The story so far: URLs identify resources. A client makes HTTP requests to those URLs. A server sends representations in response, and over time the client builds up a picture of the resource state, as seen through the representations. Eventually the client makes that fateful PUT or POST or PATCH request, sending a representation back to the server and modifying resource state.

Look closer, and youâll see a question that hasnât been answered: how does the client know which requests it can make? There are infinitely many URLs. How does a client know which URLs have representations behind them and which ones will give a 404 error? Should the client send an entity-body with its POST request? If so, what should the entity-body look like? HTTP defines a set of protocol semantics, but which subset of those semantics does this web server support on this URL right now?

The missing piece of the puzzle is hypermedia. Hypermedia connects resources to each other, and describes their capabilities in machine-readable ways. Properly used, hypermedia can solveâor at least mitigateâthe usability and stability problems found in todayâs web APIs.

Like REST, hypermedia isnât a single technology described by a standards document somewhere. Hypermedia is a strategy, implemented in different ways by dozens of technologies. Iâll cover several hypermedia standards in the next three chapters, and a whole lot more in ChapterÂ 10. Itâs up to you to choose the technologies that fit your business requirements.

The hypermedia strategy always has the same goal. Hypermedia is a way for the server to tell the client what HTTP requests the client might want to make in the future. Itâs a menu, provided by the server, from which the client is free to choose. The server knows what might happen, but the client decides what actually happens.

Thereâs nothing new here. The World Wide Web works this way, and we all take it for granted that it should work this way. Anything else would be an unusable throwback to the 1980s. But in the world of APIs, hypermedia is a confusing and controversial topic. Thatâs why todayâs APIs are terrible at managing change.

In this chapter, I want to dispel the mystery of hypermedia, so you can create APIs that have some of the flexibility of the Web.

HTML as a Hypermedia Format

Youâre probably already familiar with HTML,^[9] so letâs start with an HTML example.

Hereâs an HTML <a> tag:

<a href="http://www.youtypeitwepostit.com/messages/">
 See the latest messages
</a>

This tag is a simple hypermedia control. Itâs a description of an HTTP request your browser might make in the near future. An <a> tag is a signal to your browser that it can make an HTTP GET request that would look something like this:

GET /messages HTTP/1.1
Host: www.youtypeitwepostit.com

The HTML standard says that when the user activates a link, the user âvisitsâ the resource on the other end of the link.^[10] In practice, this means fetching a representation of the resource and displaying it in the browser window, replacing the original representation (the one that included the link). Of course, that doesnât happen automatically. Nothing will happen until the user clicks on the link. An <a> tag is a promise from the web server that a certain URL names a resource you can visit. If you sent a GET request to a URL you made up, such as http://www.youtypeitwepostit.com/give-me-the-messages?please=true, youâd probably just get a 404 error.

Compare the <a> tag to another of HTMLâs hypermedia controls, the <img> tag:

<img rel="icon" src="http://www.example.com/logo.png" />

The <img> tag also describes an HTTP request your browser might make in the near future, but thereâs no implication that youâre moving from one document to another. Instead, the representation of the linked resource is supposed to be embedded as an image in the current document. When your browser finds an <img> tag, it makes the request for the image automatically, without asking you to click on anything. Then it incorporates the representation in the document youâre viewing, again without asking your permission.

Letâs look at a more complex hypermedia controlâan HTML form:

<form action="http://www.youtypeitwepostit.com/messages" method="post">
  <input type="text" name="message" value="" required="true" />
  <input type="submit" value="Post" />
</form>

This form describes a request to the URL http://www.youtypeitwepostit.com/messages/. Thatâs the same URL I used for the <a> tag. But the <a> tag described a GET request, and this form describes a POST request.

This form doesnât just give you the URL and send you off to make a POST request. There are also two controlsâa text field and a submit buttonâwhich are rendered as GUI elements in a web browser.

When you click the submit button, the value you entered in the text field and the value on the button are transformed into a representation, according to rules set down in the HTML specification. Those rules say the media type of the representation will be application/x-www-form-urlencoded, and it will look something like this:

message=Hello%21&submit=Post

Putting it all together, that <form> tag tells your browser that it can make a POST request that looks something like this:

POST /messages HTTP/1.1
Host: www.youtypeitwepostit.com
Content-Type: application/x-www-form-urlencoded

message=Hello%21&submit=Post

As with the <a> tag, the serverâs guiding you, but its hand is pretty light. If you donât want to fill out this form, you can ignore it. If you do fill out the form, you can put whatever you want in the message field (although the server might reject certain values). The <form> tag is the server telling you that, of all the possible POST requests you might make, thereâs one type of request thatâs likely to result in something useful. Thatâs a POST to /messages, which includes a form-encoded entity-body that includes a value for message.

Hereâs one more <form> tag:

<form method="GET" action="http://www.youtypewepostit.com/messages/">
 <input type="text" id="query" name="query"/>
 <input type="submit" name="Search"/>
</form>

This form also has a text box youâre supposed to fill out, but the form is telling you to make a GET request, and GET requests donât include an entity-body. Instead, the data you type into that text box gets incorporated into the request URLâagain, according to rules laid out in the HTML specification.

If you fill out this form, the HTTP request your browser makes will look something like this:

GET /messages/?query=rest HTTP/1.1
Host: www.youtypeitwepostit.com

To sum up, the familiar HTML controls allow the server to describe four kinds of HTTP requests.

The <a> tag describes a GET request for one specific URL, which is made only if the user triggers the control.
The <img> tag describes a GET request for one specific URL, which happens automatically, in the background.
The <form> tag with method="POST" describes a POST request to one specific URL, with a custom entity-body constructed by the client. The request is only made if the user triggers the control.
The <form> tag with method="GET" describes a GET request to a custom URL constructed by the client. The request is only made if the user triggers the control.

HTML also defines some more exotic hypermedia controls, and other data formats may define controls that are stranger still. All of them fall under the formal definition of hypermedia given in the Fielding dissertation:

Hypermedia is defined by the presence of application control information embedded within, or as a layer above, the presentation of information.

The World Wide Web is full of HTML documents, and the documents are full of things people like to readâprices, statistics, personal messages, prose, and poetry. But all of those things fall under presentation of information. In terms of presentation of information, the Web isnât much different from a printed book.

Itâs the application control information that distinguishes an HTML document from a book. Iâm talking about the hypermedia controls that people interact with all the time, but rarely examine closely. The <img> tags that tell the browser to embed certain images, the <a> tags that transport the end user to another part of the Web, and the <script> tags that supply JavaScript for the browser to execute.

An HTML document that contains a poem will probably also feature a link to âOther poems by this author,â or a form that lets the reader âRate this poem.â This is application control information that couldnât show up in a printed book of poetry. The presence of application control information can certainly reduce the emotional impact of a poem, but an HTML document containing only the text of a poem is not a full participant in the Web. Itâs just simulating a printed book.

URI Templates

The custom URLs you can create using an HTML <form> tag are limited in form. http://www.youtypeitwepostit.com/messages/?search=rest doesnât look very nice. On a technical level, this doesnât matter. URLs donât have to look nice. URLs donât even need to make sense to human eyes. But we humans prefer nice-looking URLs, like http://www.youtypeitwepostit.com/search/rest.

HTMLâs hypermedia controls have no way of telling a browser how to construct a URL like http://www.youtypeitwepostit.com/search/rest. But URI Templates, a different hypermedia technology, can do this. URI Templates are defined in RFC 6570, and they look like this :

http://www.youtypeitwepostit.com/search/{search}

Thatâs not a valid URL, because it contains curly brackets. Those brackets identify the string as a URI Template. RFC 6570 tells you how to turn that string into an infinite number of URLs. It says you can replace {search} with any string you want, so long as that string would be valid in a URL:

This HTML form:

<form method="GET" action="http://www.youtypeitwepostit.com/messages/">
 <input type="text" id="query" name="query"/>
 <input type="submit" name="Search"/>
</form>

is exactly equivalent to this URI Template:

http://www.youtypeitwepostit.com/messages/?query={query}

Thatâs a very common case, so the URI Templates standard defines a shortcut for URLs that include a query string. This URI Template is exactly equivalent to the previous one, and itâs also equivalent to the previous HTML form:

http://www.youtypeitwepostit.com/messages/{?query}

The URI Templates standard is full of examples, but here are a few more sample templates, along with just a few of the URLs you can get from them:

If parameter values are set to:
   var   := "title"
   hello := "Hello World!"
   path  := "/foo/bar"

Then these URI templates:
   http://www.example.org/greeting?g={+hello}
   http://www.example.org{+path}/status
   http://www.example.org/document#{+var}

Expand to these URLs:
   http://www.example.org/greeting?g=Hello%20World!
   http://www.example.org/foo/bar/status
   http://www.example.org/document#title

Although a URI Template is shorter and more flexible than an HTML GET form, the two technologies arenât much different. URI Templates and HTML forms allow a web server to describe an infinite number of URLs with a short string. The HTTP client can plug in some values, choose one URL from that infinite family, and make a GET request to that specific URL.

URI Templates donât make sense on their own. A URI Template needs to be embedded in a hypermedia format. The idea is that every standard that needs this functionality should just use URI Templates, instead of defining a custom format, which is what was happening before RFC 6570 was published.

URI Versus URL

Iâve put this off for as long as I can, but now I need to explain the difference between URL (the term I use almost everywhere in this book), and URI (the more general term used in the names of technologies such as URI Templates). Most web APIs deal exclusively with URLs, so for most of this book, the distinction doesnât matter. But when itâs important (as it will be in ChapterÂ 12), itâs really important.

A URL is a short string used to identify a resource. A URI is also a short string used to identify a resource. Every URL is a URI. Theyâre described in the same standard: RFC 3986.

Whatâs the difference? As far as this book is concerned, the difference is this: thereâs no guarantee that a URI has a representation. A URI is nothing but an identifier. A URL is an identifier that can be dereferenced. That is, a computer can somehow take a URL and get a representation of the underlying resource.

If you see an http: URI, you know how a computer can get a representation: by making an HTTP GET request. If you see an ftp: URI, you know how a computer can get a representation: by starting up an FTP client and executing certain FTP commands. These URIs are URLs. They have protocols associated with them: rules for obtaining representations of these resources (very detailed rules that a computer can follow).

Hereâs a URI thatâs not a URL: urn:isbn:9781449358063. It designates a resource: the print edition of this book. Not any particular copy of this book, but the abstract concept of an entire edition. (Remember that a resource can be anything at all.) This URI is not a URL becauseâ¦ whatâs the protocol? How would a computer get a representation? You canât do it.

Without a URL, you canât get a representation. Without representations, there can be no representational state transfer. A resource thatâs not identified by a URL cannot fulfill many of the Fielding constraints. It canât fulfill the self-descriptive message constraint, because it canât send any messages. A representation can link to a URI thatâs not a URL (<a href="urn:isbn:9781449358063">), but that wonât fulfill the hypermedia constraint, because a client canât follow the link.

Hereâs a URL that identifies the print edition of this book: http://shop.oreilly.com/product/0636920028468.do. You can send a GET request to this URL and get a representation of the edition. Not a physical copy of the book, but an HTML document that conveys some of its resource state: the title, the number of pages, and so on. The HTML document also contains hypermedia, like links to the bookâs authorsânot the people themselves, but some information about them. A resource identified by a URL can fulfill all the Fielding constraints.

There are some good reasons to use URIs that arenât URLs, and Iâll cover them when I discuss the resource description strategy in ChapterÂ 12. But itâs a pretty rare situation. In general, when your web API refers to a resource, it should use a URL with the http or https scheme, and that URL should work: it should serve a useful representation in response to a GET request.

The Link Header

Hereâs a technology that puts hypermedia where you might not expect it: inside the headers of an HTTP request or response. RFC 5988 defines an extension to HTTP, a header called Link. This header lets you add simple hypermedia controls to entity-bodies that donât normally support hypermedia at all, like JSON objects and binary image files.

Hereâs a plain-text representation of a story thatâs been split into multiple parts with cliffhangers (the entity-body of this HTTP response contains the first part of the story, and the Link header points to the second part):

HTTP/1.1 200 OK
Content-Type: text/plain
Link: <http://www.example.com/story/part2>;rel="next"

It was a dark and stormy night. Suddenly, a...
(continued in part 2)

The Link header has approximately the same functionality as an HTML <a> tag. I recommend you use real hypermedia formats whenever possible, but when thatâs not an option, the Link header can be very useful.

The LINK and UNLINK extension methods use the Link header. This example from ChapterÂ 3 (which assigns an author to the story) should make a little more sense now:

LINK /story HTTP/1.1
Host: www.example.com
Link: <http://www.example.com/~drmilk>;rel="author"

What Hypermedia Is For

Iâll be covering a lot of hypermedia data formats in this book, but at this point telling you about one technology after another wonât help very much. We need to take a step back and see what hypermedia is for.

Hypermedia controls have three jobs:

They tell the client how to construct an HTTP request: what HTTP method to use, what URL to use, what HTTP headers and/or entity-body to send.
They make promises about the HTTP response, suggesting the status code, the HTTP headers, and/or the data the server is likely to send in response to a request.
They suggest how the client should integrate the response into its workflow.

HTML GET forms and URI Templates feel similar because they do the same job. They both tell the client how to construct a URL for use in an HTTP GET request.

Guiding the Request

An HTTP request has four parts: the method, the target URL, the HTTP headers, and the entity-body. Hypermedia controls can guide the client into specifying all four of these.

This HTML <a> tag specifies both the target URL and the HTTP method to use:

<a href="http://www.example.com/">An outbound link</a>

The target URL is defined explicitly, in the href attribute. The HTTP method is defined implicitly: the HTML spec says that an <a> tag becomes a GET request when the end user clicks the link.

This HTML form defines the method, the target URL, and the entity-body of a potential future HTTP request:

<form action="/stores" method="post">
  <input type="text" name="storeName" value=""  />
  <input type="text" name="nearbyCity" value="" />
  <input type="submit" value="Add" />
</form>

Both the HTTP method and the target URL are defined explicitly. The entity-body is defined in terms of a set of questions for the client. The client needs to figure out what values it wants to provide for the variables storeName and nearbyCity. Then it can construct a form-encoded entity-body that the server will accept. (Who says it needs to be form-encoded? Thatâs defined implicitly, by HTMLâs rules for processing a <form> tag.)

This URI Template specifies the target URL of an HTTP request, and nothing else:

http://www.youtypeitwepostit.com/messages/{?search}

The target URL is defined in terms of a variable that needs to be filled in, just like the entity-body of an HTML form would be. The client uses an algorithm to turn the URI Template and its desired value for the search variable into a real URL: say, for example, http://www.youtypeitwepostit.com/messages/?search=rest.

A URI Template defines nothing about the HTTP request except for the target URI. Itâs not telling you to make a GET request, a POST request, or any kind of request in particular. Thatâs why I said URI Templates donât make sense on their own, why they need to be combined with another hypermedia technology.

Hereâs an HTML form that tells the client to set a specific value for the HTTP header Content-Type:

<form action="POST" enctype="text/plain">
  ...
</form>

Ordinarily, the entity-body of an HTML POST form is form-encoded, and sent over the network with the Content-Type header set to application/x-www-form-urlencoded. But specifying the enctype attribute of the <form> tag overrides this behavior. A form with enctype="text/plain" tells the browser to encode its entity-body in a plain text format, and to send it over the network with the Content-Type header set to text/plain.

This isnât a great example, because the enctype attribute only changes the Content-Type header as a side effect of changing the entity-body. But it is the best example I can come up with using a popular hypermedia format like HTML.

Hypermedia controls generally leave an HTTP client free to send whatever headers it wants. But this laissez-faire attitude is only a convention. A hypermedia control can describe an HTTP request in great detail. It can instruct the client to send an HTTP request to a specific URL, using a specific HTTP method, providing an entity-body constructed according to specific rules, and providing specific values for specific HTTP headers.

Promises About the Response

Hereâs another HTML tag :

<img src="http://www.example.com/logo.png" />

Like an <a> tag, an <img> tag is a promise that the client can make a GET request to a particular URL. But the <img> tag makes another promise: that the server will send some kind of image representation in response to GET.

Hereâs another exampleâa simple XML hypermedia control from the Atom Publishing Protocol (which Iâll discuss in more detail in ChapterÂ 6):

<link rel="edit" href="http://example.org/posts/1"/>

This looks simple enough; in fact, this <link> tag could legally show up in an HTML document. But interpreted according to the AtomPub standard, that rel="edit" gives you a lot of information about the resource at http://example.org/posts/1.

First, rel="edit" says that the resource at http://example.org/posts/1 supports PUT and DELETE as well as GET. You can GET a representation of this resource, modify the representation, and PUT it back to change the resourceâs state. Thatâs a perfectly standard use of HTTP, and perhaps not something that needs to be stated explicitly. But given that most HTTP resources donât respond to PUT or DELETE, itâs worth spelling out.

More important, rel="edit" means the client neednât speculate about what kind of representation youâll get if you send a GET request to http://example.org/posts/1. Youâll get back the kind of document AtomPub calls a Member Entry. (The details arenât important right nowâskip to ChapterÂ 6 if you want to learn more about AtomPub.)

The server is making a promise to the client: if you make that GET request, youâll receive an AtomPub Member Entry representation in return. The client doesnât have to make a blind GET and see what the Content-Type says. It knows the representation will be of type application/atom+xml, and it also knows something about the representationâs application semantics.

Workflow Control

The third job of hypermedia is to describe the relationships between resources. This is best explained by an example. Hereâs an HTML <a> tag :

<a href="http://www.example.com/">An outbound link</a>

If you click this link in your web browser, the browser will move to the web page mentioned in the linkâs href attribute. The old page will become completely irrelevant, except as an item in your browser history. The <a> tag is an outbound link: a hypermedia control that, when activated, replaces the clientâs application state with a brand new state.

Compare this to the <img> tag in HTML:

<img src="http://www.example.com/logo.png" />

This is a link, but itâs not an outbound link; itâs an embedded link. Embedded links donât replace the clientâs application state. They augment it. If you visit a web page whose HTML includes this <img> tag, the image is automatically loaded in a separate HTTP request (without you having to click anything), and displayed in the same window as the web page itself. Youâre still on the same page, but now you have more information.

An HTML document can embed more than images. Hereâs some HTML markup that downloads and runs some executable code written in JavaScript:

<script type="application/javascript" src="/my_javascript_application.js"/>

Hereâs some markup that downloads a CSS stylesheet and applies it to the main document:

<link rel="stylesheet" type="text/css" href="/my_stylesheet.css"/>

Hereâs some markup that embeds another full HTML document inside this one:

<frameset>
  <iframe src="/another-document.html" />
</frameset>

All of these are embedding links. The process of embedding one document in another is also called transclusion.

Of course, a client is free to ignore the serverâs guidance. There are browser extensions that prevent the browser from transcluding the files referenced by <script> tags, and options to override the formatting instructions specified by stylesheets for greater readability. The point of these tags, as with the <form> tag, is to give the client hints as to which HTTP requests are likely to get the client what it wants. The client is always free not to make a request.

Beware of Fake Hypermedia!

There are a lot of existing APIs that were designed by people who understood the benefits of hypermedia, but that donât technically contain any hypermedia. Imagine a bookstore API that serves a JSON representation like this :

HTTP/1.1 200 OK
Content-Type: application/json

{
 "title": "Example: A Novel",
 "description": "http://www.example.com/"
}

This is a representation of a book. The description field happens to look like a URL: http://www.example.com/. But is this a link? Is description supposed to link to a resource that gives the description? Or is it supposed to be a textual description, and some smart aleck typed in some text that happens to be a valid URL?

Formally speaking, "http://www.example.com/" is a string. The application/json media type doesnât define any hypermedia controls, so even if some part of a representation really looks like a hypermedia link, itâs not! Itâs just a string!

If youâre trying to consume an API like this, you wonât get very far dogmatically denying the existence of links. Instead, youâll read some human-readable documentation written by the API provider. That documentation will explain the conventions the provider used to embed hypermedia links in a format (JSON) that doesnât support hypermedia. Then youâll know how to distinguish between links and strings, and youâll be able to write a client that can detect and follow the hypermedia links.

But your client will only work for that specific API. The documentation you read is the documentation for a one-off fiat standard. The next API you use will have a different set of conventions for embedding hypermedia links in JSON, and youâll have to do the work all over again.

Thatâs why API designers shouldnât design APIs that serve plain JSON. You should use a media type that has real support for hypermedia. Your users will thank you. Theyâll be able to use preexisting libraries written against the media type, rather than writing new ones specifically for your API.

JSON has been the most popular representation format for APIs for quite a while, but as recently as a couple years ago, there were no JSON-based hypermedia formats. As youâll see in the next few chapters, that has changed. Donât worry that youâll have to give up JSON to gain real hypermedia.

The Semantic Challenge: How Are We Doing?

At the end of ChapterÂ 1, I set out a challenge: âHow can we program a computer to decide which links to click?â A web browser works by passing the representations it gets to a human, who makes all the decisions. How can we get similar behavior without consulting a human at each step?

Providing the links is a step in the right direction. Out of the infinite set of legal HTTP requests, a hypermedia document explains which requests might be useful right now, on this particular site. The client doesnât have to guess.

But thatâs not enough. Suppose an HTML document contains only two links, A and B. Two possible requests the client might make. How does the client choose? On what basis can it make its decision?

Well, suppose one of those links is represented by an HTML <img> tag, and the other is represented by a <script> tag. As far as HTTP is concerned, thereâs no difference between these two links. They have the same protocol semantics. They both trigger a GET request to a predetermined URL. But the two links have different application semantics. The representation at the other end of an <img> tag is supposed to be displayed as an image, and the representation at the other end of a <script> tag is supposed to be executed as client-side code.

For some clients, thatâs enough information to make a decision. A client designed to scrape all the images from a web page will follow the link in the <img> tag and ignore the link in the <script> tag.

This shows that hypermedia controls can bridge the semantic gap. They can tell the client why it might want to make a certain HTTP request.

But for most clients, the distinction between <img> and <script> isnât enough information to make a decision. âImageâ and âscriptâ are very generic bits of application semantics. The application described by HTML is the World Wide Web, a very flexible application thatâs used for all sorts of things.

When I think about application semantics, I usually think on a higher level than that. I think about the concepts that separate a wiki from an online store. Theyâre both websites, they both use embedded images and scripts, but they mean very different things.

A hypermedia format doesnât have to be generic like HTML. It can be defined in enough detail to convey the application semantics of a wiki or a store. In the next chapter, Iâll talk about hypermedia formats that are designed to represent one specific type of problem. Outside that problem space, theyâre practically useless. But within their limits, they meet the semantic challenge very well.

^[9]There are two HTML specifications you should know about: the HTML 4 spec and the HTML 5 spec. Both are open standards produced by the W3C. HTML 4 has been stable for over 10 years; HTML 5 is a work in progress.

^[10]Thatâs in section 12.1.1 of the HTML 4 specification.

Get RESTful Web APIs now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

RESTful Web APIs by Leonard Richardson, Mike Amundsen, Sam Ruby