Chapter 4. Hypermedia
The story so far: URLs identify resources. A client makes HTTP requests to those URLs. A server sends representations in response, and over time the client builds up a picture of the resource state, as seen through the representations. Eventually the client makes that fateful PUT or POST or PATCH request, sending a representation back to the server and modifying resource state.
Look closer, and youâll see a question that hasnât been answered: how does the client know which requests it can make? There are infinitely many URLs. How does a client know which URLs have representations behind them and which ones will give a 404 error? Should the client send an entity-body with its POST request? If so, what should the entity-body look like? HTTP defines a set of protocol semantics, but which subset of those semantics does this web server support on this URL right now?
The missing piece of the puzzle is hypermedia. Hypermedia connects resources to each other, and describes their capabilities in machine-readable ways. Properly used, hypermedia can solveâor at least mitigateâthe usability and stability problems found in todayâs web APIs.
Like REST, hypermedia isnât a single technology described by a standards document somewhere. Hypermedia is a strategy, implemented in different ways by dozens of technologies. Iâll cover several hypermedia standards in the next three chapters, and a whole lot more in Chapter 10. Itâs up to you to choose the technologies that fit your business requirements.
The hypermedia strategy always has the same goal. Hypermedia is a way for the server to tell the client what HTTP requests the client might want to make in the future. Itâs a menu, provided by the server, from which the client is free to choose. The server knows what might happen, but the client decides what actually happens.
Thereâs nothing new here. The World Wide Web works this way, and we all take it for granted that it should work this way. Anything else would be an unusable throwback to the 1980s. But in the world of APIs, hypermedia is a confusing and controversial topic. Thatâs why todayâs APIs are terrible at managing change.
In this chapter, I want to dispel the mystery of hypermedia, so you can create APIs that have some of the flexibility of the Web.
HTML as a Hypermedia Format
Youâre probably already familiar with HTML,[9] so letâs start with an HTML example.
Hereâs an HTML <a>
tag:
<a href="http://www.youtypeitwepostit.com/messages/"> See the latest messages </a>
This tag is a simple hypermedia control. Itâs a description of an
HTTP request your browser might make in the near future. An <a>
tag
is a signal to your browser that it can make an HTTP GET request that
would look something like this:
GET /messages HTTP/1.1 Host: www.youtypeitwepostit.com
The HTML standard says that when the user activates a link, the user
âvisitsâ the resource on the other end of the link.[10] In practice, this
means fetching a representation of the resource and displaying it in
the browser window, replacing the original representation (the one
that included the link). Of course, that doesnât happen
automatically. Nothing will happen until the user clicks on the
link. An <a>
tag is a promise from the web server that a certain URL
names a resource you can visit. If you sent a GET request to a URL
you made up, such as http://www.youtypeitwepostit.com/give-me-the-messages?please=true,
youâd probably just get a 404 error.
Compare the <a>
tag to another of HTMLâs hypermedia controls, the
<img>
tag:
<img rel="icon" src="http://www.example.com/logo.png" />
The <img>
tag also describes an HTTP request your browser might make
in the near future, but thereâs no implication that youâre moving from
one document to another. Instead, the representation of the linked
resource is supposed to be embedded as an image in the current
document. When your browser finds an <img>
tag, it makes the request
for the image automatically, without asking you to click on
anything. Then it incorporates the representation in the document
youâre viewing, again without asking your permission.
Letâs look at a more complex hypermedia controlâan HTML form:
<form action="http://www.youtypeitwepostit.com/messages" method="post"> <input type="text" name="message" value="" required="true" /> <input type="submit" value="Post" /> </form>
This form describes a request to the URL
http://www.youtypeitwepostit.com/messages/. Thatâs the same URL I
used for the <a>
tag. But the <a>
tag described a GET request, and
this form describes a POST request.
This form doesnât just give you the URL and send you off to make a POST request. There are also two controlsâa text field and a submit buttonâwhich are rendered as GUI elements in a web browser.
When you click the submit button, the value you entered in the text
field and the value on the button are transformed into a representation, according to rules set
down in the HTML specification. Those rules say the media type of the
representation will be application/x-www-form-urlencoded
, and it
will look something like this:
message=Hello%21&submit=Post
Putting it all together, that <form>
tag tells your browser that it
can make a POST request that looks something like this:
POST /messages HTTP/1.1 Host: www.youtypeitwepostit.com Content-Type: application/x-www-form-urlencoded message=Hello%21&submit=Post
As with the <a>
tag, the serverâs guiding you, but its hand is
pretty light. If you donât want to fill out this form, you can ignore
it. If you do fill out the form, you can put whatever you want in the
message
field (although the server might reject certain values). The
<form>
tag is the server telling you that, of all the possible POST
requests you might make, thereâs one type of request thatâs likely to
result in something useful. Thatâs a POST to /messages
, which
includes a form-encoded entity-body that includes a value for
message
.
Hereâs one more <form>
tag:
<form method="GET" action="http://www.youtypewepostit.com/messages/"> <input type="text" id="query" name="query"/> <input type="submit" name="Search"/> </form>
This form also has a text box youâre supposed to fill out, but the form is telling you to make a GET request, and GET requests donât include an entity-body. Instead, the data you type into that text box gets incorporated into the request URLâagain, according to rules laid out in the HTML specification.
If you fill out this form, the HTTP request your browser makes will look something like this:
GET /messages/?query=rest HTTP/1.1 Host: www.youtypeitwepostit.com
To sum up, the familiar HTML controls allow the server to describe four kinds of HTTP requests.
-
The
<a>
tag describes a GET request for one specific URL, which is made only if the user triggers the control. -
The
<img>
tag describes a GET request for one specific URL, which happens automatically, in the background. -
The
<form>
tag withmethod="POST"
describes a POST request to one specific URL, with a custom entity-body constructed by the client. The request is only made if the user triggers the control. -
The
<form>
tag withmethod="GET"
describes a GET request to a custom URL constructed by the client. The request is only made if the user triggers the control.
HTML also defines some more exotic hypermedia controls, and other data formats may define controls that are stranger still. All of them fall under the formal definition of hypermedia given in the Fielding dissertation:
Hypermedia is defined by the presence of application control information embedded within, or as a layer above, the presentation of information.
The World Wide Web is full of HTML documents, and the documents are full of things people like to readâprices, statistics, personal messages, prose, and poetry. But all of those things fall under presentation of information. In terms of presentation of information, the Web isnât much different from a printed book.
Itâs the application control information that distinguishes an HTML
document from a book. Iâm talking about the hypermedia controls that
people interact with all the time, but rarely examine closely. The
<img>
tags that tell the browser to embed certain images, the <a>
tags
that transport the end user to another part of the Web, and the
<script>
tags that supply JavaScript for the browser to execute.
An HTML document that contains a poem will probably also feature a link to âOther poems by this author,â or a form that lets the reader âRate this poem.â This is application control information that couldnât show up in a printed book of poetry. The presence of application control information can certainly reduce the emotional impact of a poem, but an HTML document containing only the text of a poem is not a full participant in the Web. Itâs just simulating a printed book.
URI Templates
The custom URLs you can create using an HTML <form>
tag are limited in
form. http://www.youtypeitwepostit.com/messages/?search=rest doesnât
look very nice. On a technical level, this doesnât matter. URLs donât
have to look nice. URLs donât even need to make sense to human
eyes. But we humans prefer nice-looking URLs, like
http://www.youtypeitwepostit.com/search/rest.
HTMLâs hypermedia controls have no way of telling a browser how to construct a URL like http://www.youtypeitwepostit.com/search/rest. But URI Templates, a different hypermedia technology, can do this. URI Templates are defined in RFC 6570, and they look like this:
http://www.youtypeitwepostit.com/search/{search}
Thatâs not a valid URL, because it contains curly brackets. Those brackets identify the string as a URI Template. RFC 6570 tells you how to turn that string into an infinite number of URLs. It says you can replace {search} with any string you want, so long as that string would be valid in a URL:
This HTML form:
<form method="GET" action="http://www.youtypeitwepostit.com/messages/"> <input type="text" id="query" name="query"/> <input type="submit" name="Search"/> </form>
is exactly equivalent to this URI Template:
http://www.youtypeitwepostit.com/messages/?query={query}
Thatâs a very common case, so the URI Templates standard defines a shortcut for URLs that include a query string. This URI Template is exactly equivalent to the previous one, and itâs also equivalent to the previous HTML form:
http://www.youtypeitwepostit.com/messages/{?query}
The URI Templates standard is full of examples, but here are a few more sample templates, along with just a few of the URLs you can get from them:
If parameter values are set to: var := "title" hello := "Hello World!" path := "/foo/bar" Then these URI templates: http://www.example.org/greeting?g={+hello} http://www.example.org{+path}/status http://www.example.org/document#{+var} Expand to these URLs: http://www.example.org/greeting?g=Hello%20World! http://www.example.org/foo/bar/status http://www.example.org/document#title
Although a URI Template is shorter and more flexible than an HTML GET form, the two technologies arenât much different. URI Templates and HTML forms allow a web server to describe an infinite number of URLs with a short string. The HTTP client can plug in some values, choose one URL from that infinite family, and make a GET request to that specific URL.
URI Templates donât make sense on their own. A URI Template needs to be embedded in a hypermedia format. The idea is that every standard that needs this functionality should just use URI Templates, instead of defining a custom format, which is what was happening before RFC 6570 was published.
URI Versus URL
Iâve put this off for as long as I can, but now I need to explain the difference between URL (the term I use almost everywhere in this book), and URI (the more general term used in the names of technologies such as URI Templates). Most web APIs deal exclusively with URLs, so for most of this book, the distinction doesnât matter. But when itâs important (as it will be in Chapter 12), itâs really important.
A URL is a short string used to identify a resource. A URI is also a short string used to identify a resource. Every URL is a URI. Theyâre described in the same standard: RFC 3986.
Whatâs the difference? As far as this book is concerned, the difference is this: thereâs no guarantee that a URI has a representation. A URI is nothing but an identifier. A URL is an identifier that can be dereferenced. That is, a computer can somehow take a URL and get a representation of the underlying resource.
If you see an http:
URI, you know how a computer can get a
representation: by making an HTTP GET request. If you see an ftp:
URI, you know how a computer can get a representation: by starting up an
FTP client and executing certain FTP commands. These URIs are
URLs. They have protocols associated with them: rules for obtaining
representations of these resources (very detailed rules that a
computer can follow).
Hereâs a URI thatâs not a URL: urn:isbn:9781449358063
. It
designates a resource: the print edition of this book. Not any
particular copy of this book, but the abstract concept of an entire
edition. (Remember that a resource can be anything at all.) This URI
is not a URL because⦠whatâs the protocol? How would a computer get
a representation? You canât do it.
Without a URL, you canât get a representation. Without
representations, there can be no representational state transfer. A
resource thatâs not identified by a URL cannot fulfill many of the
Fielding constraints. It canât fulfill the self-descriptive message
constraint, because it canât send any messages. A representation can
link to a URI thatâs not a URL (<a href="urn:isbn:9781449358063">
),
but that wonât fulfill the hypermedia constraint, because a client
canât follow the link.
Hereâs a URL that identifies the print edition of this book: http://shop.oreilly.com/product/0636920028468.do. You can send a GET request to this URL and get a representation of the edition. Not a physical copy of the book, but an HTML document that conveys some of its resource state: the title, the number of pages, and so on. The HTML document also contains hypermedia, like links to the bookâs authorsânot the people themselves, but some information about them. A resource identified by a URL can fulfill all the Fielding constraints.
There are some good reasons to use URIs that arenât URLs, and Iâll
cover them when I discuss the resource description strategy in Chapter 12. But itâs a pretty rare situation. In general, when your web API
refers to a resource, it should use a URL with the http
or https
scheme, and that URL should work: it should serve a useful
representation in response to a GET request.
The Link Header
Hereâs a technology that puts hypermedia where you might not expect
it: inside the headers of an HTTP request or response. RFC 5988
defines an extension to HTTP, a header called Link
. This header lets
you add simple hypermedia controls to entity-bodies that donât normally
support hypermedia at all, like JSON objects and binary image files.
Hereâs a plain-text representation of a story thatâs been split into
multiple parts with cliffhangers (the entity-body of this HTTP
response contains the first part of the story, and the Link
header
points to the second part):
HTTP/1.1 200 OK Content-Type: text/plain Link: <http://www.example.com/story/part2>;rel="next" It was a dark and stormy night. Suddenly, a... (continued in part 2)
The Link
header has approximately the same functionality as an HTML
<a>
tag. I recommend you use real hypermedia formats whenever
possible, but when thatâs not an option, the Link
header can be very
useful.
The LINK and UNLINK extension methods use the Link
header. This
example from Chapter 3 (which assigns an author to the story) should
make a little more sense now:
LINK /story HTTP/1.1 Host: www.example.com Link: <http://www.example.com/~drmilk>;rel="author"
What Hypermedia Is For
Iâll be covering a lot of hypermedia data formats in this book, but at this point telling you about one technology after another wonât help very much. We need to take a step back and see what hypermedia is for.
Hypermedia controls have three jobs:
- They tell the client how to construct an HTTP request: what HTTP method to use, what URL to use, what HTTP headers and/or entity-body to send.
- They make promises about the HTTP response, suggesting the status code, the HTTP headers, and/or the data the server is likely to send in response to a request.
- They suggest how the client should integrate the response into its workflow.
HTML GET forms and URI Templates feel similar because they do the same job. They both tell the client how to construct a URL for use in an HTTP GET request.
Guiding the Request
An HTTP request has four parts: the method, the target URL, the HTTP headers, and the entity-body. Hypermedia controls can guide the client into specifying all four of these.
This HTML <a>
tag specifies both the target URL and the HTTP method to
use:
<a href="http://www.example.com/">An outbound link</a>
The target URL is defined explicitly, in the href
attribute. The
HTTP method is defined implicitly: the HTML spec says that an <a>
tag becomes a GET request when the end user clicks the link.
This HTML form defines the method, the target URL, and the entity-body of a potential future HTTP request:
<form action="/stores" method="post"> <input type="text" name="storeName" value="" /> <input type="text" name="nearbyCity" value="" /> <input type="submit" value="Add" /> </form>
Both the HTTP method and the target URL are defined explicitly. The
entity-body is defined in terms of a set of questions for the
client. The client needs to figure out what values it wants to provide
for the variables storeName
and nearbyCity
. Then it can construct
a form-encoded entity-body that the server will accept. (Who says it
needs to be form-encoded? Thatâs defined implicitly, by HTMLâs rules
for processing a <form>
tag.)
This URI Template specifies the target URL of an HTTP request, and nothing else:
http://www.youtypeitwepostit.com/messages/{?search}
The target URL is defined in terms of a variable that needs to be
filled in, just like the entity-body of an HTML form would be. The
client uses an algorithm to turn the URI Template and its desired
value for the search
variable into a real URL: say, for example,
http://www.youtypeitwepostit.com/messages/?search=rest.
A URI Template defines nothing about the HTTP request except for the target URI. Itâs not telling you to make a GET request, a POST request, or any kind of request in particular. Thatâs why I said URI Templates donât make sense on their own, why they need to be combined with another hypermedia technology.
Hereâs an HTML form that tells the client to set a specific value for
the HTTP header Content-Type
:
<form action="POST" enctype="text/plain"> ... </form>
Ordinarily, the entity-body of an HTML POST form is form-encoded, and
sent over the network with the Content-Type
header set to
application/x-www-form-urlencoded
. But specifying the enctype
attribute of the <form>
tag overrides this behavior. A form with
enctype="text/plain"
tells the browser to encode its entity-body in
a plain text format, and to send it over the network with the
Content-Type
header set to text/plain
.
This isnât a great example, because the enctype
attribute only
changes the Content-Type
header as a side effect of changing the
entity-body. But it is the best example I can come up with using a
popular hypermedia format like HTML.
Hypermedia controls generally leave an HTTP client free to send whatever headers it wants. But this laissez-faire attitude is only a convention. A hypermedia control can describe an HTTP request in great detail. It can instruct the client to send an HTTP request to a specific URL, using a specific HTTP method, providing an entity-body constructed according to specific rules, and providing specific values for specific HTTP headers.
Promises About the Response
<img src="http://www.example.com/logo.png" />
Like an <a>
tag, an <img>
tag is a promise that the client can
make a GET request to a particular URL. But the <img>
tag makes
another promise: that the server will send some kind of image
representation in response to GET.
Hereâs another exampleâa simple XML hypermedia control from the Atom Publishing Protocol (which Iâll discuss in more detail in Chapter 6):
<link rel="edit" href="http://example.org/posts/1"/>
This looks simple enough; in fact, this <link>
tag could legally
show up in an HTML document. But interpreted according to the AtomPub
standard, that rel="edit"
gives you a lot of information about the
resource at http://example.org/posts/1.
First, rel="edit"
says that the resource at
http://example.org/posts/1 supports PUT and DELETE as well as
GET. You can GET a representation of this resource, modify the
representation, and PUT it back to change the resourceâs state. Thatâs
a perfectly standard use of HTTP, and perhaps not something that needs
to be stated explicitly. But given that most HTTP resources donât
respond to PUT or DELETE, itâs worth spelling out.
More important, rel="edit"
means the client neednât speculate
about what kind of representation youâll get if you send a GET request
to http://example.org/posts/1. Youâll get back the kind of document
AtomPub calls a Member Entry. (The details arenât important right
nowâskip to Chapter 6 if you want to learn more about AtomPub.)
The server is making a promise to the client: if you make that GET
request, youâll receive an AtomPub Member Entry representation in
return. The client doesnât have to make a blind GET and see what the
Content-Type
says. It knows the representation will be of type
application/atom+xml
, and it also knows something about the
representationâs application semantics.
Workflow Control
The third job of hypermedia is to describe the relationships between
resources. This is best explained by an example. Hereâs an HTML <a>
tag:
<a href="http://www.example.com/">An outbound link</a>
If you click this link in your web browser, the browser will move
to the web page mentioned in the linkâs href
attribute. The old page
will become completely irrelevant, except as an item in your browser
history. The <a>
tag is an outbound link: a hypermedia control
that, when activated, replaces the clientâs application state with a
brand new state.
Compare this to the <img>
tag in HTML:
<img src="http://www.example.com/logo.png" />
This is a link, but itâs not an outbound link; itâs an embedded
link. Embedded links donât replace the clientâs application state.
They augment it. If you visit a web page whose HTML includes this
<img>
tag, the image is automatically loaded in a separate HTTP
request (without you having to click anything), and displayed in the
same window as the web page itself. Youâre still on the same page, but
now you have more information.
An HTML document can embed more than images. Hereâs some HTML markup that downloads and runs some executable code written in JavaScript:
<script type="application/javascript" src="/my_javascript_application.js"/>
Hereâs some markup that downloads a CSS stylesheet and applies it to the main document:
<link rel="stylesheet" type="text/css" href="/my_stylesheet.css"/>
Hereâs some markup that embeds another full HTML document inside this one:
<frameset> <iframe src="/another-document.html" /> </frameset>
All of these are embedding links. The process of embedding one document in another is also called transclusion.
Of course, a client is free to ignore the serverâs guidance. There are
browser extensions that prevent the browser from transcluding the
files referenced by <script>
tags, and options to override the
formatting instructions specified by stylesheets for greater
readability. The point of these tags, as with the <form>
tag, is to
give the client hints as to which HTTP requests are likely to get
the client what it wants. The client is always free not to make a
request.
Beware of Fake Hypermedia!
There are a lot of existing APIs that were designed by people who understood the benefits of hypermedia, but that donât technically contain any hypermedia. Imagine a bookstore API that serves a JSON representation like this:
HTTP/1.1 200 OK Content-Type: application/json { "title": "Example: A Novel", "description": "http://www.example.com/" }
This is a representation of a book. The description
field happens to
look like a URL: http://www.example.com/. But is this a link? Is
description
supposed to link to a resource that gives the
description? Or is it supposed to be a textual description, and some
smart aleck typed in some text that happens to be a valid URL?
Formally speaking, "http://www.example.com/"
is a string. The
application/json
media type doesnât define any hypermedia controls,
so even if some part of a representation really looks like a
hypermedia link, itâs not! Itâs just a string!
If youâre trying to consume an API like this, you wonât get very far dogmatically denying the existence of links. Instead, youâll read some human-readable documentation written by the API provider. That documentation will explain the conventions the provider used to embed hypermedia links in a format (JSON) that doesnât support hypermedia. Then youâll know how to distinguish between links and strings, and youâll be able to write a client that can detect and follow the hypermedia links.
But your client will only work for that specific API. The documentation you read is the documentation for a one-off fiat standard. The next API you use will have a different set of conventions for embedding hypermedia links in JSON, and youâll have to do the work all over again.
Thatâs why API designers shouldnât design APIs that serve plain JSON. You should use a media type that has real support for hypermedia. Your users will thank you. Theyâll be able to use preexisting libraries written against the media type, rather than writing new ones specifically for your API.
JSON has been the most popular representation format for APIs for quite a while, but as recently as a couple years ago, there were no JSON-based hypermedia formats. As youâll see in the next few chapters, that has changed. Donât worry that youâll have to give up JSON to gain real hypermedia.
The Semantic Challenge: How Are We Doing?
At the end of Chapter 1, I set out a challenge: âHow can we program a computer to decide which links to click?â A web browser works by passing the representations it gets to a human, who makes all the decisions. How can we get similar behavior without consulting a human at each step?
Providing the links is a step in the right direction. Out of the infinite set of legal HTTP requests, a hypermedia document explains which requests might be useful right now, on this particular site. The client doesnât have to guess.
But thatâs not enough. Suppose an HTML document contains only two links, A and B. Two possible requests the client might make. How does the client choose? On what basis can it make its decision?
Well, suppose one of those links is represented by an HTML <img>
tag, and the other is represented by a <script>
tag. As far as HTTP
is concerned, thereâs no difference between these two links. They have
the same protocol semantics. They both trigger a GET request to a
predetermined URL. But the two links have different application
semantics. The representation at the other end of an <img>
tag is
supposed to be displayed as an image, and the representation at the
other end of a <script>
tag is supposed to be executed as
client-side code.
For some clients, thatâs enough information to make a decision. A
client designed to scrape all the images from a web page will follow
the link in the <img>
tag and ignore the link in the <script>
tag.
This shows that hypermedia controls can bridge the semantic gap. They can tell the client why it might want to make a certain HTTP request.
But for most clients, the distinction between <img>
and <script>
isnât enough information to make a decision. âImageâ and âscriptâ are
very generic bits of application semantics. The application described
by HTML is the World Wide Web, a very flexible application thatâs used
for all sorts of things.
When I think about application semantics, I usually think on a higher level than that. I think about the concepts that separate a wiki from an online store. Theyâre both websites, they both use embedded images and scripts, but they mean very different things.
A hypermedia format doesnât have to be generic like HTML. It can be defined in enough detail to convey the application semantics of a wiki or a store. In the next chapter, Iâll talk about hypermedia formats that are designed to represent one specific type of problem. Outside that problem space, theyâre practically useless. But within their limits, they meet the semantic challenge very well.
[9] There are two HTML specifications you should know about: the HTML 4 spec and the HTML 5 spec. Both are open standards produced by the W3C. HTML 4 has been stable for over 10 years; HTML 5 is a work in progress.
[10] Thatâs in section 12.1.1 of the HTML 4 specification.
Get RESTful Web APIs now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.