Chapter 4. Metadata Design

HTTP Headers

Various forms of metadata may be conveyed through the entity headers contained within HTTP’s request and response messages. HTTP defines a set of standard headers, some of which provide information about a requested resource. Other headers indicate something about the representation carried by the message. Finally, a few headers serve as directives to control intermediary caches.

This brief chapter suggests a set of rules to help REST API designers work with HTTP’s standard headers.

Rule: Content-Type must be used

The Content-Type header names the type of data found within a request or response message’s body. The value of this header is a specially formatted text string known as a media type, which is the subject of Media Types. Clients and servers rely on this header’s value to tell them how to process the sequence of bytes in a message’s body.

Rule: Content-Length should be used

The Content-Length header gives the size of the entity-body in bytes. In responses, this header is important for two reasons. First, a client can know whether it has read the correct number of bytes from the connection. Second, a client can make a HEAD request to find out how large the entity-body is, without downloading it.

Rule: Last-Modified should be used in responses

The Last-Modified header applies to response messages only. The value of this response header is a timestamp that indicates the last time that something happened to alter the representational state of the resource. Clients and cache intermediaries may rely on this header to determine the freshness of their local copies of a resource’s state representation. This header should always be supplied in response to GET requests.

Rule: ETag should be used in responses

The value of ETag is an opaque string that identifies a specific “version” of the representational state contained in the response’s entity. The entity is the HTTP message’s payload, which is composed of a message’s headers and body. The entity tag may be any string value, so long as it changes along with the resource’s representation. This header should always be sent in response to GET requests.

Clients may choose to save an ETag header’s value for use in future GET requests, as the value of the conditional If-None-Match request header. If the REST API concludes that the entity tag hasn’t changed, then it can save time and bandwidth by not sending the representation again.

Warning

Generating an ETag from a machine-specific value is a bad idea. Specifically don’t generate ETag values from an inconsistent source, like a host-specific notion of a file’s last modified time. It may result in different ETag values being attributed to the same representation, which is likely to confuse the API’s clients and intermediaries.

Rule: Stores must support conditional PUT requests

A store resource uses the PUT method for both insert and update, which means it is difficult for a REST API to know the true intent of a client’s PUT request. Through headers, HTTP provides the necessary support to help an API resolve any potential ambiguity. A REST API must rely on the client to include the If-Unmodified-Since and/or If-Match request headers to express their intent. The If-Unmodified-Since request header asks the API to proceed with the operation if, and only if, the resource’s state representation hasn’t changed since the time indicated by the header’s supplied timestamp value. The If-Match header’s value is an entity tag, which the client remembers from an earlier response’s ETag header value. The If-Match header makes the request conditional, based upon an exact match of the header’s supplied entity tag value and the representational state’s current entity tag value, as stored or computed by the REST API.

The following example illustrates how a REST API can support conditional PUT requests using these two headers.

Two client programs, client#1 and client#2, use a REST API’s /objects store resource to share some information between them. Client#1 sends a PUT request in order to store some new data that it identifies with a URI path of /objects/2113. This is a new URI that the REST API has never seen before, meaning that it does not map to any previously stored resource. Therefore, the REST API interprets the request as an insert and creates a new resource based on the client’s provided state representation and then it returns a 201 (“Created”) response.

Some time later, client#2 decides to share some data and it requests the exact same storage URI (/objects/2113). Now the REST API is able to map this URI to an existing resource, which makes it unclear about the client request’s intent. The REST API has not been given enough information to decide whether or not it should overwrite client#1’s stored resource state with the new data from client#2. In this scenario, the API is forced to return a 409 (“Conflict”) response to client#2’s request. The API should also provide some additional information about the error in the response’s body.

If client#2 decides to update the stored data, it may retry its request to include the If-Match header. However, if the supplied header value does not match the current entity tag value, the REST API must return error code 412 (“Precondition Failed”). If the supplied condition does match, the REST API must update the stored resource’s state, and return a 200 (“OK”) or 204 (“No Content”) response. If the response does include an updated representation of the resource’s state, the API must include values for the Last-Modified and ETag headers that reflect the update.

Note

HTTP supports conditional requests with the GET, POST, and DELETE methods in the same fashion that is illustrated by the example above. This pattern is the key that allows writable REST APIs to support collaboration between their clients.

Rule: Location must be used to specify the URI of a newly created resource

The Location response header’s value is a URI that identifies a resource that may be of interest to the client. In response to the successful creation of a resource within a collection or store, a REST API must include the Location header to designate the URI of the newly created resource.

In a 202 (“Accepted”) response, this header may be used to direct clients to the operational status of an asynchronous controller resource.

Rule: Cache-Control, Expires, and Date response headers should be used to encourage caching

Caching is one of the most useful features built on top of HTTP. You can take advantage of caching to reduce client-perceived latency, to increase reliability, and to reduce the load on an API’s servers. Caches can be anywhere. They can be in the API’s server network, content delivery networks (CDNs), or the client’s network.

When serving a representation, include a Cache-Control header with a max-age value (in seconds) equal to the freshness lifetime. For example:

Cache-Control: max-age=60, must-revalidate

To support legacy HTTP 1.0 caches, a REST API should include an Expires header with the expiration date-time. The value is a time at which the API generated the representation plus the freshness lifetime. REST APIs should also include a Date header with a date-time of the time at which the API returned the response. Including this header helps clients compute the freshness lifetime as the difference between the values of the Expires and Date headers. For example:

Date: Tue, 15 Nov 1994 08:12:31 GMT
Expires: Thu, 01 Dec 1994 16:00:00 GMT

Rule: Cache-Control, Expires, and Pragma response headers may be used to discourage caching

If a REST API’s response must not cached, add Cache-Control headers with the value no-cache and no-store. In this case, also add the Pragma: no-cache and Expires: 0 header values to interoperate with legacy HTTP 1.0 caches.

Rule: Caching should be encouraged

The no-cache directive will prevent any cache from serving cached responses. REST APIs should not do this unless absolutely necessary. Using a small value of max-age as opposed to adding no-cache directive helps clients fetch cached copies for at least a short while without significantly impacting freshness.

Rule: Expiration caching headers should be used with 200 (“OK”) responses

Set expiration caching headers in responses to successful GET and HEAD requests. Although POST is cacheable, most caches treat this method as non-cacheable. You need not set expiration headers on other methods.

Rule: Expiration caching headers may optionally be used with 3xx and 4xx responses

In addition to successful responses with the 200 (“OK”) response code, consider adding caching headers to 3xx and 4xx responses. Known as negative caching, this helps reduce the amount of redirecting and error-triggering load on a REST API.

Rule: Custom HTTP headers must not be used to change the behavior of HTTP methods

You can optionally use custom headers for informational purposes only. Implement clients and servers such that they do not fail when they do not find expected custom headers.

If the information you are conveying through a custom HTTP header is important for the correct interpretation of the request or response, include that information in the body of the request or response or the URI used for the request. Avoid custom headers for such usages.

Media Types

To identify the form of the data contained within a request or response message body, the Content-Type header’s value references a media type.[25]

Media Type Syntax

Media types have the following syntax:

type "/" subtype *( ";" parameter )

The type value may be one of: application, audio, image, message, model, multipart, text, or video. A typical REST API will most often work with media types that fall under the application type. In a hierarchical fashion, the media type’s subtype value is subordinate to its type.

Note that parameters may follow the type/subtype in the form of attribute=value pairs that are separated by a leading semi-colon (;) character. A media type’s specification may designate parameters as either required or optional. Parameter names are case-insensitive. Parameter values are normally case-sensitive and may be enclosed in double quote (“ ”) characters. When more than one parameter is specified, their ordering is insignificant.

The two examples below demonstrate a Content-Type header value that references a media type with a single charset parameter:

Content-type: text/html; charset=ISO-8859-4
Content-type: text/plain; charset="us-ascii"

Registered Media Types

The Internet Assigned Numbers Authority[26] (IANA) governs the set of registered media types and provides links to each type’s published specification (RFC). The IANA allows anyone to propose a new media type by filling out the “Application for Media Type” form found at http://www.iana.org/cgi-bin/mediatypes.pl.

Some commonly used registered media types are listed below:

text/plain

A plain text format with no specific content structure or markup.[27]

text/html

Content that is formatted using the HyperText Markup Language (HTML).[28]

image/jpeg

An image compression method that was standardized by the Joint Photographic Experts Group (JPEG).[29]

application/xml

Content that is structured using the Extensible Markup Language (XML).[30]

application/atom+xml

Content that uses the Atom Syndication Format (Atom), which is an XML-based format that structures data into lists known as feeds.[31]

application/javascript

Source code written in the JavaScript programming language.[32]

application/json

The JavaScript Object Notation (JSON) text-based format that is often used by programs to exchange structured data.[33]

Vendor-Specific Media Types

Media types use the subtype prefix “vnd” to indicate that they are owned or controlled by a “vendor.” Vendor-specific media types convey a clear description of a message’s content to the programs that understand their meaning. Unlike their more common counterparts, vendor-specific media types impart application-specific metadata that makes a message more meaningful to the web component that receives it.

Vendor-specific media types may also be registered with the IANA. For example, the following vendor-specific types are among the many listed in the IANA’s registry (http://www.iana.org/assignments/media-types):

application/vnd.ms-excel
application/vnd.lotus-notes
text/vnd.sun.j2me.app-descriptor

Media Type Design

Client developers are encouraged to rely on the self-descriptive features of a REST API. In other words, client programs should hardcode as few API-specific details as possible. This goal influences many aspects of a REST API’s design, including opaque URIs, hypermedia-based actions with resource state awareness, and descriptive media types.

Rule: Application-specific media types should be used

REST APIs treat the body of an HTTP request or response as part of an application-specific interaction. While the body may be formatted using languages such as JSON or XML, it usually has semantics that require special processing beyond simply parsing the language’s syntax.

As an example, consider a REST API URI such as http://api.soccer.restapi.org/players/2113 that responds to GET requests with a representation of a player resource that is formatted using JSON. If the Content-Type header field value declares that the response’s media type is application/json, it has accurately conveyed the body content’s syntax but has disregarded the semantics and structure of the player representation. The response’s Content-Type header simply tells a client that it should expect some JSON-formatted text.

Alternatively, the response’s Content-Type header field should communicate that the body contains a representation of a player document that is formatted with JSON. To help achieve this goal, the WRML framework, which was introduced in the section WRML, uses a descriptive media type: application/wrml. The example below shows WRML’s media type used to describe a player form that is formatted using JSON:

# NOTE: the line breaks below are for the sake of visual clarity.

application/wrml;  1
    format="http://api.formats.wrml.org/application/json";  2
    schema="http://api.schemas.wrml.org/soccer/Player"   3
1

The WRML media type.[34]

2

The required format parameter’s value identifies a document resource that describes the JSON format itself.

3

The required schema parameter’s value identifies a separate document that details the Player resource type’s form, which is independent of the media type’s format parameter’s value.

This media type may appear excessive when compared to simpler ones like application/json. However, this is a worthwhile trade-off since this media type communicates—directly to clients—distinct and complementary bits of information regarding the content of a message. The application/wrml media type’s self-descriptive and pluggable design reduces the need for information to be communicated out-of-band and then hardcoded by client developers.

Note

See Media Type Representation, which describes how this media type’s format and schema documents should be represented.

Media Type Format Design

Most media types identify a format using a simple string, like application/json. Instead, by using a format parameter with a URI value, the WRML media type directs client programs to a cacheable document that provides links to other documents related to the format. In the example above, the representation of the document referenced by the format parameter (http://api.formats.wrml.org/application/json) contains links to related web resources, such as http://www.json.org and http://www.rfc-editor.org/rfc/rfc4627.txt.

More importantly, by leveraging REST’s code-on-demand constraint, the format document’s representation can provide links to formatting and parsing code, which clients can download and execute to serialize and deserialize an HTTP message body’s content. By providing this code, available for various programming languages and runtime environments, an API can programmatically teach its clients how to interoperate with its representation formats. The future-proof nature of this design may prove especially useful when a REST API wishes to adopt a new format that is not yet widely supported by its clients.

The section Rule: A consistent form should be used to represent media type formats, outlines the structure of a format document’s representation.

Media Type Schema Design

As discussed next in Chapter 5, a resource’s state representation consists of fields and links. For a given “class” of resource, the set of expected fields and context-sensitive links can be described by a schema document. The WRML media type’s schema parameter references a cacheable schema document, which describes a resource type’s fields and links; independent of any specific representational format. This separation of concerns allows multiple representation formats to be negotiated by clients and supported by REST APIs with relative ease. With a set of standard primitive types, outlined in Field Representation, a schema document can describe a resource representation’s fields in a format-independent manner.

The section Rule: A consistent form should be used to represent media type schemas, details the structure of a schema document’s representation.

Media Type Schema Versioning

The different versions of a given schema should be organized as different schema documents, with distinct URIs. This design is borrowed from the approach traditionally used by the W3C[35] and IETF[36] for versioning the URIs of Internet Drafts on their way to becoming approved standards. The example below shows the URI of a schema document that details the fields and links of a soccer Player resource type:

http://api.schemas.wrml.org/soccer/Player-2

The -2 suffix designates the version number of the Player resource type’s schema. As a rule, the current version of the resource type’s schema should always be made available through a separate resource identifier, without a numeric suffix. The example below demonstrates the design of the Player resource type’s current schema URI:

http://api.schemas.wrml.org/soccer/Player

The URI of a resource type’s current schema version always identifies the concept of the most recent version. A schema document URI that ends with a number permanently identifies a specific version of the schema. Therefore the latest version of a schema is always modeled by two separate resources which conceptually overlap while the numbered version is also the current one. This overlap results in the two distinct resources, with two separate URIs, consistently having the same state representation.

Rule: Media type negotiation should be supported when multiple representations are available

Allow clients to negotiate for a given format and schema by submitting an Accept header with the desired media type. For example:

# NOTE: the line breaks below are for the sake of visual clarity.

Accept:  application/wrml;
             format="http://api.formats.wrml.org/text/html";   1
             schema="http://api.schemas.wrml.org/soccer/Team"  2
1

Using media type negotiation clients can select a format.

2

Using media type negotiation clients can select the schema version that will work best for them.

Additionally, to facilitate browser-based viewing and debugging of a REST API’s responses, consider supporting raw media types as shown in the example below:

Accept:  application/json

This will allow web browser add-ons such as JSONView to render a REST API’s responses as JSON.

Rule: Media type selection using a query parameter may be supported

To enable simple links and easy debugging, REST APIs may support media type selection via a query parameter named accept with a value format that mirrors that of the Accept HTTP request header. For example:

GET /bookmarks/mikemassedotcom?accept=application/xml

This is a more precise and generic approach to media type identification that should be preferred over the common alternative of appending a virtual file extension like .xml to the URI’s path. The virtual file extension approach binds the resource and its representation together, implying that they are one and the same.

Warning

Media type selection (or negotiation) via a query parameter is a form of tunneling that conveys metadata in the URI rather than in HTTP’s intended slot: the Accept header. Therefore it should be used with careful consideration.

Recap

This chapter covered the design rules for a REST API’s metadata conveyed through HTTP headers and media types. Table 4-1 summarizes the vocabulary terms that were used in this chapter.

Table 4-1. Vocabulary review
TermDescription

Atom Syndication Format (Atom)

An XML-based format that structures data into lists known as “feeds.”

Conditional request

A client-initiated interaction with a precondition that the server is expected to honor.

Entity

An HTTP request or response payload, which is metadata in header fields and content in a body.

Entity tag

An opaque string value that designates the “version” of a given HTTP response message’s headers and body.

Extensible Markup Language (XML)

A standardized application profile of SGML that is used by many applications to exchange data.

Internet Assigned Numbers Authority (IANA)

The entity with many governance-related duties, which include overseeing global IP address allocation and media type registration.

Media type negotiation

A client-initiated process that selects the form of a response message’s representation.

Media type schema

A Web-oriented description of a form that is composed of fields and links.

Negative caching

Directing intermediaries to serve copies of responses that did not result in a 2xx status code.

Vendor-specific media type

A form descriptor that is owned and controlled by a specific organization.

Table 4-2 recaps a REST API’s use of the HTTP headers.

Table 4-2. HTTP response header summary
CodePurpose

Content-Type

Identifies the entity body’s media type

Content-Length

The size (in bytes) of the entity body

Last-Modified

The date-time of last resource representation’s change

ETag

Indicates the version of the response message’s entity

Cache-Control

A TTL-based caching value (in seconds)

Location

Provides the URI of a resource



[25] Media types were originally known as “MIME types,” which stood for Multipurpose Internet Mail Extensions.

[34] The application/wrml media type’s IANA registration is pending, see http://www.wrml.org for the most up-to-date information.

[35] World Wide Web Consortium (W3C), http://www.w3.org.

[36] The Internet Engineering Task Force (IETF), http://www.ietf.org.

Get REST API Design Rulebook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.