Web Client Programming with Perl
Automating Tasks on the WebBy Clinton Wong
1st Edition March 1997
This book is out of print, but it has been made available online through the O'Reilly Open Books Project.
HTTP headers are used to transfer all sorts of information between client and server. There are four categories of headers:
Information not related to the client, server, or HTTP
Preferred document formats and server parameters
Information about the server sending the response
Information on the data being sent between the client and server
General headers and entity headers are the same for both the server and client.
All headers in HTTP messages contain the header name followed by a colon (:), then a space, and the value of the header. Header names are case-insensitive (thus, Content-Type is the same as Content-type). The value of a header can extend over multiple lines by preceding each extra line with at least one space or tab.
This chapter covers the most recent draft of the HTTP 1.1 specification that was available at publication time (draft 7), as well as some headers that are not in the spec but are in common use regardless.
General headers are used in both client requests and server responses. Some may be more specific to either a client or server message.
The Cache-control header specifies desired behavior from a caching system, as used in proxy servers. For example:
Both clients and servers can use the Cache-control header to specify parameters for the cache or to request certain kinds of documents from the cache. The caching directives are specified in a comma-separated list.
Cache request directives are:
Do not cache. The proxy should not send a cached copy of the document and should always request and return the newest copy from the origin-server. The response from the server must not be cached by a proxy.
Remove information promptly after forwarding. The cache should not store anything about the client request or server response. This option prevents the accidental storing of secure or sensitive information in the cache.
max-age = seconds
Do not send responses older than seconds. The cache can send a cached document that has been retrieved within a certain number of seconds from the time it was sent by the origin server.
max-stale [ = seconds ]
The cache can send a cached document that is older than its expiration date. If seconds are given, it must not be expired by more than that time.
min-fresh = seconds
Send data only if still fresh after the specified number of seconds. The cache can send a cached document only if there are at least a certain number of seconds between now and its expiration time.
Do not retrieve new data. The cache can send a document only if it is in the cache, and should not contact the origin-server to see if a newer copy exists. This option is useful when network connectivity from the cache to origin-server is poor.
Cache response directives are:
The document is cacheable by any cache.
The document is not cacheable by a shared cache.
Do not cache the returning document. This prevents caches from returning requested documents when they are stale.
Do not store the returning document. Remove information promptly after forwarding.
Do not convert the entity-body. Useful for applications that require that the message received is exactly what was sent by the server.
The cache must verify the status of stale documents, i.e., the cache cannot blindly use a document that has expired.
Client must revalidate data except for private client caches. Public caches must verify the status of stale documents. Like must-revalidate, excluding private caches.
The document should be considered stale in the specified number of seconds from the time of retrieval.
Specifies options desired for this connection but not for further connections by proxies. For example:
The close option signifies that either the client or server wishes to end the connection (i.e., this is the last transaction). The keep-alive option signifies that the client wishes to keep the connection open. The default behavior of web applications differs between HTTP 1.0 and 1.1.
By default, HTTP 1.1 uses persistent connections, where the connection does not automatically close after a transaction. When an HTTP 1.1 web client no longer has any requests, or the server has reached some preprogrammed limit in spending resources on the client, a Connection: close header indicates that no more transactions will proceed, and the connection closes after the current one. An HTTP 1.1 client or server that doesn't support persistent connections should always use the Connection: close header.
HTTP 1.0, on the other hand, does not have persistent connections by default. If a 1.0 client wishes to use persistent connections, it uses the keep-alive parameter. A Connection: keep-alive header is issued by both HTTP 1.0 clients and servers for each transaction under persistent connections. The last transaction does not have a Connection: keep-alive header, and behaves like a Connection: close header under HTTP 1.1. HTTP 1.0 servers that do not support persistent connections will not have a Connection: keep-alive header in their response, and the client should disconnect after the first transaction completes.
Use of the keep-alive parameter is known to cause problems with proxy servers that do not understand persistent connections for HTTP 1.0. If a proxy server blindly forwards the Connection: keep-alive header, the origin-server and initial client are using persistent connections while the proxy server is not. The origin server maintains the network connection when the proxy server expects a disconnect; timing problems follow.
See Chapter 3, Learning HTTP, for more information on persistent connections.
There are three formats that can be used to express the date. The preferred date format is RFC 1123. For example:
Mon, 06 May 1996 04:57:00 GMT
The preferred RFC 1123 format specifies all dates in a fixed length string in Greenwich Mean Time (GMT). GMT is always used in HTTP to prevent any misunderstandings among computers communicating in different time zones. The valid days are: Mon, Tue, Wed, Thu, Fri, Sat, and Sun. The months are: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, and Dec. Leading zeros are padded with whitespace.
For backwards compatibility, the RFC 850 and ANSI C asctime( ) formats are also acceptable:
Monday, 06-May-96 04:57:00 GMT
Mon May 6 04:57:00 1996
The RFC 1036 format is similar to the one in RFC 1123, except that the string length varies, depending on the day of the week, and the year is specified in two digits instead of four. This makes date parsing more difficult. It is recommended that web clients use the previous format (RFC 1123) instead of this one. The valid days are: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. The months are: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, and Dec. Leading zeros are padded with whitespace.
ANSI C's asctime( ) format is not encouraged, since there can be misunderstandings about the time zone used by the computer. The valid days are: Mon, Tue, Wed, Thu, Fri, Sat, and Sun. The months are: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, and Dec. Leading zeros are padded with whitespace.
Despite a heavy preference for RFC 1123's format, current web clients and servers should be able to recognize all three formats. However, when designing web programs, it is desirable to use RFC 1123 when generating dates. Future versions of HTTP may not support the latter two formats.
The MIME-Version header specifies the version of MIME (Multipurpose Internet Mail Extensions) used in the HTTP transaction. This header indicates that the entity-body conforms to a particular version of MIME. If the transaction involves MIME-encoded data, but this header is omitted, the default value is assumed to be 1.0.
Unfortunately, some servers use this header for all transactions, regardless of the entity-body's actual format. For this reason, the HTTP/1.0 protocol suggests that this header should be ignored. If this header is encountered, the entity-body may not have any MIME messages.
The Pragma header specifies directives for proxy and gateway systems. Since many proxy systems may exist between a client and server, Pragma headers must pass through each proxy. When the Pragma header reaches the server, the header may be ignored by the server software.
The only directive defined in HTTP/1.0 is the no-cache directive. It is used to tell caching proxies to contact the server for the requested document, instead of using its local cache. This allows the client to request the most up-to-date document from the original web server, without receiving a cached copy from an intermediate proxy server.
The Pragma header is an HTTP 1.0 feature, and is maintained in HTTP 1.1 for backward compatibility. No new Pragma directives will be defined in the future.
The Transfer-Encoding header specifies that the message is encoded. This is not the same as content-encoding (an entity-body header, discussed later), since transfer-encodings are a property of the message, not of the entity-body. For example:
In the HTTP 1.1 specification, chunked is the only encoding method supported.
The chunked transfer-encoding encodes the message as a series of chunks followed by entity-headers, as shown in Figure A-1. The chunks and entity-headers are in a client's request entity-body or server response entity-body. Each chunk contains a chunk size specified in base 16, followed by CRLF. After that, the chunk body, whose length is specified in the chunk size, is presented, followed by a CRLF. Consecutive chunks are specified one after another, with the last chunk having a length of zero followed by CRLF. Entity-headers follow the chunks, terminated by a CRLF on a line by itself.
Figure A-1. Chunked transfer encoding
Using the Upgrade header, the client can specify additional protocols that it understands, and that it would prefer to talk to the server with an alternate protocol. If the server wishes to use the alternate protocol, it returns a response code of 101 and indicates which protocol it is upgrading to, with the Upgrade header. After the terminating CRLF in the server's header response, the protocol switches.
Portion of client request:
Portion of server response:
HTTP/1.1 101 Upgrading Protocols
Via: protocol host
The Via header is updated by proxy servers as messages pass from client to server and from server to client. Each proxy server appends its protocol and protocol version, hostname, port number, and comment to a comma-separated list on the Via header. If the Via header does not exist, the first proxy creates it. This information is useful for debugging purposes. If the protocol name is HTTP, it can be omitted. For HTTP, a port number of 80 can be omitted. Comments are optional.
Via: 1.1 proxy.ora.com, 1.0 proxy.internic.gov
See the discussion of the TRACE method in Chapter 3 for more information.
Client Request Headers
Client header data communicates the client's configuration and preferred document formats to the server. Request headers are used in a client message to provide information about the client.
Accept: type/subtype qvalue
Specifies media types that the client prefers to accept. For example:
Accept: text/*, image/gif
Multiple media types can be listed separated by commas. The optional qvalue represents, on a scale of 0 to 1, an acceptable quality level for accept types. See Appendix B, Reference Tables, for a listing of some commonly-accepted media types. See the section "Media Types" in Chapter 3 for more information.
Accept-Charset: character_set qvalue
Specifies the character sets that the client prefers. Multiple character sets can be listed separated by commas. The optional qvalue represents, on a scale of 0 to 1, an acceptable quality level for nonpreferred character sets. If this header is not specified, the server assumes the default of US-ASCII and ISO-8859-1 (a superset of US-ASCII), which are both specified in RFC 1521. For a list of character sets, refer to Appendix B. For example:
Through the Accept-Encoding header, a client may specify what encoding algorithms it understands. If this header is omitted, the server will send the requested entity-body without any additional encoding. Encoding mechanisms can be used to reduce consumption of scarce resources, at the expense of less expensive resources. For example, large files may be compressed to reduce transmission time over slow network connections.
In the HTTP/1.0 specification, two encoding mechanisms are defined: x-gzip and x-compress. Multiple encoding schemes can be listed, separated by commas. For reasons of compatibility with historical practice, gzip and compress should be considered the same as x-gzip and x-compress.
Jean-Loup Gailly's GNU zip compression scheme
Modified Lempel-Ziv compression scheme
There is no guarantee that the requested encoding mechanism has been applied to the entity-body returned by the server. If the client specifies an Accept-encoding header, it should examine the server's Content-encoding header to see if an encoding mechanism was applied. If the Content-encoding header has been omitted, no encoding mechanism was applied.
Accept-Language: language qvalue
Specifies the languages that the client prefers. If a client wants to to specify a preference for a particular language, it is done in the Accept-Language header. If a server contains the same document in multiple languages, it will send the document in the language of the client's preference, when available. For example:
Multiple languages can be listed separated by commas. The optional qvalue represents, on a scale of 0 to 1, an acceptable quality level for nonpreferred languages. Languages are written with their two-letter abbreviations (e.g., en for English, de for German, fr for French, etc.). See Appendix B for a listing of languages.
Authorization: scheme credentials
Provides the client's authorization to access data at a URI. When a requested document requires authorization, the server returns a WWW-Authenticate header describing the type of authorization required. The client then repeats the request with the proper authorization information.
The HTTP/1.0 specification defines the BASIC authorization scheme, where the authorization parameter is the string of username:password encoded in base 64. For example, for the username of "webmaster" and a password of "zrma4v," the authorization header would look like this:
Authorization: BASIC d2VibWFzdGVyOnpycW1hNHY=
The value decodes into webmaster:zrma4v.
See Chapter 3 for more information on using the Authorization header.
Contains a name/value pair of information stored for that URL. For example:
Multiple cookies can be specified, separated by semicolons. For browsers supporting Netscape persistent cookies--not included in the HTTP standard. See Chapter 3 for more information on cookies.
An issue arises with proxy servers in regard to the headers. Both the Set-Cookie and Cookie headers should be propagated through the proxy, even if a page is cached or has not been modified (according to the If-Modified-Since condition). The Set-Cookie header should also never be cached by the proxy.
Gives the email address of the user executing the client. The From header helps the server identify the source of malformed requests or excessive resource usage. For example:
This header should be sent when possible, but should not be sent without the user's consent, in the interest of privacy. However, when running clients that use excessive network or server resources, it is advisable to include this header, in the event that an administrator would like to contact the client user.
Host: hostname port
The hostname and port number of the server contacted by the client. Useful for software multihoming. For example:
Host: www.ora.com 80
Clients must supply this information in HTTP 1.1, so servers with multiple hostnames can easily differentiate between ambiguous URLs.
Specifies that the URI data is to be sent only if it has been modified since the date given as the value of this header. This is useful for client-side caching. For example:
If-Modified-Since: Mon, 04 May 1996 12:17:34 GMT
If the document has not been modified, the server returns a code of 304, indicating that the client should use the local copy. The specified date should follow the format described under the Date header. See the "Client Caching" section in Chapter 3 for more information.
A conditional requesting the entity only if it matches the given entity tags (see the ETag entity header). An asterisk ( * ) matches any entity, and the transaction continues only if the entity exists. See the "Client Caching" section in Chapter 3 for more information.
A conditional requesting the entity only if it does not match any of the given entity tags (see the ETag entity header). An asterisk ( * ) matches any entity; if the entity doesn't exist, the transaction continues. See the "Client Caching" section in Chapter 3 for more information.
If-Range: entity_tag date
A conditional requesting only the portion of the entity that is missing, if it has not been changed, and the entire entity if it has. Used in conjunction with the Range header to indicate the entity tag or last modified time of a document on the server. For example:
If-Range: Mon, 04 May 1996 12:17:34 GMT
If the document has not been modified, the server returns the byte range given by the Range header; otherwise, it returns all of the new document. Either an entity tag or a date can be used to identify the partial entity already received; see the Date header for information on the format for dates. See the section "Retrieving Content" in Chapter 3 for more information.
Specifies that the entity-body should be sent only if the document has not been modified since a given date. For example:
If-Unmodified-Since: Tue, 05 May 1996 04:03:56 GMT
The specified date should follow the format described under the Date header. See the "Client Caching" section in Chapter 3 for more information.
Limits the number of proxies or gateways that can forward the request. Useful for debugging with the TRACE method, avoiding infinite loops. For example:
A proxy server that receives a Max-Forwards value of zero (0) should return the request headers to the client in its response entity-body. See the discussion of the TRACE method in Chapter 3 for more information.
Used for a client to identify itself to a proxy requiring authorization.
Range: bytes= n-m
Specifies the partial range(s) requested from the document. For example:
Multiple ranges can be listed, separated by commas. If the first digit in the comma-separated byte range(s) is missing, the range is assumed to count from the end of the document. If the second digit is missing, the range is byte n to the end of the document. The first byte is byte 0. See Chapter 3 for more information.
Gives the URL of the document that refers to the requested URL (i.e., the source document of the link). For example:
See Chapter 3 for more information.
Gives identifying information about the client program. For example:
User-Agent: Mozilla 3.0b
See Chapter 3 for more information.
Server Response Headers
The response headers described here are used in server responses to communicate information about the server and how it may handle requests.
Indicates the acceptance of range requests for a URI, specifying either the range unit (e.g., bytes) or none if no range requests are accepted. For example:
Indicates the age of the document in seconds. For example:
Proxy-Authenticate: scheme realm
Indicates the authentication scheme and parameters applicable to the proxy for this URI and the current connection. Used with response 407 (Proxy Authentication Required).
Indicates methods supported by the server as a comma-separated list. Intended for declaration of nonstandard methods supported at this site. For example:
Public: GUNZIP-GET, UNCOMPRESS-GET
For methods applicable only to an individual URI, see the Allow header.
Specifies a time when the server can handle requests. Used with response code 503 (Service Unavailable). It contains either an integer number of seconds or a GMT date and time (as described by the Date header formats). If the value is an integer, it is interpreted as the number of seconds to wait after the request was issued. For example:
Retry-After: Sat, 18 May 1996 06:59:37 GMT
Contains the name and version number of the server. For example:
If security holes are discovered in a particular server, the Server header information may be used to indicate a site's vulnerability. For that reason, it's a good idea for servers to make it easy for administrators to suppress sending this header in the server configuration, if their server has a well-known bug.
Set-Cookie: name=value options
Contains a name/value pair of information to retain for this URL. For browsers supporting Netscape persistent cookies--not included in the HTTP standard. For example:
expires = date
The cookie becomes invalid after the specified date.
path = pathname
The URL range for which the cookie is valid.
domain = domain_name
The domain name range for which the cookie is valid.
Return the cookie only under a secure connection.
Specifies that the entity has multiple sources and may therefore vary according to specified list of request header(s).
Multiple headers can be listed, separated by commas. An asterisk ( * ) means that another factor, other than the request headers, may affect the document that is returned.
Warning: code host string
Indicates information additional to that in the status code, for use by caching proxies. For example:
Warning: Response stale
The host field contains the name or pseudonym of the server host, with an optional port number. The two-digit warning codes and their recommended descriptive strings are:
The response data is known to be stale.
The response data is known to be stale because the proxy failed to revalidate the data.
The cache is disconnected from the network.
The data is older than 24 hours, and the cache heuristically chose a freshness lifetime greater than 24 hours.
The proxy has changed the encoding or media type of the document, as specified by the Content-Encoding or Content-Type headers.
Arbitrary information to be logged or presented to the user.
WWW-Authenticate: scheme realm
A request for authentication, used with the 401 (Unauthorized) response code. It specifies the authorization scheme and realm of authorization required from a client at the requested URI. Many different authorization realms can exist on a server. A common authorization scheme is BASIC, which requires a username and password. For example:
WWW-Authenticate: BASIC realm="Admin"
When returned to the client, this header indicates that the BASIC type of authorization data in the appropriate realm should be returned in the client's Authorization header.
Entity headers are used in both client requests and server responses. They supply information about the entity body in an HTTP message.
Contains a comma-separated list of methods that are allowed at a specified URI. In a server response it is used with code 405 (Method Not Allowed) to inform the client of valid methods available for the requested information. For example:
Allow: GET, HEAD
Some methods may not apply to a URL, and the server must verify that the methods supplied by the client makes sense with the given URL.
Specifies the base URL for resolving relative URLs. The base URL must be written as an absolute URL. For example:
Specifies the encoding scheme(s) used for the transferred entity-body. Values are gzip (or x-gzip) and compress (or x-compress). If multiple encoding schemes are specified (in a comma-separated list), they must be listed in the order in which they were applied to the source data.
The server should attempt to use an encoding scheme used by the client's Accept-Encoding header. The client may use this information to determine how to decode the document after it is transferred.
See the description of the Accept-Encoding header earlier in this appendix for a listing of possible values. For example:
Specifies the language(s) that the transferred entity-body is intended for. Languages are represented by their two-letter abbreviations (e.g., en for English, fr for French). The server should attempt to use a language specified by the client's Accept-Language header. (See Appendix B for a listing of possible values.) This header is useful when a client specifies a preference for one language over another for a given URL. For example:
This header specifies the length of the data (in bytes) of the transferred entity-body. For example:
Due to the dynamic nature of some requests, the content length is sometimes unknown and this header is omitted.
Supplies the URL for the entity, in cases where a document has multiple entities with separately accessible locations. The URL can be either an absolute or relative URL. For example:
See the section "Retrieving Content" in Chapter 3 for more information.
Supplies an MD5 digest of the entity, for checking the integrity of the message upon receipt.
Content-Range: bytes n-n/m
Specifies where the accompanying partial entity-body should be inserted, and the total size of the full entity-body. For example:
Content-Range: bytes 6143-7166/15339
See the section "Retrieving Content" in Chapter 3 for more information.
Specifies any transformations that occurred to the data for transport over the network. For example:
Between web servers and clients, this header is usually not needed, since no encoding is needed. Possible encoding schemes are:
Data represented by short lines of US-ASCII data.
Data represented by short lines, but may contain non-ASCII data. (High-order bit may be set.)
Data may not be in short lines, and can be non-ASCII characters.
Data is encoded in base64 ASCII. (See Section 5.2 of RFC 1521 for details.)
Special characters replaced with an equal sign (=) followed by the ASCII value in hex. (See Section 5.1 of RFC 1521 for complete details.)
Describes the media type and subtype of an entity-body. It uses the same values as the client's Accept header, and the server should return media types that conform with the client's preferred formats. For example:
See the discussion of media types in Chapter 3 for more information.
Defines the entity tag for use with the If-Match and If-None-Match request headers. See the discussion of client caching in Chapter 3 for more information.
Specifies the time when a document may change, or when its information becomes invalid. After that time, the document may or may not change or be deleted. The value is a date and time in a valid format as described for the Date header. For example:
Expires: Sat, 20 May 1995 03:32:38 GMT
This is useful for cache management. The Expires header means that it is unlikely that the document will change before the given time. This does not imply that the document will be changed or deleted at that time. It is only an advisory that the document will not be modified until the specified time.
See the discussion on client caching in Chapter 3 for more information.
Specifies when the specified URL was last modified. The value is a date and time in a valid format as described for the Date header. If a client has a copy of the URL in its cache that is older than the last-modified date, it should be refreshed. See the discussion on client caching in Chapter 3 for more information. For example:
Last-Modified: Sat, 20 May 1995 03:32:38 GMT
Specifies the new location of a document, usually with response code 201 (Created), 301 (Moved Permanently), or 302 (Moved Temporarily). The URL given must be written as an absolute URL. For example:
Specifies the new location of a document, usually with response code 201 (Created), 301 (Moved Permanently), or 302 (Moved Temporarily). For example:
An optional vary parameter may also be used in this header, indicating multiple documents at the URI in the following categories: type, language, version, encoding, charset, and user-agent. Sending these parameters in a server response prompts the client to specify its preferences appropriately in the new request. The use of the URI header is deprecated in HTTP 1.1 in favor of the Location, Content-Location, and Vary headers.
Summary of Support Across HTTP Versions
The following is a listing of all HTTP headers supported by each version of HTTP so far.
Back to: Chapter Index
Back to: Web Client Programming with Perl
© 2001, O'Reilly & Associates, Inc.