Search the Catalog
Web Client Programming with Perl

Web Client Programming with Perl

Automating Tasks on the Web

By Clinton Wong
1st Edition March 1997

This book is out of print, but it has been made available online through the O'Reilly Open Books Project.

Appendix A
HTTP Headers

HTTP headers are used to transfer all sorts of information between client and server. There are four categories of headers:




Information not related to the client, server, or HTTP


Preferred document formats and server parameters


Information about the server sending the response


Information on the data being sent between the client and server

General headers and entity headers are the same for both the server and client.

All headers in HTTP messages contain the header name followed by a colon (:), then a space, and the value of the header. Header names are case-insensitive (thus, Content-Type is the same as Content-type). The value of a header can extend over multiple lines by preceding each extra line with at least one space or tab.

This chapter covers the most recent draft of the HTTP 1.1 specification that was available at publication time (draft 7), as well as some headers that are not in the spec but are in common use regardless.

General Headers

General headers are used in both client requests and server responses. Some may be more specific to either a client or server message.

Cache-Control: directives

The Cache-control header specifies desired behavior from a caching system, as used in proxy servers. For example:

Cache-control: no-cache

Both clients and servers can use the Cache-control header to specify parameters for the cache or to request certain kinds of documents from the cache. The caching directives are specified in a comma-separated list.

Cache request directives are:




Do not cache. The proxy should not send a cached copy of the document and should always request and return the newest copy from the origin-server. The response from the server must not be cached by a proxy.


Remove information promptly after forwarding. The cache should not store anything about the client request or server response. This option prevents the accidental storing of secure or sensitive information in the cache.

max-age = seconds

Do not send responses older than seconds. The cache can send a cached document that has been retrieved within a certain number of seconds from the time it was sent by the origin server.

max-stale [ = seconds ]

The cache can send a cached document that is older than its expiration date. If seconds are given, it must not be expired by more than that time.

min-fresh = seconds

Send data only if still fresh after the specified number of seconds. The cache can send a cached document only if there are at least a certain number of seconds between now and its expiration time.


Do not retrieve new data. The cache can send a document only if it is in the cache, and should not contact the origin-server to see if a newer copy exists. This option is useful when network connectivity from the cache to origin-server is poor.

Cache response directives are:




The document is cacheable by any cache.


The document is not cacheable by a shared cache.


Do not cache the returning document. This prevents caches from returning requested documents when they are stale.


Do not store the returning document. Remove information promptly after forwarding.


Do not convert the entity-body. Useful for applications that require that the message received is exactly what was sent by the server.


The cache must verify the status of stale documents, i.e., the cache cannot blindly use a document that has expired.


Client must revalidate data except for private client caches. Public caches must verify the status of stale documents. Like must-revalidate, excluding private caches.

max-age= seconds

The document should be considered stale in the specified number of seconds from the time of retrieval.

Connection: options

Specifies options desired for this connection but not for further connections by proxies. For example:

Connection: close

The close option signifies that either the client or server wishes to end the connection (i.e., this is the last transaction). The keep-alive option signifies that the client wishes to keep the connection open. The default behavior of web applications differs between HTTP 1.0 and 1.1.

By default, HTTP 1.1 uses persistent connections, where the connection does not automatically close after a transaction. When an HTTP 1.1 web client no longer has any requests, or the server has reached some preprogrammed limit in spending resources on the client, a Connection: close header indicates that no more transactions will proceed, and the connection closes after the current one. An HTTP 1.1 client or server that doesn't support persistent connections should always use the Connection: close header.

HTTP 1.0, on the other hand, does not have persistent connections by default. If a 1.0 client wishes to use persistent connections, it uses the keep-alive parameter. A Connection: keep-alive header is issued by both HTTP 1.0 clients and servers for each transaction under persistent connections. The last transaction does not have a Connection: keep-alive header, and behaves like a Connection: close header under HTTP 1.1. HTTP 1.0 servers that do not support persistent connections will not have a Connection: keep-alive header in their response, and the client should disconnect after the first transaction completes.

Use of the keep-alive parameter is known to cause problems with proxy servers that do not understand persistent connections for HTTP 1.0. If a proxy server blindly forwards the Connection: keep-alive header, the origin-server and initial client are using persistent connections while the proxy server is not. The origin server maintains the network connection when the proxy server expects a disconnect; timing problems follow.

See Chapter 3, Learning HTTP, for more information on persistent connections.

Date: dateformat

There are three formats that can be used to express the date. The preferred date format is RFC 1123. For example:

Mon, 06 May 1996 04:57:00 GMT

The preferred RFC 1123 format specifies all dates in a fixed length string in Greenwich Mean Time (GMT). GMT is always used in HTTP to prevent any misunderstandings among computers communicating in different time zones. The valid days are: Mon, Tue, Wed, Thu, Fri, Sat, and Sun. The months are: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, and Dec. Leading zeros are padded with whitespace.

For backwards compatibility, the RFC 850 and ANSI C asctime( ) formats are also acceptable:

Monday, 06-May-96 04:57:00 GMT
Mon May 6 04:57:00 1996

The RFC 1036 format is similar to the one in RFC 1123, except that the string length varies, depending on the day of the week, and the year is specified in two digits instead of four. This makes date parsing more difficult. It is recommended that web clients use the previous format (RFC 1123) instead of this one. The valid days are: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. The months are: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, and Dec. Leading zeros are padded with whitespace.

ANSI C's asctime( ) format is not encouraged, since there can be misunderstandings about the time zone used by the computer. The valid days are: Mon, Tue, Wed, Thu, Fri, Sat, and Sun. The months are: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, and Dec. Leading zeros are padded with whitespace.

Despite a heavy preference for RFC 1123's format, current web clients and servers should be able to recognize all three formats. However, when designing web programs, it is desirable to use RFC 1123 when generating dates. Future versions of HTTP may not support the latter two formats.

MIME-Version: version

The MIME-Version header specifies the version of MIME (Multipurpose Internet Mail Extensions) used in the HTTP transaction. This header indicates that the entity-body conforms to a particular version of MIME. If the transaction involves MIME-encoded data, but this header is omitted, the default value is assumed to be 1.0.

Unfortunately, some servers use this header for all transactions, regardless of the entity-body's actual format. For this reason, the HTTP/1.0 protocol suggests that this header should be ignored. If this header is encountered, the entity-body may not have any MIME messages.


MIME-version: 1.0

Pragma: no-cache

The Pragma header specifies directives for proxy and gateway systems. Since many proxy systems may exist between a client and server, Pragma headers must pass through each proxy. When the Pragma header reaches the server, the header may be ignored by the server software.

The only directive defined in HTTP/1.0 is the no-cache directive. It is used to tell caching proxies to contact the server for the requested document, instead of using its local cache. This allows the client to request the most up-to-date document from the original web server, without receiving a cached copy from an intermediate proxy server.

The Pragma header is an HTTP 1.0 feature, and is maintained in HTTP 1.1 for backward compatibility. No new Pragma directives will be defined in the future.


Pragma: no-cache

Transfer-Encoding: encoding_type

The Transfer-Encoding header specifies that the message is encoded. This is not the same as content-encoding (an entity-body header, discussed later), since transfer-encodings are a property of the message, not of the entity-body. For example:

Transfer-Encoding: chunked

In the HTTP 1.1 specification, chunked is the only encoding method supported.

The chunked transfer-encoding encodes the message as a series of chunks followed by entity-headers, as shown in Figure A-1. The chunks and entity-headers are in a client's request entity-body or server response entity-body. Each chunk contains a chunk size specified in base 16, followed by CRLF. After that, the chunk body, whose length is specified in the chunk size, is presented, followed by a CRLF. Consecutive chunks are specified one after another, with the last chunk having a length of zero followed by CRLF. Entity-headers follow the chunks, terminated by a CRLF on a line by itself.

Figure A-1. Chunked transfer encoding


Upgrade: protocol/version

Using the Upgrade header, the client can specify additional protocols that it understands, and that it would prefer to talk to the server with an alternate protocol. If the server wishes to use the alternate protocol, it returns a response code of 101 and indicates which protocol it is upgrading to, with the Upgrade header. After the terminating CRLF in the server's header response, the protocol switches.

Portion of client request:

Upgrade: HTTP/1.2

Portion of server response:

HTTP/1.1 101 Upgrading Protocols
Upgrade: HTTP/1.2

Via: protocol host

The Via header is updated by proxy servers as messages pass from client to server and from server to client. Each proxy server appends its protocol and protocol version, hostname, port number, and comment to a comma-separated list on the Via header. If the Via header does not exist, the first proxy creates it. This information is useful for debugging purposes. If the protocol name is HTTP, it can be omitted. For HTTP, a port number of 80 can be omitted. Comments are optional.


Via: 1.1, 1.0

See the discussion of the TRACE method in Chapter 3 for more information.

Client Request Headers

Client header data communicates the client's configuration and preferred document formats to the server. Request headers are used in a client message to provide information about the client.

Accept: type/subtype qvalue

Specifies media types that the client prefers to accept. For example:

Accept: text/*, image/gif

Multiple media types can be listed separated by commas. The optional qvalue represents, on a scale of 0 to 1, an acceptable quality level for accept types. See Appendix B, Reference Tables, for a listing of some commonly-accepted media types. See the section "Media Types" in Chapter 3 for more information.

Accept-Charset: character_set qvalue

Specifies the character sets that the client prefers. Multiple character sets can be listed separated by commas. The optional qvalue represents, on a scale of 0 to 1, an acceptable quality level for nonpreferred character sets. If this header is not specified, the server assumes the default of US-ASCII and ISO-8859-1 (a superset of US-ASCII), which are both specified in RFC 1521. For a list of character sets, refer to Appendix B. For example:

Accept-charset: ISO-8859-7

Accept-Encoding: encoding_types

Through the Accept-Encoding header, a client may specify what encoding algorithms it understands. If this header is omitted, the server will send the requested entity-body without any additional encoding. Encoding mechanisms can be used to reduce consumption of scarce resources, at the expense of less expensive resources. For example, large files may be compressed to reduce transmission time over slow network connections.

In the HTTP/1.0 specification, two encoding mechanisms are defined: x-gzip and x-compress. Multiple encoding schemes can be listed, separated by commas. For reasons of compatibility with historical practice, gzip and compress should be considered the same as x-gzip and x-compress.

Encoding Mechanism

Encoded By


Jean-Loup Gailly's GNU zip compression scheme


Modified Lempel-Ziv compression scheme

For example:

Accept-encoding: x-gzip

There is no guarantee that the requested encoding mechanism has been applied to the entity-body returned by the server. If the client specifies an Accept-encoding header, it should examine the server's Content-encoding header to see if an encoding mechanism was applied. If the Content-encoding header has been omitted, no encoding mechanism was applied.

Accept-Language: language qvalue

Specifies the languages that the client prefers. If a client wants to to specify a preference for a particular language, it is done in the Accept-Language header. If a server contains the same document in multiple languages, it will send the document in the language of the client's preference, when available. For example:

Accept-language: en

Multiple languages can be listed separated by commas. The optional qvalue represents, on a scale of 0 to 1, an acceptable quality level for nonpreferred languages. Languages are written with their two-letter abbreviations (e.g., en for English, de for German, fr for French, etc.). See Appendix B for a listing of languages.

Authorization: scheme credentials

Provides the client's authorization to access data at a URI. When a requested document requires authorization, the server returns a WWW-Authenticate header describing the type of authorization required. The client then repeats the request with the proper authorization information.

The HTTP/1.0 specification defines the BASIC authorization scheme, where the authorization parameter is the string of username:password encoded in base 64. For example, for the username of "webmaster" and a password of "zrma4v," the authorization header would look like this:

Authorization: BASIC d2VibWFzdGVyOnpycW1hNHY=

The value decodes into webmaster:zrma4v.

See Chapter 3 for more information on using the Authorization header.

Cookie: name=value

Contains a name/value pair of information stored for that URL. For example:

Cookie: acct=03847732

Multiple cookies can be specified, separated by semicolons. For browsers supporting Netscape persistent cookies--not included in the HTTP standard. See Chapter 3 for more information on cookies.

An issue arises with proxy servers in regard to the headers. Both the Set-Cookie and Cookie headers should be propagated through the proxy, even if a page is cached or has not been modified (according to the If-Modified-Since condition). The Set-Cookie header should also never be cached by the proxy.

From: email_address

Gives the email address of the user executing the client. The From header helps the server identify the source of malformed requests or excessive resource usage. For example:


This header should be sent when possible, but should not be sent without the user's consent, in the interest of privacy. However, when running clients that use excessive network or server resources, it is advisable to include this header, in the event that an administrator would like to contact the client user.

Host: hostname port

The hostname and port number of the server contacted by the client. Useful for software multihoming. For example:

Host: 80

Clients must supply this information in HTTP 1.1, so servers with multiple hostnames can easily differentiate between ambiguous URLs.

If-Modified-Since: date

Specifies that the URI data is to be sent only if it has been modified since the date given as the value of this header. This is useful for client-side caching. For example:

If-Modified-Since: Mon, 04 May 1996 12:17:34 GMT

If the document has not been modified, the server returns a code of 304, indicating that the client should use the local copy. The specified date should follow the format described under the Date header. See the "Client Caching" section in Chapter 3 for more information.

If-Match: entity_tag

A conditional requesting the entity only if it matches the given entity tags (see the ETag entity header). An asterisk ( * ) matches any entity, and the transaction continues only if the entity exists. See the "Client Caching" section in Chapter 3 for more information.

If-None-Match: entity_tag

A conditional requesting the entity only if it does not match any of the given entity tags (see the ETag entity header). An asterisk ( * ) matches any entity; if the entity doesn't exist, the transaction continues. See the "Client Caching" section in Chapter 3 for more information.

If-Range: entity_tag date

A conditional requesting only the portion of the entity that is missing, if it has not been changed, and the entire entity if it has. Used in conjunction with the Range header to indicate the entity tag or last modified time of a document on the server. For example:

If-Range: Mon, 04 May 1996 12:17:34 GMT

If the document has not been modified, the server returns the byte range given by the Range header; otherwise, it returns all of the new document. Either an entity tag or a date can be used to identify the partial entity already received; see the Date header for information on the format for dates. See the section "Retrieving Content" in Chapter 3 for more information.

If-Unmodified-Since: date

Specifies that the entity-body should be sent only if the document has not been modified since a given date. For example:

If-Unmodified-Since: Tue, 05 May 1996 04:03:56 GMT

The specified date should follow the format described under the Date header. See the "Client Caching" section in Chapter 3 for more information.

Max-Forwards: n

Limits the number of proxies or gateways that can forward the request. Useful for debugging with the TRACE method, avoiding infinite loops. For example:

Max-Forwards: 3

A proxy server that receives a Max-Forwards value of zero (0) should return the request headers to the client in its response entity-body. See the discussion of the TRACE method in Chapter 3 for more information.

Proxy-Authorization: credentials

Used for a client to identify itself to a proxy requiring authorization.

Range: bytes= n-m

Specifies the partial range(s) requested from the document. For example:

Range: 1024-2047,4096-

Multiple ranges can be listed, separated by commas. If the first digit in the comma-separated byte range(s) is missing, the range is assumed to count from the end of the document. If the second digit is missing, the range is byte n to the end of the document. The first byte is byte 0. See Chapter 3 for more information.

Referer: url

Gives the URL of the document that refers to the requested URL (i.e., the source document of the link). For example:


See Chapter 3 for more information.

User-Agent: string

Gives identifying information about the client program. For example:

User-Agent: Mozilla 3.0b 

See Chapter 3 for more information.

Server Response Headers

The response headers described here are used in server responses to communicate information about the server and how it may handle requests.

Accept-Ranges: bytes|none

Indicates the acceptance of range requests for a URI, specifying either the range unit (e.g., bytes) or none if no range requests are accepted. For example:

Accept-Ranges: bytes

Age: seconds

Indicates the age of the document in seconds. For example:

Age: 3521

Proxy-Authenticate: scheme realm

Indicates the authentication scheme and parameters applicable to the proxy for this URI and the current connection. Used with response 407 (Proxy Authentication Required).

Public: methods

Indicates methods supported by the server as a comma-separated list. Intended for declaration of nonstandard methods supported at this site. For example:


For methods applicable only to an individual URI, see the Allow header.

Retry-After: date|seconds

Specifies a time when the server can handle requests. Used with response code 503 (Service Unavailable). It contains either an integer number of seconds or a GMT date and time (as described by the Date header formats). If the value is an integer, it is interpreted as the number of seconds to wait after the request was issued. For example:

Retry-After: 3600
Retry-After: Sat, 18 May 1996 06:59:37 GMT

Server: string

Contains the name and version number of the server. For example:

Server: NCSA/1.3

If security holes are discovered in a particular server, the Server header information may be used to indicate a site's vulnerability. For that reason, it's a good idea for servers to make it easy for administrators to suppress sending this header in the server configuration, if their server has a well-known bug.

Set-Cookie: name=value options

Contains a name/value pair of information to retain for this URL. For browsers supporting Netscape persistent cookies--not included in the HTTP standard. For example:

Set-Cookie: acct=03845324

Options are:



expires = date

The cookie becomes invalid after the specified date.

path = pathname

The URL range for which the cookie is valid.

domain = domain_name

The domain name range for which the cookie is valid.


Return the cookie only under a secure connection.

Vary: headers

Specifies that the entity has multiple sources and may therefore vary according to specified list of request header(s).

Vary: Accept-Language,Accept-Encoding

Multiple headers can be listed, separated by commas. An asterisk ( * ) means that another factor, other than the request headers, may affect the document that is returned.

Warning: code host string

Indicates information additional to that in the status code, for use by caching proxies. For example:

Warning: Response stale 

The host field contains the name or pseudonym of the server host, with an optional port number. The two-digit warning codes and their recommended descriptive strings are:





Response stale

The response data is known to be stale.


Revalidation failed

The response data is known to be stale because the proxy failed to revalidate the data.


Disconnected operation

The cache is disconnected from the network.


Heuristic expiration

The data is older than 24 hours, and the cache heuristically chose a freshness lifetime greater than 24 hours.


Transformation applied

The proxy has changed the encoding or media type of the document, as specified by the Content-Encoding or Content-Type headers.


Miscellaneous warning

Arbitrary information to be logged or presented to the user.

WWW-Authenticate: scheme realm

A request for authentication, used with the 401 (Unauthorized) response code. It specifies the authorization scheme and realm of authorization required from a client at the requested URI. Many different authorization realms can exist on a server. A common authorization scheme is BASIC, which requires a username and password. For example:

WWW-Authenticate: BASIC realm="Admin"

When returned to the client, this header indicates that the BASIC type of authorization data in the appropriate realm should be returned in the client's Authorization header.

Entity Headers

Entity headers are used in both client requests and server responses. They supply information about the entity body in an HTTP message.

Allow: methods

Contains a comma-separated list of methods that are allowed at a specified URI. In a server response it is used with code 405 (Method Not Allowed) to inform the client of valid methods available for the requested information. For example:

Allow: GET, HEAD 

Some methods may not apply to a URL, and the server must verify that the methods supplied by the client makes sense with the given URL.

Content-Base: url

Specifies the base URL for resolving relative URLs. The base URL must be written as an absolute URL. For example:


Content-Encoding: encoding_schemes

Specifies the encoding scheme(s) used for the transferred entity-body. Values are gzip (or x-gzip) and compress (or x-compress). If multiple encoding schemes are specified (in a comma-separated list), they must be listed in the order in which they were applied to the source data.

The server should attempt to use an encoding scheme used by the client's Accept-Encoding header. The client may use this information to determine how to decode the document after it is transferred.

See the description of the Accept-Encoding header earlier in this appendix for a listing of possible values. For example:

Content-Encoding: x-gzip

Content-Language: languages

Specifies the language(s) that the transferred entity-body is intended for. Languages are represented by their two-letter abbreviations (e.g., en for English, fr for French). The server should attempt to use a language specified by the client's Accept-Language header. (See Appendix B for a listing of possible values.) This header is useful when a client specifies a preference for one language over another for a given URL. For example:

Content-Language: fr

Content-Length: n

This header specifies the length of the data (in bytes) of the transferred entity-body. For example:

Content-Length:  47293

Due to the dynamic nature of some requests, the content length is sometimes unknown and this header is omitted.

Content-Location: url

Supplies the URL for the entity, in cases where a document has multiple entities with separately accessible locations. The URL can be either an absolute or relative URL. For example:


See the section "Retrieving Content" in Chapter 3 for more information.

Content-MD5: digest

Supplies an MD5 digest of the entity, for checking the integrity of the message upon receipt.

Content-Range: bytes n-n/m

Specifies where the accompanying partial entity-body should be inserted, and the total size of the full entity-body. For example:

Content-Range: bytes 6143-7166/15339 

See the section "Retrieving Content" in Chapter 3 for more information.

Content-Transfer-Encoding: scheme

Specifies any transformations that occurred to the data for transport over the network. For example:

Content-Transfer-Encoding: base64

Between web servers and clients, this header is usually not needed, since no encoding is needed. Possible encoding schemes are:

Transfer Encoding



Data represented by short lines of US-ASCII data.


Data represented by short lines, but may contain non-ASCII data. (High-order bit may be set.)


Data may not be in short lines, and can be non-ASCII characters.


Data is encoded in base64 ASCII. (See Section 5.2 of RFC 1521 for details.)


Special characters replaced with an equal sign (=) followed by the ASCII value in hex. (See Section 5.1 of RFC 1521 for complete details.)

Content-Type: type/subtype

Describes the media type and subtype of an entity-body. It uses the same values as the client's Accept header, and the server should return media types that conform with the client's preferred formats. For example:

Content-type: text/html

See the discussion of media types in Chapter 3 for more information.

ETag: entity_tag

Defines the entity tag for use with the If-Match and If-None-Match request headers. See the discussion of client caching in Chapter 3 for more information.

Expires: date

Specifies the time when a document may change, or when its information becomes invalid. After that time, the document may or may not change or be deleted. The value is a date and time in a valid format as described for the Date header. For example:

Expires: Sat, 20 May 1995 03:32:38 GMT

This is useful for cache management. The Expires header means that it is unlikely that the document will change before the given time. This does not imply that the document will be changed or deleted at that time. It is only an advisory that the document will not be modified until the specified time.

See the discussion on client caching in Chapter 3 for more information.

Last-Modified: date

Specifies when the specified URL was last modified. The value is a date and time in a valid format as described for the Date header. If a client has a copy of the URL in its cache that is older than the last-modified date, it should be refreshed. See the discussion on client caching in Chapter 3 for more information. For example:

Last-Modified: Sat, 20 May 1995 03:32:38 GMT

Location: url

Specifies the new location of a document, usually with response code 201 (Created), 301 (Moved Permanently), or 302 (Moved Temporarily). The URL given must be written as an absolute URL. For example:


URI: uri

Specifies the new location of a document, usually with response code 201 (Created), 301 (Moved Permanently), or 302 (Moved Temporarily). For example:

URI: <>

An optional vary parameter may also be used in this header, indicating multiple documents at the URI in the following categories: type, language, version, encoding, charset, and user-agent. Sending these parameters in a server response prompts the client to specify its preferences appropriately in the new request. The use of the URI header is deprecated in HTTP 1.1 in favor of the Location, Content-Location, and Vary headers.

Summary of Support Across HTTP Versions

The following is a listing of all HTTP headers supported by each version of HTTP so far.

HTTP 0.9











HTTP 1.0
























































HTTP 1.1























































































Back to: Chapter Index

Back to: Web Client Programming with Perl

O'Reilly Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts
International | About O'Reilly | Affiliated Companies

© 2001, O'Reilly & Associates, Inc.