Appendix B: Reference Tables

This appendix contains several tables that will be useful when negotiating HTTP content. Covered in this appendix are:

Media Types

Whenever an entity-body is sent via HTTP, a media type must be sent using the Content-type header. Also, web clients can use the Accept header to define which media types the client can handle.

Character Encoding

In URL-encoded data (as described in Chapter 3, Learning HTTP), any “special” characters such as spaces and punctuation must be encoded with a % escape sequence.

Languages

Entity-bodies can be sent with a Content-language header, to declare what language the entity is written in. Clients can declare which languages they can handle, using the Accept-language header.

Character Sets

Clients can use the Accept-charset header to declare which character sets they are capable of handling.

Media Types

Listed below are media types that are registered with the Internet Assigned Number Authority (IANA). According to the HTTP specification, use of nonregistered media types is discouraged.

The IANA media list is available in RFC 1700. A more readable document describing the assigned media types is available at ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/.

A variety of methods is used to identify the media type of a document. The easiest method, but the least accurate, is to map well-known file extensions with a media type. For example, a file that ends in “.GIF” would map to “image/gif”. However, in usual practice, there is no verification that the file is in fact a GIF file.

A more accurate method would examine the structure or data format of the file and map it to a media type. For some media types, magic numbers allow this to happen. For example, all GIF files begin with the three uppercase letters of GIF, and all JPEG files begin with 0xFFD8 (hexadecimal notation). This method, however, is more time consuming.

Under some filesystems, media types may be mapped by examining the file type/creator attribute of the file. While this is easily achieved under MacOS's HFS, other filesystems (DOS, NTFS, BSD) do not have these file attributes.

Table B-1: Internet Media Types

Type Subtype
text plain
text richtext
text enriched
text tab-separated-values
text html
text sgml
multipart mixed
multipart alternative
multipart digest
multipart parallel
multipart appledouble
multipart header-set
multipart form-data
multipart related
multipart report
multipart voice-message
message rfc822
message partial
message external-body
message news
message http
application octet-stream
application postscript
application oda
application atomicmail
application andrew-inset
application slate
application wita
application dec-dx
application dca-rft
application activemessage
application rtf
application applefile
application mac-binhex40
application news-message-id
application news-transmission
application wordperfect5.1
application pdf
application zip
application macwriteii
application msword
application remote-printing
application mathematica
application cybercash
application commonground
application iges
application riscos
application eshop
application x400-bp
application sgml
application cals-1840
application vnd.framemaker
application vnd.mif
application vnd.ms-excel
application vnd.ms-powerpoint
application vnd.ms-project
application vnd.ms-works
application vnd.ms-tnef
application vnd.svd
application vnd.music-niff
application vnd.ms-artgalry
application vnd.truedoc
application vnd.koan
image jpeg
image gif
image ief
image g3fax
image tiff
image cgm
image naplps
image vnd.dwg
image vnd.svf
image vnd.dxf
audio basic
audio 32kadpcm
video mpeg
video quicktime
video vnd.vivo

Character Encoding

When the client sends data to a CGI program using the Content-type of application/x-www-form-urlencoded, certain special characters are encoded to eliminate ambiguity. Table B-2 shows which characters are transformed and which are not transformed. For more information on URLs, see RFC 1738.

Table B-2: Character Encoding

images

images

images

images

Languages

A language tag is of the form of:

<primary-tag> <-subtag>

where zero or more subtags are allowed. The primary-tag specifies the language, and the subtag specifies parameters to the language, like dialect information, country identification, or script variations. RFC 1766 contains the complete documentation of languages and parameter usage. The key values for the primary-tag and subtag are outlined in Tables B-3 and B-4, respectively.

Examples:

de

(German)

en

(English)

en-us

(English, USA)

Table B-3 lists the primary langauge tags as defined in ISO 639 and RFC 1766.

Table B-3: Primary Language Types

Primary Tag Language
aa Afar
ab Abkhazian
af Afrikaans
am Amharic
ar Arabic
as Assamese
ay Aymara
az Azerbaijani
ba Bashkir
be Byelorussian
bg Bulgarian
bh Bihari
bi Bislama
bn Bengali; Bangla
bo Tibetan
br Breton
ca Catalan
co Corsican
cs Czech
cy Welsh
da Danish
de German
dz Bhutani
el Greek
en English
eo Esperanto
es Spanish
et Estonian
eu Basque
fa Persian
fi Finnish
fj Fiji
fo Faeroese
fr French
fy Frisian
ga Irish
gd Scots, Gaelic
gl Galician
gn Guarani
gu Gujarati
ha Hausa
he Hebrew
hi Hindi
hr Croatian
hu Hungarian
hy Armenian
ia Interlingua
id Indonesian
ie Interlingue
ik Inupiak
is Icelandic
it Italian
iu Inuktitat
iw Hebrew
ja Japanese
jw Javanese
ka Georgian
kk Kazakh
kl Greenlandic
km Cambodian
kn Kannada
ko Korean
ks Kashmiri
ku Kurdish
ky Kirghiz
la Latin
ln Lingala
lo Laothian
lt Lithuanian
lv Latvian, Lettish
mg Malagasy
mi Maori
mk Macedonian
ml Malayalam
mn Mongolian
mo Moldavian
mr Marathi
ms Malay
mt Maltese
my Burmese
na Nauru
ne Nepali
nl Dutch
no Norwegian
oc Occitan
om (Afan) Oromo
or Oriya
pa Punjabi
pl Polish
ps Pashto, Pushto
pt Portuguese
qu Quechua
rm Rhaeto-Romance
rn Kirundi
ro Romanian
ru Russian
rw Kinyarwanda
sa Sanskrit
sd Sindhi
sg Sangro
sh Serbo-Croatian
si Singhalese
sk Slovak
sl Slovenian
sm Samoan
sn Shona
so Somali
sq Albanian
sr Serbian
ss Siswati
st Sesotho
su Sudanese
sv Swedish
sw Swahili
ta Tamil
te Tegulu
tg Tajik
th Thai
ti Tigrinya
tk Turkmen
tl Tagalog
tn Setswana
to Tonga
tr Turkish
ts Tsonga
tt Tatar
tw Twi
ug Uigar
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
vo Volapuk
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
za Zhuang
zh Chinese
zu Zulu

Table B-4 lists the language subtypes as defined in ISO 3166.

Table B-4: Language Subtypes

Subtype Country
AD Andorra
AE United Arab Emirates
AF Afghanistan
AG Antigua and Barbuda
AI Anguilla
AL Albania
AM Armenia
AN Netherland Antilles
AO Angola
AQ Antarctica
AR Argentina
AS American Samoa
AT Austria
AU Australia
AW Aruba
AZ Azerbaidjan
BA Bosnia-Herzegovina
BB Barbados
BD Bangladesh
BE Belgium
BF Burkina Faso
BG Bulgaria
BH Bahrain
BI Burundi
BJ Benin
BM Bermuda
BN Brunei Darussalam
BO Bolivia
BR Brazil
BS Bahamas
BT Buthan
BV Bouvet Island
BW Botswana
BY Belarus
BZ Belize
CA Canada
CC Cocos (Keeling) Isl.
CF Central African Rep.
CG Congo
CH Switzerland
CI Ivory Coast
CK Cook Islands
CL Chile
CM Cameroon
CN China
CO Colombia
CR Costa Rica
CS Czechoslovakia
CU Cuba
CV Cape Verde
CX Christmas Island
CY Cyprus
CZ Czech Republic
DE Germany
DJ Djibouti
DK Denmark
DM Dominica
DO Dominican Republic
DZ Algeria
EC Ecuador
EE Estonia
EG Egypt
EH Western Sahara
ES Spain
ET Ethiopia
FI Finland
FJ Fiji
FK Falkland Isl. (Malvinas)
FM Micronesia
FO Faroe Islands
FR France
FX France (European Ter.)
GA Gabon
GB Great Britain (UK)
GD Grenada
GE Georgia
GH Ghana
GI Gibraltar
GL Greenland
GP Guadeloupe (Fr.)
GQ Equatorial Guinea
GF Guyana (Fr.)
GM Gambia
GN Guinea
GR Greece
GT Guatemala
GU Guam (US)
GW Guinea Bissau
GY Guyana
HK Hong Kong
HM Heard & McDonald Isl.
HN Honduras
HR Croatia
HT Haiti
HU Hungary
ID Indonesia
IE Ireland
IL Israel
IN India
IO British Indian O. Terr.
IQ Iraq
IR Iran
IS Iceland
IT Italy
JM Jamaica
JO Jordan
JP Japan
KE Kenya
KG Kirgistan
KH Cambodia
KI Kiribati
KM Comoros
KN St. Kitts Nevis Anguilla
KP Korea (North)
KR Korea (South)
KW Kuwait
KY Cayman Islands
KZ Kazachstan
LA Laos
LB Lebanon
LC Saint Lucia
LI Liechtenstein
LK Sri Lanka
LR Liberia
LS Lesotho
LT Lithuania
LU Luxembourg
LV Latvia
LY Libya
MA Morocco
MC Monaco
MD Moldavia
MG Madagascar
MH Marshall Islands
ML Mali
MM Myanmar
MN Mongolia
MO Macau
MP Northern Mariana Isl.
MQ Martinique (Fr.)
MR Mauritania
MS Montserrat
MT Malta
MU Mauritius
MV Maldives
MW Malawi
MX Mexico
MY Malaysia
MZ Mozambique
NA Namibia
NC New Caledonia (Fr.)
NE Niger
NF Norfolk Island
NG Nigeria
NI Nicaragua
NL Netherlands
NO Norway
NP Nepal
NR Nauru
NT Neutral Zone
NU Niue
NZ New Zealand
OM Oman
PA Panama
PE Peru
PF Polynesia (Fr.)
PG Papua New Guinea
PH Philippines
PK Pakistan
PL Poland
PM St. Pierre & Miquelon
PN Pitcairn
PT Portugal
PR Puerto Rico (US)
PW Palau
PY Paraguay
QA Qatar
RE Reunion (Fr.)
RO Romania
RU Russian Federation
RW Rwanda
SA Saudi Arabia
SB Solomon Islands
SC Seychelles
SD Sudan
SE Sweden
SG Singapore
SH St. Helena
SI Slovenia
SJ Svalbard & Jan Mayen Isl.
SK Slovak Republic
SL Sierra Leone
SM San Marino
SN Senegal
SO Somalia
SR Suriname
ST St. Tome and Principe
SU Soviet Union
SV El Salvador
SY Syria
SZ Swaziland
TC Turks & Caicos Islands
TD Chad
TF French Southern Terr.
TG Togo
TH Thailand
TJ Tadjikistan
TK Tokelau
TM Turkmenistan
TN Tunisia
TO Tonga
TP East Timor
TR Turkey
TT Trinidad & Tobago
TV Tuvalu
TW Taiwan
TZ Tanzania
UA Ukraine
UG Uganda
UK United Kingdom
UM US Minor Outlying Isl.
US United States
UY Uruguay
UZ Uzbekistan
VA Vatican City State
VC St.Vincent & Grenadines
VE Venezuela
VG Virgin Islands (British)
VI Virgin Islands (US)
VN Vietnam
VU Vanuatu
WF Wallis & Futuna Islands
WS Samoa
YE Yemen
YU Yugoslavia
ZA South
ZM Zambia
ZR Zaire
ZW Zimbabwe

Character Sets

Table B-5 lists the character sets that may be used with the Accept-language and Content-language HTTP headers. This list does not describe all of the possible character sets of international languages that can appear in the headers. For a comprehensive list of character sets, their aliases, and pointers to more descriptive documents, refer to RFC 1700.

Table B-5: Character Sets

images

Get Web Client Programming with Perl now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.