For the most part, characters within documents that are not part of a tag are rendered as is by the browser. However, some characters have special meaning and are not directly rendered, while other characters can’t be typed into the source document from a conventional keyboard. Special characters need either a special name or a numeric character encoding for inclusion in a document.
As has become obvious in the discussion and examples leading up to
this section, three characters in source documents have very special
meaning: the less-than sign (<
), the
greater-than sign (>
), and the ampersand
(&
). These characters delimit tags and special
character references. They’ll confuse a browser if
left dangling alone or with improper tag syntax, so you have to go
out of your way to include their actual, literal characters in your
documents.[26]
Similarly, you have to use a special encoding to include double quotation mark characters within a quoted string, or when you want to include a special character that doesn’t appear on your keyboard but is part of the ISO Latin-1 character set implemented and supported by most browsers.
To include a special character in your document, enclose either its
standard entity name or a pound sign (#
) and its
numeric position in the Latin-1 standard character set[27]
inside a leading ampersand and an ending semicolon, without any
spaces in between. Whew. That’s a long explanation
for what is really a simple thing to do, as the following examples
illustrate. The first example shows how to include a greater-than
sign in a snippet of code by using the character’s
entity name. The second demonstrates how to include a greater-than
sign in your text by referencing its Latin-1 numeric value:
if a > b, then t = 0 if a > b, then t = 0
Both examples cause the text to be rendered as:
if a > b, then t = 0
The complete set of character entity values and names is given in Appendix F. You could write an entire document using character encodings, but that would be silly.
[26] The only exception is that these
characters may appear literally within the <listing>
and <xmp>
tags, but this is a moot
point, since the tags are obsolete.
[27] The popular ASCII character set is a subset of the more comprehensive Latin-1 character set. Composed by the well-respected International Organization for Standardization (ISO), the Latin-1 set is a list of all letters, numbers, punctuation marks, and so on commonly used by Western language writers, organized by number and encoded with special names. Appendix F contains the complete Latin-1 character set and encoding.
Get HTML & XHTML: The Definitive Guide, 5th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.