Character Entities
Characters not found in the normal alphanumeric character set, such
as < and &, must be specified in HTML and XHTML documents using
character references. This process is known as
escaping the character. In (X)HTML documents, escaped
characters are indicated by character references that begin with &
and end with ;
. The character may be referred to by its
Numeric Character Reference (NCR) or a predefined character entity
name.
A Numeric Character Reference refers to a character by its Unicode
code point in either decimal or hexadecimal form. Decimal character
references use the syntax &#nnnn;
.
Hexadecimal values are indicated by an “x”: &#xhhhh;
. For example, the less-than (<)
character could be identified as <
(decimal) or <
(hexadecimal).
Character entities are abbreviated names for characters, such as
<
for the less-than symbol.
Character entities are predefined in the DTDs of markup languages such as
HTML and XHMTL as a convenience to authors because they may be easier to
remember than Numeric Character References.
ASCII Character Set
HTML and XHTML documents use the standard 7-bit ASCII character
set in their source. The first 31 characters in ASCII (not listed) are
such device controls as backspace (
) and carriage return (
) and are not appropriate for use in
HTML documents.
HTML 4.01 defines only four entities in this character range—less
than (<
,<
), greater than (<
, >
), ampersand (&
, &
), and quotation mark ("
, "
)—that are necessary ...
Get HTML & XHTML Pocket Reference, 4th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.