Skip to Content
Unicode Explained
book

Unicode Explained

by Jukka K. Korpela
June 2006
Beginner
688 pages
26h 18m
English
O'Reilly Media, Inc.
Content preview from Unicode Explained
—i.e., notations of the form &# n ; or &#x n ;. The document character set is the character
code (mapping of integers to characters) according to which the n in such notations is
to be interpreted.
In particular, HTML and XML specifications do not impose Unicode semantics on
characters, for two reasons: they formally refer to ISO 10646, not the Unicode standard,
and even if they referred to Unicode, this would not constitute a requirement on con-
formance to the standard. Of course, software that processes HTML or XML docu-
ments may apply Unicode semantics and rules, such as line breaking rules, but this is
not a requirement. Only for some features related to directionality do HTML specifi-
cations refer to Unicode rules normatively.
The HTML specifications contain some special restrictions on the use of control char-
acters, as listed in Table 11-2. There is usually little reason why control characters other
than line breaks and sometimes horizontal tabs would appear in HTML documents.
They may, however, appear due to conversions. The rules for them are somewhat dif-
ferent in HTML up to and including HTML 4.01 and in XHTML. (Technically, the
SGML declaration for HTML 4.01 disallows U+000C, but the prose discusses it as an
allowed character. It would anyway be whitespace and not a page eject character.)
Table 11-2. C0 and C1 Control characters in HTML
Character(s) Explanation Use in HTML
U+0000..U+0008 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Unicode Demystified

Unicode Demystified

Richard Gillam
Fonts & Encodings

Fonts & Encodings

Yannis Haralambous
The Java® Language Specification, Java SE 8 Edition

The Java® Language Specification, Java SE 8 Edition

James Gosling, Bill Joy, Guy L. Steele Jr., Gilad Bracha, Alex Buckley

Publisher Resources

ISBN: 059610121XCatalog PageErrata