Skip to Main Content
Unicode Explained
book

Unicode Explained

by Jukka K. Korpela
June 2006
Beginner content levelBeginner
688 pages
26h 18m
English
O'Reilly Media, Inc.
Content preview from Unicode Explained
On the other hand, the Unicode encodings are defined for noncharacters and for un-
assigned code points, too. If some data contains, for example, the code point U+FFFF,
which is defined to be a noncharacter, the data is incorrect as Unicode character data.
However, it is processed in a well-defined way when encoding the data in UTF-8,
UTF-16, or UTF-32. This guarantees that conversions between Unicode encodings do
not remove such errors but allow them to be detected.
The encodings UTF-8, UTF-16, and UTF-32 are all self-synchronizing. This feature,
also known as auto-synchronization , means that if malformed data (i.e., data that is
not possible according to the definition of the encoding) is encountered, only one code
point needs to be rejected. The start of the representation of the next code point can
be recognized easily. This helps guard against errors caused by data corruption in
transfer or storage: the effects of errors are local. If you have data like “Foobar” and
the character “b” is corrupted in storage or transfer, the data appears as “Foo?ar”
(where ? indicates corrupted data). In some other encodings, all data following a cor-
rupted character might appear as corrupted.
Sample program code, in the C language, for conversions between the Unicode encod-
ing forms is available at http://www.unicode.org/Public/PROGRAMS/CVTUTF/.
UTF-32 and UCS-4
UTF-32 uses a 32-bit ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Unicode Demystified

Unicode Demystified

Richard Gillam
Fonts & Encodings

Fonts & Encodings

Yannis Haralambous
Core Java™ Data Objects

Core Java™ Data Objects

Sameer Tyagi, Keiron McCammon, Michael Vorburger, Heiko Bobzin

Publisher Resources

ISBN: 059610121XCatalog PageErrata