Skip to Content
Unicode Explained
book

Unicode Explained

by Jukka K. Korpela
June 2006
Beginner
688 pages
26h 18m
English
O'Reilly Media, Inc.
Content preview from Unicode Explained
CHAPTER 6
Unicode Encodings
This chapter describes UTF-8 and other encodings for Unicode in detail, including the
algorithmic
descriptions and the practical considerations on choosing an encoding. It
concentrates on the UTF-8, UTF-16, and UTF-32 encodings, which are the current
official Unicode encodings. However, some older encodings are described as well, even
though not all of them are formally character encodings in a strict sense.
If you are not interested in the technicalities of encodings, you might read just the last
section of this chapter (“Choosing an Encoding). It summarizes the practical criteria,
but they can really be understood well only if you know the technical foundations.
Unicode Encodings in General
As described in Chapter 3, an encoding is a mapping from code numbers (which rep-
resent characters) to sequences of code units. A code unit is in practice an octet (8-bit
byte), a double octet (16-bit quantity), or a quadruple octet (32-bit quantity). The rea-
son for using such units is that modern computers have been designed to work on such
data objects efficiently.
Thus, the simplest encoding for Unicode is to map each code number to a quadruple
octet representing the number as a single integer in binary notation. Such an encoding,
UTF-32, is however too inefficient for most practical purposes.
Within a code unit of 16 or 32 bits, the order in which the octets are interpreted ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Unicode Demystified

Unicode Demystified

Richard Gillam
Fonts & Encodings

Fonts & Encodings

Yannis Haralambous
The Java® Language Specification, Java SE 8 Edition

The Java® Language Specification, Java SE 8 Edition

James Gosling, Bill Joy, Guy L. Steele Jr., Gilad Bracha, Alex Buckley

Publisher Resources

ISBN: 059610121XCatalog PageErrata