Skip to Content
Web Design in a Nutshell, 3rd Edition
book

Web Design in a Nutshell, 3rd Edition

by Jennifer Robbins
February 2006
Intermediate to advanced
826 pages
63h 42m
English
O'Reilly Media, Inc.
Content preview from Web Design in a Nutshell, 3rd Edition

Unicode encodings

Many character sets have only one encoding method, such as the ISO 8859 series. Unicode, however, may be encoded a number of ways. So although the code points never change, they may be represented by 1, 2, or 4 bytes. The encoding forms for Unicode are:

UTF-8

This is an expanding format that uses 1 byte for characters in the ASCII set, 2 bytes for additional character ranges, and 3 bytes for the rest of the BMP. Supplementary planes use 4 bytes. UTF-8 is the recommended Unicode encoding for web documents and other Internet technologies.

UTF-16

Uses 2 bytes for BMP characters and 4 bytes for supplementary characters. UTF-16 is another option for web documents.

UTF-32

Uses 4 bytes for all characters.

So while the code point for the percent sign is U+0025, it would be represented by the byte value 25 in UTF-8, 00 25 in UTF-16, and 00 00 00 25 by UTF-32. There are other things at work in the encoding as well, but this gives you a feel for the difference in encoding forms.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Beginning Responsive Web Design with HTML5 and CSS3

Beginning Responsive Web Design with HTML5 and CSS3

Jonathan Fielding

Publisher Resources

ISBN: 0596009879Errata Page