Skip to Content
XML in a Nutshell, 3rd Edition
book

XML in a Nutshell, 3rd Edition

by Elliotte Rusty Harold, W. Scott Means
September 2004
Intermediate to advanced
712 pages
24h 45m
English
O'Reilly Media, Inc.
Content preview from XML in a Nutshell, 3rd Edition

UCS-2 and UTF-16

UCS-2, also known as ISO-10646-UCS-2, represents each character as a two-byte, unsigned integer between 0 and 65,535. Thus the capital letter A, code point 65 in Unicode, is represented by the two bytes 00 and 41 (in hexadecimal). The capital letter B, code point 66, is represented by the two bytes 00 and 42. The two bytes 03 and A3 represent the capital Greek letter Σ, code point 931.

UCS-2 comes in two variations, big endian and little endian. In big-endian UCS-2, the most significant byte of the character comes first. In little-endian UCS-2, the order is reversed. Thus, in big-endian UCS-2, the letter A is #x0041.[3] In little-endian UCS-2, the bytes are swapped, and A is #x4100. In big-endian UCS-2, the letter B is #x0042; in little-endian UCS-2, it’s #x4200. In big-endian UCS-2, the letter Σ is #x03A3; in little-endian UCS-2, it’s #xA303. In this book we use big-endian notation, but parsers cannot assume this. They must be able to determine the endianness from the document itself.

To distinguish between big-endian and little-endian UCS-2, a document encoded in UCS-2 customarily begins with Unicode character #xFEFF, the zero-width nonbreaking space, more commonly called the byte-order mark . This character has the advantage of being invisible. Furthermore, if its bytes are swapped, the resulting #xFFFE character doesn’t actually exist. Thus, a program can look at the first two bytes of a UCS-2 document and tell immediately whether the document is big endian, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

XML: Visual QuickStart Guide, Second Edition

XML: Visual QuickStart Guide, Second Edition

Kevin Howard Goldberg
XML Hacks

XML Hacks

Michael Fitzgerald

Publisher Resources

ISBN: 0596007647Errata PageSupplemental Content