A Historical Note
Much of the terminology surrounding the presentation of Unicode in bits is relatively new, going back only a few years in Unicode's history. Indeed, some of the concepts we discuss here were originally called by different names.
When it was first designed, Unicode was a fixed-length, 16-bit standard. The abstract encoding space was 16 bits wide (a single 256 × 256 plane), and one character encoding form existed—a straightforward mapping of 16-bit code points to 16-bit unsigned integers in memory. The single official encoding scheme prefixed a special sentinel value to the front of the Unicode file to allow systems to auto-detect the underlying byte order of the system that created the file.
Early in the life of the standard, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access