Choosing an Encoding

Given that you've wisely chosen to use Unicode for your documents, the next question is which encoding of Unicode to pick. Unicode is a character set that assigns almost 100,000 characters to different numeric code points. The characters assigned code points from 0 to 65,535 are sometimes referred to as Plane 0 or the Basic Multilingual Plane (BMP for short). The BMP includes most common characters from most of the world's living languages including the Roman alphabet, Cyrillic, Arabic, Greek, Hebrew, Hangul, the most common Han ideographs, and many more. Plane 1, spanning code points 65,536 to 131,071, includes musical notation, many mathematical symbols, and several dead languages such as Old Italic. Plane 2 (code points ...

Get Effective XML: 50 Specific Ways to Improve Your XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.