O'Reilly logo

Effective XML: 50 Specific Ways to Improve Your XML by Elliotte Rusty Harold

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Choosing an Encoding

Given that you've wisely chosen to use Unicode for your documents, the next question is which encoding of Unicode to pick. Unicode is a character set that assigns almost 100,000 characters to different numeric code points. The characters assigned code points from 0 to 65,535 are sometimes referred to as Plane 0 or the Basic Multilingual Plane (BMP for short). The BMP includes most common characters from most of the world's living languages including the Roman alphabet, Cyrillic, Arabic, Greek, Hebrew, Hangul, the most common Han ideographs, and many more. Plane 1, spanning code points 65,536 to 131,071, includes musical notation, many mathematical symbols, and several dead languages such as Old Italic. Plane 2 (code points ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required