O'Reilly logo

Effective XML: 50 Specific Ways to Improve Your XML by Elliotte Rusty Harold

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Normalization Forms

For reasons of compatibility with legacy character sets, as well as out-and-out mistakes, a number of characters have more than one representation in Unicode. For example, the umlaut character can be represented as either the single character ü or as a u followed by a combining diaresis. XML 1.0[1] treats these two forms as distinct. For example, Münchn (München) is not the same as Münchn (München). You can see that this might be a bit of a problem.

[1] This is one of the few changes that may be made in XML 1.1. However, exactly how or when characters will be normalized has not yet been finalized.

While such differences are not significant to XML parsing, they ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required