O'Reilly logo

Unicode Demystified by Richard Gillam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Language-Sensitive Comparison on Unicode Text

To the previously mentioned considerations, which you have to deal with regardless of which encoding standard you use to encode your characters, Unicode adds a few more interesting complications.

Unicode Normalization

Unlike in most other encoding schemes, many characters and sequences of characters have multiple legal representations in Unicode. One of the requirements of supporting Unicode is that (provided you support all of the characters involved) all representations of a character be treated as equal. Thus, whether you represent “ä” with

U+00E4 LATIN SMALL LETTER A WITH DIAERESIS

or

U+0061 LATIN SMALL LETTER A
U+0308 COMBINING DIAERESIS

it should look and behave the same way everywhere. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required