Compatibility Decompositions

Canonical composites are just one kind of compatibility character; in fact, they're only one kind of composite character. Unicode is also rife with compatibility composites, which account for 3,165 assigned code point values in Unicode 3.1. All of these characters have assigned code point values in some encoding standard in reasonably widespread use. They are characters from those standards that wouldn't have made it into Unicode on their own merits, but were given their own code point values in Unicode to allow text to be converted from the source encodings to Unicode and back again without losing any of the original information (this ability is usually referred to as “round-trip compatibility”).

A few important ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.