
Not all compatibility characters are compatibility decomposable. Many
of them have decompositions that are canonical.
W3C Normalization
The World Wide Web Consortium (W3C) favors Normalization Form C on the Web,
and it additionally suggests stronger normalization rules in HTML and XML docu-
ments. The stronger rules are external to Unicode, since they relate to markup, not
plain text. They are briefly described here due to their practical impact. The rules are
described in more detail in the document “Character Model for the World Wide Web
1.0: Normalization,” http://www.w3.org/TR/charmod-norm/. However, it needs to be
noted that document is officially a Working Draft (work in progress) only.
The W3C normalization rules require that text be in NFC and additionally forbid the
occurrence of character references and entity references that would make the text non-
normalized, if replaced by the characters that they denote. For example, by Unicode
rules, NFC does not allow the appearance of “e” followed by a combining acute accent,
since this combination must be replaced by the precomposed character é. The W3C
normalization rules also forbid the indirect appearance of the combination, for exam-
ple, as in é (where ́ is a character reference that denotes the combining
acute accent U+0301).
On the Web, expressions like é are rarely used in practice, since the corre-
sponding precomposed ...