
Character Set Standard Oddities
|
177
Character Set Standard Oddities
While the contents of character set standards are, for the most part, assumed
*
to be
error-free by soware developers and users, many of them do exhibit some interesting
characteristics that can only be described as oddities. For example, some character set
standards contain duplicate characters, some include characters that do not or should
not exist (although one could argue that such characters now exist by virtue of being in
a character set standard), some do not contain some characters that should have been
included (because they are needed to complete character pairs, to form a known word,
or to provide the corresponding traditional form of a simplied character), and some
became endowed with ctitious extensions. e following sections detail these oddities
and draw examples from common character set standards.
Duplicate Characters
e most common character set oddity is duplicate characters. In some cases, duplicate
characters are intentional. Take, for instance, the hanja in KS X 1001:2004 that are or-
dered according to their readings. For those hanja that have been classied with multiple
readings, multiple instances of the hanja have been encoded in that standard. For reasons
of compatibility, Unicode needs to propagate these duplicate characters, which it does
through the use of i ...