
However, ISO 8859-5 characters, such as Latin letters, that already exist in Unicode in
other blocks were not included in the Cyrillic block. (Many characters in Figure 4-1
look like Latin letters, but they are Cyrillic letters.) This might be described so that the
characters in ISO 8859-5 with code numbers A1 to FF were directly copied to Unicode
range U+0401 to U+045F, but characters that exist in other blocks (such as Basic Latin
and Latin-1 Supplement) were omitted. The rest of the range U+0400 to U+04FF
(U+0400 and columns 046 through 04F in the code chart illustrated in Figure 4-1) was
used for Cyrillic characters not present in ISO 8859-5.
The omission of already coded characters follows the principle of not coding the same
character twice, even though this prevents simple correspondence between other char-
acter codes and Unicode. If the Cyrillic block were just a copy of the ISO 8859-5 code
table, shifted to a different range, transcoding between ISO 8859-5 and Unicode would
be trivial. However, many other things would have become more complex, if such an
approach had been taken. For example, all ASCII characters would appear in many
copies in different blocks. This would waste coding space and make even simple tests
like “is this character ‘X’?” more complicated: the data being tested would need to be
tested against all the appearances of “X” in different blocks.
This explanation ...