
If no previous character codes had been taken into account when defining Unicode,
the use of the coding space would undoubtedly be different. It would be based on
grouping by usage. The order of blocks would probably be different too. Now the CJK
characters, for example, have been distributed into blocks in a manner that looks rather
random.
The reasons for making the first two blocks essentially copies of ASCII and ISO 8859-1
are both technical and cultural. Such an assignment helps in efficiency; consider how
ASCII characters are representable each as one octet in UTF-8, still keeping UTF-8
simple. They also help in continuity, since people who have worked with ASCII and
ISO 8859-1 can find their characters easily.
The evolving nature of Unicode also makes some illogical assignments more or less
necessary. New needs have led to allocation of blocks and ranges in a manner that
cannot be smoothly integrated with old allocations. All the different extension blocks
reflect the gradual incorporation of scripts and characters into Unicode.
Questions and Answers
The Unicode web site contains a Frequently Asked Questions (FAQ) section, divided
into topics and categories, at http://www.unicode.org/faq/. You will probably find it very
useful, especially if you take some time now to have a look at its table of contents, so
that you roughly know what you can expect to find there. The following list of