
size storage units for all characters. The encodings used in practice tend to be more
complicated. Methods for performing code point order sorting on UTF-8 and other
Unicode encodings are described in the Unicode standard in section 5.17 “Binary Or-
der.” It discusses the even more technical orders based on numeric ordering of code
units (such as octets that constitute UTF-8 encoded data) rather than code numbers.
Problems of legacy software
In simple programming tasks, comparisons of character and string data are sometimes
based on comparisons of code points. This applies basically to basic Latin letters in
contexts where it can be assumed (or it just is assumed) that we need not deal with any
other letters and that the case of letters is fixed (e.g., due to previous case folding). This
explains code like if((ch >= 'A') & (ch <= 'Z')) for testing whether the value of ch
is an (uppercase) letter. Such code can be efficient, but nowadays it is usually better to
use library subroutines (e.g., if(isletter(ch))), making the code more readable and
more portable without sacrificing efficiency. We will discuss such methods in Chapter
11.
As a user of programs, you may encounter sorted data and sorting routines that apply
to simple code point order. For example, if you use a tool for automatic generation of
an index for a publication, you might notice that the index will be sorted that way. If
the entries ...