Sorting
Many CS101 textbooks demonstrate sorting on strings by using code point order. Unfortunately this does not work in the real world, even in ASCII, much less in Unicode. Most obviously, real sorts (such as that found in the index in the back of this book) sort capital letters identically to their lowercase equivalents. Lichenstein should appear after language, not before it as it does when ordered by code points. Less obviously, the punctuation marks generally appear before all letters whether they're # (ASCII code point 35), [ (ASCII code point 91), or ~ (ASCII code point 126). And of course sorting is language dependent. While converting all characters to upper case and lexically ordering the resulting strings may give passable results ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access