Searching

The other major use for string comparison involves searching.[7] By now, it should be apparent that sequences of Unicode code points that aren't bit-for-bit identical should compare as equal in many situations and thus should be returned as a hit from a text searching routine. Not only do you have to deal with issues such as Unicode canonical and compatibility equivalents, but you also have to handle linguistically equivalent strings. For example, if you're searching for “cooperate,” you would probably want to find “coöperate” and “co-operate” as well. By now, you've undoubtedly figured out that the best strategy is not to match Unicode code points, but rather to convert both the search key and the string being searched to collation ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.