August 2018
Intermediate to advanced
366 pages
10h 14m
English
We already know as explained in the Cleanup text recipe how str.translate works: each character is looked up in a translation table and it’s substituted with the replacement specified in the table.
So, what we need is a translation table that maps "Ü" to "U" and "ç" to "c", and so on.
But how can we know all these mappings? One interesting property of these characters is that they can be considered plain characters with an added symbol. Much like à can be considered an a with an accent.
Unicode equivalence knows this and provides multiple ways to write what's considered the same character. What we are really interested in is decomposed form, which means to write a character as multiple separated symbols that define it. ...