October 2018
Beginner to intermediate
466 pages
12h 2m
English
If we need to convert incoming bytes into Unicode, we're clearly also going to have situations where we convert outgoing Unicode into byte sequences. This is done with the encode method on the str class, which, like the decode method, requires a character set. The following code creates a Unicode string and encodes it in different character sets:
characters = "cliché"
print(characters.encode("UTF-8"))
print(characters.encode("latin-1"))
print(characters.encode("CP437"))
print(characters.encode("ascii"))
The first three encodings create a different set of bytes for the accented character. The fourth one can't even handle that byte:
b'clich\xc3\xa9'
b'clich\xe9'
b'clich\x82'
Traceback (most recent call last):
File ...