August 2014
Beginner to intermediate
304 pages
7h 10m
English
A common occurrence with text processing is finding text that has nonstandard character encoding. Ideally, all text would be ASCII or utf-8, but that's just not the reality. In cases when you have non-ASCII or non-utf-8 text and you don't know what the character encoding is, you'll need to detect it and convert the text to a standard encoding before doing further processing.
You'll need to install the charade module using sudo pip install charade or sudo easy_install charade. You can learn more about charade at https://pypi.python.org/pypi/charade.
Encoding detection and conversion functions are provided in encoding.py. These are simple wrapper functions around the charade ...
Read now
Unlock full access