July 2017
Intermediate to advanced
360 pages
8h 26m
English
Stopwords, like other important features, are strictly related to a specific language, so it's often necessary to detect the language before moving on to any other step. A simple, free, and reliable solution is provided by the langdetect library, which has been ported from Google's language detection system. Its usage is immediate:
from langdetect import detect>>> print(detect('This is English'))en>>> print(detect('Dies ist Deutsch'))de
The function returns the ISO 639-1 codes (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes), which can be used as keys in a dictionary to get the complete language name. Where the text is more complex, the detection can more difficult and it's useful to know whether there are any ...
Read now
Unlock full access