Natural Language Toolkit (NLTK)
NLTK was originally created in 2001 as part of a computational linguistics course in the Department of Computer and Information Science at the University of Pennsylvania. Since then it has been developed and expanded with the help of dozens of contributors. It has now been adopted in courses in dozens of universities, and serves as the basis of many research projects. Table 2 lists the most important NLTK modules.
Table 2. Language processing tasks and corresponding NLTK modules with examples of functionality
Language processing task | NLTK modules | Functionality |
|---|---|---|
Accessing corpora | nltk.corpus | Standardized interfaces to corpora and lexicons |
String processing | nltk.tokenize, nltk.stem | Tokenizers, sentence tokenizers, stemmers |
Collocation discovery | nltk.collocations | t-test, chi-squared, point-wise mutual information |
Part-of-speech tagging | nltk.tag | n-gram, backoff, Brill, HMM, TnT |
Classification | nltk.classify, nltk.cluster | Decision tree, maximum entropy, naive Bayes, EM, k-means |
Chunking | nltk.chunk | Regular expression, n-gram, named entity |
Parsing | nltk.parse | Chart, feature-based, unification, probabilistic, dependency |
Semantic interpretation | nltk.sem, nltk.inference | Lambda calculus, first-order logic, model checking |
Evaluation metrics | nltk.metrics | Precision, recall, agreement coefficients |
Probability and estimation | nltk.probability | Frequency distributions, smoothed probability distributions |
Applications | nltk.app, nltk.chat | Graphical concordancer, parsers, WordNet browser, chatbots |
Linguistic fieldwork ... |
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access