In order to carry out processing on natural language text, we need to perform normalization that mainly involves eliminating punctuation, converting the entire text into lowercase or uppercase, converting numbers into words, expanding abbreviations, canonicalization of text, and so on.
Sometimes, while tokenizing, it is desirable to remove punctuation. Removal of punctuation is considered one of the primary tasks while doing normalization in NLTK.
Consider the following example:
>>> text=[" It is a pleasant evening.","Guests, who came from US arrived at the venue","Food was tasty."] >>> from nltk.tokenize import word_tokenize >>> tokenized_docs=[word_tokenize(doc) for doc in text] >>> print(tokenized_docs) [['It', ...