December 2018
Intermediate to advanced
274 pages
7h 46m
English
Any text present in the sentence that may not be relevant to the context of the data can be termed noise.
For example, this can include language stop words (commonly used words in a language – is, am, the, of, and in), URLs or links, social media entities (mentions, hashtags), and punctuation.
To remove the noise from the sentence, the general approach is to maintain a dictionary of noise words and then iterate through the tokens of the sentence under consideration against this dictionary and remove matching stop words. The dictionary of noise words is updated frequently to cover all possible noise.
Read now
Unlock full access