February 2018
Beginner to intermediate
364 pages
10h 32m
English
Depending upon the tokenizer used, and the input to those tokenizers, it may be desired to remove punctuation from the resulting list of tokens. The regexp_tokenize function with '\w+' as the expression removes punctuation well, but word_tokenize does not do it very well and will return many punctuation marks as their own tokens.
Read now
Unlock full access