June 2016
Beginner to intermediate
1783 pages
71h 22m
English
Articles and newswires denote the huge periodical source of events of knowledge at different periods of time. The classification of text is the preprocessing step to store all these documents into a specific corpus. The categorization of text is the base of text processing.
We will now introduce an N-gram-based text-classification algorithm. From a longer string, an N-character slice is called N-gram. The key point of this algorithm is the calculation of the profiles of the N-gram frequencies.
Before the introduction of the algorithm, here are the necessary illustrations of a couple of concepts adopted in the algorithm:
The summarized pseudocodes for the ...
Read now
Unlock full access