April 2018
Beginner to intermediate
566 pages
12h 17m
English
For this application, we will be using a basic statistical feature extraction concept in order to generate the features from raw text data. In the NLP domain, we need to convert raw text into a numerical format so that the ML algorithm can be applied to that numerical data. There are many techniques available, including indexing, count based vectorization, Term Frequency - Inverse Document Frequency (TF-IDF ), and so on. I have already discussed the concept of TF-IDF in Chapter 4, Generate features using TF-IDF:
Indexing is basically used for fast data retrieval. In indexing, we provide a unique identification number. This unique identification number can be assigned in alphabetical order or based ...