O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Vectorizing the data

The final stage of preprocessing the data is to vectorize or quantize our dialogs and candidates. This entails converting each word or token into an integer value, which implies that any sequence of words is now transformed into a sequence of integers corresponding to each word. 

We will first write a method to vectorize candidate texts. We also have to keep in mind a fixed word length (sentence_size) of each vectorized candidate. Hence, we need to pad (with 0s, which corresponds to empty words) those candidate vectors whose length is less than the required sentence size:

def vectorize_candidates(candidates, word_idx, sentence_size):    # Determine shape of final vector    shape = (len(candidates), sentence_size) candidates_vector ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required