Tokenization and converting text into a sequence of integers

We will start by carrying out tokenization and then converting the articles, which are in text form, into a sequence of integers. To do this, we can use the following code:

# Tokenizationtoken <- text_tokenizer(num_words = 500) %>%             fit_text_tokenizer(trainx)# Text to sequence of integerstrainx <- texts_to_sequences(token, trainx)testx <- texts_to_sequences(token, testx)# Examplestrainx[[7]][1] 98   4  41  5  4  2  4  425  5  20  4  9  4  195  5  157  1  18[19] 87  3  90  3  59 1 169 346  2  29  52 425   6  72 386 110 331  24[37] 5   4  3  31  3  22   7  65  33 169 329  10 105  1 239  11   4  31[55] 11 422  8  60 163 318  10  58 102   2 137 329 277  98 58 287  20  81[73] 3 142  9   6  87   3  49  20 142   2 142   6   2  60  13   1 470   8[91] 137 190 ...

Get Advanced Deep Learning with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.