We will start by carrying out tokenization and then converting the articles, which are in text form, into a sequence of integers. To do this, we can use the following code:
# Tokenizationtoken <- text_tokenizer(num_words = 500) %>% fit_text_tokenizer(trainx)# Text to sequence of integerstrainx <- texts_to_sequences(token, trainx)testx <- texts_to_sequences(token, testx)# Examplestrainx[[7]][1] 98 4 41 5 4 2 4 425 5 20 4 9 4 195 5 157 1 18[19] 87 3 90 3 59 1 169 346 2 29 52 425 6 72 386 110 331 24[37] 5 4 3 31 3 22 7 65 33 169 329 10 105 1 239 11 4 31[55] 11 422 8 60 163 318 10 58 102 2 137 329 277 98 58 287 20 81[73] 3 142 9 6 87 3 49 20 142 2 142 6 2 60 13 1 470 8[91] 137 190 ...