Padding and truncation of sequences

When developing the author classification model, the number of integers for each training and test text data need to be of equal length. We can achieve this by padding and truncating the sequence of integers, as follows:

# Padding and truncationtrainx <- pad_sequences(trainx, maxlen = 300) testx <- pad_sequences(testx, maxlen = 300)dim(trainx) [1] 2500  300

Here, we are specifying the maximum length of all the sequences, that is, maxlen, to be 300. This will truncate any sequences that are longer than 300 integers in an article and add zeroes to sequences that are shorter than 300 integers in an article. Note that for padding and truncation, a default setting of "pre" has been used and is not specifically ...

Get Advanced Deep Learning with R now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.