Padding and truncation of sequences

When developing the author classification model, the number of integers for each training and test text data need to be of equal length. We can achieve this by padding and truncating the sequence of integers, as follows:

# Padding and truncationtrainx <- pad_sequences(trainx, maxlen = 300) testx <- pad_sequences(testx, maxlen = 300)dim(trainx) [1] 2500  300

Here, we are specifying the maximum length of all the sequences, that is, maxlen, to be 300. This will truncate any sequences that are longer than 300 integers in an article and add zeroes to sequences that are shorter than 300 integers in an article. Note that for padding and truncation, a default setting of "pre" has been used and is not specifically ...

Get Advanced Deep Learning with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.