How to do it...

Before jumping to the model-building part, let's clean the input data:

  1. First, we need to create a custom function, clean_data(), in order to convert the messy data into a cleaned dataset. We will apply this function to both the reviews and the associated summaries and then put the cleaned versions into a DataFrame for easy data manipulation:
clean_data <- function(data,remove_stopwords = TRUE){ data <- tolower(data) data = replace_contraction(data) data = gsub('<br />', '', data) data = gsub('[[:punct:] ]+',' ',data) data = gsub("[^[:alnum:]\\-\\.\\s]", " ", data) data = gsub('&amp', '', data) data = if(remove_stopwords == "TRUE"){paste0(unlist(rm_stopwords(data,tm::stopwords("english"))),collapse = " ")}else{data} data ...

Get Deep Learning with R Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.