O'Reilly logo

R Deep Learning Cookbook by Achyutuni Sri Krishna Rao, Dr. PKS Prakash

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

How to do it...

Here is how we go about preprocessing:

  1. Load the required packages:
load_packages=c("janeaustenr","tidytext","dplyr","stringr","ggplot2","wordcloud","reshape2","igraph","ggraph","widyr","tidyr") 
lapply(load_packages, require, character.only = TRUE) 
  1. Load the Pride and Prejudice dataset. The line_num attribute is analogous to the line number printed in the book:
Pride_Prejudice <- data.frame("text" = prideprejudice, 
                              "book" = "Pride and Prejudice", 
                              "line_num" = 1:length(prideprejudice), 
                              stringsAsFactors=F) 
  1. Now, perform tokenization to restructure the one-string-per-row format to a one-token-per-row format. Here, the token can refer to a single word, a group of characters, co-occurring words (n-grams), sentences, paragraphs, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required