Skip to Content
Advanced Machine Learning with R
book

Advanced Machine Learning with R

by Cory Lesmeister, Dr. Sunil Kumar Chinnamgari
May 2019
Intermediate to advanced
664 pages
15h 41m
English
Packt Publishing
Content preview from Advanced Machine Learning with R

Word frequency in all addresses

To get rid of stop words in a tidy format, you can use the stop_words data frame provided in the tidytext package. You call that tibble into the environment, then do an anti-join by word:

> library(tidytext)> data(stop_words)> sotu_tidy <- sotu_unnest %>%    dplyr::anti_join(stop_words, by = "word")

Notice that the length of the data went from 1.97 million observations down to 778,161. Now, you can go ahead and see the top words. I don't do it in the following, but you can put this into a data frame if you so choose: 

> sotu_tidy %>%    dplyr::count(word, sort = TRUE)# A tibble: 29,558 x 2   word           n   <chr>      <int> 1 government  7573 2 congress    5759 3 united      5102 4 people      4219 5 country     3564 6 public      3413 7 time        3138 8 war ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Machine Learning Using R

Machine Learning Using R

Karthik Ramasubramanian, Abhishek Singh
Machine Learning with R Cookbook - Second Edition

Machine Learning with R Cookbook - Second Edition

AshishSingh Bhatia, Yu-Wei, Chiu (David Chiu)
Practical Machine Learning in R

Practical Machine Learning in R

Fred Nwanganga, Mike Chapple

Publisher Resources

ISBN: 9781838641771Supplemental Content