April 2018
Beginner
238 pages
7h 13m
English
Using R, the script becomes:
#saved from https://transcripts.factcheck.org/remarks-president-trump-tax-reform-event/path <- "C:/Users/Dan/trump.txt"text <- readLines(path, encoding="UTF-8")
It's more interesting to start processing the speech:
# create corpus#install.packages("tm", repos='http://cran.us.r-project.org')library(tm)vs <- VectorSource(text)elem <- getElem(stepNext(vs))result <- readPlain(elem, "en", "idi")txt <- Corpus(vs)summary(txt)
This results in a corpus being created from the speech contents:

Some of the text processing that you can perform:
# convert to lower casetxtlc <- tm_map(txt, tolower)inspect(txt[1]) ...