Analyzing PDF reports in a folder with the tm package

Text analytics is basically a way to perform quantitative analysis on qualitative information stored in text. In this recipe, we will create a corpus of documents from PDF files and perform descriptive analytics on them, looking for the most frequent terms.

This is a particularly useful recipe for professionals who work with PDF reports.

In this recipe, we will explore the full text of the Italian medieval masterpiece Divine Comedy by Dante Alighieri. You can find out more on Wikipedia at https://en.wikipedia.org/wiki/Divine_Comedy:

Get RStudio for R Statistical Computing Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.