Analyzing PDF reports in a folder with the tm package

Text analytics is basically a way to perform quantitative analysis on qualitative information stored in text. In this recipe, we will create a corpus of documents from PDF files and perform descriptive analytics on them, looking for the most frequent terms.

This is a particularly useful recipe for professionals who work with PDF reports.

In this recipe, we will explore the full text of the Italian medieval masterpiece Divine Comedy by Dante Alighieri. You can find out more on Wikipedia at https://en.wikipedia.org/wiki/Divine_Comedy:

Get RStudio for R Statistical Computing Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.