October 2017
Beginner to intermediate
236 pages
7h 38m
English
The PDF file stores the text in different layers, and the pdf_text() function reaches out to those layers and imports the relevant text data into the R environment. In this example, you supplied the PDF files that contain only plain text. Using this same function, you can extract bookmarks from the PDF if there are any.
If you are interested in extracting the meta data of a PDF file, such as font name, author name, version, and so on, then you can use the following intuitive functions:
> pdf_fonts(pdfFileNames[1]) name type embedded file 1 ELGFYK+Calibri truetype TRUE > info <- pdf_info(pdfFileNames[1]) > names(info) [1] "version" "pages" "encrypted" "linearized" "keys" "created" "modified" [8] "metadata" "locked" "attachments" ...