November 2019
Intermediate to advanced
346 pages
9h 36m
English
We start our preparation by downloading the PDFiD tool and placing our PDF files in a convenient location for analysis (Step 1). Note that the tool is free and simple to use. Continuing, we import the very useful IPython's io module in order to capture the results of an external program, namely, PDFiD (Step 2). In the following steps, Step 3 and Step 5, we define a function PDF to FV that takes a PDF file and featurizes it. In particular, it utilizes the PDFiD tool, and then parses its output into a convenient form. When we run on the PDFSamples\PythonBrochure.pdf file, our functions output the following vector:
[1096, 1095, 1061, 1061, 0, 0, 2, 32, 0, 43, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0]
Now that we are able to featurize ...