November 2019
Intermediate to advanced
346 pages
9h 36m
English
In the following steps, you will featurize a collection of PDF files using the PDFiD script:
from IPython.utils import io
def PDF_to_FV(file_path): """Featurize a PDF file using pdfid."""
with io.capture_output() as captured: %run -i pdfid $file_path out = captured.stdout
out1 = out.split("\n")[2:-2] return [int(x.split()[-1]) for x in out1]