O'Reilly logo

Intelligent Document Capture with Ephesoft - Second Edition by Jon Solove, Michael Muller, Ike Kavas, Pat Myers

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

No blank forms available for training

Classification is the most accurate when the system is trained with blank forms (a form that has not been completed). If blank forms are not available, accurate classification can still be achieved.

The first option is to redact (remove sensitive and instance-unique data) on the samples you have before uploading them to Ephesoft for training.

The second option involves editing the HOCR file that is created after clicking on Learn Files in the Batch Class Management administrative interface. The HOCR file is the XML representation of the OCR output.

The XML file can be edited to remove any content that is not part of the blank form. After the XML file is updated, click on Learn Files again to update the index ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required