O'Reilly logo

Intelligent Document Capture with Ephesoft by Clifford Laurin, Michael Muller, Ike Kavas, Pat Myers

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

No blank forms for training

Classification is most accurate when the system is trained with blank forms. If blank forms are not available, accurate classification can still be achieved.

Pressing Learn Files in the Batch Class Management administrative interface will cause Ephesoft to OCR the sample documents into a HOCR file. This is an HTML representation of the OCR output.

The HTML file can be edited to remove any content that is not part of the blank form. After the HTML file is updated, click on Learn Files again to update the index files used by Ephesoft. This will not overwrite the changes that have been made to the HTML file; that will only happen if the source TIFF is updated.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required