Chapter 16

Learning Algorithms for Document Layout Analysis

Simone Marinai,    Dipartimento di Sistemi e Informatica, Universitá degli Studi di Firenze, Italy

Abstract

In this chapter we describe several approaches that have been proposed to use learning algorithm to analyze the layout of digitized documents. Layout analysis encompasses all the techniques that are used to infer the organization of the page layout of document images. From a physical point of view the layout can be described as composed by blocks, in most cases rectangular, that are arranged in the page and contain homogeneous content, such as text, vectorial graphics, or illustrations. From a logical point of view text blocks can have a different meaning on the basis of their content ...

Get Handbook of Statistics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.