OCR pipeline with Spark

Image processing and computer vision are two classical but still-emerging research areas that often make proper utilization of many types of machine learning algorithms. There are several use cases where the relationships of linking the patterns of image pixels to higher concepts are extremely complex and hard to define, and of course, computationally extensive, too.

From a practical point of view, it's relatively easier for a human being to recognize if an object is a face, a dog, or letters or characters. However, defining these patterns under certain circumstances is difficult. Additionally, image-related datasets are often noisy.

In this section, we will develop a model similar to those used at the core of the Optical ...

Get Large Scale Machine Learning with Spark now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.