The next step is to find where the text is located and extract it. There are two common strategies for this:
- Using connected component analysis: Searching groups of connected pixels in the image. This will be the technique that will be used in this chapter.
- Use classifiers to search for a previously trained letter texture pattern: with texture features such as Haralick features, wavelet transforms are often used. Anther option is to identify maximally stable extremal regions (MSERs) in this task. This approach is more robust for text in a complex background and will be studied in Chapter 11, Text Recognition with Tesseract. You can read about Haralick features at his own website, which can be found at http://haralick.org/journals/TexturalFeatures.pdf ...