Solving complex CAPTCHAs
The CAPTCHA system tested so far was relatively straightforward to solve -- the black font color meant that the text could easily be distinguished from the background, and additionally, the text was level and did not need to be rotated for Tesseract to interpret it accurately. Often, you will find websites using simple custom CAPTCHA systems similar to this, and in these cases, an OCR solution is practical. However, if a website uses a more complex system, such as Google's reCAPTCHA, OCR will take a lot more effort and may become impractical.
In these examples, the text is placed at different angles and with different fonts and colors, so plenty more work needs to be done to clean and preprocess the image before OCR ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access