November 2019
Intermediate to advanced
346 pages
9h 36m
English
Our starting point is to collect a large corpus of CAPTCHAs (step 1). You can find these in captcha_images.7z. Alternatively, since Really Simple CAPTCHA's code is available online, you can modify it to generate a large number of CAPTCHAs. Additional ideas include utilizing bots to scrape CAPTCHAs. Next, in step 2, we specify where the CAPTCHA images are stored and then enumerate all CAPTCHAs in the specified folder. Our goal is to begin processing these. In step 3, we define a function to threshold and grayscale the CAPTCHA images. This allows us to reduce the computation, as well as making it easier to determine where one character starts and where the next one ends. We then define a function to obtain the label of a CAPTCHA ...