July 2016
Beginner to intermediate
462 pages
9h 14m
English
Corporate websites are usually made by teams or departments using specialized tools and templates. A lot of the content is generated on the fly and consists of a large part of JavaScript and CSS. This means that even if we download the content, we still have to, at least, evaluate the JavaScript code. One way that we can do this from a Python program is using the Selenium API. Selenium's main purpose is actually testing websites, but nothing stops us from using it to scrape websites.
Instead of scraping a website, we will scrape an IPython Notebook—the test_widget.ipynb file in this book's code bundle. To simulate browsing this web page, we provided a unit test class in test_simulating_browsing.py. In case you wondered, ...