December 2018
Beginner to intermediate
684 pages
21h 9m
English
We will use the browser automation tool Selenium to operate a headless FireFox browser that will parse the HTML content for us.
The following code opens the FireFox browser:
from selenium import webdriver# create a driver called Firefoxdriver = webdriver.Firefox()
Let's close the browser:
# close itdriver.close()
To retrieve the HTML source code using selenium and Firefox, do the following:
import time, re# visit the opentable listing pagedriver = webdriver.Firefox()driver.get(url)time.sleep(1) # wait 1 second# retrieve the html sourcehtml = driver.page_sourcehtml = BeautifulSoup(html, "lxml")for booking in html.find_all('div', {'class': 'booking'}): match = re.search(r'\d+', booking.text)