December 2018
Beginner to intermediate
684 pages
21h 9m
English
Now you only need to combine all the interesting elements from the website to create a feature that you could use in a model to predict economic activity in geographic regions or foot traffic in specific neighborhoods.
With Selenium, you can follow the links to the next pages and quickly build a dataset of over 10,000 restaurants in NYC that you could then update periodically to track a time series. First, we set up a function that parses the content of the pages that we plan on crawling:
def parse_html(html): data, item = pd.DataFrame(), {} soup = BeautifulSoup(html, 'lxml') for i, resto in enumerate(soup.find_all('div', class_='rest-row- info')): item['name'] = resto.find('span', class_='rest-row-name- ...