Exercises
12.1 (Web Scraping with the Requests and Beautiful Soup Libraries) Web pages are excellent sources of text to use in NLP tasks. In the following IPython session, you’ll use the
requests
library to download thewww.python.org
home page’s content. This is called web scraping. You’ll then use the Beautiful Soup library37 to extract only the text from the page. Eliminate the stop words in the resulting text, then use thewordcloud
module to create a word cloud based on the text.In [1]: import requests
In [2]: response = requests.get('https://www.python.org')
In [3]: response.content # gives back the page's HTML
In [4]: from bs4 import BeautifulSoup
In [5]: soup = BeautifulSoup(response.content, ...
Get Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.