Exercises

12.1 (Web Scraping with the Requests and Beautiful Soup Libraries) Web pages are excellent sources of text to use in NLP tasks. In the following IPython session, you’ll use the requests library to download the www.python.org home page’s content. This is called web scraping. You’ll then use the Beautiful Soup library³⁷ to extract only the text from the page. Eliminate the stop words in the resulting text, then use the wordcloud module to create a word cloud based on the text.
37. Its module name is bs4 for Beautiful Soup 4.
```
In [1]: import requests

In [2]: response = requests.get('https://www.python.org')

In [3]: response.content # gives back the page's HTML

In [4]: from bs4 import BeautifulSoup

In [5]: soup = BeautifulSoup(response.content, ...
```

Get Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud by Paul J. Deitel, Harvey M. Deitel

Exercises

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly