Exercises

  1. 12.1 (Web Scraping with the Requests and Beautiful Soup Libraries) Web pages are excellent sources of text to use in NLP tasks. In the following IPython session, you’ll use the requests library to download the www.python.org home page’s content. This is called web scraping. You’ll then use the Beautiful Soup library37 to extract only the text from the page. Eliminate the stop words in the resulting text, then use the wordcloud module to create a word cloud based on the text.

    In [1]: import requests
    
    In [2]: response = requests.get('https://www.python.org')
    
    In [3]: response.content # gives back the page's HTML
    
    In [4]: from bs4 import BeautifulSoup
    
    In [5]: soup = BeautifulSoup(response.content, ...

Get Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.