At this point, you know how to extract valuable data from an existing HTML, XML, CSV, or JSON file, or even from plain text. You understand HTML and XML tags and their structure, and you can separate tags from data and normalize words (at least to some extent). There are a lot of powerful projects that literally require just that—and some patience. Let’s practice!
- Broken Link Detector*
Write a program that, given a URL of a web page, reports the names and destinations of broken links in the page. For the purpose of this exercise, a link is broken if an attempt to open it with urllib.request.urlopen fails.
- Wikipedia Miner**
MediaWiki (a Wikimedia project) provides a JSON-based API that enables programmable access to Wikipedia ...