Case study
To tie everything together, we'll be writing a simple link collector, which will visit a website and collect every link on every page it finds in that site. Before we start, though, we'll need some test data to work with. Simply write some HTML files to work with that contain links to each other and to other sites on the Internet, something like this:
<html> <body> <a href="contact.html">Contact us</a> <a href="blog.html">Blog</a> <a href="esme.html">My Dog</a> <a href="/hobbies.html">Some hobbies</a> <a href="/contact.html">Contact AGAIN</a> <a href="http://www.archlinux.org/">Favorite OS</a> </body> </html>
Name one of the files index.html
so it shows up first when pages are served. Make sure the other files exist, and keep things complicated ...
Get Python: Real-World Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.