Building a tiny web crawler

Let's crawl the web! For this recipe, we'll devise a little spider that'll look over webpages, searching for hyperlinks, and crawl them again for any other web pointers they happen to hold. It'll carry on with this process over and over again, recursively discovering more links, until the maximum depth of search is reached.

The crawler's job can incur exponential time usage as, for every single page, it'll have to initiate as many HTTP connections as hyperlinks it might be able to discover. If it had to do this sequentially, the overall time taken by the operation of crawling many levels of web indirections as it will have to explore only a single link at every step would be equal to the sum of the time it takes to process ...

Get Clojure Data Structures and Algorithms Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Clojure Data Structures and Algorithms Cookbook by Rafik Naccache

Building a tiny web crawler

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly