So we were looking for a mash-up with characteristics that would:
Generate custom sets of URLs to test up to one million URLs using a URL list from alexa.com
Manage those large sets of URLs in the context of the test framework (Spider)
Launch a selected version of Firefox, open a page, collect memory leak and assertion information, and then quit Firefox (Sisyphus)
Load extensions (Sisyphus)
Continue across crashes (Sisyphus)
In addition to launching web pages, spider through their links for more rigorous testing (Spider)
We could pull the URLs from a site such as alexa.com. At first we only knew about Alexa’s top-100 or top-500 URL lists. During a meeting we played with this idea and looked around the Alexa site (must have been a boring meeting). We discovered it had a top-million link. Why stop at 50,000 or 100,000 websites? Now we were able to download large lists of top URLs, and we could scale our tests to 1,000, 50,000, or 1,000,000 sites! We could also create custom sets of sites.
We downloaded the Top 1,000,000 Sites list from Alexa (http://s3.amazonaws.com/alexa-static/top-1m.csv.zip) and converted the list into a format that the test script would understand (just a simple .txt file with a URL in the form of http://example.com on each line). This was very flexible, and we could add as many URLs as we liked.
We had a database as part of the Spider tool. This database could track URLs and related information more effectively and in a more flexible way. ...