O'Reilly logo

Beautiful Data by Toby Segaran, Jeff Hammerbacher

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Conclusion and Future Work

We described an approach to surfacing content from the Deep Web, thereby making that content accessible through search-engine queries. The most significant requirement from our system is that it be completely automatic (and hence scale to the Web), and retrieve content from any domain in any language. Interestingly, these stringent requirements pushed us toward a relatively simple and elegant solution, thereby showing that simplicity is often the key in solving hard problems.

There are many directions for future work on surfacing the Deep Web. In particular, there are certain patterns in forms that can be identified to broaden the coverage of our crawl. For example, pairs of fields are often related to each other (e.g., MinPrice and MaxPrice), and entering valid and carefully chosen pairs of values can result in surfacing more pages.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required