We described an approach to surfacing content from the Deep Web, thereby making that content accessible through search-engine queries. The most significant requirement from our system is that it be completely automatic (and hence scale to the Web), and retrieve content from any domain in any language. Interestingly, these stringent requirements pushed us toward a relatively simple and elegant solution, thereby showing that simplicity is often the key in solving hard problems.
There are many directions for future work on surfacing the Deep Web. In particular, there are certain patterns in forms that can be identified to broaden the coverage of our crawl. For example, pairs of fields are often related to each other (e.g.,
MaxPrice), and entering valid and carefully chosen pairs of values can result in surfacing more pages.