
112
Big Data: Storage, Sharing, and Security
The purpose of [9] is to crawl over all entity documents in deep web data sources, such
as the documents containing product names and other attributes. Since the authors have all
query logs of Google, the query logs (its format is < query,url
clicked
,times
clicked
>)toward
a target data source are collected and only queries that are clicked for at least two times are
considered. Then the relevant entity names are extracted from the satisfied log queries. The
extraction is based on the Freebase data [37] that provides 22 million entity names. Finally, all
extracted entity names will be sent to the target data ...