Experiments on Query Expansion for Internet Yellow
Page Services Using Web Log Mining
Yusuke Ohura t, Katsumi Takahashi tl:, Iko Pramudiono t, Masaru Kitsuregawa t
t Institute of Industrial Science, University of Tokyo
4-6-1 Komaba, Meguro-ku, Tokyo
153-8505, Japan
{ ohura,katsumi,iko,kit sure } @tkl.iis. u-tokyo, ac.j p
:!: NTT Information Sharing Platform Laboratories
Nippon Telegraph and Telephone Corporation
3-9-11 Midori-cho, Musashino-shi, Tokyo
180-8585, Japan
takahashi.katsumi@lab.ntt.co.jp
Abstract
1 Introduction
Tremendous amount of access log data is accu-
mulated at many web sites. Several efforts to
mine the data and apply the results to support
end-users or to re-design the Web site's struc-
ture have been proposed. This paper describes
our trial on access logs utilization from com-
mercial yellow page service called "iTOWN-
PAGE". Our initial statistical analysis reveals
that many users search various categories-
even non-sibling ones in the provided hierar-
chy - together, or finish their search without
any results that match their queries. To solve
these problems, we first cluster user requests
from the access logs using enhanced K-means
clustering algorithm and then apply them for
query expansion. Our method includes two-
steps expansion that 1) recommends similar
categories to the request, and 2) suggests re-
lated categories although they are non-similar
in existing category hierarchy. We also report
some evaluations that show the effectiveness
of the prototype system.
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed for
direct commercial advantage, the VLDB copyright notice and
the title of the publication and its date appear, and notice is
given that copying is by permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires a fee
and/or special permission from the Endowment.
Proceedings of the
28th VLDB Conference,
Hong Kong, China, 2002
1.1
Introduction
The rapid progress on storage capacity and proces-
sor performance brought us a chance to analyze huge
log data left on Web servers. With early success of
"Click-stream" analysis, many research groups and in-
dustries are paying more attention to Web log mining
techniques. By utilizing those techniques, several pro-
posals have been made to support end-users or to re-
design web site. But as far as we know, no technical
report on huge log data mining is available to public.
This paper reports results of log data mining and
query expansion experiments on the huge commer-
cial Web service called "iTOWNPAGE", an online
Japanese telephone directory system. We analyze 450
million lines of iTOWNPAGE log data and create ses-
sion clusters from 24 million lines of selected log data.
Our initial statistical analysis founds that many cat-
egories that are not sibling in the given yellow pages
hierarchy are searched together in one user session.
We also found that many queries fail that no result
matched to the user requests. To cope with these
problems, we propose a query expansion method us-
ing user requests clusters obtained by our enhanced
K-means clustering on log data. Our method includes
two-steps expansion that 1) recommends similar cate-
gories to the request, and 2) suggests non-similar cat-
egories in existing hierarchy but found to be related
in log-analysis as well. Here we also report the im-
plementation details of our prototype and also some
evaluations that prove its effectiveness.
1008

Get Proceedings 2002 VLDB Conference now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.