Vertical Mining of Frequent
Patterns from Uncertain Data
Laila A. Abd-Elmegid, Mohamed E. El-Sharkawi,
Laila M. El-Fangary and Yehia K. Helmy
ABSTRACT
Ecient algorithms have been developed for mining frequent patterns in tra-
ditional data where the content of each transaction is denitely known. ere
are many applications that deal with real data sets where the contents of
the transactions are uncertain. Limited research work has been dedicated for
mining frequent patterns from uncertain data. is is done by extending the
state of art horizontal algorithms proposed for mining precise data to be suit-
able with the uncertainty environment. Vertical mining is a promising ap-
proach that is experimentally proved to be more ecient than the horizontal
mining. In this paper we extend the state-of-art vertical mining algorithm
Eclat for mining frequent patterns from uncertain data producing the pro-
posed UEclat algorithm. In addition, we compared the proposed UEclat al-
gorithm with the UF-growth algorithm. Our experimental results show that
V M  F P  U D 257
the proposed algorithm outperforms the UF-growth algorithm by at least one
order of magnitude.
Keywords: Frequent patterns, Uncertain data, Vertical mining, Tidset, Diset,
Association rules, Data mining
Introduction
Frequent pattern mining has been a focused theme in data mining research for
over a decade. It is a core technique used in many mining tasks like sequential
pattern mining, structured pattern mining, correlation mining, associative clas-
sication, and frequent pattern-based clustering (C. Zhu, X. Zhang, J. Sun, and
B. Huang, 2009), as well as their broad applications (H. Kriegel, P. Kroger and
A. Zimek, 2009) (Y. Koh, N. Rountree, R. O’Keefe, 2008) (A.Ceglar and J. Rod-
dick, 2006). So, a great eort has been dedicated to this research and tremendous
progress has been made to develop ecient and scalable algorithms for frequent
pattern mining (A. Ceglar and J. Roddick, 2006) (Z. Zheng, R. Kohavi, and L.
Mason, 2001) (J. Han, H. Cheng, D. Xin, and X. Yan, 2007). All these algo-
rithms deal with precise data sets (J. Han, J. Pei, Y. Yin and R. Mao, 2004) (M.
Zaki, 2000) (P. Shenoy, J. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa and D.
Shah, 2000) (M. Zaki and K. Gouda, 2003)(W. Consue, and W. Kurutach, 2003)
(B. Goethals, 2004)(M. Song, S. Rajasekaran, 2006). Such data is characterized
by known and denite existence of the items or events in the transactions. How-
ever, there are datasets where the exact existence of items in the transactions can-
not be gained. ese datasets are called uncertain data. e existence of an item
in a transaction is best captured by a likelihood measure or a probability (Chui,
C.-K., Kao, B., Hung, E.). As an example, a medical dataset may contain a table
of patients’ records, each of which contains a set of diseases that a patient suers.
In such case the physician may highly suspect (but cannot guarantee) that a pa-
tient suers from a specic disease. So he expresses his suspection by a probability
of the existence of such disease (H., Li, H., Yang, Q. 2007). Another example
of uncertain dataset is pattern recognition applications. Given a satellite picture,
image processing techniques can be applied to extract features that indicate the
presence or absence of certain target objects (such as bunkers). Due to noises and
limited resolution, the presence of a feature in a spatial area is often uncertain
and expressed as a probability (Dai, X., Yiu, M.L., et al. 2005). Figure 1 shows an
example of precise and uncertain data sets. Few algorithms have been dedicated
for mining frequent patterns from uncertain data. All these algorithms follow the
horizontal data representation.

Get Data Structure and Software Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.