研究生: |
朱聖池 Chu, Sheng-Chih |
---|---|
論文名稱: |
從單一圖型中有效率探勘常見鄰近樣式之研究 Efficiently Finding Neighborhood Patterns in a Large Labeled Graph |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 中文 |
論文頁數: | 65 |
中文關鍵詞: | 樣式探勘 、常見鄰近樣式 、單一圖型探勘 |
英文關鍵詞: | pattern mining, frequent neighborhood pattern, single graph mining |
論文種類: | 學術論文 |
相關次數: | 點閱:115 下載:7 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
圖型為具結構性的資料結構,能夠完整描述真實世界的資料關聯性。而單一圖型中的常見鄰近樣式在支持度計算上不僅能維持向下包含特性之外,同時支持度也較能提供明確的意義。我們認為先前研究提出探勘常見鄰近樣式之FNP演算法,因採用類似Apriori 演算法的圖型組合方式,將列舉過多不存在的候選樣式。因此本論文採用先前gSpan圖型成長的方法,進行常見鄰近子圖樣式探勘。本論文所提出的gSFNP方法,利用內嵌圖型結構來加速圖型成長,並運用最小深先搜尋碼來檢查圖型同構問題,同時提出在MapReduce平行框架處理的gSFNP_MR演算法以解決系統記憶體不足的問題。由實驗結果顯示,此兩種方法在探勘鄰近子圖樣式的執行時間上都能比FNP更快速地探勘出常見鄰近樣式。
Graph is a powerful abstraction of structural data, which is applied to model the various relations among data in a real world. Recently, a new kind of patterns called frequent neighborhood patterns is defined for a large labeled graph. Frequent neighborhood patterns have the downward closure property of the support measure and provide meaningful interpretations of pattern mining. The previous work used an Apriori-like approach to combine the discovered frequent neighborhood patterns into larger candidate patterns, many of the generated candidates may not appear in the graph. In this thesis, we propose an algorithm, which is called the gSFNP algorithm, to find frequent neighborhood patterns efficiently. By applying a pattern growth approach, a data structure of pattern is designed to store the information of the matched sub-graphs for speeding up the following pattern growth computation. Besides, the minimum DFS code of a pattern is used to avoid finding graph isomorphic patterns. Moreover, we propose another MapReduce version of the gSFNP algorithm, which is called the gSFNP_MR algorithm, to solve the problem of insufficient memory in a centralized environment. Finally, we evaluate the performance of gSFNP and gSFNP_MR. The experimental results show that both the proposed algorithms have shorter response time than the previous work.
[1] Y. R. Cho, and A. Zhang. “Predicting Protein Function by Frequent Functional Association Pattern Mining in Protein Interaction Networks”. Trans. Info. Tech. Biomed,14(1):30-36,Jan.2010
[2] L. Dehaspe, H. Toivonen, and R. D. King. “Finding frequent substructures in chemical compounds. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, Proc. of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), pages30–36. AAAI Press, 1998.
[3] M. Elseidy , E. Abdelhamid, S. Skiadopoulos, and P. Kalnis. “GRAMI: Frequent Subgraph and Pattern Mining in a Singls Large Graph.”, In VLDB'14 2014
[4] J. Han, and J. R. Wen. “Mining Frequent Neighborhood Patterns in a Large Labeled Graph.” In CIKM'13, 2013.
[5] J. Han, and J. R. Wen. “Within-network classification using radius-contrained neighborhood patterns.” , in Proc. The ACM Intl. Conf. on Information and Knowledge Management,CIKM,2014.
[6] S. Hill , B. Srichandan , and R. Sunderraman. An Iterative MapReduce Approach to Frequent Subgraph Mining in Biological Datasets. In ACM-BCB 2012
[7] L. B Holder, D. J Cook, S. Djoko,et al. “Substructure discovery in the subdue system.” In AAAI Workshop on Knowledge Discovery in Database,KDD-94,pages 169-180, 1994.
[8] J. Huan, W. Wang and J. Prins.”Efficient Mining of Frequent Subgraph in the Presence of Isomorphism.” In ICDM'03, 2003.
[9] A. Inokuchi, T.Washio,and H.Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of the 4th European Conf. on Principles of Data Mining and Knowledge Discovery,PKDD'00,pp 13-23,2000.
[10] M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of 2001 IEEE International Conference on Data Mining (ICDM),November 2001.
[11] M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. In IEEE Transacrion on Knowledge and Data Engineering,16(9),pp.1038 - 1051,2004.
[12] M. Kuramochi and G. Karypis. “Finding frequent patterns in a large sparse graph.” In Data Mining and Knowledge Discovery,11(3):243-271,2005
[13] C. W. Leung , E. P. Lim , D. Lo and J. Weng. “Mining interesting link formation rules in social networks.”, In Proc. the ACM Intl.Conf.on Information and Knowledge Management,CIKM'10,pp.209-218,2010.
[14] W. Lin,X. Xiao, and G. Ghinita. “Large-scale frequent subgraph mining in MapReduce” in Proc. the IEEE Inter. Conf. on Data Engineering, ICDE’14,2014.
[15] Y. Liu, X. Jiang, H. Chen , J. Ma , and X. Zhang. “MapReduce-Based Pattern Finding Algorithm Applied in Motif Detection for Prescription Compatibility Network.”, In Advanced Parallel Processing Technologies,pages 341-355, Springer,2009.
[16] W. Lu ,G. Chen,A. KH Tung and F. Zhao.”Efficiently extracting frequent subgraphs using mapreduce. In 2013 IEEE International Conference on Big Data, Page 639-647. IEEE, 2013.
[17] T. Meinl, M. Worlein, I. Fischer and M. Philippsen. “Mining molecular datasets on symmetric multiprocessor systems. In IEEE International Conference on systems,Man and Cybernetics, volume 2 of SMC ’06,pages 1269-1274.IEEE, 2006.
[18] S. Nijssen and J. N. Kok. “A quickstart in frequent structure mining can make a difference.” In Proc. the ACM Intl. Conf. on Knowledge Discovery and Data Mining,SIGKDD'04,pp. 647-652,2004
[19] X.Yan, and J. Han. “gSpan:graph-based substructure pattern mining.” In ICDM'02 2002.