簡易檢索 / 詳目顯示

研究生: 朱聖池
Chu, Sheng-Chih
論文名稱: 從單一圖型中有效率探勘常見鄰近樣式之研究
Efficiently Finding Neighborhood Patterns in a Large Labeled Graph
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 65
中文關鍵詞: 樣式探勘常見鄰近樣式單一圖型探勘
英文關鍵詞: pattern mining, frequent neighborhood pattern, single graph mining
論文種類: 學術論文
相關次數: 點閱:115下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 圖型為具結構性的資料結構,能夠完整描述真實世界的資料關聯性。而單一圖型中的常見鄰近樣式在支持度計算上不僅能維持向下包含特性之外,同時支持度也較能提供明確的意義。我們認為先前研究提出探勘常見鄰近樣式之FNP演算法,因採用類似Apriori 演算法的圖型組合方式,將列舉過多不存在的候選樣式。因此本論文採用先前gSpan圖型成長的方法,進行常見鄰近子圖樣式探勘。本論文所提出的gSFNP方法,利用內嵌圖型結構來加速圖型成長,並運用最小深先搜尋碼來檢查圖型同構問題,同時提出在MapReduce平行框架處理的gSFNP_MR演算法以解決系統記憶體不足的問題。由實驗結果顯示,此兩種方法在探勘鄰近子圖樣式的執行時間上都能比FNP更快速地探勘出常見鄰近樣式。

    Graph is a powerful abstraction of structural data, which is applied to model the various relations among data in a real world. Recently, a new kind of patterns called frequent neighborhood patterns is defined for a large labeled graph. Frequent neighborhood patterns have the downward closure property of the support measure and provide meaningful interpretations of pattern mining. The previous work used an Apriori-like approach to combine the discovered frequent neighborhood patterns into larger candidate patterns, many of the generated candidates may not appear in the graph. In this thesis, we propose an algorithm, which is called the gSFNP algorithm, to find frequent neighborhood patterns efficiently. By applying a pattern growth approach, a data structure of pattern is designed to store the information of the matched sub-graphs for speeding up the following pattern growth computation. Besides, the minimum DFS code of a pattern is used to avoid finding graph isomorphic patterns. Moreover, we propose another MapReduce version of the gSFNP algorithm, which is called the gSFNP_MR algorithm, to solve the problem of insufficient memory in a centralized environment. Finally, we evaluate the performance of gSFNP and gSFNP_MR. The experimental results show that both the proposed algorithms have shorter response time than the previous work.

    目錄 I 圖目錄 III 摘要 1 Abstract 2 誌謝 3 第一章 緒論 4 1.1 研究動機 4 1.2 研究目的 5 1.3 論文方法 8 1.4 論文架構 9 第二章 文獻探討 10 2.1 圖型探勘技術簡介 10 2.2 多筆圖型探勘常見子圖樣式技術 11 2.3 單一圖型常見子圖樣式探勘技術 14 2.4 常見子圖樣式平行探勘技術 16 第三章 問題及相關名詞定義 18 3.1 問題定義 18 3.2 鄰近子圖樣式編碼 20 3.3 圖型特徵儲存結構 24 第四章 鄰近樣式圖型探勘演算法 26 4.1 鄰近子圖樣式列舉方法 26 4.2 避免樣式重覆列舉方法 34 4.3 鄰近子圖樣式演算法完整流程 38 第五章 MapReduce平行圖型探勘 42 第六章 演算法執行效率評估 48 6.1 資料來源及圖型資料產生方式 48 6.2 實驗分析 50 6.3 實驗結果總結 61 第七章 結論及未來發展方向 62 參考文獻 63

    [1] Y. R. Cho, and A. Zhang. “Predicting Protein Function by Frequent Functional Association Pattern Mining in Protein Interaction Networks”. Trans. Info. Tech. Biomed,14(1):30-36,Jan.2010
    [2] L. Dehaspe, H. Toivonen, and R. D. King. “Finding frequent substructures in chemical compounds. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, Proc. of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), pages30–36. AAAI Press, 1998.
    [3] M. Elseidy , E. Abdelhamid, S. Skiadopoulos, and P. Kalnis. “GRAMI: Frequent Subgraph and Pattern Mining in a Singls Large Graph.”, In VLDB'14 2014
    [4] J. Han, and J. R. Wen. “Mining Frequent Neighborhood Patterns in a Large Labeled Graph.” In CIKM'13, 2013.
    [5] J. Han, and J. R. Wen. “Within-network classification using radius-contrained neighborhood patterns.” , in Proc. The ACM Intl. Conf. on Information and Knowledge Management,CIKM,2014.
    [6] S. Hill , B. Srichandan , and R. Sunderraman. An Iterative MapReduce Approach to Frequent Subgraph Mining in Biological Datasets. In ACM-BCB 2012
    [7] L. B Holder, D. J Cook, S. Djoko,et al. “Substructure discovery in the subdue system.” In AAAI Workshop on Knowledge Discovery in Database,KDD-94,pages 169-180, 1994.
    [8] J. Huan, W. Wang and J. Prins.”Efficient Mining of Frequent Subgraph in the Presence of Isomorphism.” In ICDM'03, 2003.
    [9] A. Inokuchi, T.Washio,and H.Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of the 4th European Conf. on Principles of Data Mining and Knowledge Discovery,PKDD'00,pp 13-23,2000.
    [10] M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of 2001 IEEE International Conference on Data Mining (ICDM),November 2001.
    [11] M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. In IEEE Transacrion on Knowledge and Data Engineering,16(9),pp.1038 - 1051,2004.
    [12] M. Kuramochi and G. Karypis. “Finding frequent patterns in a large sparse graph.” In Data Mining and Knowledge Discovery,11(3):243-271,2005
    [13] C. W. Leung , E. P. Lim , D. Lo and J. Weng. “Mining interesting link formation rules in social networks.”, In Proc. the ACM Intl.Conf.on Information and Knowledge Management,CIKM'10,pp.209-218,2010.
    [14] W. Lin,X. Xiao, and G. Ghinita. “Large-scale frequent subgraph mining in MapReduce” in Proc. the IEEE Inter. Conf. on Data Engineering, ICDE’14,2014.
    [15] Y. Liu, X. Jiang, H. Chen , J. Ma , and X. Zhang. “MapReduce-Based Pattern Finding Algorithm Applied in Motif Detection for Prescription Compatibility Network.”, In Advanced Parallel Processing Technologies,pages 341-355, Springer,2009.
    [16] W. Lu ,G. Chen,A. KH Tung and F. Zhao.”Efficiently extracting frequent subgraphs using mapreduce. In 2013 IEEE International Conference on Big Data, Page 639-647. IEEE, 2013.
    [17] T. Meinl, M. Worlein, I. Fischer and M. Philippsen. “Mining molecular datasets on symmetric multiprocessor systems. In IEEE International Conference on systems,Man and Cybernetics, volume 2 of SMC ’06,pages 1269-1274.IEEE, 2006.
    [18] S. Nijssen and J. N. Kok. “A quickstart in frequent structure mining can make a difference.” In Proc. the ACM Intl. Conf. on Knowledge Discovery and Data Mining,SIGKDD'04,pp. 647-652,2004
    [19] X.Yan, and J. Han. “gSpan:graph-based substructure pattern mining.” In ICDM'02 2002.

    下載圖示
    QR CODE