簡易檢索 / 詳目顯示

研究生: 蔡明瑾
論文名稱: 資料流中常見子圖樣式改變之漸進式偵測
Incremental Detection for Frequent Sub-Graph Patterns Changing on Data Streams
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊教育研究所
Graduate Institute of Information and Computer Education
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 80
中文關鍵詞: 資料串流改變偵測常見子圖樣式
英文關鍵詞: data stream change detection, frequent sub-graphs patterns
論文種類: 學術論文
相關次數: 點閱:151下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 圖型為具結構性的資料結構,能夠完整描述真實世界的資料關聯性。探勘常見子圖必須考慮子圖間為同質異構體的問題,因此處理複雜度高,應用在資料流探勘的成本更高。本論文考慮從單位時間的累積圖型資料集中探勘常見子圖樣式的問題,我們認為在資料流中資料樣式的改變會經過一個過渡期,因此未必在每一個時間點皆需要進行重新探勘。本論文將初始時間點探勘出的常見子圖樣式稱為基底常見子圖樣式,提出從新進累積圖型資料集中有效率地偵測比對是否包含基底常見子圖樣式的方法,近似估算出基底常見子圖樣式在新進累積圖型資料集中仍為常見的比例值,以判定資料流中常見子圖樣式趨勢轉移的情況,並建議需重新進行新資料中子圖樣式探勘的時間點。本論文所提出的FGCD方法建立不同的圖型資料結構,並運用常見子圖樣式的向下包含特性來加速比對過程。由實驗結果顯示,此方法可有效比對估算出新進累積圖型資料集中仍維持常見之基底子圖樣式,當樣式趨勢尚未發生改變時,比重新進行探勘在更快的時間內提供常見子圖樣式的近似結果。

    Graph is a kind of structural data, which is applied to model the various relations among data in real world. Mining frequent sub-graph patterns, being equal to solve the problem of checking graph isomorphism, is a NP hard problem. Therefore, mining frequent sub-graph patterns in data streams is an even more complicated problem. In this thesis, graph data at every time point is collected for mining frequent sub-graph patterns at the time point. We assume that the changing of frequent sub-graph patterns will take several time points. Therefore, it is not necessary to re-mine frequent sub-graph patterns at each time point. The frequent sub-graph patterns discovered at the first time point are named base patterns. An efficient method, named FGCD algorithm, is proposed to detect the change of base patterns at the following time points, the FGCD algorithm approximately counts the frequencies of base patterns in the set of newly coming graphs, and calculates the percentage of remaining frequent patterns to decide whether the trend of frequent sub-graph patterns is changing or not and trigger to perform the re-mining of frequent sub-graph patterns. The storage structures of graphs are designed and the downward closure property among frequent sub-graphs is applied in the proposed method to efficiently match the sub-graphs patterns. According to experimental results, FGCD can approximately estimate the percentage of base patterns that remain frequent. When the trend of frequent sub-graph patterns does not change, FGCD algorithm provides a more efficient way than re-mining to maintain the frequent sub-graph patterns approximately.

    附表目錄..................................................................................................Ⅱ 附圖目錄..................................................................................................Ⅲ 第一章 緒論..............................................................................................1 1-1 背景與研究動機.............................................................................................1 1-2 相關研究 ........................................................................................................3 1-3 論文方法 ......................................................................................................10 1-4 論文架構 ......................................................................................................11 第二章 相關名詞定義.........................................................................................12 2-1 基本名詞定義...............................................................................................13 2-2 子圖樣式編碼...............................................................................................15 第三章 儲存圖型特徵結構....................................................................19 3-1 圖型特徵結構...............................................................................................19 3-2 基底常見子圖之圖型特徵結構...................................................................20 3-3 新進累積圖型資料集之圖型特徵結構.......................................................33 第四章 偵測常見子圖樣式改變演算法................................................40 4-1 比對流程處理概念........................................................................................40 4-2 樹狀結構之基底常見子圖樣式特徵比對....................................................46 4-3 具有相等邊之基底常見子圖樣式特徵比對................................................55 4-4 具有返回邊之基底常見子圖樣式特徵比對................................................57 4-5 比對誤判 .......................................................................................................61 4-6 減少基底常見子圖樣式比對技巧 .............................................................. 65 4-7 趨勢轉移判斷 ...............................................................................................67 第五章 演算法執行效率評估.................................................................69 5-1 資料來源及圖型資料串流產生方式............................................................69 5-2 實驗評估........................................................................................................72 5-3 實驗結果總結................................................................................................79 第六章 結論.............................................................................................80 參考文獻...................................................................................................81

    [1] M. Cohen and E. Gudes, “Diagonally Subgraphs Pattern Mining,” in Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), 2004.
    [2] B Goethals, E Hoekx, J Van den Bussche, “Mining Tree Queries in a Graph,” in Proc. of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05), 2005.
    [3] L. Huan, W. Wang and J. Prins, “SPIN: Mining Maximal Frequent Subgraphs from Graph Databases,” in Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery (KDD’04), 2004.
    [4] J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins and A. Tropsha, “Mining Protein Family Specific Residue Packing Patterns from Protein Structure Graphs,” in 8th Annual International Conference on Research in Computational Molecular Biology (RECOMB’04), pages 308-315, 2004.
    [5] R. Jin, C. Wang, D. Polshakov, S. Parthasarathy, and G. Agrawal, “Discovering Frequent Topological Structures from Graph Datasets,” in Proc. of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05), pages 606-611, 2005.
    [6] M. Kuramochi and G. Karypis, “An Efficient Algorithm for Discovering Frequent Subgraphs,” in Proc. of the 2001 International Conference on Data Mining (ICDM’01), 2001.
    [7] M. Kuramochi and G. Karypis, “Finding Frequent Patterns in a Large Sparse Graph.,” in SIAM International Conference on Data Mining(SDM’04), pages 243-271, 2004.
    [8] R. Read and D. Corneil, “The graph isomorph disease,” Journal of Graph Theory, 1:339-363, 1977.
    [9] G. S. Manku and R. Motwani, “Approximate Frequency Counts over Data Streams,” in Proc. of the Conference on Very Large Data Bases, pages 346-357, 2002.
    [10] X. Yan and J. Han, “gSpan: Graph-Based Substructure Pattern Mining,” in Proc. of the 2002 International Conference on Data Mining (ICDM’02), 2002.
    [11] X. Yan and J. Han, “CloseGraph: Mining Closed Frequent Graph Patterns,” in Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery (KDD’03), 2003.
    [12] X. Yan, X. Jasmine Zhou and J. Han, “Mining Closed Relational Graphs with Connectivity Constraints,” in Proc. of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD’05), pages 324-333, 2005.
    [13] Q. Zheng, K. Xu and S. Ma, “When to Update Sequential Pattern”, in Proc. of the 7th Pacific-Asia In Conference on Knowledge Discovery and Data Mining, 2003.

    QR CODE