簡易檢索 / 詳目顯示

研究生: 林靜怡
Ching-Yi Lin
論文名稱: 資料流常見樣式改變偵測方法之研究
Frequent Patterns Change Detection over Data Streams
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊教育研究所
Graduate Institute of Information and Computer Education
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 52
中文關鍵詞: 資料流資料探勘常見樣式改變偵測
英文關鍵詞: Data Stream, Data Mining, Frequent Pattern, Change Detection
論文種類: 學術論文
相關次數: 點閱:190下載:24
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近來許多實際應用,其資料都是以資料流的形式產生,探勘資料流中最近常見樣式是其中一個重要的研究問題,而採用滑動視窗可有效探勘出資料流中最近常見樣式,然而在每個時間點對滑動視窗中的資料重新進行探勘將耗費許多計算成本。我們認為在常見樣式沒有顯著變化的情況下,先前探勘出的常見樣式可以近似代表目前的常見樣式,因此本論文針對特例資料流和一般化資料流環境提出監視常見樣式的方法FCDS和FCDG,藉由監視滑動視窗內的樣式出現次數,在每個時間點估算出最新的最近常見樣式集合和先前所探勘出來的常見樣式集合是否有顯著的改變差異,以提供是否重新進行探勘的判斷。由實做FCDS和FCDG之實驗結果顯示,監視常見樣式之出現次數及狀態變化情況,可有效偵測出最近常見樣式改變的狀況,使能在發現常見資料樣式已大幅改變時,重新探勘得到最新的最近樣式。此外,FCDS和FCDG之執行時間相較於每個時間點進行探勘為少,節省許多計算時間的成本,並可從維護結構中提供最近常見樣式變化狀況的資訊。

    Recently, the data of many real applications is generated in the form of data streams. Therefore, mining recent frequent patterns over data streams is one of the important research issues in data mining. According to the literatures, the recently frequent patterns are discovered effectively by using a sliding window. However, it is costly to perform frequent patterns re-mining in a sliding window at each time point. In this thesis, it is adopted that the previously discovered frequent patterns represent the current frequent patterns approximately when the change of frequent patterns is not significant. Accordingly, two methods, named FCDS and FCDG, are proposed to monitor the frequent patterns over a special data stream and a general data stream environment, respectively. By monitoring the appearing frequencies of patterns in a sliding window, the difference between the new recently frequent patterns and the previous ones is estimated at each time point. Only when the change is more than a user-defined threshold value, a re-mining process is required to find the exact frequent patterns in the sliding window. The experimental results show that FCDS and FCDG detect the change of the recent frequent patterns effectively and efficiently. The execution time of FCDS and FCDG is less than the time of re-mining frequent patterns in a sliding window significantly. Furthermore, from the maintained data structures, the changing information of recent frequent patterns is provided.

    目 錄 附表目錄 ii 附圖目錄 iii 第一章 緒論 1                   第一節 背景與研究動機 1 第二節 相關研究 2 第三節 論文方法 8 第四節 論文架構 9    第二章 問題定義及知識背景 10                   第一節 問題定義 10 第二節 最近常見樣式改變量 13 第三章 特例資料流常見樣式改變偵測 14 第一節 樣式監視樹結構 14 第二節 樣式監視處理過程 17 第三節 估算改變量 21 第四章 一般化資料流常見樣式改變偵測 29 第一節 原常見樣式樹結構 29 第二節 樣式監視處理過程 32 第三節 估算改變量 33 第五章 演算法效率評估 41 第一節 資料產生方式 41 第二節 實驗評估 42 第二節 實驗結果總結 50 第六章 結論 51 參考文獻 52

    [1] Y. Chi, H. Wang , P. S. Yu ,and R. R. Muntz,” Moment: Maintaining closed frequent itemsets over a stream sliding window,” In Proceedings of International Conference on Data Mining and Knowleadge Discovery, 2004.
    [2] N.Jiang and L.Gruenwald,”CFI-Stream:mining closed frequent itemsets in data streams,”In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2006
    [3] S.Ben-David , J. Gehrke, and D. Kifer,”Detecting Change in Data Streams, ” In Proceedings of the 30th International Conference on Very Large Database, 2004.
    [4] J. H. Chang and W. S. Lee, “A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams,” In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
    [5] G.. S. Manku and R. Chen Motwani, “Approximate frequent counts over data Streams,” In Proceedings of the 28th International Conference on Very Large Database, 2002.
    [6] Y. B. Don, “Approximately Mining Frequent Representative Itemsets on Data Streams,” Technical Report ,National Taiwan Normal University,2006.
    [7] J. Pei, J. Han, and R. Mao,"CLOSET : An efficient algorithm for mining frequent closed itemsets," In Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery , 2000.
    [8] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” In Proceedings of the ACM SIGMOD International Conference on
    Management of Data, 2000.

    [9] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rule in Large Databases,” In Proceedings of the 20th International Conference on Very Large Database, 1994.
    [10] D.Lee and W. S. Lee,“Finding Maximal Frequent Itemsets over Online Data Streams Adaptively,” In Proceedings of the 5th IEEE International Conference on Data Mining,2005
    [11] S. N. Shin and J.L. Koh,”An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams,” In Proceedings of 8th International Conference on Data Warehousing and Knowledge Discovery(DaWaK),2006.
    [12] M. J. Tsai,” Incremental Detection for Frequent Sub-Graph Patterns Changing on Data Streams,” Technical Report ,National Taiwan Normal University,2006.
    [13] J. L. Huang and M. S. Chen,”On Exploring the Power-Law Relationship in the Itemset Support Distribution,”In Proceedings of the 10th International Conference on Extending Database Technology (EDBT), 2006.

    QR CODE