簡易檢索 / 詳目顯示

研究生: 董原賓
論文名稱: 近似探勘資料流常見資料代表樣式之研究
指導教授: 柯佳伶
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 中文
中文關鍵詞: 一般化出現頻率改變點法近似探勘
論文種類: 學術論文
相關次數: 點閱:173下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 探勘資料流中常見資料項集技術是近來重要的研究方向,在實際應用中,大部份的使用者對最近的資訊較有興趣,而採用滑動視窗定義資料範圍,可有效探勘出資料流中最近常見資料項集。因此本論文提出一個稱為一般化出現頻率改變點(NFCP)演算法,不需記錄滑動視窗中所有交易內容,以類似FP¬-tree的結構儲存資料項集出現時間的摘要資訊,即可有效的更新資料項集過時資訊並從中探勘出最近常見資料項集。此外,在探勘常見資料項集時,隨著最小支持度門檻值設定變小,探勘結果通常會隨著呈指數成長,為了有效減少探勘出重複資訊,本論文結合探勘代表樣式的方法,能從儲存結構中快速地近似找出資料流最近常見代表樣式,以進一步精簡探勘結果。由實做NFCP演算法之實驗結果顯示,以維護資料項集出現頻率改變點之摘要資訊,可有效近似探勘出目前交易視窗中的最近常見代表樣式,且保證不會有資料樣式的漏失。此外,NFCP所需的維護時間極少,因此若資料流中並非在每個時間點都需進行探勘,但亦隨時有可能要求進行探勘,則NFCP可以很有效率的維護方式,達到隨時可進行探勘最近常見資料項集的效果,可節省更多的處理成本。

    第一章 緒論 1 1-1 背景與研究動機 1 1-2 相關研究 3 1-3 論文方法 10 1-4 論文架構 10 第二章 問題定義及背景知識 11 2-1 問題定義 11 2-2 出現頻率改變點法 14 第三章 一般化出現頻率改變點法 17 3-1 出現摘要資訊儲存結構 17 3-2 摘要結構資訊維護方法 19 3-3 範例說明 26 第四章 常見資料代表樣式探勘法 34 4-1 儲存結構 34 4-2 刪除候選樣式 36 4-3 最近常見代表樣式探勘步驟 36 4-4 範例說明 40 第五章 演算法效率評估 47 5-1 資料產生方式 47 5-2 實驗評估 48 5-3 實驗結果總結 62 第六章 結論 63 參考文獻 64

    [1]R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rule in Large Databases,” in Proc. of the 20th International Conference on Very Large Database, 1994.
    [2]C.C. Chang and Y.C. Li and J.S. Lee "An Efficient Algorithm for Incremental Mining of Association Rules," in Proc. of the 15th Intl. Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA’05), 2005.
    [3]J. H. Chang and W. S. Lee, “A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams,” in Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
    [4]J. H. Chang and W. S. Lee, “A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams,” in Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
    [5]J. H, Chang and W.S. Lee, “Finding Recent Frequent Itemsets Adaptively over Online Data Streams,” in Proc. of the 9th ACM International Conference on Knowledge Discovery and Data Ming, 2003.
    [6]D. W. Cheung, J. Han, V. T. Ng, and C. Y. Wang, ”Maintenance of discovered association rules in large databases : an incremental updating technique,” in Proc. of the 12th Intl. Conf. on Data Engineering. New Orleans, LA, Feb. 1996.
    [7]D. W. Cheung, S. D. Lee, and B. Kao, “A general incremental technique for maintaining discovered association rules”, in Proc. 5th Intl. Conf. on Database Systems for Advanced Applications, Melbourne, Australia, Apr. 1997.
    [8]J. Han, J. Wang, Y. Lu and P. Tzvetkov, “Mining Top­K Frequent Closed Patterns without Minimum Support,” in Proc. of Int. Conference on Data Mining (ICDM’02), 2002.
    [9]J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” in Proc. of the ACM SIGMOD International Conference on Management of Data, pages 1-12, Dallas, Texas, USA, 2000.
    [10]C.H. Lin and D. Y. Chiu and Y.H. Wu and Arbee L. P. Chen "Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window," in Proc. of SIAM Intl. Conference on Data Mining, 2005
    [11]G.. S. Manku and R. Chen Motwani, “Approximate frequent counts over data Streams,” in Proc. of the 28th International Conference on Very Large Database, Hong Kong, China Aug, 2002.
    [12]J. Pei, J. Han, and R. Mao "CLOSET : An efficient algorithm for mining frequent closed itemsets," in Proc. of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery , 2000.
    [13]S. N. Shin and J. L. Koh , “An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams,” in Proc. of 8th International Conference on Data Warehousing and Knowledge Discovery(DaWaK’06), 2006.
    [14]S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka, “An efficient algorithm for the incremental updating of association rules in large database,” in Proc. of the 3rd Intl. Conf. on data Mining and Knowledge Discovery, Newport Beach, 1997.
    [15]K. Wang, L. Tang, and J. Liu, “Top down FP-Growth for association rule mining,” in Porc. of the 6th Pacific Area Conf. on Knowledge Discovery and Data Mining(PAKDD’02), May 2002.
    [16]K Wang, L. Tang, J. Han, and J. Liu, “Top Down FP-Growth for Association Rule Mining,” in Proc. of the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, May 6-8, Taipei, Taiwan(PAKDD’02), 2002.
    [17]R.C.W. Wong and A. W.C. Fu "Mining Top-K Itemsets over a Sliding Window Based on Zipfian Distribution," in Proc. of SIAM Int. Conference on Data Mining, 2005.
    [18]D. Xin, J. Han, X. Yan and H. Cheng, “Mining Compressed Frequent-Pattern Sets,” in Proc. of Int. Conf. on Very Large Data Bases (VLDB'05), 2005.
    [19]X. Yan, H. Cehng, Jiawei Han, and D. Xin, “Summarizing itemset patterns: a profile based approach,” in Proc. of ACM Intl. Conf. on Knowledge Discovery in Database (KDD’05), 2005.

    QR CODE