研究生: |
龔毓婷 Yu-Ting, Kung |
---|---|
論文名稱: |
以位元序列為基礎探勘容錯重複樣式之研究 An Efficient Approach for Mining Fault-Tolerant Repeating Patterns based on Bit Sequences |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊教育研究所 Graduate Institute of Information and Computer Education |
論文出版年: | 2004 |
畢業學年度: | 92 |
語文別: | 英文 |
論文頁數: | 83 |
中文關鍵詞: | 重複樣式 、容錯探勘 、位元序列 、前K個樣式探勘 、資料序列 |
英文關鍵詞: | Repeating Patterns, Fault-Tolerant Mining, Bit Sequences, Top-K patterns Mining, Data Sequence |
論文種類: | 學術論文 |
相關次數: | 點閱:214 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出一個有效率的方法對資料序列探勘出前K個非顯然且滿足最小長度限制的容錯重複樣式。我們擴展出現位元序列的表示法,設計出容錯出現位元序列,在考慮有插入或刪除錯誤的容錯情況下,用來表示候選樣式在資料序列中的出現位置。本論文提出二個演算法,分別命名為TFTRP-Mine (Top-K non-trivial FT-RPs Mining)及RE-TFTRP-Mine (REfinement of TFTRP-Mine)。兩個演算法皆根據論文中歸納出的遞迴公式,可系統性地計算出一個樣式的容錯出現位元序列,因而很有效率地得到每個侯選樣式的容錯出現次數。此外,RE-TFTRP-Mine演算法採用額外兩個技巧來砍除搜尋空間以加快探勘效率。由實驗結果得知,當K及min_len的值較小時,RE-TFTRP-Mine比TFTRP-Mine有較好的執行效率;而由實際的樂曲資料之實驗顯示,當探勘過程中有考慮容錯比對時,可以找出更多重要且隱藏的重複樣式。
An efficient way of mining top-K non-trivial fault-tolerant repeating patterns (FT-RPs in short) with length no less than min_len for data sequences is proposed in this thesis. By extending the idea of appearing bit sequences, fault-tolerant appearing bit sequences are defined to represent the locations where candidate patterns appear in a data sequence with insertion/deletion errors allowed. Two algorithms, named TFTRP-Mine (Top-K non-trivial FT-RPs Mining) and RE-TFTRP-Mine (REfinement of TFTRP-Mine), respectively, are proposed. Both of two algorithms use the recursive formulas to obtain fault-tolerant appearing bit sequences of a pattern systematically and then the fault-tolerant frequency of each candidate pattern could be counted quickly. Besides, RE-TFTRP-Mine adopts two additional strategies to prune the searching space in order to increase the mining efficiency. From experiment results, we can know that RE-TFTRP-Mine outperforms TFTRP-Mine algorithm when K and min_len are small. In addition, when adopting fault tolerant mining, more important and implicit repeating patterns could be found for music objects.
Bibliography
[1] C. C. Liu, J. L. Hsu, and A. L. P. Chen, “Efficient Theme and Non-Trivial Repeating Pattern Discovering in Music Databases,” in Proceedings of the 15th International Conference on Data Engineering (ICDE’99), 1999.
[2] C. C. Liu, J. L. Hsu, and A. L. P. Chen, “An Approximate String Matching Algorithm for Content-Based Music Data Retrieval,” in Proceedings of the International Conference on Multimedia Computing and System (ICMCS’99), 1999.
[3] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential Pattern Mining using A Bitmap Representation,” in Proceedings of the Eighth International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD’02), 2002.
[4] J. Han, G. Dong, and Y. Yin, “Efficient Mining of Partial Periodic Patterns in Time Series Database,” in Proceedings of the 1999 IEEE International Conference on Data Engineering (ICDE’99), 1999.
[5] J. Han, J. Wang, Y. Lu, and P. Tzvetkov, “Mining Top-K Frequent Closed Patterns without Minimum Support”, in Proceedings of 2002 International Conference on Data Mining (ICDM'02), 2002.
[6] J. L. Hsu, C. C. Liu, and A. L.P. Chen, “Efficient Repeating Pattern Finding in Music Databases,” in Proceedings of the Seventh International Conference on Information and Knowledge Management (ACM CIKM’98), 1998.
[7] J. L. Koh and W. D. C. Yu, “Efficient Feature Mining in Music Objects,” Lecture Notes in Computer Science: DEXA’01: Database and Expert Systems Applications, pp. 221-231, Springer-Verlag, 2001. (EI)
[8] J. L. Koh and W. D. C. Yu, “Mining Repeating and Sequential Patterns from Music Databases.” Technique Report in Department of Information and Computer Education, National Taiwan Normal University, 1999.
[9] J. L. Koh and P. W. Yo, “An Efficient Approach for Mining Fault-Tolerant Frequent Itemsets based on Bit Sequences,” Technique Report in Department of Information and Computer Education, National Taiwan Normal University, 2003.
[10] J. Pei, A. K.H. Tung, and J. Han, “Fault-Tolerant Frequent Pattern Mining: Problem and Challenges,” in Proceedings of ACM-SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’01), 2001.
[11] J. Yang, W. Wang, and P. Yu, “Infominer: mining surprising periodic patterns,” in Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD’01), 2001.
[12] P. Tzvetkov, X. Yan, and J. Han, “TSP: Mining Top-K Closed Sequential Patterns”, in Proceedings of 2003 International Conference on Data Mining (ICDM'03), 2003.
[13] S.-S Wang and S,-Y. Lee, “Mining Fault-Tolerant Frequent Patterns In Large Database,” in Proceedings of Workshop on Software Engineering and Database Systems, International Computer Symposium, Taiwan, 2002.