研究生: |
涂益郎 |
---|---|
論文名稱: |
以FP-tree結構為基礎之近似常見項目集探勘法 |
指導教授: | 柯佳伶 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 中文 |
論文頁數: | 69 |
中文關鍵詞: | FP-tree 、探勘 、常見項目集 、近似常見項目集 、誤差容忍參數 、核心樣式參數 |
英文關鍵詞: | mining, FP-growth, approximate frequent itemset, FP-tree |
論文種類: | 學術論文 |
相關次數: | 點閱:201 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文針對交易資料庫運用FP-tree結構可壓縮儲存交易資料的特性,提出以FP-tree儲存結構為基礎之近似常見項目集探勘法,稱為 FP-AFI演算法( FP-tree Approximate Frequent Itemsets mining algorithm )。透過分析容錯包含項目集之交易資料集合間的遞迴關係,擴展FP-tree執行投影的方法,可分別找出包含某個項目與不包含某個項目的conditional FP-tree。FP-AFI演算法以深先搜尋的方式,從Header Table中取出符合核心樣式門檻值的項目產生候選項目集,系統化地建構出其對應的conditional FP-tree,並透過conditional FP-tree根節點所記錄的計數值及conditional FP-tree的編碼資訊,快速獲得該項目集的容錯支持度及各項目支持度,以確認是否為一近似常見項目集。在探勘的過程中僅需掃瞄資料庫兩次,可省去大量讀取交易資料所需的時間。由實驗結果顯示,當最小支持度門檻值訂為較小或交易資料的筆數較多時,此方法較之前已提出的近似常見項目集探勘演算法FT-Apriori及AFI有顯著的執行效率增進。
In this thesis, an algorithm called FP-AFI( FP-tree Approximate Frequent Itemsets mining ) is proposed for mining approximate frequent itemsets. The FP-AFI applies the characteristics of FP-tree structure to compress transaction data. Through analyzing the recursive relationship of the sets of transactions which fault tolerant contain an itemset, the projection operation on FP-tree is extended to obtain the conditional FP-tree which contains a specific item or not. The FP-AFI algorithm applies a depth-first search strategy to generate candidate itemsets by checking the threshold value of a core pattern from the item supports counted in the Header Table. For each candidate itemset, its corresponding fault-tolerant conditional FP-trees are constructed systematically. Accordingly, the approximate support of the candidate and the item supports of each item in the candidate are obtained easily from the counter stored in the root node and the coding vectors of the fault-tolerant conditional FP-trees. Such that, a candidate itemset is confirmed to be an approximate frequent itemset or not efficiently. The experimental results show that, when there are many transactions or the support is small, the performance efficiency of FP-AFI algorithm is better than the FT-Apriori and AFI algorithms proposed previously.
[1]R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rule in Large Databases,” in Proc. of the 20th International Conference on Very Large Data Bases, 1994.
[2]J. Han, J. Pei and Y. Yin, “Mining Frequent Patterns without Candidate Generation, ” in Proc. of the 2000 ACM SIGMOD International Conference on Management of Data, 2000.
[3]H. Cheng, P. S. Yu and J. Han, “AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery,” in Proc. of the 6th IEEE International Conference on Data Mining ( ICDM 2006 ), 2006.
[4]J. Liu, S. Paulsen, X. Sun, W. Wang, A. Nobel and J. Prins, “Mining approximate frequent itemsets in the presence of noise: Algorithm and analysis,” in Proc. of the 6th SIAM International Conference on Data Mining ( SDM 2006 ), 2006.
[5]H. Cheng, P. S. Yu and J. Han, “Approximate Frequent Itemset Mining In the Presence of Random Noise,” Soft Computing for Knowledge Discovery and Data Mining. Oded Maimon and Lior Rokach. Springer, pages 363-389, 2008.
[6]J. Pei, A. K. H. Tung and J. Han, “Fault-Tolerant Frequent Pattern Mining: Problems and Challenges,” in Proc. of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery ( DMKD 2001 ), 2001.
[7]R. Gupta, G. Fang, B. Field, M. Steinbach and V. Kumar, “Quantitative evaluation of approximate frequent pattern mining algorithms,” in Proc. of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ( KDD 2008 ), 2008.
[8]UCI machine learning repository. ( http://www.ics.uci.edu/mlearn/MLSumm ary.html )
[9]K. Wang, L. Tang, J. Han and J. Liu, “Top Down FP-Growth for Association Rule Mining,” in Proc. of Advances in Knowledge Discovery and Data Mining, 6th Pacific-Asia Conference ( PAKDD 2002 ), 2002.
[10]J. L. Koh and P. W. Yo, “An Efficient Approach for Mining Fault-Tolerant Frequent Patterns Based on Bit Vector Representations,” in Proc. of the 10th International Conference, Database Systems for Advanced Applications, ( DASFAA 2005 ), 2005.
[11]J. K. Seppänen and H. Mannila, “Dense itemsets,” in Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ( KDD 2004 ), 2004.
[12]C. Yang, U. M. Fayyad and P. S. Bradley, “Efficient discovery of error-tolerant frequent itemsets in high dimensions,” in Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ( KDD 2001 ), 2001.
[13]M. S. Chen, J. Han and P. S. Yu, “Data Mining: An Overview from a Database Perspective,” in Proc. of the IEEE Transactions on Knowledge and Data Engineering, Volume 8(6): pages 866-883, 1996.
[14]IBM inc. http://www.almaden.ibm.com/cs/quest/syndata.html.