研究生: |
嚴翠鳳 Tsui-Feng Yen |
---|---|
論文名稱: |
資料流序列中資料項預測方法之研究 A Pattern-based Method for Item Predictions over Data Streams |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊教育研究所 Graduate Institute of Information and Computer Education |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 中文 |
論文頁數: | 45 |
中文關鍵詞: | 資料流 、重覆樣式 、資料探勘 、概念變動 |
英文關鍵詞: | Data Stream, Repeating Pattern, Data Mining, Concept Change |
論文種類: | 學術論文 |
相關次數: | 點閱:168 下載:15 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
不同於以往的靜態交易資料庫,愈來愈多的應用之資料輸入方式形成資料流型態,其輸入的快速及連續使得在資料串流預測上的面臨重要的挑戰,資料會隨著時間不斷快速進入,因此必須提供極有效率的處理,同時,資料分佈及隱含樣式也可能隨著時間而改變。因此,針對以上的挑戰,本篇論文提出一個稱為預測樹的樹狀結構,可以快速的從訓練資料集中探勘出重覆樣式產生資料項預測則。當概念發生變動時,需重新探勘最近視窗內的重覆樣式,產生新的預測規則以適應目前的概念。本論文提出的第一個方法稱為ERT,藉著計算滑動視窗內的預測錯誤率來判斷是否發生概念變動,若錯誤率大於給定的最小錯誤率門檻值,則重新探勘產生新的預測規則,並且將先前預測準確率高的規則與新規則整合。另一個方法則是每個固定的時間點會觸發重新探勘,根據產生新規則時是否有與先前的規則整合又可以分為WANR及WRNR兩種方法。實驗結果顯示,WRAR的預測錯誤率略高於其他兩種方法,而ERT在三種方法中是最有效率且最有效的,因為此方法僅在偵測到概念變動時重新調整規則。
Because of progressing of various electronic equipments, more and more data of applications is collected quickly and constantly to form a data stream. Two challenges arise when performing item predictions in a data stream. The first one is that the data is continuously inputted in high-speed, such that it is required to perform the processing efficiently. Besides, the data distribution and the implicit patterns might change over time. In this thesis, a structure named prediction-tree is proposed to discover prediction rules from repeating patterns in the training data quickly. For adapting the concept changes, it is necessary to generate new prediction rules by re-mining repeating patterns in the most recent sliding window. The first approach, named ERT, is to monitor the accuracy of predictions in a sliding window for detecting the concept changes. When the error rate in a sliding window is higher than a given threshold value, new prediction rules are generated by re-mining repeating patterns. Then the previous prediction rules with high accuracy are remained to be combined with the new generated ones. The other approach is to trigger the re-mining every other non-overlapping data window. Two variations of the window-based triggering approach, named WANR and WRNR, are provided according to whether the previous rules are remained to be combined with the new ones or not. The experimental results show that the error rate of WRNR is slightly higher than the others. However, ERT is the most efficient and effective one among the three methods because it needs to adjust rules only when the concept changes are detected.
[1] Y.Yang, X Wu and Xingquanzhu,” Mining in Anticipation for Concept Change:Proactive-Reactive Prediction in Data Streams,” in Proc. of the 6th
IEEE International Conference on Data Mining (ICDM), 2006
[2] E.Keogh and S.Kasetty, “On the need for time series data mining benchmarks: a survey and empirical demonstration, ”in Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD),2002
[3] I. Bouzouita, S. Elloumi and S. B. Yahia ,”GARC : A New Associative Classification Approach,” in Proc. of 16th International Conference on Database and Expert Systems Applications (DEXA),2006.
[4] Gh. Gasmi1, S. Ben Yahia1, E. Mephu Nguifo and Y. Slimani,” A new Informative Generic Base of Association Rules,” In Proc. of the International 9th Pacific-Asia Conference on Knowledge Data Discovery (PAKDD),2005
[5] Wang, H Mannila, H Toivonen and A.Verkamo, ”Discovering Frequent Episodes in Sequnces,” In Proc of the 1st International Conference on Knowledge Discovery and Data Mining(KDD), 1995
[6] H. M O.R.Zaļane and M.L.Antonie, “On pruning and tuning rules for associative classifiers.” in Proc. 9th International conference on knowledge Based Intelligence Information And Engineering Systems (KES),2005
[7] X. Yin and J. Han, “CAPR :Classification based on Predictive Association Rules,” in Proc of the 3rd SIAM International Conference on Data Mining(SDM),2003
[8] H.Mannila, H.Toivonen, and Verkamo, “An Efficient algorithms for discovering association rules,” in Proc of AAAI'94 Workshop Knowledge Discovery in Databases (KDD),2004
[9] J.S. Park, M. S. Chen and P. S. Yu, ”An effective hash-based algorithm for mining association rule,” in Proc. of 1995 ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD).1995
[10] J.Han,J.Pei, and Y.Yin.Mining, ”Frequent patterns without candidate generation. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD), 2000
[11] J.L. Koh and W.D.C.Yu, “Efficient Feature Mining in Music Objects,” in Proc. of 11th International Conference on Database and Expert Systems Applications (DEXA),2001
[12] J.L. Koh and Y.T. Kung, “An Efficient Approach for Mining Top-K Fault-Tolerant Repeating Patterns,” in Proc. of Lecture Notes in Computer Science Database Systems for Advanced Applications(DASFAA) ,2006
[13] P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, “On Reducing Classifier
Granularity in Mining Concept-Drifting Data Streams,” in Proc. of the 5th
IEEE International Conference on Data Mining (ICDM), 2005.
[14] P.M. Chou,” Approximately Mining Recently Repeating Patterns on Data Streams ,” Technical Report ,National Taiwan Normal University,2006.