簡易檢索 / 詳目顯示

研究生: 黃士恆
Shih-Heng Huang
論文名稱: 利用Shannon's Entropy作特徵維度切割以改善貝氏分類器正確率之研究
A Study of Increasing the Classification Accuracy of Nave Bayesian Classifier Utilizing Shannon's Entropy-based Feature Division Approach
指導教授: 何宏發
Ho, Hong-Fa
學位類別: 碩士
Master
系所名稱: 工業教育學系
Department of Industrial Education
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 46
中文關鍵詞: 貝氏分類器黃金分割Shannon's Entropy特徵維度分割
英文關鍵詞: Bayesian Classifier, Golden Section, Shannon's Entropy, Feature Dimension Section
論文種類: 學術論文
相關次數: 點閱:129下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究所提出的貝氏分類器是基於機率理論的貝氏定理之分類器,它根據樣本的分類結果計算出各個特徵之區間的機率值,然後計算出測試案例發生各種結果的機率,是一個在理論與實務上最佳的分類器。由於在計算各個特徵維度的區間時,沒有適當的方法可以快速且正確地分割區間,故本研究的目的嘗試以黃金分割作為分割的方法,並以Shannon's Entropy作為判斷是否分割得宜的根據,以便取得最佳的分割區間,並且對於不同的資料,可以一併適用於此分割方法之全自動分割為研究目的。
    實驗的結果發現並不是全部的資料庫都可以有很高的正確率,乃是因為資料分佈與資料重要性的問題,這雖也呼應了貝氏定理的獨立性假設是必須的,但從整體表現不錯角度來看,本研究之貝氏分類器所擁有的高維度與高容忍性,仍是一個可利用的分類器。

    This proposed Bayesian Classifier is a classifier based on Bayes' Theorem of Probability. The Bayesian Classifier, best in theory as well as in practice, calculated the probability within the intervals of input features according to the training set and class, and then calculated the probability in every class using testing set. Because the intervals of the input feature cannot be sectioned correctly and rapidly, the research aims to use the golden section as a way to get the intervals in each feature, and the Shannon's Entropy is used to check if the intervals sectioned by golden section are appropriate and optimal. Moreover, the research proposed to apply this method to automatic section in the different sets.
    Results of experiments showed that not all databases have high accuracy due to problems in data distribution and importance. Though the results prove that the independent assumption of the Bayes' Theorem is necessary, the Bayesian Classifier has its advantage of good recognition rates and reliability in the high dimension.

    中文摘要..............................................................................Ⅰ 英文摘要..............................................................................Ⅱ 目錄......................................................................................Ⅲ 圖目錄..................................................................................Ⅴ 表目錄..................................................................................Ⅵ 第一章 緒論  1.1 研究背景與動機..........................................................2  1.2 研究目的......................................................................4  1.3 研究範圍與限制..........................................................4  1.4 名辭解釋......................................................................5 第二章 文獻探討  2.1 貝氏分類器..................................................................6   2.1.1 貝氏定理...............................................................6   2.1.2 基於貝氏定理之分類器.......................................7  2.2 黃金分割......................................................................8   2.2.1 黃金分割的原理...................................................8   2.2.2 黃金分割的內外比...............................................9   2.2.3 黃金分割搜尋範例.............................................10  2.3 Shannon's Entropy.......................................................12   2.3.1 Shannon's Entropy的定義....................................12   2.3.2 Shannon's Entropy的計算範例............................13 第三章 具自動特徵維度區間分割之貝氏分類器設計  3.1 演算流程圖.................................................................14  3.2 基於Shannon's Entropy的黃金分割............................20  3.3 重新組合區間..............................................................24   3.3.1 偏態擴展...............................................................28   3.3.2 中點擴展...............................................................28   3.3.3 比例擴展...............................................................29  3.4 計算結果機率值..........................................................30 第四章 實驗結果  4.1 Pima Indians Diabetes...................................................34  4.2 Wisconsin Breast Cancer...............................................36  4.3 BUPA liver disorders.....................................................38  4.4 Wine recognition data....................................................39  4.5 Promoter.........................................................................40 第五章 討論、結論與未來發展  5.1 研究討論.......................................................................41   5.1.1 分割特徵維度區間................................................41   5.1.2 擴展零機率區間....................................................42   5.1.3 特徵維度的遺失....................................................42   5.1.4 去除相關性低的特徵維度....................................43  5.2 結論...............................................................................44  5.3 未來發展.......................................................................46 參考文獻.................................................................................47

    [1]Hahn-Ming Lee, Chih-Ming Chen, and Yung-Feng Lu, "A Self-Organizing HCMAC Neural-Network Classifier", IEEE Transactions on Neural Networks, Vol. 14, NO. 1, January 2003.
    [2]Detlef Nauck and Rudolf Kruse, "Obtaining Interpretable Fuzzy Classification Rules from Medical Data", Elsevier Science, Artificial Intelligence in Medicine, Vol. 16, pp.149-169, 1999.
    [3]Xiaoguang Chang and John H. Lilly, "Evolutionary Design of a Fuzzy Classifier From Data", IEEE Transactions on System, Man and Cybernetics-Part B: Cybernetics, Vol. 34, No. 4, August 2004.
    [4]Margaret H. Dunham, "Data Mining Introductory and Advanced Topics", Prentice Hall, 2003.
    [5]UCI (University of California, Irvine) Machine Learning Repository [Online]. Available: http://www.ics.uci.edu/~mlearn/MLRepository.html.
    [6]Pat Langley, Wayne Iba and Kevin Thompson, "An Analysis of Bayesian Classifiers", AI Research Branch, NASA Ames Research Center, Moffett Field, CA, 1992.
    [7]Who was the Reverend Thomas Bayes, Available: http://www.bayesian.org/bayesian/bayes.html.
    [8]Pedro Domingos and Michael Pazzani, "On the Optimality of the Simple Bayesian Classifier under Zero-One Loss", Kluwer Academic Publishers, Machine Learning, Vol. 29, pp. 103-130, 1997.
    [9]Nir Friedman, Dan Geiger and Moises Goldszmidt, "Bayesian Network Classifiers", Kluwer Academic Publishers, Machine Learning, Vol. 29, pp. 131-163, 1997.
    [10]Zhipeng Xie and Qing Zhang, "A Study of Selective Neighborhood-based Nave Bayes for Efficient Lazy Learning", IEEE Proc. ICTAI, 2004.
    [11]Zijian Zheng and Geoffrey I. Webb, "Lazy Learning of Bayesian Rules", Kluwer Academic Publishers, Machine Learning, Vol. 41, pp. 53-84, 2000.
    [12]Press W. H., Teukolsky S. A., Vetterling W. T., and Flannery B. P., "Numerical Recipes in C, The Art of Scientific Computing", 2nd, Cambridge University Press, Cambridge, 1999.
    [13]H. J. Huang and C. N. Hsu, "Bayesian Classification for set and interval data", in proceedings 2000 International Computer Symposium(ICS 2000), Chia-Yi, Taiwan, 2000.
    [14]T. M. Mitchell, Machine Learning, New York: McGraw-Hill, 1997.
    [15]H. J. Huang and C. N. Hsu, "Bayesian Classification for data from the same unknown class", Systems, Man and Cybernetics, IEEE Transactions on, Vol. 32, pp. 137-145, 2002.
    [16]G. F. Cooper and E. Herskovits, "A Bayesian method for the induction of probabilistic networks from data", Machine Learning, Vol 9, pp. 309-347, 1992.
    [17]G. John and P. Langley, "Estimating Continuous Distributions in Bayesian Classifiers", In Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence(UAI ’95), pp. 338-345, 1995.
    [18]R. C. Holte, "Very Simple Classification Rules perform well on Most Commonly Used Datasets", Machine Learning, Vol. 11, pp. 63-90, 1993.
    [19]Nevin L. Zhang, Thomas D. Nielsen, Finn V. Jensen, "Latent Variable Discovery in Classification Models", Artificial Intelligence in Medicine, Vol. 30, pp. 283- 299, 2004.
    [20]Chi-Chun Huang and Hahn-Ming Lee, "A Grey-Based Nearest Neighbor Approach for Missing Attribute Value Prediction", Applied Intelligence, Vol. 20, pp. 239-252, 2004.

    QR CODE