研究生: |
蔡明碩 TSAI, Ming-Shuo |
---|---|
論文名稱: |
Adjustment Methods for Support Vector Machines with Imbalanced Data Adjustment Methods for Support Vector Machines with Imbalanced Data |
指導教授: |
張少同
Chang, Shao-Tung |
口試委員: |
張少同
Chang, Shao-Tung 呂翠珊 Lu, Tsui-Shan 李孟峰 Li, Meng-Feng |
口試日期: | 2024/07/23 |
學位類別: |
碩士 Master |
系所名稱: |
數學系 Department of Mathematics |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 30 |
中文關鍵詞: | 支持向量機 、不平衡資料分類 、隨機資料生成方法 、過採樣技術 、二元搜尋法 |
英文關鍵詞: | Support Vector Machine, Random Data Generation Method, SMOTE Method, Imbalanced Data Clustering, Binary Search |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202401582 |
論文種類: | 學術論文 |
相關次數: | 點閱:69 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在各種的資料集中,資料不平衡是機器學習領域中常見的現象,可以明顯影響模型訓練的結果。在各種提出的眾多解決方案中,最常使用的方法是合成少數資料的過採樣技術(Synthesized Minority Oversampling Technique, SMOTE),它在解決資料不平衡的同時實現了高度準確的分類。
在這篇研究中,我們通過設置不同的參數來生成隨機資料,從而平衡資料比例,探討支持向量機(Support Vector Machines, SVM)在分類不平衡資料時的結果,此方法與過採樣技術都是用生成資料,達到資料比例趨於平衡,以實驗結果來說,兩者達到相似的效果。
此外我們利用二分搜索算法來改善原始SVM提供的結果,提高少數類的分類效果,二元搜尋法的SVM可以在不需要生成資料的情況下,得到更好得分類結果。
最後,我們將結果與過採樣技術方法進行比較。實驗結果顯示,二元搜尋法的SVM可以使少數族群得到更好的分群效果,同時平衡資料比例的隨機資料生成方法,也可以在資料比例相近時提高分類結果。
In various real datasets, data imbalance is a common phenomenon that can significantly impact the outcomes of model training in the field of machine learning. Among the various proposed solutions, one of the most commonly used methods is Synthetic Minority Over-sampling Technique (SMOTE), which addresses data imbalances while achieving highly accurate classification.
In this thesis, we explore Support Vector Machines (SVM) performance in classifying imbalanced datasets by setting different parameters to generate random data, thereby balancing the data distribution. Additionally, we utilize a Binary Search algorithm to fine-tune the results provided by the original SVM, enhancing the classification performance for the minority class.
Finally, we compare the results with the SMOTE method. Experimental results indicate that the random data generation method, which balances the data distribution, can improve classification outcomes when data proportions are similar. Moreover, it achieves comparable classification performance to SMOTE.
S moro, P cortez, & P rita. (2014, June). A Data-Driven Approach to Predict the Success of Bank Telemarketing. ScienceDirect, 62, 22-31.
N. V. Chawla, K. W. Bowyer, L. O. Hall, & W. P. Kegelmeyer. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.
Lin A. (2019). Binary search algorithm. WikiJournal of Science, 2(1), 1-13.
Stephen Boyd, Stephen P Boyd, & Lieven Vandenberghe. (2004). Convex optimization. Cambridge university press.
Vladimir Vapnik. (2013). The Nature of Statistical Learning Theory. Springer science & business media.
Heinz H. Bauschke, Minh N. Dao, Dominikus Noll, and Hung M. Phan. (2015). On Slater’s condition and finite convergence of the Douglas–Rachford algorithm. Journal of Global Optimization, 65(2), 329-349.
Grant Van Horn and Pietro Perona. (2017), The Devil is in the Tails: Fine-grained Classification in the Wild. Preprint arXiv:1709.01450.
Choi MJ. (2010). A selective sampling method for imbalanced data learning on support vector machines. Jong Myong Choi. Graduate Theses, Ioawa State University.
Liu, N.; Li, X.; Qi, E.; Xu, M.; Li, L.; Gao, B. (2020). A novel ensemble learning paradigm for medical diagnosis with imbalanced data. IEEE Access. 8, 171263-171280.
Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines. Microsoft Research.
Christina Leslie, Eleazar Eskin, and William Stafford Noble. (2002). The spectrum kernel: A string kernel for SVM protein classification. In Biocomputing 2002. World Scientific, 564–575.
Yu H, Kim S. (2012). SVM Tutorial-Classification, Regression and Ranking. Springer, Berlin, 479-506.
O. Seref, O.E. Kundakcioglu, O.A. Prokopyev, P.M. (2009). Selective support vector machines. Journal of Combinatorial Optimization, 17, 3-20.