簡易檢索 / 詳目顯示

研究生: 温鑫
Wen, Xin
論文名稱: 基於圖像串接和深度學習的改良生咖啡豆分類方法
An Improved Classification Method of Green Coffee Beans Based on Image Concatenation and Deep Learning
指導教授: 蘇崇彥
Su, Chung-Yen
口試委員: 蘇崇彥
Su, Chung-Yen
賴以威
Lai, I-Wei
彭昭暐
Perng, Jau-Woei
口試日期: 2024/06/12
學位類別: 碩士
Master
系所名稱: 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 57
中文關鍵詞: 深度學習影像辨識非銳化濾鏡邊緣偵測MobileViTMobileNetV3
英文關鍵詞: Deep Learning, Image Recognition, Unsharp masking, Edge Detection, MobileViT, MobileNetV3
DOI URL: http://doi.org/10.6345/NTNU202400897
論文種類: 學術論文
相關次數: 點閱:144下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為了解決生咖啡豆在影像辨識上的分類困難並提升精確度,這篇論文提出了一種通過串接不同的影像增強技術來融合不同的特徵提取演算法,以提高對生咖啡豆的辨識準確率。為了從原始影像中獲得各種關鍵特徵,我們選用了自適應閾值、位元平面分割、黑帽運算、Canny邊緣偵測、灰階、直方圖等化、Laplacian濾波、頂帽運算與非銳化濾鏡九種常見的影像增強方法。我們提出先在原本九種影像增強算法中挑選出與基準真相相關性較高的方法,並且僅將原始影像的RGB影像平面替換成相關性較高的影像處理方法,藉著多種特徵提升模型辨識度。在這項研究中,我們使用MobileViT進行實驗,最後選擇相關性較高的處理方式作為特徵融合的素材,經過影像串接產生的影像資料集作為新的輸入重新訓練。我們將不進行任何影像增強的分類方法視為基準。在二分法中,位元平面分割、直方圖等化和非銳化濾鏡的組合達到了96.9%的準確率,相對於原始方法提高了約5.5%。如果使用去除背景的相同資料集,相同的組合可以達到了97.0%的準確率;當我們選擇三分法進行實驗時,同樣都是由位元平面分割、直方圖等化和非銳化濾鏡的組合,分別達到了96.8%以及97.4%的準確率,較原始方法提升6.7%與4.9%。最後我們使用MobileNetV3驗證研究結果,在二分法的情況下,相同的影像增強組合分別在未去除背景與去除背景的影像可以獲得最高的99.12%與99.21%的準確率,相較原始方法有0.39%與0.44%的提升;如果以三分法再次進行實驗,與原始方法比較,大約分別有0.92%以及0.79%的提升,取得了98.73%與99.25%的準確率。

    In order to address the classification challenges and improve accuracy in recognizing coffee green beans through image classification, this paper propose a method that enhances the classification accuracy by concatenating different image enhancement techniques, merging features from various algorithms. To extract crucial features from original images, we chose nine common image enhancement methods, Adaptive threshold, bit-plane slicing, black hat, Canny edge detection, grayscale, histogram equalization, Laplacian, top hat, and unsharp masking. We selected the methods with higher correlations corresponding to the ground truth from the nine image enhancement algorithms. We replaced the RGB channel of the original image with the image processing methods that exhibit higher correlation coefficients, thereby enhancing the model's recognition capability with multiple features. In this study, we conducted experiments using the MobileViT. We selected the processing methods with higher correlations as the materials for feature fusion. The image dataset generated through image concatenation as new inputs for training. We considered the method without any preprocessing as the baseline. In the dichotomy case, the combination of bit-plane slicing, histogram equalization, and unsharp masking achieved an accuracy of 96.9%, representing an improvement of approximately 5.5% compared to the original method. If using the same dataset with background removal, the same combination achieved an accuracy of 97.0%. In the trichotomy case, the same combination achieved accuracies of 96.8% and 97.4%, respectively, representing improvements of 6.7% and 4.9% over the original method. Finally, we validated the research results using MobileNetV3. In the dichotomy case, the same combination of image enhancement algorithms achieved the highest accuracies of 99.12% and 99.21% for image with and without background removal, respectively, representing improvements of 0.39% and 0.44% compared to the original method. In the trichotomy case, compared to the original method, there were improvements of approximately 0.92% and 0.79%, achieving accuracies of 98.73% and 99.25%, respectively.

    第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 3 1.3 研究方法 3 第二章 文獻探討 5 2.1 影像辨識 5 2.1.1 CNN 5 2.1.2 Transformer 7 2.2 咖啡豆瑕疵分類 10 2.2.1 咖啡豆瑕疵種類介紹 11 2.2.2 咖啡豆分類模型 14 2.3 投票策略與影像串接 15 2.4 影像分類模型 17 2.4.1 MobileViT 17 2.4.2 MobileNetV3 20 第三章 相關影像增強方法與篩選策略 22 3.1 皮爾森動差相關係數 22 3.2 篩選策略與影像串接 23 3.2.1 投票策略 23 3.2.2 影像串接 24 3.3 灰階 25 3.4 自適應閾值算法 26 3.5 直方圖等化 27 3.6 Canny邊緣偵測 28 3.7 拉普拉斯濾波 29 3.8 頂帽邊緣偵測 30 3.9 黑帽邊緣偵測 30 3.10 位元平面分割 31 3.11 非銳化濾鏡 33 3.11.1 參數設計 35 第四章 實驗結果與分析 38 4.1 硬體及環境配置 38 4.2 咖啡豆資料集 39 4.3 實驗結果 41 4.3.1 未去背影像實驗結果 41 4.3.2 去背影像實驗結果 43 4.3.3 三分法對於影像準確率的影響 45 4.3.4 實驗分析 47 4.4 驗證 47 4.4.1 驗證結果 48 第五章 結論 53 參考文獻 55

    A. M. Neto, A. C. Victorino, I. Fantoni, D. E. Zampieri, J. V. Ferreira and D. A. Lima, "Image processing using Pearson's correlation coefficient: Applications on autonomous robotics," 13th International Conference on Autonomous Robot Systems, Lisbon, Portugal, 2013, pp. 1-6, doi: 10.1109/Robotica.2013.6623521.
    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, "Attention Is All You Need," 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017.
    S. Mehta, M. Rastegari, "MobileViT: Light-weight, General-purpose and Mobile-friendly Vision Transformer," 2021 arXiv preprint arXiv:2110.02178.
    S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a convolutional neural network," International Conference on Engineering and Technology (ICET), Antalya, Turkey, 2017, pp. 1-6, doi: 10.1109/ICEngTechnol.2017.8308186.
    R. Chauhan, K. K. Ghanshala and R. C. Joshi, "Convolutional Neural Network (CNN) for Image Detection and Recognition," First International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, India, 2018, pp. 278-282, doi: 10.1109/ICSCCC.2018.8703316.
    A. Howard M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, H. Adam, "Searching for MobileNetV3," 2019 arXiv:1905.02244v5
    S. Hochreiter, J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 15 Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
    J. Kosalos, R. Stephen, S. Diaz, P. Songer, M. Alves, M. Curtis and S. Sung-Yong Kil, (n.d.). "Arabica Green Coffee Defect Handbook," Retrieved from https://www.coffeestrategies.com/wp-content/uploads/2020/08/Green-Coffee-Defect-Handbook.pdf
    J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
    P.-Y. Yang, S.-Y. Jhong and C.-H. Hsia, "Green Coffee Beans Classification Using Attention-Based Features and Knowledge Transfer," IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), Penghu, Taiwan, 2021, pp. 1-2, doi: 10.1109/ICCE-TW52618.2021.9603134.
    K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
    S. Woo, J. Park and J. -Y. Lee, “CBAM: Convolutional Block Attention
    Module,” European Conference on Computer Vision, pp. 3-19, 2018.
    李易軒,“可解釋輕量化深度卷積網路於生咖啡豆品質檢測”, 國立宜蘭大學,碩士論文,2022。
    陳霖,“基於YOLOv7神經網路演算法於咖啡豆瑕疵分類之研究”,國立雲林科技大學,碩士論文,2023。
    周佩誼,“以影像辨識技術實作咖啡豆篩選系統之研究”,國立臺中教育大學,碩士論文,2022。
    Y. Fei, Z. Li, T. Zhu and C. Ni, "A Lightweight Attention-Based Convolutional Neural Networks for Fresh-Cut Flower Classification," IEEE Access, vol. 11, pp. 17283-17293, 2023, doi: 10.1109/ACCESS.2023.3244386.
    N. Ma, X. Zhang, H.-T. Zheng, J. Sun, "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design," 2018 arXiv preprint arXiv:1807.11164.
    M. Siar and M. Teshnehlab, "A Combination of Feature Extraction Methods and Deep Learning for Brain Tumour Classification," IET Image Processing, 16(2), 416-441. https://doi.org/10.1049/ipr2.12358
    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, "An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale," 2021 arXiv arXiv:2010.11929v2.
    M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. -C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4510-4520, doi: 10.1109/CVPR.2018.00474.
    J. Hu, L. Shen and G. Sun, "Squeeze-and-Excitation Networks," IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7132-7141, doi: 10.1109/CVPR.2018.00745.
    K. T. Chitty-Venkata, M. Emani, V. Vishwanath and A. K. Somani, "Neural Architecture Search for Transformers: A Survey," IEEE Access, vol. 10, pp. 108374-108412, 2022, doi: 10.1109/ACCESS.2022.3212767.
    T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and H. Adam, “NetAdapt: Platform-aware neural network adaptation for mobile applications,” Computer Vision—ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, Switzerland, 289–304.
    N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, Jan. 1979, doi: 10.1109/TSMC.1979.4310076.
    F. A. Cheikh, L. Khriji and M. Gabbouj, "Unsharp masking-based approach for color image processing," 9th European Signal Processing Conference (EUSIPCO 1998), Rhodes, Greece, 1998, pp. 1-4.
    A. Febriana, K. Muchtar, R. Dawood and C.-Y. Lin. "USK-Coffee Dataset: A Multi-Class Green Arabica Coffee Bean Dataset for Deep Learning," 2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), Malang, Indonesia, 2022, pp.469-473.
    潘睿中,“以漫水填充法擴增影像手法提升卷積神經網路預測模型準確率之研究”, 國立中原大學,碩士論文,2023。

    下載圖示
    QR CODE