研究生: |
陳建豪 Chen, Jian-Hao |
---|---|
論文名稱: |
使用人工智慧晶片實作之自動樂譜辨識與打擊樂演奏系統 Robotic Percussion System Incorporating an Automatic Sheet Music Recognition System Using Artificial Intelligence Chip |
指導教授: |
王偉彥
Wang, Wei-Yen |
口試委員: |
翁慶昌
Wong, Ching-chang 盧明智 Lu, Ming-Chih 呂成凱 Lu, Cheng-Kai 許陳鑑 Hsu, Chen-Chien 王偉彥 Wang, Wei-Yen |
口試日期: | 2022/08/17 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 55 |
中文關鍵詞: | 樂譜辨識 、深度學習 、Delta 機械手臂 、人工智慧晶片 |
英文關鍵詞: | music score recognition, deep learning, delta robot, artificial intelligence chip |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202201582 |
論文種類: | 學術論文 |
相關次數: | 點閱:121 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年的神經網路研究,針對高解析度光學影像辨識系統已達到成熟階段,然而龐大的卷積神經網路(Convolutional Neural Network, CNN)架構往往有著極大的計算成本,如何維持可接受的正確率並降低計算負擔是一個值得研究的方向。因此本論文使用專精電腦視覺任務的人工智慧晶片替換龐大的目標偵測CNN來偵測音符位置,並以自行設計之輕量CNN辨識音階資訊。將複雜的任務分配給兩個輕量CNN來實現一套光學樂譜辨識系統。本論文亦設計控制程式整合光學樂譜辨識與Delta機械手臂控制。透過鏡頭偵測與辨識拍攝到的紙本樂譜,並且以通用非同步收發傳輸器(Universal Asynchronous Receiver/Transmitter, UART)取得辨識結果。接著以辨識結果確定演奏順序後,驅動Delta機械手臂自動演奏鐵琴。最後以紙本樂譜實際測試本論文提出之光學樂譜辨識系統,驗證此系統的辨識正確率。
In recent years, neural network research has reached a mature stage for high-resolution optical image recognition systems. However, huge Convolutional Neural Network (CNN) architectures often have huge computational costs, and it is worth studying how to maintain acceptable accuracy and reduce the computational cost. Therefore, this thesis uses an artificial intelligence chip specializing in computer vision tasks to replace the huge target detection CNN for detecting music score coordinates. This thesis also proposes a lightweight CNN to recognize the music scale of detected music score. A complex task is assigned to two lightweight CNNs to implement an optical music score recognition (OMR) system. This thesis also proposes the control program to integrate OMR system and Delta robot. The OMR system detects music score from captured sheet music through the lens and transfers results with Universal Asynchronous Receiver/Transmitter (UART) to control program. The program drives Delta robot to play percussion after the playing order is determined with the recognition results. Finally, we tested the OMR system with sheet music to verify the accuracy of this system.
[1] B. Dynamics. "Spot Arm - Mobile Manipulation." https://www.bostondynamics.com/products/spot/arm (accessed Aug. 30, 2022).
[2] 蔡自偉, "印刷樂譜辨識系統," 國立中山大學, 資訊工程學系碩士論文, 2004.
[3] 黃朝慶, "自動樂譜辨識與打擊樂機器人系統," 國立臺灣師範大學, 電機工程學系碩士論文, 2020.
[4] W. S. McCulloch and W. Pitts, "A logical calculus of the ideas immanent in nervous activity," (in en), Bulletin of Mathematical Biophysics, vol. 5, no. 4, pp. 115-133, Dec. 1943, doi: 10.1007/BF02478259.
[5] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," (in en), Nature, vol. 323, no. 6088, pp. 533-536, Oct. 1986, doi: 10.1038/323533a0.
[6] G. E. Hinton, S. Osindero, and Y.-W. Teh, "A Fast Learning Algorithm for Deep Belief Nets," (in en), Neural Computation, vol. 18, no. 7, pp. 1527-1554, Jul. 2006, doi: 10.1162/neco.2006.18.7.1527.
[7] C. Michael. "The Difference Between AI, Machine Learning, and Deep Learning?" https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/ (accessed Aug. 30, 2022).
[8] K. Fukushima, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," (in en), Biol. Cybernetics, vol. 36, no. 4, pp. 193-202, Apr. 1980, doi: 10.1007/BF00344251.
[9] J. Schmidhuber, "Deep Learning in Neural Networks: An Overview," Neural Networks, vol. 61, pp. 85-117, Jan. 2015, doi: 10.1016/j.neunet.2014.09.003.
[10] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in NIPS Conf., Lake Tahoe, Nevada, USA, Dec. 2012, vol. 25: Curran Associates, Inc., pp. 1097-1105.
[12] O. Russakovsky et al., "ImageNet Large Scale Visual Recognition Challenge," (in en), Int J Comput Vis, vol. 115, no. 3, pp. 211-252, Dec. 2015, doi: 10.1007/s11263-015-0816-y.
[13] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in International Conference on Learning Representations (ICLR), San Diego, CA, USA, Apr. 2015, pp. 1-14.
[14] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[15] C. Szegedy et al., "Going deeper with convolutions," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp. 1-9, doi: 10.1109/CVPR.2015.7298594.
[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, Jun. 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
[17] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, Jul. 2017, pp. 6517-6525, doi: 10.1109/CVPR.2017.690.
[18] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018. [Online]. Available: https://arxiv.org/abs/1804.02767.
[19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510-4520.
[20] 視芯有限公司. "視芯AVSdsp | AI晶片與應用模組開發 - Mipy AI 簡易應用發展系統." https://sites.google.com/avsdsp.com/avsdsp/module/mipy-system (accessed Aug. 30, 2022).
[21] 視芯有限公司. "視芯AVSdsp | AI晶片與應用模組開發 - 第四代AI晶片 AVS05P." https://sites.google.com/avsdsp.com/avsdsp/chip/avs05p (accessed Aug. 30, 2022).
[22] 視芯有限公司. "視芯AVSdsp | AI晶片與應用模組開發 - 第五代AI晶片 AI860." https://sites.google.com/avsdsp.com/avsdsp/chip/ai860 (accessed Aug. 30, 2022).
[23] 採智科技股份有限公司. "MX-28 系列全向智能馬達清單." https://idminer.com.tw/product/mx-28-%e7%b3%bb%e5%88%97%e5%85%a8%e5%90%91%e6%99%ba%e8%83%bd%e9%a6%ac%e9%81%94%e6%b8%85%e5%96%ae/ (accessed Aug. 29, 2022).
[24] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, Jun. 2017, doi: 10.1109/TPAMI.2016.2577031.
[25] A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models," in Proc. icml, 2013, vol. 30, no. 1: Citeseer, p. 3.
[26] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, "Efficient BackProp," in Neural Networks: Tricks of the Trade: Second Edition, G. Montavon, G. B. Orr, and K.-R. Müller Eds., (Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2012, pp. 9-48.