簡易檢索 / 詳目顯示

研究生: 黃朝慶
HUANG, Chao-Ching
論文名稱: 自動樂譜辨識與打擊樂機器人系統
Automatic Music Score Recognition and Robotic Percussion System
指導教授: 王偉彥
Wang, Wei-Yen
蔣欣翰
Chiang, Hsin-Han
學位類別: 碩士
Master
系所名稱: 電機工程學系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 93
中文關鍵詞: 樂譜辨識Delta機械手臂深度學習影像處理
英文關鍵詞: music score recognition, delta robot, deep learning, digital image processing
DOI URL: http://doi.org/10.6345/NTNU202001197
論文種類: 學術論文
相關次數: 點閱:425下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 光學樂譜辨識系統是一套針對樂譜影像進行影像辨識的系統,在樂譜影像中,音符是用以記錄音階和節拍的資訊,在過去許多的研究和實驗當中,針對高解析度的樂譜辨識系統已經達到成熟的階段。然而,基於相機影像的樂譜辨識會受到環境光線、角度和模糊的影響,故仍有進一步研究的必要,我們初次嘗試將深度學習架構應用在基於相機影像的樂譜辨識系統。首先,我們使用線偵測演算法在即時攝影畫面中自動偵測樂譜影像,並找出樂譜當中的五線譜範圍,因為我們只專注於五線譜當中的音符資訊,為了完成這個任務,我們使用霍夫線偵測演算法並取得每一行五線譜的範圍。接下來,為偵測、切割及辨識每一個音符,我們將每一行獨立的五線譜送至基於Darknet53網路之YOLO v3的檢測模型當中,目前可以辨識六類的音符分類名稱分別為全音符、二分音符、四分音符、八分音符、四分休止符和二分休止符,再者,將YOLO v3所偵測到的音符根據樂譜中的位置進行排序,並送至卷積神經網路用以辨識音階,現階段我們可以辨識C3到F4共十一類的音階,最後我們透過RS232連接Delta機械手臂進行樂器的演奏。在光學樂譜辨識的發展中,我使用霍夫線偵測樂譜中每行的五線譜範圍,如此我們可以避免歌詞或圖案的雜訊,減少辨識的錯誤。不僅如此,透過自動化五線譜偵測所取得的樂譜影像使用深度學習的架構進行辨識,並在介面上顯示音階和節拍,至終,我們使用機械手臂進行演奏。

    Optical music recognition (OMR) is a system for music score recognition. In music scores, notes are utilized to record pitch and duration information. After much research and experimentation, the recognition of high-resolution music scores is in a mature state. However, the research of the recognition of camera-based music scores is needed because of different illumination and perspective distortions. Therefore, we explored the utilization of deep learning architectures for music object recognition system. At the first step, we performed Hough lines detection algorithm to automatically detect scores, find the staff areas and get the boundary of each staff in real-time because we just needed to focus on the information in these areas. Then, in order to detect, recognize, and make a segmentation of musical notes, our approach was to feed each individual staff row into YOLOv3, which is based on Darknet-53, to classify the notes into six categories: whole notes, half notes, quarter notes, eighth notes, half rest, and quarter rest. After that, we utilized a convolutional neural network (CNN) to recognize the pitch. Currently, eleven classes are considered: pitch from C3 to F4. Finally, we employed one of the Delta robot’s serial ports (RS232) for communication. In the development of the OMR system, by using Hough lines determining for each staff area, we can avoid drawings, text and thus reduce detection errors. Moreover, we utilized deep learning architectures for music object recognition. The proposed system only needs a picture of music score by a webcam as input, and then it can automatically detect the staff area, as well as output the duration and pitch of the notes. Finally, we utilized robotic arms to play musical instruments.

    摘要 i ABSTRACT ii 誌 謝 iv 圖 目 錄 viii 表 目 錄 xi 第一章 緒論 1 1.1研究動機與背景 1 1.2文獻回顧 3 1.2.1樂器演奏型機器人 3 1.2.2光學樂譜辨識技術 4 1.2.3卷積神經網路 5 1.2.4基於人類演示學習之機械手臂自動化控制技術 8 1.3論文架構 10 第二章 軟/硬體架構與設計 11 2.1系統架構 11 2.2實驗平台 13 2.3硬體實現設備 15 2.3.1計算核心 15 2.3.2感測器介紹 17 2.3.3機械手臂與末端執行器介紹 18 2.3.4機械手臂機構設計 20 2.3.5 通訊設備介紹 23 2.4軟體層面介紹 25 第三章 樂譜辨識技術 29 3.1透視變換 30 3.2五線譜偵測及切割 32 3.3音符偵測與節拍辨識 35 3.4音階辨識 42 3.5建立訓練資料集 46 第四章 基於人類演奏之機械手臂模仿系統 49 4.1手臂機構設計 49 4.2控制與連接方法 50 4.2.1電源連接方式 50 4.2.2 通訊連接方法 51 4.2.3機械手臂剎車系統 53 4.2.4機械手臂速度控制 54 4.3機械手臂運動學 55 4.3.1運動學模型 55 4.3.2 Delta 機械手臂知參數設定 56 4.3.3 逆向運動學 58 4.4聲音辨識系統 65 4.4.1系統架構 65 4.4.2基於FFT之快速音階匹配 66 4.5混合模式 68 4.5.1前奏觸發模式 68 4.5.2協同合作模式 69 第五章 實驗與討論結果 71 5.1基於YOLO v2與YOLO v3之音符偵測辨識實驗結果 71 5.2基於CNN之音階辨識實驗結果 73 5.3基於樂譜辨識機械手臂演奏自動化實驗 76 5.4 基於FFT快速音階匹配應用於模仿人類演奏實驗 79 5.5混和模式實驗 82 5.5.1前奏觸發模式 82 5.5.2 協奏模式 84 第六章 結論與未來展望 87 6.1結論 87 6.2未來展望 88 參考文獻 89 自  傳 92 學術成就 93

    [1] D.Doermann, Handbook of document image processing and recognition. Springer-Verlag London, 2014.
    [2] C.R. Boër, L. Molinari-Tosatti, and K.S. Smith , Parallel Kinematic Machines: Theoretical Aspects and Industrial Requirements. Springer-Verlag, 1999.
    [3] L.W Tsai, Robot analysis: the mechanics of serial and parallel manipulators. New York, 1999.
    [4] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proc. of the IEEE., vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
    [5] A. Rebelo, I. Fujinaga, F. Paszkiewicz, R.S. Marc¸al, C. Guedes, and J.S. Cardoso, “Optical music recognition: State-of-the-art and open issues,” Int. J. Multimed. Inf. Retr., pp.173-190, Feb. 2012.
    [6] Q.N. Vo, S.H Kim, H.J. Yang, and G. Lee, “ An MRF model for binarization of music scores with complex background,” Pattern Recognit. Lett., vol. 69, pp. 88-95, Jan. 2016.
    [7] C.M. Dinh, H.J. Yang, G.S. Lee, and S.H. Kim, “ Fast lyric area extraction from images of printed Korean music scores,” IEICE Trans. Inf. Syst., vol. 99, no. 6, pp.1576– 1584, Nov.2016.
    [8] C.Y. Tzou, M.J. Hsu, J.Z. Jian, Y.H. Chien , W.Y. Wang, and C.C. Hsu, “Mathematical analysis and practical applications of a serial-parallel robot with delta-like architecture,” Int. J. Eng. Res. Sci., vol. 2, pp.80-91, May. 2016.
    [9] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and S. Dieleman, “ Mastering the game of Go with deep neural networks and tree search,” Nature., pp.484-489, Jan. 2016.
    [10] C. Wen, A. Rebelo, J. Zhang, and J. Cardoso, “A new optical music recognition system based on combined neural network,” Pattern Recognit. Lett., vol. 58, pp.1-7, Jun. 2015.
    [11] Z. Huang, X. Jia, and Y. Guo, “State-of-the-art model for music object recognition with deep learning,” Appl. Sci., Sep. 2019.
    [12] A. Rico, “Camera-based Optical Music Recognition using a Convolutional Neural Network,” in IAPR Int. Conf. on Document Anal. and Recognit., Nagoya, Japan, Aug. 2017, pp.27-28.
    [13] H.N. Bui, I.S. Na, and S.H. Kim, “ Staff line removal using line adjacency graph and staff line skeleton for camera-based printed music scores,” in Proc. IEEE. ICPR., Stockholm, Sweden, Aug. 2014, pp.2787-2789.
    [14] Q.N. Vo, T. Nguyen, S.H. Kim, H.J. Yang, and G.S. Lee, “Distorted music score recognition without Staffline removal,” in Proc. IEEE. ICPR., Stockholm, Sweden, 2014, pp.2956–2960.
    [15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural network,” in Proc. NIPS Conf., Lake Tahoe, Nevada, USA, Dec. 2012, pp.1097-1105.
    [16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, USA, 2016, pp.779-788.
    [17] J. Redmon, and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, HI, USA, 2017, pp.7263-7271.
    [18] J. Redmon, and A. Farhadi, “YOLOv3: An Incremental Improvement.,” arXiv preprint arXiv:1804.02767.,2018.
    [19] W. Liu, et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis., Las Vegas, NV, USA, July 2016, pp. 21-37.
    [20] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR., San Diego, CA, USA, 2015.
    [21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, NV, USA, 2016, pp.770-778.
    [22] S. Christian, et al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Boston, MA, 2015, pp.1-9.
    [23] J.H. Chen, G.Y. Lu, Y.H. Chien, H.H. Chiang, W.Y. Wang, C.C. Hsu, “A Robotic Arm System Learning from Human Demonstrations,” in Proc. Int. Autom. Control Conf., Taiwan, Nov. 2018, pp.4-7.
    [24] Clavel, Reymond, “Device for the movement and positioning of an element in space,” U.S. Patent no 4,976,582, 1990.
    [25] 蔡代桓,“應用自適應滑動模式實現於機械手臂之位置控制器設計”, 國立臺灣師範大學 , 碩士 , 2017年7月
    [26] 凃昱銘,“基於快速音高序列比對之哼唱式歌曲檢索”, 國立臺北科技大學 , 碩士 , 2012年1月

    無法下載圖示 電子全文延後公開
    2025/08/01
    QR CODE