簡易檢索 / 詳目顯示

研究生: 盧聖侃
Lu, Sheng-Kan
論文名稱: 應用深度學習演算法之海報文字區域檢測實驗
An experiment with application of deep learning algorithm to detect texts area for poster
指導教授: 張晏榕
Chang, Yen-Jung
口試委員: 周遵儒
Chou, Tzren-Ru
林玲遠
Lin, Ling-Yuan
張晏榕
Chang, Yen-Jung
口試日期: 2022/06/14
學位類別: 碩士
Master
系所名稱: 圖文傳播學系
Department of Graphic Arts and Communications
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 41
中文關鍵詞: 海報版面深度學習
英文關鍵詞: Mask R-CNN, Yolov4, Poster Layout, Deep Learning
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202201588
論文種類: 學術論文
相關次數: 點閱:99下載:17
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,數位化的廣泛應用也促使了互聯網的發展。伴隨著互聯網技術日新月異,大量的社交媒體和其他應用程式不斷推陳出新,數位圖像已然成為社會中一種主要的資訊獲取來源。在當今資訊量爆炸的社會裡,海報作為生活中最常見的資訊傳達媒介,成為生活中處處可見的藝術表現方式並充斥在現代人的生活當中。若能提出一個檢測方法來辨識海報中的文字區域,不僅能提取海報文字區域作為後續分析的資訊,也能使海報在網路中的更容易被使用者檢索。隨著深度學習的興起,越來越多研究者利用深度學習來完成影像分析及物件檢測。而其中,Mask R-CNN 與 Yolov4 分別代表著 two-stage 與 one-stage 的目標檢測方法,無論是在物件的瑕疵檢測、人臉的偵測、交通路況的偵測等領域都有很好的研究結果。然而,以上大多都是檢測自然場景物件,較少應用在平面設計的領域之中。基此,為了提取海報圖像的文字區域,本研究將訓練 Mask R-CNN 與Yolov4 兩個檢測方法,分別來對海報圖像文本進行檢測。實驗結果顯示,Mask R-CNN檢測文字區域的 mAP50 可達 79.0%;Yolov4 檢測文字區域的 mAP50 也高達 85.1%。意味著兩個目標檢測方法都可在海報版面中,定位出海報中文字區域,提供未來作為文字辨識的數據。而對比 Mask R-CNN 與 Yolov4 兩種演算法的輸出結果後,發現 Yolov4 可以更準確地檢測文字區域,並且較不受海報因色彩、文字大小、文字間隔等設計因素影響到檢測結果。

    In poster design, designers often simplify and artisticize the information, quickly capture the audience's attention. The text in the poster must be brief and clear to the audience at a glance. If a detection method can be proposed to identify the text area in the poster, it can not only extract the text area of the poster as information for subsequent analysis, but also make the poster on the Internet easier to be retrieved by users. With the progress of deep learning and the improvement of computer hardware equipment, many researchers also use deep learning to complete image analysis and object detection. Among many object detection methods, Mask R-CNN and Yolov4 represent the two-stage and one-stage object detection methods respectively. Both of them have relatively outstanding performance in accuracy and computational efficiency. It can also be observed in real life that many researchers use this method to solve many problems, such as object defects detection, face detection, and traffic condition detection. However, most of the methods above detect objects in natural scenes, and are less used in the field of graphic design. In order to understand the ability of deep learning in poster layout analysis. In this study, two detection methods, Mask R-CNN and Yolov4, will be trained to detect poster image text respectively. The experimental results show that the mAP50 of Mask R-CNN can reach 79.0%; the mAP50 of Yolov4 can also be as high as 85.1%. It means that both object detection methods can be able to locate the text area in the poster layout, and provide data for text recognition in the future.

    摘要 I Abstract II 目錄 III 第壹章 緒論 1 第一節 研究背景與動機 1 第二節 研究目的與問題 3 第三節 研究範圍與限制 4 第貳章 文獻探討 5 第一節 海報圖像版面 5 一 海報版面設計 5 二 海報設計要素 6 三 海報圖像文本 7 第二節 深度學習演算法 Mask R-CNN 9 一 語義分割與實例分割 10 二 Mask R-CNN網路架構 12 第三節 深度學習演算法 Yolov4 14 一 one-stage目標檢測 15 二 Yolov4網路架構 15 第參章 研究方法 17 第一節 資料來源與敘述 17 第二節 圖像預處理 18 一 labelme 18 二 labelImg 19 第三節 模型訓練 19 一 Mask R-CNN網路結構 19 二 Yolov4網路結構 20 第四節 評量方式 21 一 混淆矩陣 21 二 衍生評量方式 22 第肆章 研究結果 24 一 實驗結果與數據 24 二 實驗分析與討論 25 第伍章 結論與建議 31 一 研究結論 31 二 研究建議 32 參考文獻 33 致謝 41

    Ahmed, B., & Gulliver, T. A. (2020). Image splicing detection using mask-rcnn. Signal, Image and Video Processing, 1-8.
    Berg, J., & Hicks, R. (2017). Successful design and delivery of a professional poster. Journal of the American Association of Nurse Practitioners, 29(8), 461-469.
    Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
    He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1904-1916.
    Keely, B. (2004). Planning and creating effective scientific posters. Journal of Continuing Education in Nursing, 35(4), 182–185.
    Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759-8768).
    Liu, Y., Chen, Z., Xu, C., Liu, T., & Guo, X. (2022). Driver Fatigue Detection Algorithm Based on Improved Yolov4. World Scientific Research Journal, 8(1), 58-63.
    Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
    Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision (pp. 843-852).
    Van Dalen, J., Gubbels, H., Engel, C., & Mfenyana, K. (2002). Effective poster design. Education for Health, 15(1), 79-84.
    Wang, C. Y., Liao, H. Y. M., Wu, Y. H., Chen, P. Y., Hsieh, J. W., & Yeh, I. H. (2020). CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 390-391).
    Wang, Y., Wang, L., Jiang, Y., & Li, T. (2020). Detection of self-build data set based on YOLOv4 network. In 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE) (pp. 640-642).
    Ye, Q., & Doermann, D. (2014). Text detection and recognition in imagery: A survey. IEEE transactions on pattern analysis and machine intelligence, 37(7), 1480-1500.
    Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International journal of computer vision, 59(2), 167-181.
    Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Martinez-Gonzalez, P., & Garcia-Rodriguez, J. (2018). A survey on deep learning techniques for image and video semantic segmentation. Applied Soft Computing, 70, 41-65.
    Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
    Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
    Hafiz, A. M., & Bhat, G. M. (2020). A survey on instance segmentation: state of the art. International journal of multimedia information retrieval, 1-19.
    He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
    He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
    Jiao, L., & Zhao, J. (2019). A survey on the new generation of deep learning in image processing. IEEE Access, 7, 172231-172263.
    Jiao, L., Zhang, F., Liu, F., Yang, S., Li, L., Feng, Z., & Qu, R. (2019). A survey of deep learning-based object detection. IEEE access, 7, 128837-128868.
    Jobin, K. V., Mondal, A., & Jawahar, C. V. (2019, September). DocFigure: A dataset for scientific document figure classification. In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) (Vol. 1, pp. 74-79). IEEE.
    Li, Y., Zhang, J., Gao, P., Jiang, L., & Chen, M. (2018, June). Grab cut image segmentation based on image region. In 2018 IEEE 3rd international conference on image, vision and computing (ICIVC) (pp. 311-315). IEEE.
    Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
    Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
    Qin, J., & Zhang, Y. (2021). Design of Protein Crystal Detection System Based on Mask RCNN. World Scientific Research Journal, 7(5), 85-88.
    Rajan, V., & Stiehl, H. S. (2019, September). Making DIA Accessible to Non-Experts: Designing a Visual Programming Language for Document Image Analysis. In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) (Vol. 3, pp. 23-27). IEEE.
    Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 91-99.
    Rother, C., Kolmogorov, V., & Blake, A. (2004). " GrabCut" interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG), 23(3), 309-314.
    Roullet, C., Fredrick, D., Gauch, J., & Vennarucci, R. (2019). An automated technique to recognize and extract images from scanned archaeological documents. In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) (Vol. 1, pp. 20-25). IEEE.
    Shrestha, A., & Mahmood, A. (2019). Review of deep learning algorithms and architectures. IEEE Access, 7, 53040-53065.
    Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision (pp. 843-852).
    Terauchi, A., Mori, N., & Ueno, M. (2019, September). Analysis Based on Distributed Representations of Various Parts Images in Four-Scene Comics Story Dataset. In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) (Vol. 1, pp. 50-55). IEEE.
    Tian, Y., Yang, G., Wang, Z., Li, E., & Liang, Z. (2020). Instance segmentation of apple flowers using the improved mask R–CNN model. Biosystems Engineering, 193, 264-278.
    Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International journal of computer vision, 104(2), 154-171.
    Xu, L., Fei, M., Zhou, W., & Yang, A. (2018, December). Face expression recognition based on convolutional neural network. In 2018 Australian & New Zealand Control Conference (ANZCC) (pp. 115-118). IEEE.
    Yu, C., Fan, X., Hu, Z., Xia, X., Zhao, Y., Li, R., & Bai, Y. (2020). Segmentation and measurement scheme for fish morphological features based on Mask R-CNN. Information Processing in Agriculture, 7(4), 523-534.
    Yu, Y., Zhang, K., Yang, L., & Zhang, D. (2019). Fruit detection for strawberry harvesting robot in non-structural environment based on Mask-RCNN. Computers and Electronics in Agriculture, 163, 104846.
    Zhang, Q., Chang, X., & Bian, S. B. (2020). Vehicle-damage-detection segmentation algorithm based on improved mask RCNN. IEEE Access, 8, 6997-7004.
    Zhang, S., Pan, X., Cui, Y., Zhao, X., & Liu, L. (2019). Learning affective video features for facial expression recognition via hybrid deep learning. IEEE Access, 7, 32297-32304.
    于藕(2019)。探討圖像處理軟件在平面設計中的應用。科學技術創新,(34),108-109。
    王瑋瓊(2013)。後現代語境下的海報版面設計研究(碩士論文)。取自https://tra.oversea.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD201401&filename=1014148714.nh。
    田萱、王亮、丁琪(2019)。基于深度學習的圖像語義分割方法綜述。軟件學報,(02),440-468。doi:10.13328/j.cnki.jos.005659。
    李寧(2020)。探究版式設計在平面設計中的作用。科技經濟導刊28(26),62-63。
    李佳薇(2022)。2021海報設計流行趨勢探究。現代商貿工業,(7),194-195。
    李翌昕、鄒亞君、馬盡文(2019)。基於特徵提取和機器學習的文檔區塊圖像分類算法。信號處理,35(5),747-757。
    李嫣、胡清(2021)。北京大學生電影節海報設計發展評析。藝術教育,23,203-206。
    林季穎(2018)。基於深度學習之人臉特徵辨識與應用(碩士論文)。取自華藝線上圖書館。
    林泊宏(2018)。基於遮罩區域卷積類神經網路之木節偵測暨分類演算法(碩士論文)。取自華藝線上圖書館。
    俞越東(2019)。淺議展覽版面設計的四項基本原則。文化創新比較研究(02),65+69。
    胡芝蘭、林行剛、嚴洪(2006)。基於分層密度特徵的文檔圖像檢索。清華大學學報,46(7),1231-1234。
    徐津(2005)。報紙版面設計中的人機工程學問題。包裝工程(04),201-204。
    徐銳義、吳煒、何小海、楊玉科(2008)。中文商務名片版面分割研究。四川大學學報:自然科學版,45(2),331-335。
    晉瑾、平西建、張濤、陳明貴(2007)。圖像中的文本定位技術研究綜述。計算機應用研究,24(06),8-11。
    殷建、鄭童(2021)。平面海報設計在信息傳達設計中的研究。西部皮革,7,126-129。
    荊奕君(2018)。新媒體技術對平面設計的影響與發展。新聞戰線,(16),168-169。
    袁子意(2018)。運用深度神經網絡實現驗證碼識別(碩士論文)。取自華藝線上圖書館。
    馬勇(2021)。新媒體時代報紙美術編輯設計創新。新聞傳播,(03),119-120。
    馬雲峰(2016)。圖文搭配在版面設計中的視覺效果分析。新媒體研究,20(2),43-44。
    張芳(2016)。版面設計的要素與視覺傳達。新媒體研究,18(2),178-179。
    張勁(2014)。平面媒體編輯版面設計基礎。美術大觀,6,110。
    張旋(2021)。平面海報設計中圖形符號的視覺傳達探討。美與時代(上),(04),74-76。doi:10.16129/j.cnki.mysds.2021.04.026。
    曹偉(2017)。電影海報細節設計與電影文本內容的敘事性互動。現代視聽,(5),58-61。
    郭芬紅、謝立艷、熊昌鎮(2018)。自然場景圖像文字檢測研究綜述。計算機應用,38(S1),173-178。
    陳圓圓、王维蘭、劉華明、蔡正琦、趙鵬海(2021)。基於自適應游程平滑算法的藏文文檔圖像版面分割與描述。中國學術期刊,58(14)。
    陳慧姝(2019)。電影海報中的視覺傳達表現。包裝工程,40(12),313-318。
    陳璇、賀建軍、李厚杰、武林秀(2019)。基於Mask R-CNN的滿文文檔版面分析。大連民族大學學報,21(3),240-245。
    傅隆生、馮亞利、Elkamil Tola、劉智豪、李瑞、崔永杰(2018)基于卷積神經網絡的田間多簇獼猴桃圖像識別方法。農業工程學報(02),205-211。
    曾建澔、林孟緯、謝佳蒨、吳志泓(2019)。基於智慧影像分析模式之道路積淹水自動辨識系統。TANET2019臺灣網際網路研討會,92-96。
    賀瑩(2015)。平面媒體編輯版面設計基礎研究。品牌,4,181。
    楊利娜(2021)。色彩構成原理在海報設計中的表現。西部皮革(08),40-41。
    楊飛(2016)。自然場景圖像中的文字檢測綜述。電子設計工程,24(24),165-168。
    楊捷、劉進鋒(2018)。利用CTPN檢測電影海報中的文本信息。電腦知識與技術,14(25),213-215。
    劉成林(2019)。文檔圖像識別技術回顧與展望。數據與計算發展前沿(06),17-25。

    下載圖示
    QR CODE