簡易檢索 / 詳目顯示

研究生: 鄭翔升
Cheng, Hsiang-Sheng
論文名稱: 以 RISC-V SoC 為基礎的類神經網路模型部署工具
Neural network model deployment tools for SoC based on RISC-V cores
指導教授: 黃文吉
Hwang, Wen-Jyi
口試委員: 董一志
Tung, Yi-Chih
葉佐任
Yeh, Tso-Zen
黃文吉
Hwang, Wen-Jyi
口試日期: 2024/01/15
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 51
英文關鍵詞: RISC-V, TinyML, Model deployment
DOI URL: http://doi.org/10.6345/NTNU202400185
論文種類: 學術論文
相關次數: 點閱:134下載:13
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文實做一個適用於 RISC-V SoC 的模型部署工具,將建立模型、量化模型、部署模型功能整合成一套軟體工具,使用自訂義之 Intermedia Representation 將神經網路自動轉換為可以在 SoC 執行的 C 語言,主要目的是簡化 TinyML 系統開發階段的模型部署流程。使用 Genesys2 FPGA 實現 Rocket Core 與 AI accelerator Gemmini 為基礎的 SoC 驗證此工具的部署結果,包括神經網路推理資料流以及效能。

    誌謝 i 摘要 ii 目錄 iii 圖目錄 v 表目錄 vii 第一章 緒論 1 1-1 研究背景 1 1-2 研究目的 3 1-3 研究困難 3 1-4 研究貢獻 4 第二章 基礎理論 5 2-1 Chipyard Framework 5 2-2 Rocket Core 6 2-3 AI accelerator Gemmini 7 2-3-1 Gemmini architecture 7 2-3-2 Gemmini 使用方式 8 2-4 二維卷積運 9 2-4-1 二維卷積基礎 9 2-4-2 Gemmini加速二維卷積運算 9 2-5 深度可分離二維卷積運算 12 2-5-1 深度可分離二維卷積基礎 12 2-5-2 Gemmini加速深度可分離二維卷積運算 12 2-6 神經網路量化 15 2-6-1 量化基礎 15 2-6-2 BRECQ 量化框架 16 第三章 研究方法 17 3-1 Intermediate Representation 19 3-2 Model Generation Function in System 22 3-3 Model Quantization in System 24 3-4 Hardware Inference for AI Model 25 3-4-1 Gemmini Fully Connected Layer 26 3-4-2 Gemmini Convolution-2D Layer 28 3-4-3 Gemmini Depth-wise Convolution-2D Layer 31 第四章 實驗數據與效能分析 33 4-1 實驗環境 33 4-2 操作 GUI 部署模型 35 4-2-1 System Graphical User Interface 35 4-2-2 結合自動光學檢測系統 36 4-2-3 實驗模型架構 37 4-2-4 實驗 SoC 架構 40 4-3 評估方法 42 4-4 模型部署工具效能分析 43 4-4-1 模型一之推理效能 43 4-4-2 模型二之推理效能 45 4-4-3 模型三之推理效能 47 第五章 結論 49 參考文獻 50

    [1] P. Wolinski, J. Arbel, “Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review,” Nov 2023.
    doi:10.48550/arXiv.2311.11883.
    [2] T. Wang, C. Wang, X. Zhou, H. Chen, “An Overview of FPGA Based Deep Learning Accelerators: Challenges and Opportunities,” Oct 2019.
    doi: 10.1109/HPCC/SmartCity/DSS.2019.00229.
    [3] Z. Liu, P. N. Whatmough, “Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference,” May 2020.
    doi: 10.1109/LCA.2020.2979965.
    [4] A. Shahid, M. Mushtaq,” A Survey Comparing Specialized Hardwar And Evolution In TPUs For Neural Networks,” Nov 2020.
    doi: 10.1109/INMIC50486.2020.9318136.
    [5] H. Genc, S. Kim, A. Amid, “Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration,” 2021 58th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2021, pp. 769-774.
    doi: 10.1109/DAC18074.2021.9586216.
    [6] F. Chollet,” Xception: Deep Learning with Depthwise Separable Convolutions,” Apr 2017. doi: 10.1109/CVPR.2017.195.
    [7] A. Amid, D. Biancolin, A. Gonzalez, “Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs,” May 2020.
    doi: 10.1109/MM.2020.2996616.
    [8] J. Bachrach, H. Vo; B. Richards, Y. Lee,”Chisel: Constructing hardware in a Scala embedded language,” June 2012.
    doi: 10.1145/2228360.2228584.
    [9] A. Izraelevitz, J. Koenig, P. Li, “Reusability is FIRRTL ground: Hardware construction languages, compiler frameworks, and transformations,” Nov 2017. doi:10.1109/ICCAD.2017.8203780.
    [10] K. Asanovi´c, R. Aviˇzienis, J. Bachrach, “The Rocket Chip Generator,” April 2016. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-17.html
    [11] A. Rao, “The RoCC Doc v2: An Introdution to the Rocket Custom Coprocessor Interface,” Dec 2016.
    https://pdfcoffee.com/rocc-doc-v2-pdf-free.html
    [12] S. H. Chua, T. H. Teo, M. A. Tiruye,“Systolic Array Based Convolutional Neural Network Inference on FPGA,” Dec 2022. doi:10.1109/MCSoC 57363.2022.00029.
    [13] B. Wang, S. Ma, G. Zhu, “A novel systolic array processor with dynamic dataflows,” Mar 2022. https://doi.org/10.1016/j.vlsi.2022.03.002
    [14] H. Yeh, “Processor Elements And Systolic Arrays,” Nov 1985. doi:10.1109/ACSSC.1985.671416.
    [15] M. Nagel, M. Fournarakis, “A White Paper on Neural Network Quantization,” Jun 2021. doi:10.48550/arXiv.2106.08295.
    [16] M. Nagel, R. A. Amjad, M. Baalen, C. Louizos, “Up or Down? Adaptive Rounding for Post-Training Quantization,” Jun 2020.
    doi: 10.48550/arXiv.2004.10568.
    [17] Y. Li et al., “BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction,” arXiv, Jul. 25, 2021. doi: 10.48550/arXiv.2102.05426.
    [18] R. C. Huang, “Automated Neural Network Design and Deployment Based on AI Hardware Accelerators,” National Taiwan Normal University, July 2023.
    [19] J. H. Koo, “A component layout inspection system based on the heat map marking rule applied to Printed Circuit Boards,” National Taiwan Normal University, July 2022.
    doi: 10.6345/NTNU202201353.

    下載圖示
    QR CODE