簡易檢索 / 詳目顯示

研究生: 蔡佳諭
Tsai, Chia-Yu
論文名稱: 基於RISC-V架構之脈動陣列一維卷積運算研究
Implementation of 1-D Convolution in Systolic Array based on RISC-V Architecture
指導教授: 黃文吉
Hwang, Wen-Jyi
口試委員: 葉佐任
Yeh, Tso-Zen
鮑興國
Pao, Hsing-Kuo
黃文吉
Hwang, Wen-Jyi
口試日期: 2022/07/27
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 69
中文關鍵詞: 深度學習加速器一維卷積運算
英文關鍵詞: Gemmini, RISC-V, Systolic Array
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202201366
論文種類: 學術論文
相關次數: 點閱:78下載:18
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現有Edge端裝置由於產品定位原因,多數運算能力不足以應付AI模型應用程式,也因此裝置搭配硬體AI加速器,來使其足夠運算AI模型的方式成為此困境的解決方法之一。
    本論文研究基於RISC-V架構下的硬體AI加速器平台Gemmini,透過RISC-V中的custom指令為基礎,設計可利用加速器進行運算的一維卷積運算程式,使得此加速器平台能廣泛應用於類神經網路中。
    本論文將設計的程式執行於包含Gemmini平台的FPGA上,以Clock Cycles作為運算速度依據,比較模型運算時使用加速器與否的差別,以及直接使用Gemmini,與重排資料後再使用Gemmini執行一維卷積運算的速度差距,藉由此兩種比較,驗證Gemmini的加速效果及直接使用其運算1-D CNN的可行性。

    誌謝 i 摘要 ii 目錄 iii 圖目錄 v 表目錄 vii 第1章 論文背景及目的 1 1-1 研究背景 1 1-2 研究困難 3 1-3 研究目的 4 1-4 研究貢獻 5 第2章 基礎理論 6 2-1 Chipyard簡介 7 2-2 Rocket Chip 9 2-3 Gemmini 10 2-4 脈動陣列之權重固定運算流程 12 2-5 一維卷積運算原理 18 2-6 通用矩陣乘法 19 第3章 實作方法 21 3-1 Gemmini執行運算過程 21 3-2 Systolic Array結合一維卷積運算 23 3-2-1 一維卷積運算式 24 3-2-2 一維卷積運算程式 27 3-2-3 Systolic Array運算不同型態資料 29 3-3 一維卷積運算方式差異比較 50 第4章 實驗結果與效能分析 51 4-1 實驗環境 51 4-2 實驗設計 53 4-2-1 模型辨識對象 54 4-2-2 AI辨識模型 55 4-2-3 實驗運行時間 60 4-3 實驗結果及效能分析 61 第5章 結論 65 參考文獻 66

    [1] H. Genc, S. Kim, A. Amid, A. Haj-Ali, V. Iyer, P. Prakash, J. Zhao, D. Grubb, H. Liew, H. Mao, A. Ou, C. Schmidt, S. Steffl, J. Wright, I. Stoica, J. Ragan-Kelley, K. Asanovic, B. Nikolic, Y. Sophia Shao, "Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration." 2021 58th ACM/IEEE Design Automation Conference (DAC), 2021, pp. 769-774, doi: 10.1109/DAC18074.2021.9586216.

    [2] G. Zhou, J. Zhou and H. Lin, "Research on NVIDIA Deep Learning Accelerator." 2018 12th IEEE International Conference on Anti-counterfeiting, Security, and Identification (ASID), 2018, pp. 192-195, doi: 10.1109/ICASID.2018.8693202.

    [3] A. Amid, D. Biancolin, A. Gonzalez, D. Grubb, S. Karandikar, H. Liew, A. Magyar, H. Mao, A. Ou, N. Pemberton, P. Rigge, C. Schmidt, J. Wright, J. Zhao, Y. S. Shao, K. Asanović, B. Nikolić, "Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs." in IEEE Micro, 2020, vol. 40, no. 4, pp. 10-21, doi: 10.1109/MM.2020.2996616.

    [4] K. Asanović, R. Avizienis, J. Bachrach, S. Beamer, D. Biancolin, C. Celio, H. Cook, D. Dabbelt, J. Hauser, A. Izraelevitz, S. Karandikar, B. Keller, D. Kim, J. Koenig, Y. Lee, E. Love, M. Maas, A. Magyar, H. Mao, M. Moreto, A. Ou, D. A. Patterson, B. Richards, C. Schmidt, S. Twigg, H. Vo, A. Waterman, "The rocket chip generator." EECS Department, University of California, Berkeley, 2016, Tech. Rep. UCB/EECS-2016-17 4.

    [5] Y. Lee, C. Schmidt, A. Ou, A. Waterman, K. Asanović, “The Hwacha vector-fetch architecture manual.” EECS Department, University of California, Berkeley, 2015, Tech. Rep. UCB/EECS-2015-262.

    [6] C. Schmidt, A. Izraelevitz, “A fast parameterized sha3 accelerator.” EECS Department, University of California, Berkeley, 2015, Tech. Rep. UCB/EECS-2015-204.

    [7] IceNet. Chipyard main documentation, Accessed on July, 20, 2022
    https://chipyard.readthedocs.io/en/latest/Generators/IceNet.html

    [8] SiFive Generators. Chipyard main documentation, Accessed on July, 20, 2022
    https://chipyard.readthedocs.io/en/latest/Generators/SiFive-Generators.html?highlight=sifive%20block#sifive-generators

    [9] J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, K. Asanović, "Chisel: Constructing hardware in a Scala embedded language." DAC Design Automation Conference 2012, 2012, pp. 1212-1221, doi: 10.1145/2228360.2228584.

    [10] A. Izraelevitz, J. Koenig, P. Li, R. Lin, A. Wang, A. Magyar, D. Kim, C. Schmidt, C. Markley, J. Lawson, J. Bachrach, "Reusability is FIRRTL ground: Hardware construction languages, compiler frameworks, and transformations." 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2017, pp. 209-216, doi: 10.1109/ICCAD.2017.8203780.

    [11] N. Pemberton and A. Amid, "FireMarshal: Making HW/SW Co-Design Reproducible and Reliable." 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2021, pp. 299-309, doi: 10.1109/ISPASS51385.2021.00052.

    [12] S. Karandikar, H. Mao, D. Kim, D. Biancolin, A. Amid, D. Lee, N. Pemberton, E. Amaro, C. Schmidt, A. Chopra, Q. Huang, K. Kovacs, B. Nikolic, R. Katz, J. Bachrach, K. Asanovic, "FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud." 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), 2018, pp. 29-42, doi: 10.1109/ISCA.2018.00014.
    [13] W. Snyder, "Verilator and systemperl. " In: North American SystemC Users’ Group, Design Automation Conference. 2004.

    [14] VCS. Synopsys documentation, Accessed on July, 20, 2022
    https://www.synopsys.com/verification/simulation/vcs.html

    [15] E. Wang, C. Schmidt, A. Izraelevitz, J. Wright, B. Nikolić, E. Alon, J. Bachrach, "A Methodology for Reusable Physical Design." 2020 21st International Symposium on Quality Electronic Design (ISQED), 2020, pp. 243-249, doi: 10.1109/ISQED48828.2020.9136999.

    [16] S. Kung, "VLSI Array processors." in IEEE ASSP Magazine, 1985, vol. 2, no. 3, pp. 4-22, doi: 10.1109/MASSP.1985.1163741.

    [17] Y. -H. Chen, J. Emer and V. Sze, "Using Dataflow to Optimize Energy Efficiency of Deep Neural Network Accelerators." in IEEE Micro, 2017, vol. 37, no. 3, pp. 12-21, doi: 10.1109/MM.2017.54.

    [18] T. Ince, S. Kiranyaz, L. Eren, M. Askar and M. Gabbouj, "Real-Time Motor Fault Detection by 1-D Convolutional Neural Networks." in IEEE Transactions on Industrial Electronics, 2016, vol. 63, no. 11, pp. 7067-7075, doi: 10.1109/TIE.2016.2582729.

    [19] A. V. Trusov, E. E. Limonova, D. P. Nikolaev and V. V. Arlazarov, "p-im2col: Simple Yet Efficient Convolution Algorithm With Flexibly Controlled Memory Overhead." in IEEE Access, 2021, vol. 9, pp. 168162-168184, doi: 10.1109/ACCESS.2021.3135690.

    [20] 鄭博升, "以矩陣乘法為基礎應用硬體加速器於一維卷積計算之研究", 國立臺灣師範大學資訊工程研究所碩士論文, 2022

    [21] 黃維熙, "以Chipyard為基礎的SoC設計平台FPGA實現之研究", 國立臺灣師範大學資訊工程研究所碩士論文, 2022

    [22] Y. Chu, Y. Jhang, T. Tai, W. Hwang, "Recognition of Hand Gesture Sequences by Accelerometers and Gyroscopes. " Applied Sciences, 2020, vol. 10, no. 18, pp. 6507.

    [23] Y. Jhang, Y. Chu, T. Tai, W. Hwang, P. Cheng, C. Lee, "Sensor Based Dynamic Hand Gesture Recognition by PairNet." 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2019, pp. 994-1001, doi: 10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00174.

    下載圖示
    QR CODE