研究生: |
余松恬 Yu, Song-Tien |
---|---|
論文名稱: |
通用型脈動陣列 AI 加速器:評估適用性與效能研究 A Study on the Applicability and Performance Evaluation of a General-Purpose Systolic Array AI Accelerator |
指導教授: |
黃文吉
Hwang, Wen-Jyi |
口試委員: |
葉佐任
Yeh, Tso-Zen 董一志 Tung, Yi-Chih 黃文吉 Hwang, Wen-Jyi |
口試日期: | 2023/07/17 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 51 |
中文關鍵詞: | 脈動陣列硬體加速器 、邊緣運算 、神經網路模型 |
英文關鍵詞: | Gemmini, RISC-V |
DOI URL: | http://doi.org/10.6345/NTNU202301110 |
論文種類: | 學術論文 |
相關次數: | 點閱:199 下載:30 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文旨在評估通用型脈動陣列 AI 硬體加速器在不同類型神經網路模型上的適用性及效能。隨著深度學習在邊緣運算中的廣泛應用,硬體加速器的設計成為提升邊緣運算效率的關鍵。然而,為每種類神經網路配置專用的硬體加速器並不切實際,若硬體加速器配置需要隨著模型架構的不同而頻繁改變,將是高昂成本負擔。
本論文提出一套通用型 AI 脈動陣列硬體加速器的配置,目的是解決類神經網路應用中硬體適配的問題,使單一硬體加速器能夠適用於多種不同類神經網路架構,並建立了一個基於 RISC-V 核心且與通用型 AI 硬體加速器做整合之SoC 架構平台,實作於 FPGA 板,該 SoC架構提供一個真實情況的評估平台。
本論文選用 Gemmini 作為通用型脈動陣列 AI 硬體加速器的代表,在不同的硬體配置下,針對兩種具代表性的類神經網路模型進行實驗,分別是基於二維卷積神經網路的影像元件辨識模型以及基於一維卷積的手勢辨識模型。本研究會結合效能評估並衡量 FPGA 硬體資源使用量,提出合適的通用型脈動陣列加速器硬體配置選用方案,供 AI 領域研究者參考。
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep Learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015, doi: 10.1038/nature14539.
[2] Y. Chen et al., “A Survey of Accelerator Architectures for Deep Neural Networks,” Engineering, vol. 6, no. 3, pp. 264–274, Mar. 2020, doi: 10.1016/j.eng.2020.01.007.
[3] A. Gonzalez and C. Hong, "A chipyard comparison of NVDLA and Gemmini", 2020
[4] H. Genc et al., "Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration," 2021 58th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2021, pp. 769-774, doi: 10.1109/DAC18074.2021.9586216.
[5] N. P. Jouppi et al., "In-datacenter performance analysis of a tensor processing unit," 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, Canada, 2017, pp. 1-12, doi: 10.1145/3079856.3080246.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, May 2012, doi: 10.1145/3065386.
[7] P. M. Kogge and H. S. Stone, “A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations,” IEEE Transactions on Computers, vol. C–22, no. 8, pp. 786–793, Aug. 1973, doi: 10.1109/tc.1973.5009159.
[8] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta and A. A. Bharath, "Generative Adversarial Networks: An Overview," in IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 53-65, Jan. 2018, doi: 10.1109/MSP.2017.2765202.
[9] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, doi: 10.1162/neco.1997.9.8.1735.
[10] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, 2017, pp. 5998-6008.
[11] A. Amid et al., “Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs,” IEEE Micro, vol. 40, no. 4, pp. 10–21, Jul. 2020, doi: 10.1109/mm.2020.2996616.
[12] K. Asanović et al., “The rocket chip generator,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2016-17 4, 2016.
[13] A. Waterman and K. Asanović, “The RISC-V Instruction Set Manual: Volume I: Unprivileged ISA. ” SiFive Inc. and University of California, Berkeley, 2019.
[14] Y. Lee et al., “The Hwacha vector-fetch architecture manual,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2015-262, 2015.
[15] IceNet — Chipyard 1.9.0 documentation, accessed on July 9, 2023
https://chipyard.readthedocs.io/en/latest/Generators/IceNet.html
[16] SiFive Generators — Chipyard 1.9.0 documentation, accessed on July 9, 2023
https://chipyard.readthedocs.io/en/latest/Generators/SiFive-Generators.html
[17] J. Bachrach et al., “Chisel,” Proceedings of the 49th Annual Design Automation Conference on - DAC ’12, 2012, doi: 10.1145/2228360.2228584.
[18] A. Izraelevitz et al., “Reusability is FIRRTL ground: Hardware construction languages, compiler frameworks, and transformations,” Nov. 2017, doi: 10.1109/iccad.2017.8203780.
[19] Verilator — Chipyard 1.9.0 documentation, accessed on July 9, 2023 https://chipyard.readthedocs.io/en/main/Simulation/Software-RTL-Simulation.html
[20] VCS Functional Verification Solution, accessed on July 9, 2023 https://www.synopsys.com/verification/simulation/vcs.html
[21] S. Karandikar et al., “FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud,” 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 29-42, 2018.
[22] R. D. Schreiber, “SYSTOLIC ARRAYS: HIGH PERFORMANCE PARALLEL MACHINES FOR MATRIX COMPUTATION,” Jan. 1984, doi: 10.1016/b978-0-12-100560-3.50019-6.
[23] U. S. Solangi, M. Ibtesam, M. A. Ansari, J. Kim, and S. Park, “Test Architecture for Systolic Array of Edge-Based AI Accelerator,” IEEE Access, vol. 9, pp. 96700–96710, 2021, doi: 10.1109/access.2021.3094741.
[24] H. Genc et al., "Gemmini: An agile systolic array generator enabling systematic evaluations of deep-learning architectures", arXiv:1911.09925, 2019.
[25] D. A. N. Gookyi, E. Lee, K. Kim, S. -J. Jang and S. -S. Lee, "Exploring GEMM Operations on Different Configurations of the Gemmini Accelerator," 2022 19th International SoC Design Conference (ISOCC), Gangneung-si, Korea, Republic of, 2022, pp. 356-357, doi: 10.1109/ISOCC56007.2022.10031536.
[26] J. H. Koo, “A component layout inspection system based on the heat map marking rule applied to Printed Circuit Boards”, National Taiwan Normal University, July. 2022, doi: 10.6345/NTNU202201353.
[27] H. K. Chang, “Real-time gesture recognition system based on CenterNet algorithm and digital flex sensor”, National Taiwan Normal University, Aug. 2021, doi: 10.6345/NTNU202101300.
[28] Y. Li et al., “BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction,” Feb. 2021, doi: 10.48550/arxiv.2102.05426.
[29] P. S. Cheng, “Matrix multiplication based 1-D convolution with hardware accelerator”, National Taiwan Normal University, doi: 10.6345/NTNU202201331.