研究生: |
劉怡汎 Liu, Yi-Fan |
---|---|
論文名稱: |
點格棋中小盤面模型取代大盤面模型訓練之可行性研究 Feasibility Study on Replacing Large Board Model with Small Board Model in Dots and Boxes |
指導教授: |
林順喜
Lin, Shun-Shii |
口試委員: |
林順喜
Lin, Shun-Shii 吳毅成 Wu, I-Chen 顏士淨 Yen, Shi-Jim 陳志昌 Chen, Jr-Chang 周信宏 Chou, Hsin-Hung |
口試日期: | 2024/07/01 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 76 |
中文關鍵詞: | 點格棋 、AlphaGo Zero 、AlphaZero 、AlphaZero General |
英文關鍵詞: | Dots and Boxes, AlphaGo Zero, AlphaZero, AlphaZero General |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202400836 |
論文種類: | 學術論文 |
相關次數: | 點閱:145 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
點格棋(Dots and Boxes)是款零和、完全資訊並公正的雙人遊戲,雖然棋盤小卻有較高的複雜度。本論文以3×3盤面的點格棋作為研究主題,實現訓練好的小盤面的AlphaZero神經網路模型取代大盤面的AlphaZero神經網路模型。
在實作上,我們採用基於AlphaGo Zero論文實現的AlphaZero General的開源框架專案,透過方便理解的Python開源專案,讓使用者可以輕鬆的在AlphaGo Zero的架構上實作遊戲及訓練神經網路,省去從頭開始開發的成本,能較專注於其他研究中。
從實驗結果可以得知,在1天、2天及3天的訓練神經網路時間下,3×3盤面AlphaZero General代理人以平均處理合併policy的方式,在與相同訓練時間的4×4盤面AlphaZero General代理人的對戰中,分別取得64%、58%、57%的勝率。因此在訓練時間限制3天的情況下,可以使用訓練好的小盤面的AlphaZero神經網路模型取代大盤面的AlphaZero神經網路模型。
Dots and Boxes is a zero-sum, perfect information, and impartial two-player game. Despite its small board size, it exhibits high game complexity. This study focuses on the 3×3 board of the game and employs the AlphaZero neural network model adapted for smaller boards, replacing the model originally designed for larger boards.
For implementation, we utilized the AlphaZero General open-source framework, which is based on the principles outlined in the AlphaGo Zero paper. This Python-based framework facilitates straightforward game implementation and neural network training, following the AlphaGo Zero architecture. By leveraging this existing framework, we reduced the development costs and could allocate resources to other research areas.
Experimental results demonstrate that, across various training durations (1 day, 2 days, and 3 days), the 3×3 board AlphaZero General agent, employing average processing to merge policy, outperforms its 4×4 board AlphaZero General agent. It achieved respective winning rates of 64%, 58%, and 57%. Therefore, within a limited training timeframe of 3 days, the compact AlphaZero neural network model proves effective in substituting the larger-capacity model originally used.
[1] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, pp. 484-489, 2016.
[2] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, et al., "Mastering the game of Go without human knowledge," Nature, vol. 550, pp. 354-359, 2017.
[3] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, et al., "Mastering Chess and Shogi by self-play with a general reinforcement learning algorithm," ArXiv, vol. abs/1712.01815, 2017.
[4] É. Lucas, H. A. Delannoy, C. A. Laisant, and É. M. H. Lemoine, L'arithmétique amusante: Gauthier-Villars et fils, 1895.
[5] S. Krebbers, "Monte carlo tree search for dots-and-boxes." , Leiden University Institute of Advanced Computer Science Bachelor Thesis, 2021.
[6] L. Weaver and T. Bossomaier, "Evolution of neural networks to play the game of dots-and-boxes," arXiv preprint cs/9809111, 1998.
[7] E. R. Berlekamp, The dots and boxes game: sophisticated child's play: CRC Press, 2000.
[8] J. Barker and R. Korf, "Solving dots-and-boxes," in Proceedings of the AAAI Conference on Artificial Intelligence, pp. 414-419, 2012.
[9] Y. Zhuang, S. Li, T. V. Peters, and C. Zhang, "Improving Monte-carlo tree search for dots-and-boxes with a novel board representation and artificial neural networks," in 2015 IEEE Conference on Computational Intelligence and Games (CIG), pp. 314-321, 2015.
[10] L. Zhang, Y. Zhang, P. Liu, and L. Guo, "The research to construct dots and boxes battle platform in computer game," in The 27th Chinese Control and Decision Conference (2015 CCDC), pp. 3749-3753, 2015.
[11] J. Lu and H. Yin, "Using heuristic solver to optimize Monte Carlo tree search in dots-and-boxes," in 2016 Chinese Control and Decision Conference (CCDC), pp. 4288-4291, 2016.
[12] Y. Zhang, S. Li, and X. Xiong, "A study on the game system of dots and boxes based on reinforcement learning," in 2019 Chinese Control and Decision Conference (CCDC), pp. 6319-6322, 2019.
[13] P. Agrawal, "Parallelization of the Monte Carlo tree search approach in dots-and-boxes.", Western Kentucky University Faculty of Engineering and Applied Sciences, 2019.
[14] S. Li, Y. Zhang, M. Ding, and P. Dai, "Research on integrated computer game algorithm for dots and boxes," The Journal of Engineering, vol. 2020, pp. 601-606, 2020.
[15] D. Allcock, "Best play in dots and boxes endgames," International Journal of Game Theory, vol. 50, pp. 671-693, 2021.
[16] A. Cotarelo, V. García-Díaz, E. R. Núñez-Valdez, C. González García, A. Gómez, and J. Chun-Wei Lin, "Improving monte carlo tree search with artificial neural networks without heuristics," Applied Sciences, vol. 11(5), 2056, 2021.
[17] A. Pandey, "Solving dots & boxes using reinforcement learning," Colorado State University, 2022.
[18] 曾羭豪,"基於AlphaZero General與MuZero General框架實現點格棋",臺灣師範大學資訊工程學系碩士論文,2023。
[19] Y.-C. Chen, C.-H. Chen, and S.-S. Lin, "Exact-win strategy for overcoming AlphaZero," in Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems, pp. 26-31, 2018.
[20] 陳彥吉,"蒙地卡羅樹搜索法的必贏策略以及快速 Nonogram 解題程式的實作",臺灣師範大學資訊工程學系碩士論文,2019。
[21] S. Thakoor, S. Nair, and M. Jhunjhunwala, "Learning to play Othello without human knowledge," Stanford University CS238 Final Project Report, vol. 204, 2017. https://github.com/suragnair/alpha-zero-general (accessed Jul. 9, 2024).