簡易檢索 / 詳目顯示

研究生: 曾羭豪
Tseng, Yu-Hao
論文名稱: 基於AlphaZero General與MuZero General框架實現點格棋
On Implementing Dots and Boxes Based on AlphaZero General and MuZero General Frameworks
指導教授: 林順喜
Lin, Shun-Shii
口試委員: 林順喜
Lin, Shun-Shii
吳毅成
Wu, I-Chen
顏士淨
Yen, Shi-Jim
陳志昌
Chen, Jr-Chang
周信宏
Chou, Hsin-Hung
口試日期: 2023/05/03
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 104
中文關鍵詞: 點格棋
英文關鍵詞: Dots and Boxes, AlphaGo Zero, AlphaZero, MuZero
研究方法: 實驗設計法主題分析比較研究
DOI URL: http://doi.org/10.6345/NTNU202300500
論文種類: 學術論文
相關次數: 點閱:118下載:13
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 點格棋(Dots and Boxes)是一款雙人、公正、零和與完全資訊的遊戲,儘管棋盤很小就有很高的複雜度。本論文以3×3盤面大小的點格棋作為課題,實現於AlphaGo Zero、MuZero架構上,並且還提出了適用於連續走步棋規的Exact-win策略實現於點格棋上,並運用於AlphaGo Zero的訓練與對弈上。
    在實作上,我們採用AlphaZero General與MuZero General兩個開源碼,分別是基於AlphaGo Zero與MuZero的論文實現。兩者皆是易於理解的Python開源專案,透過簡潔的架構幫助使用者輕鬆的能在AlphaGo Zero與MuZero的架構上實現遊戲並訓練,省去了從頭開始架構AlphaGo Zero與MuZero的工作,能更專注於相關研究。
    從實驗結果驗證,我們實現的AlphaZero General、Exact-win與MuZero General代理人,在與破解程式對手的對弈中,分別取得了98%、100%與32%的勝率。此外,還證明了Exact-win策略用於訓練階段能有效提升訓練速度與成效,以及訓練後期代理人棋力穩定度。透過一些盤面測試,證實了這些代理人在一些盤面上確實能搜索出最佳走步並且執行。

    Dots and Boxes is a two-player, impartial, zero-sum and perfect information game. In this thesis, Dots and Boxes with 3×3 board size is taken as the subject, and is implemented on the AlphaGo Zero and MuZero frameworks. The Exact-win strategy for the consecutive moves rule is proposed and implemented on Dots and Boxes, and is also applied in training and playing of the AlphaGo Zero framework.
    In practice, we use two open source codes, AlphaZero General and MuZero General, which are based on the original papers of AlphaGo Zero and MuZero respectively. Through these codes, we can easily implement and train the gaming program based on the structure of AlphaGo Zero and MuZero. In addition, we can save the effort of building AlphaGo Zero and MuZero from scratch, and pay more attention in related research topics.
    The experimental results show that the AlphaZero General, Exact-win and MuZero General agents we implemented achieved winning rates of 98%, 100% and 32% respectively in games against the solver program opponents. In addition, it is also proved that the Exact-win strategy used in the training stage can effectively improve the training speed and effectiveness, as well as the stability of the agent's strength in the later stage of training. Through some board tests, it is confirmed that these agents can indeed find the best moves and execute them on those boards.

    摘要 ii ABSTRACT iii 誌謝 iv 目錄 v 圖目錄 viii 表目錄 xii 第一章 緒論 1 第二章 點格棋Dots and Boxes 3 2.1 遊戲規則 5 2.2 遊戲特定術語 7 2.3 常見策略 12 第三章 文獻探討 15 3.1 點格棋 15 3.2 AlphaGo Zero 19 3.3 AlphaZero 22 3.4 MuZero 23 3.5 ResNet 25 3.6 Exact-win 26 第四章 方法與實現 32 4.1 AlphaZero General 32 4.2 基於AlphaZero General實現點格棋 36 4.3 實現點格棋的Exact-win策略 40 4.4 MuZero General 45 4.5 基於MuZero General實現點格棋 48 4.6 遊戲盤面與神經網路輸入 52 4.7 神經網路架構 56 第五章 實驗與結果 61 5.1 設備與參數設置 61 5.2 比較對象 63 5.3 綜合比較 69 5.4 盤面測試 74 5.4.1 測試盤面一 75 5.4.2 測試盤面二 78 5.4.3 測試盤面三 82 5.4.4 測試盤面四 86 5.4.5 測試盤面五 90 第六章 結論及未來方向 94 參考文獻 96

    [1] F. Hsu, Behind Deep Blue: Building the Computer that Defeated the World Chess Champion. Princeton University Press, 2002.
    [2] D. Silver et al., “Mastering the Game of Go with Deep Neural Networks and Tree Search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016, doi: 10.1038/nature16961.
    [3] D. Silver et al., “Mastering the Game of Go without Human Knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, Oct. 2017, doi: 10.1038/nature24270.
    [4] D. Silver et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” Dec. 2017, doi: 10.48550/arxiv.1712.01815.
    [5] J. Schrittwieser et al., “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model,” Nature, vol. 588, no. 7839, pp. 604–609, Nov. 2019, doi: 10.1038/s41586-020-03051-4.
    [6] É. Lucas, C.-A. Laisant, and É. M. H. Lemoine, L’arithmétique amusante. 1895.
    [7] A. Lin, “Playing AI Dots-And-Boxes Using Endgame Theory and Nimstring Sprague-Grundy Values,” Proceedings - 25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020, pp. 211–216, Dec. 2020, doi: 10.1109/TAAI51410.2020.00046.
    [8] “ICGA | International Computer Games Association.” https://icga.org/ (accessed Oct. 06, 2022).
    [9] E. R. Berlekamp, J. H. Conway, and R. K. Guy, Winning Ways for Your Mathematical Plays, vol. 4, A K Peters/CRC Press, 2004. doi: 10.1201/9780429487309.
    [10] J. K. Barker and R. E. Korf, “Solving Dots-And-Boxes,” Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012, doi: 10.1609/AAAI.V26I1.8144.
    [11] R. B. Myerson, Game Theory, Analysis of Conflict. Harvard University Press, 1997.
    [12] S. Krebbers, “Monte Carlo Tree Search for Dots-and-Boxes,” 2021.
    [13] L. Weaver and T. Bossomaier, “Evolution of Neural Networks to Play the Game of Dots-and-Boxes,” Sep. 1998, doi: 10.48550/arxiv.cs/9809111.
    [14] E. R. Berlekamp, The Dots-and-Boxes Game : Sophisticated Child’s Play, 1st ed. A K Peters/CRC Press, 2000.
    [15] D. Wilson, “Dots-and-Boxes Analysis Index,” Apr. 30, 2002. https://wilson.engr.wisc.edu/boxes/index.shtml (accessed Sep. 22, 2022).
    [16] K. Thompson, “Retrograde Analysis of Certain Endgames,” ICCA Journal, vol. 9, no. 3, pp. 131–139, 1986.
    [17] Y. Zhuang, S. Li, T. V. Peters, and C. Zhang, “Improving Monte-Carlo Tree Search for Dots-and-Boxes with a Novel Board Representation and Artificial Neural Networks,” 2015 IEEE Conference on Computational Intelligence and Games, CIG 2015 - Proceedings, pp. 314–321, Nov. 2015, doi: 10.1109/CIG.2015.7317912.
    [18] E. W. Weisstein, “Hyperbolic Tangent,” MathWorld, 1999.
    [19] Y. Zhuang, S. Li, T. V. Peters, and C. Zhang, “QDaB.” http://dotsandboxes.tar.xyz/ (accessed Sep. 22, 2022).
    [20] J.P. Grossman, “Dabble.” https://www.mathstat.dal.ca/~jpg/dabble/ (accessed Sep. 22, 2022).
    [21] Paul R. Stevens, “PRsBoxes.” http://www.dianneandpaul.net/PRsBoxes/ (accessed Sep. 22, 2022).
    [22] C. Zhang, S. Li, and X. Yuan, “Technique Analysis of Dots-and-Boxes,” 2015, doi: 10.2991/ICECTT-15.2015.87.
    [23] J. Lu and H. Yin, “Using Heuristic Solver to Optimize Monte Carlo Tree Search in Dots-and-Boxes,” Proceedings of the 28th Chinese Control and Decision Conference, CCDC 2016, pp. 4288–4291, Aug. 2016, doi: 10.1109/CCDC.2016.7531736.
    [24] Y. Zhang, S. Li, and X. Xiong, “A Study on the Game System of Dots and Boxes Based on Reinforcement Learning,” Proceedings of the 31st Chinese Control and Decision Conference, CCDC 2019, pp. 6319–6322, Jun. 2019, doi: 10.1109/CCDC.2019.8833043.
    [25] P. Agrawal, “Parallelization of the Monte Carlo Tree Search Approach in Dots-and-Boxes,” 2019. Accessed: Sep. 22, 2022. [Online]. Available: https://pranayagra.github.io/pdf/ParallelizedMCTS.pdf.
    [26] P. Agrawal and U. Ziegler, “Performance of the Parallelized Monte-Carlo Tree Search Approach for Dots and Boxes,” Western Kentucky University, 2018. Accessed: Apr. 04, 2023. [Online]. Available: https://digitalcommons.murraystate.edu/postersatthecapitol/2019/WKU/7.
    [27] N. Sephton, P. I. Cowling, E. Powley, D. Whitehouse, and N. H. Slaven, “Parallelization of Information Set Monte Carlo Tree Search,” Proceedings of the 2014 IEEE Congress on Evolutionary Computation, CEC 2014, pp. 2290–2297, Sep. 2014, doi: 10.1109/CEC.2014.6900583.
    [28] S. Li, Y. Zhang, M. Ding, and P. Dai, “Research on Integrated Computer Game Algorithm for Dots and Boxes,” The Journal of Engineering, vol. 2020, no. 13, pp. 601–606, Jul. 2020, doi: 10.1049/JOE.2019.1185.
    [29] D. Allcock, “Best Play in Dots and Boxes Endgames,” International Journal of Game Theory, vol. 50, no. 3, pp. 671–693, Sep. 2021, doi: 10.1007/S00182-020-00730-4/FIGURES/2.
    [30] A. Cotarelo, V. García-Díaz, E. R. Núñez-Valdez, C. G. García, A. Gómez, and J. C.-W. Lin, “Improving Monte Carlo Tree Search with Artificial Neural Networks without Heuristics,” Applied Sciences 2021, Vol. 11, Page 2056, vol. 11, no. 5, p. 2056, Feb. 2021, doi: 10.3390/APP11052056.
    [31] A. Pandey, “Solving Dots & Boxes Using Reinforcement Learning,” Colorado State University, 2022.
    [32] R. Coulom, “Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength,” International Conference on Computers and Games, pp. 113–124, 2008.
    [33] T. Romstad, M. Costalba, and J. Kiiski, “Stockfish - Open Source Chess Engine.” https://stockfishchess.org/ (accessed Oct. 15, 2022).
    [34] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” 32nd International Conference on Machine Learning, ICML 2015, vol. 1, pp. 448–456, Feb. 2015, doi: 10.48550/arxiv.1502.03167.
    [35] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-December, pp. 770–778, Dec. 2016, doi: 10.1109/CVPR.2016.90.
    [36] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9908 LNCS, pp. 630–645, Mar. 2016, doi: 10.48550/arxiv.1603.05027.
    [37] Y. C. Chen, C. H. Chen, and S. S. Lin, “Exact-win Strategy for Overcoming AlphaZero,” Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems, Nov. 2018, doi: 10.1145/3293475.3293486.
    [38] Y.-C. Chen and S.-S. Lin, “Exact-win Strategy for Monte Carlo Tree Search and the Implementation of a Fast Nonogram Solver,” National Taiwan Normal University, 2019.
    [39] C.-H. Chen and S.-S. Lin, “Improving the AlphaZero Algorithm in the Playing and Training Phases,” National Taiwan Normal University, 2021.
    [40] C.-H. Chen, W.-L. Wu, Y.-H. Chen, and S.-S. Lin, “Some Improvements in Monte Carlo Tree Search Algorithms for Sudden Death Games,” ICGA Journal, vol. 40, no. 4, pp. 460–470, Jan. 2018, doi: 10.3233/ICG-180065.
    [41] S. Thakoor, S. Nair, and M. Jhunjhunwala, “Learning to Play Othello without Human Knowledge,” Stanford University, Final Project Report, 2016. https://github.com/suragnair/alpha-zero-general (accessed Sep. 29, 2022).
    [42] A. Paszke et al., “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” Adv Neural Inf Process Syst, vol. 32, 2019.
    [43] N.-Y. Chang and S.-S. Lin, “The Big Win Strategy: An Improvement over AlphaZero Approach for Othello,” National Taiwan Normal University, Taipei, 2019.
    [44] N.-Y. Chang, C.-H. Chen, S.-S. Lin, and S. Nair, “The Big Win Strategy on Multi-Value Network: An Improvement over AlphaZero Approach for 6x6 Othello,” Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence, pp. 78–81, 2018, doi: 10.1145/3278312.3278325.
    [45] C.-P. Wang and S.-S. Lin, “Hex Game Implement and Reinforcement Learning,” National Taiwan Normal University, Taipei, 2019.
    [46] T.-Y. Wu and S.-S. Lin, “On Implementing Breakthrough Game Based on AlphaZero General Framework,” National Taiwan Normal University, Taipei, 2019.
    [47] P.-Y. Chen and S.-S. Lin, “Implement and Improve a MiniShogi Program Using the AlphaZero Framework,” National Taiwan Normal University, Taipei, 2020.
    [48] H.-H. Liu and S.-S. Lin, “AlphaZero Algorithm Combined with Quick Win or Threat Space for Gomoku,” National Taiwan Normal University, Taipei, 2020.
    [49] Y.-H. Chien and S.-S. Lin, “The Development and Research of a Draughts Program Based on AlphaZero Approach,” National Taiwan Normal University, Taipei, 2020.
    [50] Y.-H. Tseng, W.-Z. Jie, and S.-S. Lin, “基於 alpha-zero-general 實現 Dots and Boxes 遊戲,” in The Conference of Taiwan Computer Game Association (TCGA), 2021.
    [51] W. Duvaud and A. Hainaut, “MuZero General: Open Reimplementation of MuZero,” GitHub, 2019. https://github.com/werner-duvaud/muzero-general (accessed Sep. 29, 2022).
    [52] “MuZero pseudocode.” https://arxiv.org/src/1911.08265v2/anc/pseudocode.py (accessed Nov. 20, 2022).
    [53] G. Brockman et al., “OpenAI Gym,” Jun. 2016, Accessed: Nov. 20, 2022. [Online]. Available: https://github.com/openai/gym.
    [54] Y. Jao and S.-S. Lin, “Combining MuZero Algorithm with Consecutive Winning Moves to Improve the Outer-Open Gomoku Program,” National Taiwan Normal University, Taipei, 2022.
    [55] P. Moritz et al., “Ray: A Distributed Framework for Emerging AI Applications,” Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, pp. 561–577, Dec. 2017, doi: 10.48550/arxiv.1712.05889.
    [56] “tensorflow/tensorboard: TensorFlow’s Visualization Toolkit.” https://github.com/tensorflow/tensorboard (accessed Nov. 20, 2022).
    [57] R. Coulom, “Bayesian Elo Rating.” https://www.remi-coulom.fr/Bayesian-Elo/?fbclid=IwAR2pAYzo5CHywcdpFfRrh_VjNhAvajq7JzeTJow-zb99_IKH36sNfnl5Rlw (accessed Dec. 03, 2022).
    [58] “Little Golem.” https://littlegolem.net/jsp/main/ (accessed Sep. 29, 2022).
    [59] M. Jaderberg et al., “Population Based Training of Neural Networks,” Nov. 2017, doi: 10.48550/arxiv.1711.09846.
    [60] I. Danihelka, A. Guez, J. Schrittwieser, and D. Silver, “Policy Improvement by Planning with Gumbel,” ICLR, 2022.

    下載圖示
    QR CODE