簡易檢索 / 詳目顯示

研究生: 劉怡汎
Liu, Yi-Fan
論文名稱: 點格棋中小盤面模型取代大盤面模型訓練之可行性研究
Feasibility Study on Replacing Large Board Model with Small Board Model in Dots and Boxes
指導教授: 林順喜
Lin, Shun-Shii
口試委員: 林順喜
Lin, Shun-Shii
吳毅成
Wu, I-Chen
顏士淨
Yen, Shi-Jim
陳志昌
Chen, Jr-Chang
周信宏
Chou, Hsin-Hung
口試日期: 2024/07/01
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 76
中文關鍵詞: 點格棋AlphaGo ZeroAlphaZeroAlphaZero General
英文關鍵詞: Dots and Boxes, AlphaGo Zero, AlphaZero, AlphaZero General
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202400836
論文種類: 學術論文
相關次數: 點閱:101下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 點格棋(Dots and Boxes)是款零和、完全資訊並公正的雙人遊戲,雖然棋盤小卻有較高的複雜度。本論文以3×3盤面的點格棋作為研究主題,實現訓練好的小盤面的AlphaZero神經網路模型取代大盤面的AlphaZero神經網路模型。
    在實作上,我們採用基於AlphaGo Zero論文實現的AlphaZero General的開源框架專案,透過方便理解的Python開源專案,讓使用者可以輕鬆的在AlphaGo Zero的架構上實作遊戲及訓練神經網路,省去從頭開始開發的成本,能較專注於其他研究中。
    從實驗結果可以得知,在1天、2天及3天的訓練神經網路時間下,3×3盤面AlphaZero General代理人以平均處理合併policy的方式,在與相同訓練時間的4×4盤面AlphaZero General代理人的對戰中,分別取得64%、58%、57%的勝率。因此在訓練時間限制3天的情況下,可以使用訓練好的小盤面的AlphaZero神經網路模型取代大盤面的AlphaZero神經網路模型。

    Dots and Boxes is a zero-sum, perfect information, and impartial two-player game. Despite its small board size, it exhibits high game complexity. This study focuses on the 3×3 board of the game and employs the AlphaZero neural network model adapted for smaller boards, replacing the model originally designed for larger boards.
    For implementation, we utilized the AlphaZero General open-source framework, which is based on the principles outlined in the AlphaGo Zero paper. This Python-based framework facilitates straightforward game implementation and neural network training, following the AlphaGo Zero architecture. By leveraging this existing framework, we reduced the development costs and could allocate resources to other research areas.
    Experimental results demonstrate that, across various training durations (1 day, 2 days, and 3 days), the 3×3 board AlphaZero General agent, employing average processing to merge policy, outperforms its 4×4 board AlphaZero General agent. It achieved respective winning rates of 64%, 58%, and 57%. Therefore, within a limited training timeframe of 3 days, the compact AlphaZero neural network model proves effective in substituting the larger-capacity model originally used.

    一、 緒論 1 1.1 研究背景 1 1.2 研究目的 2 二、 文獻探討 4 2.1 點格棋 4 2.1.1 遊戲來源 4 2.1.2 遊戲玩法 4 2.1.3 遊戲特性 4 2.1.4 遊戲相關術語[5] 5 2.1.5 遊戲策略[5, 6] 9 2.1.6 研究歷史 10 2.2 AlphaGo Zero 13 2.3 AlphaZero 15 2.4 基於AlphaZero General與MuZero General框架實現點格棋 17 三、 方法與實現 19 3.1 於Alpha-Zero-General實作點格棋 19 3.1.1 Alpha-Zero-General 19 3.1.2 點格棋遊戲的Alpha-Zero-General程式 19 3.1.3 神經網路架構 22 3.2 訓練 27 3.3 3×3盤面神經網路模型拼接成能與4×4盤面玩家進行對戰 29 四、 實驗與結果 34 4.1 設備與超參數設定 34 4.1.1 設備 34 4.1.2 超參數設定 34 4.2 3×3盤面神經網路模型訓練結果 35 4.2.1 以代數為基礎進行記錄 35 4.2.2 以天數為基礎進行記錄 42 4.3 4×4盤面神經網路模型訓練結果 45 4.3.1 以代數為基礎進行記錄 45 4.3.2 以天數為基礎進行記錄 47 4.4 實驗結果 50 4.4.1 盤面測試 50 4.4.2 實驗一 58 4.4.3 實驗二 66 五、結論與未來方向 72 參考文獻 74

    [1] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, pp. 484-489, 2016.
    [2] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, et al., "Mastering the game of Go without human knowledge," Nature, vol. 550, pp. 354-359, 2017.
    [3] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, et al., "Mastering Chess and Shogi by self-play with a general reinforcement learning algorithm," ArXiv, vol. abs/1712.01815, 2017.
    [4] É. Lucas, H. A. Delannoy, C. A. Laisant, and É. M. H. Lemoine, L'arithmétique amusante: Gauthier-Villars et fils, 1895.
    [5] S. Krebbers, "Monte carlo tree search for dots-and-boxes." , Leiden University Institute of Advanced Computer Science Bachelor Thesis, 2021.
    [6] L. Weaver and T. Bossomaier, "Evolution of neural networks to play the game of dots-and-boxes," arXiv preprint cs/9809111, 1998.
    [7] E. R. Berlekamp, The dots and boxes game: sophisticated child's play: CRC Press, 2000.
    [8] J. Barker and R. Korf, "Solving dots-and-boxes," in Proceedings of the AAAI Conference on Artificial Intelligence, pp. 414-419, 2012.
    [9] Y. Zhuang, S. Li, T. V. Peters, and C. Zhang, "Improving Monte-carlo tree search for dots-and-boxes with a novel board representation and artificial neural networks," in 2015 IEEE Conference on Computational Intelligence and Games (CIG), pp. 314-321, 2015.
    [10] L. Zhang, Y. Zhang, P. Liu, and L. Guo, "The research to construct dots and boxes battle platform in computer game," in The 27th Chinese Control and Decision Conference (2015 CCDC), pp. 3749-3753, 2015.
    [11] J. Lu and H. Yin, "Using heuristic solver to optimize Monte Carlo tree search in dots-and-boxes," in 2016 Chinese Control and Decision Conference (CCDC), pp. 4288-4291, 2016.
    [12] Y. Zhang, S. Li, and X. Xiong, "A study on the game system of dots and boxes based on reinforcement learning," in 2019 Chinese Control and Decision Conference (CCDC), pp. 6319-6322, 2019.
    [13] P. Agrawal, "Parallelization of the Monte Carlo tree search approach in dots-and-boxes.", Western Kentucky University Faculty of Engineering and Applied Sciences, 2019.
    [14] S. Li, Y. Zhang, M. Ding, and P. Dai, "Research on integrated computer game algorithm for dots and boxes," The Journal of Engineering, vol. 2020, pp. 601-606, 2020.
    [15] D. Allcock, "Best play in dots and boxes endgames," International Journal of Game Theory, vol. 50, pp. 671-693, 2021.
    [16] A. Cotarelo, V. García-Díaz, E. R. Núñez-Valdez, C. González García, A. Gómez, and J. Chun-Wei Lin, "Improving monte carlo tree search with artificial neural networks without heuristics," Applied Sciences, vol. 11(5), 2056, 2021.
    [17] A. Pandey, "Solving dots & boxes using reinforcement learning," Colorado State University, 2022.
    [18] 曾羭豪,"基於AlphaZero General與MuZero General框架實現點格棋",臺灣師範大學資訊工程學系碩士論文,2023。
    [19] Y.-C. Chen, C.-H. Chen, and S.-S. Lin, "Exact-win strategy for overcoming AlphaZero," in Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems, pp. 26-31, 2018.
    [20] 陳彥吉,"蒙地卡羅樹搜索法的必贏策略以及快速 Nonogram 解題程式的實作",臺灣師範大學資訊工程學系碩士論文,2019。
    [21] S. Thakoor, S. Nair, and M. Jhunjhunwala, "Learning to play Othello without human knowledge," Stanford University CS238 Final Project Report, vol. 204, 2017. https://github.com/suragnair/alpha-zero-general (accessed Jul. 9, 2024).

    下載圖示
    QR CODE