簡易檢索 / 詳目顯示

研究生: 林育璋
Lin, Yu-Chang
論文名稱: 使用KataGo方法及迫著空間搜尋提升AlphaZero在六子棋的訓練成效
Using the KataGo Method and Threat Space Search to Imporve the Training Performance of AlphaZero in Connect6
指導教授: 林順喜
Lin, Shun-Shii
口試委員: 吳毅成
Wu, I-Chen
顏士淨
Yen, Shi-Jim
陳志昌
Chen, Jr-Chang
周信宏
Chou, Hsin-Hung
林順喜
Lin, Shun-Shii
口試日期: 2023/05/03
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 53
中文關鍵詞: 電腦對局強化式學習六子棋AlphaZeroKataGo平行化
英文關鍵詞: Computer Games, Reinforcement Learning, Connect6, AlphaZero, KataGo, Parallelization
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202300496
論文種類: 學術論文
相關次數: 點閱:107下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自從Google DeepMind提出AlphaZero演算法之後,許多使用傳統搜尋法的電腦對局程式都被AlphaZero作法取代。然而AlphaZero作法需要非常大量的算力,才能夠達到頂尖的水準,因此我們希望透過程式效能改進及傳統做法的輔助,提升AlphaZero在六子棋遊戲的訓練效率,讓我們可以使用個人電腦達到頂尖水準。
    本篇論文使用Alpha-Zero-General開源程式碼作為基礎,研發一支AlphaZero的六子棋程式。我們參考galvanise_zero的做法修改MCTS的搜尋方式、參考OOGiveMeFive提出的通用型Bitboard,將其進行修改後用於六子棋程式中,並且參考陽明交通大學的CZF_Connect6提出的六子棋強度改進方式。
    本篇論文從三個面向來加速AlphaZero的訓練效率。第一個是提升程式效能,我們分析Alpha-Zero-General的一個效能瓶頸是MCTS的部分,因此透過C++及平行化的方式重新實作MCTS,大幅提升AlphaZero的訓練效率。第二個是提升神經網路的性能,使用KataGo提出的Global Pooling及Auxiliary Policy Targets方法修改神經網路,並套用於六子棋程式中。第三個是提升訓練資料的品質,使用KataGo提出的Forced Playout and Policy Target Pruning方法及傳統的迫著空間搜尋提升訓練資料的品質。另外本篇論文提出一種新的訓練方式,提升AlphaZero加入heuristics的訓練效果。
    我們使用C++、平行化及批次預測的方式可以讓MCTS的搜尋效率達到26.4的加速比,並且使用Bitboard的方式可以讓迫著空間搜尋達到6.03的加速比。在短時間的訓練中,雖然使用相同時間AlphaZero方法可以訓練更多個迭代,不過使用相同時間訓練的KataGo方法與原始AlphaZero方法相比依然可以取得57.58%的勝率,且使用相同時間訓練的KataGo-TSS Hybrids方法與原始AlphaZero方法相比也可以取得70%的勝率。並且這三種作法訓練到500個迭代後與NCTU6_Level3對戰,都可以取得超過65%的勝率。

    Since Google DeepMind proposed the AlphaZero algorithm, many traditional search methods for computer game programs have been replaced by the AlphaZero method. However, the AlphaZero method requires a very large amount of computing power to reach the top level. Therefore, we hope to improve the training efficiency of AlphaZero in the game of Connect6 through the improvement of program performance and the assistance of traditional methods, so that we can use personal computers to reach the top level.
    This thesis uses the Alpha-Zero-General open source code as the basis to develop an AlphaZero Connect6 program. We refer to the method of galvanise_zero to modify the method of MCTS, refer to the general-purpose Bitboard proposed by OOGiveMeFive, modify it and use it in the game of Connect6, and refer to the method for improving the strength of Connect6 proposed by CZF_Connect6 of Yang Ming Chiao Tung University.
    This thesis accelerates the training efficiency of AlphaZero from three aspects. The first is program performance improvement. We analyze that a performance bottleneck of Alpha-Zero-General is its MCTS, so we re-implement MCTS through C++ language and parallelization. This greatly improves the training efficiency of AlphaZero. The second is to improve the performance of the neural network. We refer to the KataGo method, using Global Pooling and Auxiliary Policy Targets to modify the neural network and apply it to the game of Connect6. The third is to improve the quality of training data by using KataGo's Forced Playout and Policy Target Pruning method as well as the traditional Threat Space Search to improve the quality of training data. In addition, this thesis proposes a new training method to improve the training effect of AlphaZero by adding heuristics.
    By using C++ language, parallelization and batch prediction, the search efficiency of MCTS can reach a speedup of 26.4, and by using Bitboard, the threat space search can reach a speedup of 6.03. In short-term training with the same time, although the AlphaZero method can train more iterations, the KataGo method can still achieve a 57.58% win rate compared with the original AlphaZero method. Furthermore, the KataGo-TSS Hybrids trained with the same time can also achieve a 70% win rate against the original AlphaZero method. After the three methods have been trained to 500 iterations and played against NCTU6_Level3, they all achieve a win rate of more than 65%.

    第一章 緒論 1 1.1 研究背景 1 1.2 研究目的 1 第二章 文獻探討 3 2.1 六子棋介紹 3 2.2 AlphaZero 4 2.3 KataGo 6 2.4 迫著空間搜尋 8 2.5 外圍開局五子棋的Bitboard表示法 10 2.6 六子棋限縮落子範圍方法 10 2.7 六子棋程式強度改進方式 10 2.8 相關程式介紹 11 2.9 Ludii[14] GUI介紹 11 第三章 研究方法 13 3.1 Alpha-Zero-General 13 3.2 效能提升 17 3.3 實作小規模測試 20 3.4 實作六子棋規則 21 3.5 實作六子棋神經網路 22 3.6 實作六子棋遮罩 23 3.7 實作六子棋Bitboard 23 3.8 實作KataGo方法 24 3.9 實作六子棋迫著空間搜尋 25 3.10 AlphaZero-TSS Hybrids 27 3.11 AlphaZero-TSS Learning 28 3.12 ELO等級分 28 3.13 連接Ludii GUI介面 29 第四章 實驗與結果 31 4.1 實驗設置 31 4.2 實驗說明 33 4.3 MCTS效能測試 34 4.4 並行Self-Play效能測試 34 4.5 TSS效能測試 35 4.6 ELO變化 36 4.7 找出必勝能力比較 38 4.8 強度比較 45 4.9 NCTU6對戰 47 4.10 參賽結果 48 第五章 結論與未來工作 49 參考文獻 51

    D. Silver et al., “Mastering the Game of Go with Deep Neural Networks and Tree Search,” Nature 2016 529:7587, vol. 529, no. 7587, pp. 484–489, Jan. 2016, doi: 10.1038/nature16961.
    D. Silver et al., “Mastering the Game of Go without Human Knowledge,” Nature 2017 550:7676, vol. 550, no. 7676, pp. 354–359, Oct. 2017, doi: 10.1038/nature24270.
    D. Silver et al., “A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go through Self-Play,” Science 2018, vol. 362, no. 6419, pp. 1140–1144, Dec. 2018, doi: 10.1126/SCIENCE.AAR6404/SUPPL_FILE/AAR6404_DATAS1.ZIP.
    D. J. Wu, “Accelerating Self-Play Learning in Go,” Feb. 2019, doi: 10.48550/arxiv.1902.10565.
    楊子頤,“應用AlphaZero於六子棋”,國立交通大學多媒體工程研究所碩士論文,2020。
    黃德彥,“五子棋相關棋類人工智慧之研究”,國立交通大學資訊科學與工程研究所碩士論文,2005。
    K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-December, pp. 770–778, Dec. 2015, doi: 10.48550/arxiv.1512.03385.
    K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9908 LNCS, pp. 630–645, Mar. 2016, doi: 10.48550/arxiv.1603.05027.
    C. H. Chen, S. S. Lin, and Y. C. Chen, “An Algorithmic Design and Implementation of Outer-open Gomoku,” 2nd International Conference on Computer and Communication Systems, ICCCS 2017, pp. 26–30, Oct. 2017, doi: 10.1109/CCOMS.2017.8075180.
    黃士豪,“利用有利條件訓練神經網路-以六子棋為例”,國立台灣師範大學資訊工程研究所碩士論文,2019。
    李韡,“適用於AlphaZero類型應用之軟體框架”,國立交通大學資訊科學與工程研究所碩士論文,2018。
    Little Golem,檢自:https://www.littlegolem.net/jsp/main/
    蔣秉璁,“六子棋程式強度改進之研究”,國立交通大學資訊科學與工程研究所碩士論文,2015。
    Ludii Portal, https://ludii.games/index.php (accessed Jan. 31, 2023).
    GitHub - suragnair/alpha-zero-general: A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/ TicTacToe/Connect4 and more,檢自:https://github.com/suragnair/alpha-zero-general (accessed Jan. 04, 2023).
    GitHub - richemslie/galvanise_zero: Learning from zero (mostly based off of AlphaZero) in General Game Playing,檢自:https://github.com/richemslie/galvanise_zero
    Easily run TensorFlow models from C++ — cppflow 2.0 documentation,檢自:https://serizba.github.io/cppflow/
    A. Liu, J. Chen, M. Yu, Y. Zhai, X. Zhou, and J. Liu, “Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search,” Jan. 23, 2023. Accessed: Jan. 31, 2023,檢自:https://github.com/liuanji/WU-UCT.
    GitHub - leela-zero/leela-zero: Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper,檢自:https://github.com/leela-zero/leela-zero
    廖唯辰、黃柏維、鄭紹雄、吳毅成,“KataGo方法於外圍開局五子棋上的通用性”,TCGA電腦對局研討會,2021。
    李榮欽,“六子棋上的自動詰棋產生系統”,國立交通大學資訊科學與工程研究所碩士論文,2008。
    陳志宏,“在下棋與訓練階段改進AlphaZero演算法”,國立台灣師範大學資訊工程研究所博士論文,2021。
    J. Czech, P. Korus, and K. Kersting, “Monte-Carlo Graph Search for AlphaZero,” Dec. 2020,檢自:http://arxiv.org/abs/2012.11045

    下載圖示
    QR CODE