研究生: |
偕為昭 Jie, Wei-Zhao |
---|---|
論文名稱: |
強化學習與遷移學習應用於六貫棋遊戲 Investigating Reinforcement Learning and Transfer Learning in Hex Game |
指導教授: |
林順喜
Lin, Shun-Shii |
口試委員: |
吳毅成
Wu, I-Chen 顏士淨 Yen, Shi-Jim 陳志昌 Chen, Jr-Chang 周信宏 Chou, Hsin-Hung 林順喜 Lin, Shun-Shii |
口試日期: | 2023/06/28 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 46 |
中文關鍵詞: | 六貫棋 、強化學習 、遷移學習 |
英文關鍵詞: | Hex, Reinforcement Learning, Transfer Learning |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202300891 |
論文種類: | 學術論文 |
相關次數: | 點閱:69 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
六貫棋是一款雙人對局遊戲,起初在1942年於丹麥的報紙中出現,被稱為Polygon。1948年時,被美國數學家John Forbes Nash Jr.重新獨立發明,並稱為Nash。最後在1952年由製造商Parker Brothers發行,且將其命名為Hex。在此遊戲中,上下及左右的對邊各以一個顏色表示,雙方玩家需要在棋盤上落子並將自己顏色的對邊連接以取得勝利。此遊戲為零和遊戲,且不會有平手的情況發生。在以前的研究中,六貫棋在9路以下的盤面已經被破解。
由於AlphaZero的問世,現今電腦對局遊戲的程式有更進一步的發展,以該方法研發的對局程式都有不錯的棋力。而在六貫棋遊戲中,不得不提由加拿大Alberta大學研發的Mohex程式,該程式一直都在競賽中得到優異的成績,至今也持續進行改良。
本研究試圖以AlphaZero的訓練框架進行強化學習,並以Mohex破解的盤面資料為輔助。在訓練大盤面的模型時需要較多的成本,因此嘗試結合遷移學習的方式,運用已經破解的小盤面資料,使初期的自我對下階段就能產生較好的棋譜,而不是從完全的零知識開始訓練,藉此提升大盤面模型的訓練成果。並且比較在進行遷移學習時,使用不同參數轉移方法的影響。
Hex is a two-player board game that first appeared in a Denmark newspaper in 1942 and was called Polygon. In 1948, American mathematician John Forbes Nash Jr. reinvented the game independently and called it Nash. Finally, in 1952, it was published by the manufacturer Parker Brothers and renamed Hex. In the game board, each of the opposite sides (vertically and horizontally) is represented by a different color. Players take turns placing their pieces on the board to connect opposite sides that marked by their colors to win. This game is a zero-sum game, and a tie is impossible. In previous research, the game has been solved for board sizes smaller than 9×9.
With the advent of AlphaZero, programs for board games have been further investigation, and programs developed using this method have also shown good performance. In the game of Hex, the program “Mohex” developed by the University of Alberta is noteworthy. It already had excellent results in competitions and is continuously improving its strength.
This thesis attempts to use the framework of AlphaZero for reinforcement learning and uses the solved board data from Mohex for assistance. Since training a model for larger board sizes require more resources, so we aim to combine transfer learning with solved games for smaller board sizes to get better gameplay in the early stages of self-play, rather than starting from zero knowledge. By the above approach, we try to improve the training results of the model for larger board sizes. Additionally, we compare the effects of using different ways to transfer parameters during transfer learning.
[1] DeepMind, https://www.deepmind.com/.
[2] Wikipedia: Hex, https://en.wikipedia.org/wiki/Hex_(board_game).
[3] Jakub Pawlewicz, Ryan Hayward, Philip Henderson, Broderick Arneson, “Stronger Virtual Connections in Hex”, IEEE Trans. on Computational Intelligence and AI in Games, vol. 7, no. 2, June 2015, pp. 156-166.
[4] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis, “Mastering the Game of Go without Human Knowledge”, Nature, vol. 550, Oct. 2017, pp. 354-359.
[5] Lisa Torrey, Jude Shavlik, “Transfer learning”, in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. Hershey, PA: IGI global, 2010, pp. 242-264.
[6] cgao3/benzene-vanilla-cmake, https://github.com/cgao3/benzene-vanilla-cmake.
[7] Broderick Arneson, Ryan B. Hayward, Philip Henderson, “Monte Carlo Tree Search in Hex”, IEEE Trans. on Computational Intelligence and AI in Games (special issue: Monte Carlo Techniques and Computer Go), vol. 2, no. 4, Dec. 2010, pp. 251-257.
[8] Broderick Arneson, Ryan B. Hayward, Philip Henderson, “Solving Hex: Beyond Humans”, Computers and Games, CG 2010, Lecture Notes in Computer Science, vol. 6515, Springer Berlin/Heidelberg, 2011, pp. 1-10. https://doi.org/10.1007/978-3-642-17928-0_1.
[9] Shih-Chieh Huang, Broderick Arneson, Ryan B. Hayward, Martin Müller, Jakub Pawlewicz, “MOHEX 2.0: A Pattern-Based MCTS Hex Player”, In: van den Herik, H., Iida, H., Plaat, A. (eds) Computers and Games. CG 2013. Lecture Notes in Computer Science, vol. 8427. Springer, Cham. https://doi.org/10.1007/978-3-319-09165-5_6.
[10] Ryan Hayward, Noah Weninger, “Hex 2017: MoHex Wins the 11x11 and 13x13 Tournaments”, ICGA Journal, vol. 39, no. 3-4, Jan. 2017, pp. 222-227.
[11] Yann LeCun, Leon Bottou, Yoshua Bengio,Patrick Haffner, “Gradient-Based Learning Applied to Document Recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, https://doi.org/10.1109/5.726791.
[12] suragnair/alpha-zero-general, https://github.com/suragnair/alpha-zero-general.
[13] Shantanu Thakoor, Surag Nair, Megha Jhunjhunwala, “Learning to Play Othello without Human Knowledge,” Stanford University CS238 Final Project Report, 2017.
[14] PyTorch, https://pytorch.org/.
[15] 王鈞平,六貫棋遊戲實作與強化學習應用,國立臺灣師範大學資訊工程所碩士論文,2019。
[16] Dennis J.N.J. Soemers, Vegard Mella, Eric Piette, Matthew Stephenson, Cameron Browne, Olivier Teytaud, “Transfer of Fully Convolutional Policy-Value Networks between Games and Game Variants,” arXiv preprint, https://arxiv.org/abs/2102.12375, 2021.