研究生: |
劉浩萱 Liu, Hao-Hsuan |
---|---|
論文名稱: |
AlphaZero演算法結合快贏策略或迫著空間實現於五子棋 AlphaZero Algorithm Combined with Quick Win or Threat Space for Gomoku |
指導教授: |
林順喜
Lin, Shun-Shii |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 59 |
中文關鍵詞: | AlpahZero 、類神經網路 、快贏策略 、迫著搜尋 |
英文關鍵詞: | AlpahZero, Neural Network, Quick Win, Threats-space Search |
DOI URL: | http://doi.org/10.6345/NTNU202001222 |
論文種類: | 學術論文 |
相關次數: | 點閱:350 下載:53 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
AlphaZero是一個通用的強化式學習之演算法,除了遊戲規則外毫無人類知識,經過訓練後會有極佳的結果。為了要讓此架構在訓練初期,就能夠成功學習到五子棋所需的獲勝資訊,本研究展示了快贏策略(Quick Win)與迫著空間。
快贏策略旨在讓類神經網路學習到快贏的價值,並且在各走步勝率相同時,能更傾向選擇可以快速獲得勝利的走步;迫著空間則是針對盤面的迫著做搜索,讓能產生迫著走步的資訊被類神經網路學習,以縮短訓練時間。
本研究以四種不同的實驗方式,包含線性距離權重、指數距離權重、結合迫著搜尋於距離權重,以及結合迫著搜尋於蒙地卡羅樹搜索法的方式,觀察AlphaZero為設計基礎的人工智慧模型,在對弈時是否因為選擇了更快獲勝的棋局走步或學會形成迫著,而有效增強棋力。
AlphaZero is a generic reinforcement learning algorithm that achieved superior results after training, given no domain knowledge except the game rules. To get the similar results and let the neural network learn winning information of Gomoku in the beginning of the training, this thesis deals with Quick Win and Threats-space Search methods.
Quick Win method aims to let the neural network learn how to win faster by choosing the fastest winning move when the walkable moves show the same winning possibilities. Threats-space Search method is to search for the threats for every move, letting the neural network learn how to create threats for shortening the training period.
In this thesis, we demonstrate four kinds of experiments applied to Gomoku including linear distance weight, exponential distance weight, combining Threats-space Search with distance weight and combining Threats-space Search with Monte Carlo Tree Search. We observe whether the implementations based on AlphaZero algorithm effectively enhances the winning ability because of choosing a faster winning move or a threat move during the game.
[1] J. Schaeffer, “The history heuristic and alpha-beta search enhancements in practice,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 11, pp. 1203–1212, 1989.
[2] L. Kocsis and C. Szepesvári, “Bandit Based Monte-Carlo Planning,” Lecture Notes in Computer Science Machine Learning: ECML 2006, pp. 282–293, 2006.
[3] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
[4] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. V. D. Driessche, T. Graepel, and D. Hassabis, “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017.
[5] “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm” [Online]. Available: https://arxiv.org/pdf/1712.01815.pdf. [Accessed: 24-Feb-2020].
[6] S. Nair, Alpha-zero-general. December, 2017. [Online]. Available: https://github.com/junxiaosong/Alpha-zero-general. [Accessed Feb. 24, 2020].
[7] C. H. Chen, W. L. Wu, Y. H. Chen, and S. S. Lin, “Some improvements in Monte Carlo tree search algorithms for sudden death games,” ICGA Journal, vol. 40, no. 4, pp. 460–470, 2019.
[8] W. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” Bulletin of Mathematical Biology, vol. 52, no. 1-2, pp. 99–115, 1990.
[9] D. O. Hebb, The organization of behavior: a neuropsychological theory. New York: John Wiley & Sons, 1949.
[10] B. Farley and W. Clark, “Simulation of self-organizing systems by digital computer,” Transactions of the IRE Professional Group on Information Theory, vol. 4, no. 4, pp. 76–84, 1954.
[11] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain.,” Psychological Review, vol. 65, no. 6, pp. 386–408, 1958.
[12] B. Widrow and M. A. Lehr, "30 years of adaptive neural networks: perceptron, madaline, and backpropagation". Proceedings of the IEEE. 78 (9): 1415–1442, 1990.
[13] M. Minsky and S. Papert, A review of "Perceptrons: an introduction to computational geometry". New York: Academic Press, 1970.
[14] P. J. Werbos, Roots of backpropagation: from ordered derivatives to neural networks and political forecasting. New York: Wiley, 1994.
[15] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.
[16] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551, 1989.
[17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[18] Renju International Federation, “The International Rules of Renju”. [Online]. Available: http://www.renju.net/study/rifrules.php. [Accessed Mar. 11, 2020].
[19] L. V. Allis, H. J. Herik, and M. P. H. Huntjens, “Go-Moku Solved By New Search Techniques,” Computational Intelligence, vol. 12, no. 1, pp. 7–23, 1996.
[20] S. S. Lin, “Outer-Open Gomoku (Using Lin's Rule,proposed in 2011)”. [Online]. Available: http://www.csie.ntnu.edu.tw/~linss/Lin-Rule-For-Five-in-a-row.htm. [Accessed Aug. 21, 2020].
[21] C. H. Chen, W. L. Wu, Y. H. Chen, and S. S. Lin, “Some Improvements in Monte Carlo Tree Search Algorithms for Sudden Death Games”, ICGA Journal, vol. 40, no. 4, pp. 460-470, 2018.
[22] 吳天宇,基於AlphaZero General Framework實現Breakthrough遊戲。國立臺灣師範大學碩士論文, 2019。
[23] L. Allis, H. van den Herik, and M. Huntjens, “Go-moku solved by new search techniques,” Computation Intelligence, vol. 12, pp. 7-23, 1996.
[24] 翁慈孝,電腦五子棋程式珠連設計與製作。國立東華大學碩士論文,2004。
[25] 劉雲青,六子棋中一個結合迫著搜尋的防禦性策略。國立臺灣師範大學碩士論文,2009。
[26] 張乃元,改進AlphaZero的大贏策略並應用於黑白棋。國立臺灣師範大學碩士論文,2019。