簡易檢索 / 詳目顯示

研究生: 陳彥吉
Chen, Yen-Chi
論文名稱: 蒙地卡羅樹搜索法的必贏策略以及快速Nonogram解題程式的實作
Exact-win Strategy for Monte Carlo Tree Search and the Implementation of a Fast Nonogram Solver
指導教授: 林順喜
Lin, Shun-Shii
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 54
中文關鍵詞: 增強式學習蒙地卡羅樹搜索法AlphaZero必贏策略Nonogram謎題動態規劃位元操作指令
英文關鍵詞: Reinforcement learning, Monte-Carlo Tree Search, AlphaZero, Exact-win, Nonogram, Puzzle, Dynamic Programming, BMI
DOI URL: http://doi.org/10.6345/NTNU201900373
論文種類: 學術論文
相關次數: 點閱:286下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • DeepMind的AlphaZero展現了增強式學習即使在沒有人類知識的情況下也能表現出超越人類世界冠軍的棋力。然而AlphaZero所使用的蒙地卡羅樹搜索法無法根據遊戲理論值來評估盤面好壞。即使遊戲的結果已經被得知,蒙地卡羅樹仍會拜訪這個節點。在這篇論文中我們提出了Exact-win策略來對蒙地卡羅樹進行剪枝。Exact-win讓MCTS不再去處理已知遊戲理論值的節點,增加發現其他關鍵走步的機會。實驗結果顯示了我們的Exact-win方法在一些即死遊戲上顯著提升了原始MCTS的棋力,像是在井字遊戲和連四棋。在使用了Exact-win策略之後,Exact-win與原始版本的Leela Zero、ELF OpenGo和PhoenixGo對下了100盤後分別取得61、58和51場勝場。雖然DeepMind的AlphaZero仍未開源,但我們期待未來我們的方法也能用來加強AlphaZero。就我們所知,這是第一個可以直接加強AlphaZero的方法。
    在本篇論文中我們也將揭露我們的Nonogram程式Requiem的實作方式,該程式在近幾次的比賽中都以十分顯著的時間差距贏得冠軍。Nonogram是一個單人的紙筆邏輯遊戲,玩家須根據每一行每一列的提示來對二維的方格填入顏色。我們改進了吳老師等人的方法,藉由自由度參數來減少maximal painting的計算開銷。並結合一個設計好的位元盤面表示法來配合BMI指令架構,在加速運算的同時減少記憶體的負載。我們的Nonogram程式正確地解開了2011年到2018年間的所有錦標賽的題目,並且比歷年的程式都來得快。

    DeepMind’s AlphaZero demonstrated that reinforcement learning could reach the strength of over the human champions without human knowledge. However, the Monte-Carlo Tree Search (MCTS) used by AlphaZero cannot know the theoretical value of a board state. Even if the result of a node has been known, MCTS will still revisit this node. In this thesis, we propose the Exact-win strategy to prune the Monte-Carlo tree. Exact-win allows MCTS to no longer deal with the nodes that have the determined theoretical values and increases the chances of discovering other key moves. The experiments show that our Exact-win method substantially enhances the strength of the original MCTS in some sudden death games, such as the Tic-Tac-Toe and the Connect4. After employing the Exact-win strategy, Exact-win won 61, 58, and 51 of per 100 games against the original version of Leela Zero, ELF OpenGo, and PhoenixGo, respectively. DeepMind’s AlphaZero is currently not open source. If they make it open source, we expect that our approach can also promote AlphaZero. As far as we know, this is the first concept to enhance AlphaZero's approach with the concrete experiments performed on Go.
    Also, in this thesis, we reveal the implementation of our Nonogram program, named Requiem, which won all the games by a significant time gap in recent competitions. Nonogram is a pen and paper single-player logic game in which players paint each cell of a two-dimensional grid according to clues for specific rows and columns. We improve the method proposed by Wu et al. by adding a freedom parameter which can significantly reduce the total computation cost of maximal painting. Combining a well-designed bitboard with BMI instruction, we reduce memory loading while accelerating operations. Our Nonogram program solved all 1000 puzzles in every Nonogram Tournament from 2011 to 2018 faster than all nonogram programs.

    LIST OF TABLES viii LIST OF FIGURES ix Chapter 1 Introduction 1 1.1 Motivation Goals 1 1.1.1 Improving AlphaZero 1 1.1.2 More Efficient Nonogram Solver 4 1.2 Contributions 4 1.3 Thesis Outline 5 Chapter 2 Related Works on MCTS 6 2.1 Monte-Carlo Method 6 2.2 Monte-Carlo Tree Search 7 2.3 Improving MCTS 8 2.3.1 MCTS-Solver 8 2.3.2 MCTS-Minimax Hybrids 9 2.4 AlphaZero 10 2.5 Top-level Go Programs 10 2.5.1 Leela Zero 10 2.5.2 PhoenixGo 11 2.5.3 ELF OpenGo 11 Chapter 3 Exact-win Strategy for MCTS 13 3.1 Modify MCTS 13 3.2 Select Action 15 Chapter 4 Experiments of Exact-win Strategy 17 4.1 Tic-Tac-Toe 17 4.2 Connect4 19 4.3 Go 20 4.3.1 Leela Zero 20 4.3.2 PhoenixGo 21 4.3.3 ELF OpenGo 21 Chapter 5 Conclusions of Exact-win Strategy 23 Chapter 6 Introduction of Nonogram 24 Chapter 7 How to Solve a Nonogram? 27 7.1 Solving a Single Line 27 7.2 Fully Probing 28 7.3 Backtracking 29 Chapter 8 Our Nonogram Solver Requiem 31 8.1 Freedom 31 8.2 Fully Probing 35 8.3 Backtracking 36 8.4 High Performance Data Structures 37 Chapter 9 Experiments of Our Nonogram Solver 40 9.1 Past Competitions 40 9.2 Recent Competitions 41 Chapter 10 Concluding Remarks of Nonogram Solver 43 Bibliography 44 Appendix A Publications 49 A.1 Journal Papers 49 A.2 Conference Papers 49 A.3 Under Review 50 Appendix B Honors and Awards 51 B.1 Outstanding Student 51 B.2 Honorary Member of Phi Tau Phi Society 52 B.3 Excellent Oral Presentation 53 Appendix C Computer Game Medals 54

    J. Schaeffer and H. J. van den Herik, "Games, computers, and artificial intelligence," Artificial Intelligence, vol. 134, pp. 1-7, January 2002.
    T.-S. Hsu, S.-C. Hsu, J.-C. Chen, Y.-T. Chiang, B.-N. Chen, Y.-C. Liu, H.-J. Chang, S.-C. Tsai, T.-Y. Lin and G.-Y. Fang, Computers and Classical Board Games An Introduction, Taipei, Taiwan: National Taiwan University Press, 2017.
    C. E. Shannon, "Programming a computer for playing chess," Philosophical Magazine, vol. 41, no. 314, pp. 256-275, 1950.
    D. E. Knuth and R. W. Moore, "An analysis of alpha-beta pruning," Artificial Intelligence, vol. 6, no. 4, pp. 293-326, 1975.
    K. Thompson, "Retrograde Analysis of Certain Endgames," ICGA Journal, vol. 9, no. 3, pp. 131-139, 1 September 1986.
    J. Schaeffer, R. Lake, P. Lu and M. Bryant, "Chinook: the world man-machine Checkers champion," AI Magazine, vol. 17, no. 1, pp. 21-29, 15 March 1996.
    M. Buro, "The Othello Match of the Year: Takeshi Murakami vs. Logistello," ICGA Journal, vol. 20, no. 3, pp. 189-193, 1 September 1997.
    M. Buro, "Toward Opening Book Learning," ICGA Journal, vol. 22, no. 2, pp. 98-102, 1 June 1999.
    M. Buro, "Experiments with Multi-ProbCut and a New High-Quality Evaluation Function for Othello," Games in AI Research, 2000.
    M. Campbell, A. J. Hoane and F.-H. Hsu, "Deep Blue," Artificial Intelligence, vol. 134, no. 1, pp. 57-83, 2002.
    I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press, 2016.
    R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1st ed., Cambridge, MA, USA: MIT Press, 1998.
    J. Tromp and G. Farnebäck, "Combinatorics of Go," in Computers and Games, Berlin, Heidelberg, Springer Berlin Heidelberg, 2007, pp. 84-99.
    H. J. van den Herik, J. W. Uiterwijk and J. van Rijswijck, "Games solved: Now and in the future," Artificial Intelligence, vol. 134, no. 1, pp. 277-311, January 2002.
    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel and D. Hassabis, "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, pp. 484-489, 27 Jan. 2016.
    S.-C. Huang, "New Heuristics for Monte Carlo Tree Search Applied to the Game of Go," National Taiwan Normal University, Dept. of Computer Science and Information Engineering, Taipei, Taiwan, 2011.
    D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui and L. Sifre, "Mastering the game of Go without human knowledge," Nature, vol. 550, pp. 354-359, 18 October 2017.
    D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan and D. Hassabis, Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, arXiv:1712.01815 [cs.AI], 2017.
    D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan and D. Hassabis, "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play," Science, vol. 362, no. 6419, pp. 1140-1144, 7 Dec. 2018.
    C.-H. Chen, W.-L. Wu, Y.-H. Chen and S.-S. Lin, "Some improvements in Monte Carlo tree search algorithms for sudden death games," ICGA Journal, vol. 40, no. 4, pp. 460-470, 2019.
    Y.-C. Chen, C.-H. Chen and S.-S. Lin, "Exact-Win Strategy for Overcoming AlphaZero," in Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems, Phuket, Thailand, 2018.
    I.-C. Wu, D.-J. Sun, L.-P. Chen, K.-Y. Chen, C.-H. Kuo, H.-H. Kang and H.-H. Lin, "An Efficient Approach to Solving Nonograms," IEEE Transactions on Computational Intelligence and AI in Games, vol. 5, no. 3, pp. 251-264, September 2013.
    K.-Y. Chen, C.-H. Kuo, H.-H. Kang, D.-J. Sun and I.-C. Wu, "GitHub - CGI-LAB/Nonogram," CGI-LAB, [Online]. Available: https://github.com/CGI-LAB/Nonogram. [Accessed 9 June 2019].
    H. Baier and M. Winands, "Nested Monte-Carlo Tree Search for online planning in large MDPs," Frontiers in Artificial Intelligence and Applications, vol. 242, pp. 109-114, Aug. 2012.
    L. Kocsis and C. Szepesvári, "Bandit Based Monte-carlo Planning," in Proceedings of the 17th European Conference on Machine Learning, Berlin, Germany, 2006.
    P. Auer, N. Cesa-Bianchi and P. Fischer, "Finite-time Analysis of the Multiarmed Bandit Problem," Machine Learning, vol. 47, no. 2, pp. 235-256, 1 May 2002.
    P. Auer, N. Cesa-Bianchi, Y. Freund and R. E. Schapire, "Gambling in a rigged casino: The adversarial multi-armed bandit problem," in Proceedings of IEEE 36th Annual Foundations of Computer Science, 1995.
    C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis and S. Colton, "A Survey of Monte Carlo Tree Search Methods," IEEE Transactions on Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1-43, Mar. 2012.
    M. H. M. Winands, Y. Björnsson and J.-T. Saito, "Monte-Carlo Tree Search Solver," in Computers and Games, Berlin, Heidelberg, Springer Berlin Heidelberg, 2008, pp. 25-36.
    H. Baier and M. H. M. Winands, "Monte-Carlo Tree Search and minimax hybrids," in 2013 IEEE Conference on Computational Inteligence in Games (CIG), Niagara Falls, ON, Canada, 2013.
    H. Baier and M. H. M. Winands, "MCTS-minimax hybrids," IEEE Transactions on Computational Intelligence and AI in Games, vol. 7, no. 2, pp. 167-179, Jun. 2015.
    G.-C. Pascutto, "Leela-Zero," GitHub repository, 2018.
    Q. Zeng, J. Zhang, Z. Zeng, Y. Li, M. Chen and S. Liu, "PhoenixGo," GitHub repository, 2018.
    Y. Tian, Q. Gong, W. Shang, Y. Wu and C. L. Zitnick, "ELF: An extensive, lightweight and flexible research platform for real-time strategy games," in Advances in Neural Information Processing Systems, 2017.
    Y. Tian, J. Ma, Q. Gong, S. Sengupta, Z. Chen, J. Pinkerton and C. L. Zitnick, "ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero," CoRR, vol. abs/1902.04522, 2019.
    N. Ueda and T. Nagao, "NP-completeness Results for NONOGRAM via Parsimonious Reductions," Technical Report TR96-0008, Tokyo Japan, 1996.
    K. J. Batenburg, S. J. Henstra, W. A. Kosters and W. J. Palenstijn, "Constructing Simple Nonograms of Varying Difficulty," PU.M.A. Pure Mathematics and Applications, vol. 20, pp. 1-15, January 2009.
    S.-J. Yen, T.-C. Su, S.-Y. Chiu and J.-C. Chen, "Optimization of Nonogram's Solver by Using an Efficient Algorithm," in 2010 International Conference on Technologies and Applications of Artificial Intelligence, Hsinchu, Taiwan, 2010.
    T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein, Introduction to Algorithms, 3rd ed., Cambridge, MA, USA: MIT Press, 2009.
    L.-P. Chen, C.-Y. Hung and Y.-C. Liu, "A New Simplified Line Solver for Nonogram Puzzle Games," in 2017 TCGA Computer Game Workshop (TCGA 2017), Penhu, Taiwan, 2017.
    K. J. Batenburg and W. A. Kosters, "Solving Nonograms by combining relaxations," Pattern Recognition, vol. 42, no. 8, pp. 1672-1683, 2009.
    D. Cohen, P. Jeavons and M. Gyssens, "A unified theory of structural tractability for constraint satisfaction problems," Journal of Computer and System Sciences, vol. 74, no. 5, pp. 721-743, 2008.
    A. Biere, M. Heule, H. van Maaren and T. Walsh, Handbook of Satisfiability: Volume 185 Frontiers in Artificial Intelligence and Applications, Amsterdam, The Netherlands: IOS Press, 2009.
    "Contraposition - Wikipedia," [Online]. Available: https://en.wikipedia.org/wiki/Contraposition. [Accessed 9 June 2019].
    K.-C. Huang, J.-J. Yeh, W.-C. Huang, Y.-R. Guo and L.-P. Chen, "Exploring Effects of Fully Probing Sequence on Solving Nonogram Puzzles," ICGA Journal, vol. 40, no. 4, pp. 397-405, November 2018.
    L.-P. Chen and K.-C. Huang, "Solving Nonogram puzzles by using group-based fully probing," ICGA Journal, vol. 40, no. 4, pp. 387-396, 2018.
    F. Bacchus and P. van Run, "Dynamic variable ordering in CSPs," in Principles and Practice of Constraint Programming --- CP '95, Berlin, Heidelberg, Springer Berlin Heidelberg, 1995, pp. 258-275.
    "Bitboard Serialization - Chessprogramming wiki," [Online]. Available: https://www.chessprogramming.org/Bitboard_Serialization. [Accessed 10 June 2019].
    "Bit Manipulation Instruction Sets - Wikipedia," [Online]. Available: https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets. [Accessed 10 June 2019].
    "TCGA Computer Game Tournaments 2017," [Online]. Available: https://www.tcga.tw/taai2017. [Accessed 10 June 2019].
    "ICGA Computer Olympiad 2018," [Online]. Available: https://www.tcga.tw/icga-computer-olympiad-2018. [Accessed 10 June 2019].
    "TCGA Computer Game Tournaments 2018," [Online]. Available: https://www.tcga.tw/taai2018. [Accessed 10 June 2019].
    "TCGA Computer Game Tournaments 2019," [Online]. Available: https://tcga.tw/tournament2019. [Accessed 16 June 2019].
    Y.-C. Chen and S.-S. Lin, "A Fast Nonogram Solver that Won the TAAI 2017 and ICGA 2018 tournaments," ICGA Journal, vol. 41, no. 1, pp. 2-14, 14 June 2019.

    下載圖示
    QR CODE