研究生: |
陳志宏 Chen, Chih-Hung |
---|---|
論文名稱: |
在下棋與訓練階段改進AlphaZero演算法 Improving the AlphaZero Algorithm in the Playing and Training Phases |
指導教授: |
林順喜
Lin, Shun-Shii |
口試委員: |
許舜欽
Hsu, Shun-Chin 吳毅成 Wu, I-Chen 顏士淨 Yen, Shi-Jim 陳志昌 Chen, Jr-Chang 周信宏 Chou, Hsin-Hung 張紘睿 Chang, Hung-Jui 林順喜 Lin, Shun-Shii |
口試日期: | 2021/08/25 |
學位類別: |
博士 Doctor |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 115 |
英文關鍵詞: | AlphaZero-miniMax Hybrids, Proven-mark strategy, Quick-win strategy, Best-win strategy, Threat-space-reduction, Big-win strategy, Multistage-training strategy |
研究方法: | 實驗設計法 、 調查研究 、 比較研究 |
DOI URL: | http://doi.org/10.6345/NTNU202101339 |
論文種類: | 學術論文 |
相關次數: | 點閱:118 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
AlphaZero got grand success across many challenging games, but it needs a huge computational power to train a good model. Instead of investing so many resources, we focus on improving the performance of AlphaZero. In this work, we introduce seven major enhancements in AlphaZero. First, the AlphaZero-miniMax Hybrids strategy combines the modern AlphaZero approach and traditional search algorithm to improve the strength of the program. Second, the Proven-mark strategy prunes unneeded moves to avoid the re-sampling problem and increase the opportunity of exploring the promising moves. Third, the Quick-win strategy distinguishes the rewards according to the length of the game-tree search, and no longer treats all wins (or losses) equally. Fourth, the Best-win strategy resolves an inaccurate win rate problem by updating the best reward rather than average. Fifth, the Threat-space-reduction improves the performance of the neural network training under limited resources. Sixth, the Big-win strategy takes into consideration the number of points of the final outcome instead of simply labeling win/loss/draw. Finally, the Multistage-training strategy improves the quality of the neural network for multistage games. After years of work, we derive some promising results that have already improved the performance of the AlphaZero algorithm on some test domains.
[1]J. McCarthy, “Chess as the Drosophila of AI,” Computers, Chess, and Cognition, pp. 227-237, 1990. DOI: 10.1007/978-1-4613-9080-0_14.
[2]D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the Game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484-489, 2016. DOI: 10.1038/nature16961.
[3]D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354-359, 2017. DOI: 10.1038/nature24270.
[4]D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play,” Science, vol. 362, no. 6419, pp. 1140-1144, 2018. DOI: 10.1126/science.aar6404.
[5]D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” 2018. Retrieved from https://arxiv.org/abs/1712.01815.
[6]T.-s. Hsu, S.-C. Hsu, J.-C. Chen, Y.-T. Chiang, B.-N. Chen, Y.-C. Liu, H.-J. Chang, S.-C. Tsai, T.-Y. Lin, and G.-Y. Fan, “Computers and Classical Board Games: An Introduction,” National Taiwan University Press, 2017.
[7]R. E. Korf, D. M. Chickering, “Best-first minimax search,” Artificial Intelligence, vol. 84, no. 1–2, pp. 299-337, 1996. DOI: 10.1016/0004-3702(95)00096-8.
[8]C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A Survey of Monte Carlo Tree Search Methods,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1-43, March 2012. DOI: 10.1109/TCIAIG.2012.2186810.
[9]R. Coulom, “Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search,” Proc. 5th International Conference on Computers and Games (CG’06), Springer, Berlin, Heidelberg, 2007, pp. 72-83. DOI: 10.1007/978-3-540-75538-8_7.
[10]G. Chaslot, M. Winands, H. Herik, J. Uiterwijk, and B. Bouzy, “Progressive Strategies for Monte-Carlo Tree Search,” New Mathematics and Natural Computation, vol. 4, no. 3, pp. 343-357, 2008. DOI: 10.1142/S1793005708001094.
[11]G. Chaslot, S. Bakkes, I. Szita, and P. Spronck, “Monte-Carlo Tree Search: A New Framework for Game AI,” Proc. 4th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’08), AAAI Press, pp. 216-217, 2008.
[12]P. Auer, N. Cesa-Bianchi, and P. Fischer, (2002). “Finite-time Analysis of the Multiarmed Bandit Problem,” Machine Learning, vol. 47, no. 2, pp. 235-256, 2002. DOI:10.1023/A:1013689704352.
[13]L. Kocsis and C. Szepesvári, “Bandit Based Monte-Carlo Planning,” Proc. 17th European Conference on Machine Learning (ECML’06), Springer, Berlin, Heidelberg, 2006, pp. 282-293. DOI: 10.1007/11871842_29.
[14]L. Kocsis and C. Szepesvári, and J. Willemson, “Improved Monte-Carlo Search,” Univ. Tartu, Estonia, Tech. Rep. 1, 2006.
[15]M. H. M. Winands, Y. Björnsson, and J.-T. Saito, “Monte-Carlo Tree Search Solver,” Proc. 6th International Conference on Computers and Games (CG 2008), Springer, October 2008. DOI:10.1007/978-3-540-87608-3_3.
[16]C.-H. Chen, Y.-C. Chen, and S.-S. Lin, “Two-Phase-Win Strategy for Improving the AlphaZero’s Strength,” 2nd World Symposium on Communication Engineering (WSCE 2019), Nagoya, Japan, December 20-23, 2019.
[17]Y.-C. Chen, C.-H. Chen, and S.-S. Lin, “Exact-Win Strategy for Overcoming AlphaZero,” Proc. 2018 International Conference on Computational Intelligence and Intelligent Systems (CIIS 2018), Phuket Island, Thailand, November 17-19, 2018, pp. 26-31. DOI: 10.1145/3293475.3293486.
[18]C.-H. Chen, W.-L. Wu, Y.-H. Chen, and S.-S. Lin, “Some Improvements in Monte Carlo Tree Search Algorithms for Sudden Death Games,” ICGA Journal, vol. 40, no. 4, pp. 460-470, 2018. DOI: 10.3233/ICG-180065.
[19]S.-H. Huang, C.-H. Chen, and S.-S. Lin, “Improve the Performance of Neural Network Training with Accurate Information: Take Connect6 for Example,” Proc. 2019 International Conference on Advanced Information Science and System (AISS’19), Singapore, 2019. DOI: 10.1145/3373477.3373703.
[20]N.-Y. Chang, C.-H. Chen, S.-S. Lin, and S. Nair, “The Big Win Strategy on Multi-Value Network: An Improvement over AlphaZero Approach for 6×6 Othello,” Proc. 2018 International Conference on Machine Learning and Machine Intelligence (MLMI2018), ACM, New York, NY, USA, 2018, pp. 78-81. DOI: 10.1145/3278312.3278325.
[21]J.-H. Chern, C.-H. Chen, and S.-S. Lin, “The Research of Chinese Checkers Program with Deep Learning,” 2019 TCGA Computer Game Workshop (TCGA 2019), 2019, Tainan, Taiwan (R.O.C). (in Chinese)
[22]Ti-Rong Wu, “Accelerating and Improving AlphaZero Using Population Based Training,” 2019. Retrieved from https://youtu.be/_xqeUQiQMNI?t=1821.
[23]N.-Y. Chang, “The Big Win Strategy: An improvement over AlphaZero approach for Othello,” Master’s Thesis, National Taiwan Normal University, Taipei, Taiwan, 2019. (in Chinese)
[24]Y.-H. Chen, “Design and Implementation of the Outer-Open Gomoku Program JacksonFive,” Master’s Thesis, National Taiwan Normal University, Taipei, Taiwan, 2018. (in Chinese)
[25]T.-Y. Wu, “On Implementing Breakthrough Game Based on AlphaZero General Framework,” Master’s Thesis, National Taiwan Normal University, Taipei, Taiwan, 2019. (in Chinese)
[26]Y.-C. Chen, “Exact-win Strategy for Monte Carlo Tree Search and the Implementation of a Fast Nonogram Solver,” Master’s Thesis, National Taiwan Normal University, Taipei, Taiwan, 2019.
[27]C.-H. Chen, P.-Y. Chen, H.-H. Liu, and Shun-Shii Lin, “Improving the Performance of AlphaZero by Using Threat-Space-Reduction,” The 25th International Conference on Technologies and Applications of Artificial Intelligence (TAAI 2020), Taipei, Taiwan (R.O.C.). (in Chinese)
[28]P.-Y. Chen, “Implement and Improve a MiniShogi Program Using the AlphaZero Framework,” Master’s Thesis, National Taiwan Normal University, Taipei, Taiwan, 2020. (in Chinese)
[29]P.-Y. Chen, C.-H. Chen, and S.-S. Lin, “Implementing MiniShogi Program by Using the AlphaZero Framework,” 2020 TCGA Computer Game Workshop (TCGA 2020), 2020, Taichung, Taiwan (R.O.C). (in Chinese)
[30]S.-H. Huang, “Exploiting Favorable Conditions for Training Neural Networks-Taking Connect6 for Example,” Master’s Thesis, National Taiwan Normal University, Taipei, Taiwan, 2019. (in Chinese)
[31]H.-H. Liu, “AlphaZero Algorithm Combined with Quick Win or Threat Space for Gomoku,” Master’s Thesis, National Taiwan Normal University, Taipei, Taiwan, 2020. (in Chinese)
[32]J.-H. Chern, “The Research of Chinese Checkers Program with Deep Learning,” Master’s Thesis, National Taiwan Normal University, Taipei, Taiwan, 2019. (in Chinese)
[33]L. Allis, “Searching for Solutions in Games and Artificial Intelligence,” Ph.D. Thesis, University of Limburg, Maastricht, Netherlands, 1994.
[34]L. V. Allis, M. van der Meulen, and H. J. van den Herik. “Proof-Number Search,” Artificial Intelligence, vol. 66, no. 1, pp. 91–124, 1994.
[35]D. M. Breuker, J. Uiterwijk, and H. J. van den Herik, “The PN-search algorithm,” Advances in Computer Games, The Netherlands, Maastricht: IKAT, Universiteit Maastricht, vol. 9, pp. 115-132, 2001.
[36]H. J. van den Herik and M. H. M. Winands, “Proof number search and its variants,” Oppositional Concepts in Computational Intelligence, USA, NY, New York: Springer-Verlag, vol. 155, pp. 91-118, 2008. DOI: 10.1007/978-3-540-70829-2_6.
[37]A. Kishimoto and Y. Kotani, “Parallel AND/OR tree search based on proof and disproof numbers,” Proc. 5th Games Programming Workshop, vol. 99, no. 14, pp. 24-30, 1999.
[38]A. Kishimoto and M. Müller, “DF-PN in Go: Application to the One-Eye Problem,” Advances in Computer Games Conference (ACG’10), USA, MA, Norwell: Kluwer, pp. 125-141, 2003.
[39]A. Nagai, “DF-PN algorithm for searching AND/OR trees and its applications,” PhD thesis, University of Tokio, 2002.
[40]J. Pawlewicz and L. Lew, “Improving depth-first PN-search: trick,” 5th International Conference on Computers and Games, Germany, Berlin: Springer-Verlag, vol. 4630, pp. 160-170, 2006.
[41]A. Saffidine, N. Jouandeau, and T. Cazenave, “Solving breakthrough with race patterns and job-level proof number search,” Proc. 13th Advances in Computer Games Conference, pp. 196-207, 2011.
[42]J. T. Saito, G. Chaslot, J. W. H. M. Uiterwijk, and H. J. van den Herik, “Monte-Carlo Proof-Number Search for Computer Go,” Computers and Games (CG 2006), Lecture Notes in Computer Science, vol 4630, Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-540-75538-8_5.
[43]J. T. Saito, M. H. M. Winands, and H. J. van den Herik, “Randomized parallel proof number search,” Advances in Computer Games Conference (ACG’12), Germany, Berlin: Springer-Verlag, vol. 6048, pp. 75-87, 2009.
[44]M. Seo, H. Iida, and J. Uiterwijk, “The PN*-search algorithm: Application to Tsumeshogi,” Artificial Intelligence, vol. 129, no. 1–2, pp. 253-277, 2001.
[45]M. H. M. Winands, J. W. H. M. Uiterwijk, and H. J. van den Herik, “PDS-PN: A new proof number search algorithm: Application to lines of action,” Computers and Games 2002, Germany, Berlin: Springer-Verlag, vol. 2883, pp. 170-185, 2003. DOI: 10.1007/978-3-540-40031-8_5.
[46]I.-C. Wu, H.-H. Lin, D.-J. Sun, K.-Y. Kao, P.-H. Lin, Y.-C. Chan, and P.-T. Chen, “Job-Level Proof Number Search,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 5, no. 1, pp. 44-56, March 2013. DOI: 10.1109/TCIAIG.2012.2224659.
[47]I.-C. Wu, H.-H. Lin, P.-H. Lin, D.-J. Sun, Y.-C. Chan, and P.-T. Chen, “Job-Level Proof Number Search for Connect6,” Computers and Games (CG 2010), Lecture Notes in Computer Science, vol 6515, Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-642-17928-0_2.
[48]H. Baier and M. H. M. Winands, “MCTS-Minimax Hybrids,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 7, no. 2, pp. 167-179, 2015. DOI: 10.1109/tciaig.2014.2366555.
[49]H. Baier and M. H. M. Winands, “Monte-Carlo Tree Search and minimax hybrids,” Proc. 2013 IEEE Conference on Computational Intelligence in Games (CIG 2013), Niagara Falls, ON, Canada, 2013, pp. 1-8. DOI: 10.1109/CIG.2013.6633630.
[50]S. Gelly, Y. Wang, R. Munos, and O. Teytaud, “Modification of UCT with Patterns in Monte-Carlo Go,” RR-6062, Inria, 2006. Retrieved from https://hal.inria.fr/inria-00117266v3.
[51]T. Pepels, M. J. Tak, M. Lanctot, M. H. M. Winands, “Quality-based rewards for Monte Carlo tree search simulations,” 21st European Conference on Artificial Intelligence, Prague, Czech Republic, 2014. DOI: 10.3233/978-1-61499-419-0-705.
[52]C.-H. Hsueh, I-C. Wu, T.-s. Hsu, J.-C. Chen, “An Investigation of Strength Analysis Metrics for Game-Playing Programs: A Case Study in Chinese Dark Chess,” ICGA Journal, vol. 40, no. 2, pp. 77-104, 2018. DOI: 10.3233/ICG-180046.
[53]C.-H. Hsueh, I-C. Wu, W.-J. Tseng, S.-J. Yen, J.-C. Chen, “An Analysis for Strength Improvement of an MCTS-Based Program Playing Chinese Dark Chess,” Theoretical Computer Science, vol. 644, no. C, pp. 63-75, 2016.
[54]C.-H. Hsueh, I-C. Wu, W.-J. Tseng, S.-J. Yen, J.-C. Chen, “Strength Improvement and Analysis for an MCTS-Based Chinese Dark Chess Program,” 14th conference on Advances in Computer Games (ACG2015), Lecture Notes in Computer Science (LNCS) 9525, 29-40, Leiden, the Netherlands, 2015.
[55]S. Nair, “A Simple Alpha(Go) Zero Tutorial.” Retrieved from http://web.stanford.edu/~surag/posts/alphazero.html.
[56]S. Nair, “Open-source of AlphaZero General.” Retrieved from https://github.com/suragnair/alpha-zero-general.
[57]S. Thakoor, S. Nair, and M. Jhunjhunwala, “Learning to Play Othello without Human Knowledge,” 2017. Retrieved from https://github.com/suragnair/alpha-zero-general/raw/master/pretrained_models/writeup.pdf.
[58]T. Cazenave, Y.-C. Chen, G.-W. Chen, S.-Y. Chen, X.-D. Chiu, J. Dehos, M. Elsa, Q. Gong, H. Hu, V. Khalidov, C.-L. Li, H.-I Lin, Y.-J. Lin, X. Martinet, V. Mella, J. Rapin, B. Roziere,G. Synnaeve, F. Teytaud, O. Teytaud, S.-C. Ye, Y.-J. Ye, S.-J. Yen, and S. Zagoruyko, “Polygames.” Retrieved from https://github.com/facebookincubator/Polygames.
[59]T. Cazenave, Y.-C. Chen, G.-W. Chen, S.-Y. Chen, X.-D. Chiu, J. Dehos, M. Elsa, Q. Gong, H. Hu, V. Khalidov, C.-L. Li, H.-I Lin, Y.-J. Lin, X. Martinet, V. Mella, J. Rapin, B. Roziere,G. Synnaeve, F. Teytaud, O. Teytaud, S.-C. Ye, Y.-J. Ye, S.-J. Yen, and S. Zagoruyko, “Polygames: Improved zero learning,” ICGA Journal, vol. 42, no. 4, pp. 244-256, 2020. DOI: 10.3233/ICG-200157.
[60]FACEBOOK AI, "Open-sourcing Polygames, a new framework for training AI bots through self-play," Retrieved from https://ai.facebook.com/blog/open-sourcing-polygames-a-new-framework-for-training-ai-bots-through-self-play/.
[61]G.-C. Pascutto, “Open-source of Leela Zero.” Retrieved from https://github.com/leela-zero/leela-zero.
[62]G.-C. Pascutto, “Website of Leela Zero.” Retrieved from https://zero.sjeng.org/.
[63]D. J. Edwards, and T. P. Hart, “The Alpha-Beta Heuristic,” Dspace.mit.edu, 1961. Retrieved from http://dspace.mit.edu/handle/1721.1/6098.
[64]S. H. Fuller, J. G. Gaschnig, and J. J. Gillogly, “Analysis of the alpha-beta pruning algorithm,” Technical report, Carnegie Mellon University, Pittsburg, PA, July 1973.
[65]D. E. Knuth and R. W. Moore, “An analysis of alpha-beta pruning,” Artificial Intelligence, vol. 6, no. 4, pp. 293–326, 1975.
[66]L. Allis, H. Herik, and M. Huntjens, “Go-Moku Solved by New Search Techniques,” Computational Intelligence, vol. 12, no. 1, pp. 7-23, 1996. DOI: 10.1111/j.1467-8640.1996.tb00250.x.
[67]S.-J. Yen and J.-K. Yang, “Two-Stage Monte Carlo Tree Search for Connect6,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 3, no. 2, pp. 100-118, 2011. DOI: 10.1109/tciaig.2011.2134097.
[68]J. D. Allen, “Expert Play in Connect-Four.” Retrieved from http://blog.gamesolver.org/data/expert_play_in_connect_4.html.
[69]P. Pons, “Connect Four solver.” Retrieved from http://connect4.gamesolver.org.
[70]J. Tromp, “John’s Connect Four Playground.” Retrieved from https://tromp.github.io/c4/c4.html.
[71]S. Edelkamp, and P. Kissmann, “Symbolic Classification of General Two-Player Games,” Proc. Annual Conference on Artificial Intelligence (KI 2008), Springer, September 2008. DOI: 10.1007/978-3-540-85845-4_23.
[72]J. D. Allen, “A Note on the Computer Solution of Connect-Four,” In D. N. L. Levy, D. F. Beal (eds.) Heuristic Programming in Artificial Intelligence: The 1st Computer Olympiad, Ellis Horwood, pp. 134-135, 1989.
[73]J. W. H. M. Uiterwijk, H. J. van den Herik, and L. V. Allis, “A knowledge-based approach to connect four: The game is over, white to move wins,” In D. N. L. Levy, D. F. Beal (eds.) Heuristic Programming in Artificial Intelligence: The 1st Computer Olympiad, Ellis Horwood, pp. 113–133, 1989.
[74]J. Tromp, G. Farnebäck, “Combinatorics of Go,” Proc. 5th International Conference on Computers and Games (CG’06), May 2006, pp. 84-99. DOI: 10.1007/978-3-540-75538-8_8.
[75]M. Enzenberger, M. Müller, B. Arneson, and R. Segal, “Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search,” in IEEE Transactions on Computational Intelligence and AI in Games, vol. 2, no. 4, pp. 259-270, December 2010. DOI: 10.1109/TCIAIG.2010.2083662.
[76]J. Tromp, “Tromp-Taylor Rules.” Retrieved from https://tromp.github.io/go.html.
[77]H. J. Berliner, “Some Necessary Conditions for a Master Chess Program,” Proc. 3rd International Joint Conference on Artificial Intelligence (IJCAI 1973), Morgan Kaufmann Publishers Inc. August, pp. 77-85, 1973.
[78]H. Baier and M. Winands, “MCTS-Minimax Hybrids with State Evaluations,” Journal of Artificial Intelligence Research, vol. 62, pp. 193-231, 2018. DOI: 10.1613/jair.1.11208.
[79]M. Lanctot, M. H. M. Winands, T. Pepels and N. R. Sturtevant, “Monte Carlo Tree Search with heuristic evaluations using implicit minimax backups,” 2014 IEEE Conference on Computational Intelligence and Games, Dortmund, pp. 1-8, 2014. DOI: 10.1109/CIG.2014.6932903.
[80]L. V. Allis, H. J. Herik, and M. P. H. Huntjens, “Go-Moku and Threat-Space Search,” Computational Intelligence, vol. 12, 1994.
[81]C.-H. Chen, “On the Study of Pattern Classification and Evaluation Function for Connect6 Games,” Master’s Thesis, National Taiwan Normal University, Taipei, Taiwan, 2011. (in Chinese)
[82]C.-H. Chen, S.-S. Lin and Y.-C. Chen, “An Algorithmic Design and Implementation of Outer-Open Gomoku,” 2nd International Conference on Computer and Communication Systems (ICCCS 2017), Krakow, Poland, 2017. DOI: 10.1109/CCOMS.2017.8075180.
[83]Y.-C. Liu, “A Defensive Strategy Combined with Threat-Space Search for Connect6,” Master’s Thesis, National Taiwan Normal University, Taipei, Taiwan, 2009.
[84]I-C. Wu and D.-Y. Huang, “A New Family of k-in-a-row Games,” 11th Advances in Computer Games Conference (ACG’11), Taipei, Taiwan, 2005. DOI: 10.1007/11922155_14.
[85]I-C. Wu, D.-Y. Huang, and H.-C. Chang, “Connect6,” ICGA Journal, vol. 28, no. 4, pp. 235-242, 2005. DOI: 10.3233/ICG-2005-28405.
[86]I-C. Wu and P.-H. Lin, “Relevance-Zone-Oriented Proof Search for Connect6,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 2, no. 3, pp. 191–207, 2010. DOI: 10.1109/TCIAIG.2010.2060262.
[87]F. Pittner, “Website of Tothello.” Retrieved from http://www.tothello.com/.
[88]J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, and D. Silver, “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model,” 2019. Retrieved from https://arxiv.org/abs/1911.08265v2.
[89]J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, and D. Silver, “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model,” Nature, vol. 588, pp. 604–609, 2020. DOI: 10.1038/s41586-020-03051-4.
[90]W. Duvaud, A. Hainaut, and P. Lenoir, “MuZero General,” Retrieved from https://github.com/werner-duvaud/muzero-general.