Author: |
張乃元 Chang, Nai-Yuan |
Thesis Title: |
改進AlphaZero的大贏策略並應用於黑白棋 The Big Win Strategy: An improvement over AlphaZero approach for Othello |
Advisor: |
Lin, Shun-Shii |
Degree: |
碩士 Master |
Department: |
資訊工程學系 Department of Computer Science and Information Engineering |
Thesis Publication Year: | 2019 |
Academic Year: | 107 |
Language: | 中文 |
Number of pages: | 52 |
Keywords (in Chinese): | 電腦對局 、黑白棋 、蒙地卡羅法 、神經網路 、深度學習 |
DOI URL: | |
Thesis Type: | Academic thesis/ dissertation |
Reference times: | Clicks: 409 Downloads: 61 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
DeepMind's AlphaZero algorithm has achieved great success in the field of computer game, and has surpassed human performance in many challenging games, but we believe there still has some point for improvement in the AlphaZero algorithm.
The AlphaZero algorithm only estimates whether the game wins or loses, and ignores how many points may be obtained in the end. In a land-based game like Go or Othello, the final score will tend to be quite a big game. So we propose Big Win Strategy: add the judgment of the score in the AlphaZero algorithm. To improve the efficiency of the algorithm.
In this paper, we used 8x8 Othello as the game for the Big Win Strategy. We used and modified an open source project on the Internet that implemented the AlphaZero algorithm: alpha-zero-general for our experiments. After our experiments, the model using the Big Win Strategy has a winning rate of 78% after 100 iterations compared to the original AlphaZero model, which proves that the Big Win Strategy has significant improvement benefits for the AlphaZero algorithm.
[1] Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), pp.484-489.
[2] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T. and Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), pp.354-359.
[3] Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K. and Hassabis, D. (2018). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. [Accessed 10 Jul. 2018].
[4] Knuth, D. E. & Moore, R. W. An analysis of alpha-beta pruning. Artif. Intell. 6, 293–326 (1975)
[5] Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S. and Colton, S. (2012). A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), pp.1-43.
[6] Coulom R. (2007) Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In:van den Herik H.J., Ciancarini P., Donkers H.H.L.M.. (eds) Computers and Games. CG 2006. Lecture Notes in Computer Science, vol 4630. Springer, Berlin, Heidelberg.
[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, arXiv:1512.03385v1 [cs.CV] 10 Dec 2015.
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Identity Mappings in Deep Residual Networks, arXiv:1603.05027v3 [cs.CV] 25 Jul 2016.
[9] Convolutional neural network, Convolutional_neural_network
[10] Project of AlphaZero General, alpha-zero-general.
[11] A Simple Alpha(Go) Zero Tutorial, alphazero.html.
[12] Learning to Play Othello Without Human Knowledge,
[13] Nai-Yuan Chang, Chih-Hung Chen, Shun-Shii Lin, Surag Nair, The Big Win Strategy on Multi-Value Network:An Improvement over AlphaZero Approach for 6x6 Othello, MLMI2018 Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence, Pages 78-81.