國立臺灣師範大學博碩士論文全文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	饒鏞 Jao, Yung
論文名稱：	MuZero 演算法結合連續獲勝走步改良外圍開局五子棋程式 Combining MuZero Algorithm with Consecutive Winning Moves to Improve the Outer-Open Gomoku Program
指導教授：	林順喜 Lin, Shun-Shii
口試委員：	許舜欽 Hsu, Shun-Chin 吳毅成 Wu, I-Chen 顏士淨 Yen, Shi-Jim 陳志昌 Chen, Jr-Chang 張紘睿 Chang, Hung-Jui 林順喜 Lin, Shun-Shii
口試日期：	2022/08/03
學位類別：	碩士 Master
系所名稱：	資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	47
中文關鍵詞：	MuZero 、神經網路、迫著搜索、連續獲勝走步
英文關鍵詞：	MuZero, Neural Network, Threats-Space Search, Consecutive Winning Moves
研究方法:	實驗設計法、比較研究
DOI URL：	http://doi.org/10.6345/NTNU202201075
論文種類：	學術論文
相關次數：	點閱：399 下載：37
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

2019年，DeepMind所開發的MuZero演算法使用「零知識」學習，將人工智慧帶往更加通用的研究領域。由於以此演算法所開發的Muzero-general原始版本外五棋程式，其模型訓練時只估計遊戲的結束狀態，增添了許多訓練時的不確定性，於是本研究嘗試以連續獲勝走步改良此外五棋程式。
迫著走步是外五棋遊戲當中非常重要的獲勝手段，連續獲勝走步則是在正確使用迫著走步後，所得出的獲勝走步。本研究透過連續獲勝走步原則，進一步以對局過程中是否有提供以迫著搜索得出之連續獲勝走步，以及不同的迫著搜索設計結合不同情況的連續獲勝走步獎勵，設計了三種不同的改良方法。
實驗結果表明，在相同的訓練時間下，三種方法均成功對原始版本進行改良，其中採用加入主動進攻走步之迫著搜索設計為棋力最強的方法。

關鍵詞 : MuZero、神經網路、迫著搜索、連續獲勝走步

In 2019, the MuZero algorithm developed by DeepMind used "no knowledge" learning to bring artificial intelligence to a more general research field. Since the original version of Muzero-general developed by this algorithm only estimates the ending state of the game during training, it adds a lot of uncertainty during training, so this study attempts to improve the Outer-Open Gomoku program with consecutive winning moves.
Using threat moves is a very important way to win in the game of Outer-Open Gomoku, and the consecutive winning moves are the winning moves obtained from the correct use of the threat moves. Through combining MuZero Algorithm with consecutive winning moves , this study further designs three different methods.
The experimental results show that, under the same training time, the three methods have all successfully improved the original version. Among them, the second one that the threat moves include the active offensive moves is the most powerful method.

Keywords: MuZero, Neural Network, Threats-Space Search, Consecutive Winning Moves

第一章 緒論 1
1. 研究背景 1
2. 研究目的 4
3. 研究意義 5
第二章 文獻探討 6
1. MuZero 6
2. 五子棋 8
3. 迫著搜尋 10
4. 相關程式介紹 13
第三章 方法與步驟 15
1. Muzero-general 16
2. 方法一：關鍵雙迫著獎勵法 19
3. 方法二：結合雙迫著於蒙地卡羅樹搜索法 23
4. 方法三：連續迫著獎勵法 27
第四章 實驗與結果 32
1. 方法一：關鍵雙迫著獎勵法實驗結果 33
2. 方法二：結合雙迫著於蒙地卡羅樹搜索法實驗結果 34
3. 方法三：連續迫著獎勵法實驗結果 35
4. 方法二及方法三對上弈心實驗結果 40
第五章 結論與未來工作 43
參考文獻 45
                                

[1] IBM.com, IBM research pages on Deep Blue, https://www.research.ibm.com/deepblue/.
[2] D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis, "Mastering
the game of Go with deep neural networks and tree search," Nature, vol.
529(7587), pp.484-489, 2016.
[3] DeepMind, Google DeepMind Challenge Match: Lee Sedol vs AlphaGo, https://www.deepmind.com/research/highlighted-research/alphago/the-challenge-match.
[4] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis (2017). "Mastering the game of Go without human knowledge," Nature, vol. 550(7676), pp.354-359, 2017.
[5] Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. 46 Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, D. Silver, "Mastering atari, go, chess and shogi by planning with a learned model," Nature, vol. 588(7839), pp. 604–609, 2020.
[6] C. H. Chen, W. L. Wu, Y. H. Chen, and S. S. Lin, "Some improvements in Monte Carlo tree search algorithms for sudden death games," ICGA Journal, vol. 40, no. 4, pp. 460–470, 2019.
[7] H. Shuai, and H. He, "Online scheduling of a residential microgrid via Monte-Carlo tree search and a learned model," IEEE Transactions on Smart Grid, vol. 12(2),pp.1073-1087, 2021.
[8] L. V. Allis, "Searching for solutions in games and artificial intelligence," Ph.D. thesis, University of Limburg, Maastricht, The Netherlands, 1994.
[9] L. V. Allis, H. J. Herik, and M. P. H. Huntjens, "Go-Moku solved by new search techniques," Computational Intelligence, vol. 12, no. 1, pp. 7–23, 1996.
[10] S. S. Lin, and C. Y. Chen, "How to rescue Gomoku? The introduction
of Lin's new rule," (in Chinese) The 2012 Conference on Technologies and Applications of Artificial Intelligence (TAAI 2012), Tainan, Taiwan, 2012.
[11] L. V. Allis, H. van den Herik, and M. P. H. Huntjens, “Go-moku solved 47
by new search techniques,” Computation Intelligence, vol. 12, pp. 7-23, 1996.
[12] 劉浩萱，AlphaZero 演算法結合快贏策略或迫著空間實現於五子棋。國立臺灣師範大學碩士論文，2020。
[13] Muzero-general, https://github.com/werner-duvaud/muzero-general.
[14] Yixin, https://www.aiexp.info/pages/yixin.html.

簡易檢索 / 詳目顯示

相關論文