研究生: |
邱宣凱 Chiu, Hsuan-Kai |
---|---|
論文名稱: |
開放式學習應用於優化多目標的連子棋類遊戲 Multi-Objective Optimization Based on Open-ended Learning Applies to Connection Games |
指導教授: |
林順喜
Shun-Shii Lin |
口試委員: |
林順喜
Shun-Shii Lin 吳毅成 I-Chen Wu 周信宏 Hsin-Hung Chou 陳志昌 Jr-Chang CHEN 顏士淨 Yen, Shi-Jim |
口試日期: | 2024/07/01 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 50 |
中文關鍵詞: | 連子棋 、AlphaZero 、開放式學習 |
英文關鍵詞: | Connection Games, AlphaZero, Open-ended Learning |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202401569 |
論文種類: | 學術論文 |
相關次數: | 點閱:88 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Open-ended learning是Google DeepMind在2021提出的一種AI,與以前常見的AI不同,Open-ended learning的AI並不會將一種任務做到最佳化,但Open-ended的AI可以做到多種不同的任務,是以多目標最佳化為訴求的AI。目前由於Open-ended learning 是一種非常新的概念,其文獻的數量處於一個相對較少的狀況,實作方面也是在一個較為模糊的階段。故本研究希望使用相對熟悉的技術以及遊戲規則,來嘗試實作出與Open-ended learning類似或是相同的AI。
連子棋是一種雙人對弈的遊戲,雙方玩家在圍棋棋盤上輪次落子,先將指定顆數的己方的棋子連成任何橫縱斜方向者為勝。而本研究使用的五子棋、四子棋、及三子棋,規則上除了目標棋子數為五顆、四顆和三顆之外,還有縮小了棋盤的大小。
由於Open-ended learning的AI的訓練資料是由程式生成的,故本研究打算以能透過自我對弈來產生訓練資料的alpha-zero-general,來做為實現Open-ended learning的AI的核心,本實驗透過修改alpha-zero-general中自我對弈的部分來使訓練出來的AI獲得可以下多種棋規的能力。
Open-ended learning is a type of AI proposed by Google DeepMind in 2021. Unlike traditional AI, which optimizes for a single task, Open-ended learning AI is designed to perform multiple different tasks, aiming for multi-objective optimization. Currently, due to the novelty of the Open-ended learning concept, the number of related literatures is relatively small, and its practical implementation is still in a somewhat ambiguous stage. Therefore, this study aims to use relatively familiar techniques and game rules to attempt to implement an AI similar to or identical to Open-ended learning.
A connection game is a two-player board game where players take turns placing stones on a board, and the first player to align a specified number of their stones in any horizontal, vertical, or diagonal direction wins. In this study, we use connection game variations with targets of five, four, and three stones to win, and we also reduce the board size accordingly.
Since the training data for Open-ended learning AI is generated by programs, this study intends to use alpha-zero-general, which can generate training data through self-play, as the core to achieve Open-ended learning AI. This experiment modifies the self-play aspect of alpha-zero-general to enable the trained AI to handle multiple game rules.
黃德彥(2004). 五子棋相關棋類人工智慧之研究,國立交通大學資訊科學與工程研究所碩士論文。
D. Silver et al., “Mastering Chess and Shogi by Self-Play with a GeneralReinforcement Learning Algorithm,”, arXiv.1712.01815, Dec. 2017.
Open Ended Learning Team, “Open-Ended Learning Leads to Generally Capable Agents,” arXiv, Jul. 2021.
Open Ended Learning Team, “Generally capable agents emerge from open-ended play,” Available:https://deepmind.com/blog/article/generally-capable-agents-emerge-from-open-ended-play.
M. Jaderberg, W. M. Czarnecki, I. Dunning, T. Graepel, L. Marris, “Capture the Flag: the emergence of complex cooperative agents,” Available:https://www.deepmind.com/blog/capture-the-flag-the-emergence-of-complex-cooperative-agents.
suragnair/alpha-zero-general, suragnair/alpha-zero-general.
陳昌裕(2013). 五子棋新棋規與五~七路五子棋勝負問題之研究,國立臺灣師範大學資訊科學與工程研究所碩士論文。
IJCAI International Joint Conference on Artificial Intelligence, ISSN: 1045-0823, Vol: 2019-August, pp. 4704-4710.