研究生: |
陳冠綸 Chen, Kuan-Lun |
---|---|
論文名稱: |
多智能體強化學習中的適當橢圓影響範圍分析 Proper Influence Ellipse Analysis in Multi-Agent Reinforcement Learning |
指導教授: |
陳建隆
Chern, Jann-Long 黃志煒 Huang, Chih-Wei |
口試委員: |
陳建隆
Chern, Jann-Long 黃志煒 Huang, Chih-Wei 林政宏 Lin, Cheng-Hung 陳志有 Chen, Zhi-You |
口試日期: | 2022/08/12 |
學位類別: |
碩士 Master |
系所名稱: |
數學系 Department of Mathematics |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 22 |
中文關鍵詞: | 多智能體強化學習 、置信橢圓 、星際爭霸多智能體挑戰賽 |
英文關鍵詞: | Multi-agent reinforcement learning, Confidence ellipse, StarCraft MultiAgent Challenge |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202201386 |
論文種類: | 學術論文 |
相關次數: | 點閱:121 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年在機器學習領域中,強化學習的應用越來越廣泛,希望智能體在某些規範下與環境互動後,學習到好的策略與行為模式,更能夠有效地達成目標,或是讓用戶有更好的體驗。但畢竟單獨的智能體與整個環境互動是有限的,所以往往需要與其他智能體合作,得知彼此的資訊,才能更了解所處環境,以便選擇更有效的策略達成目標。不過若要讓單一智能體了解整個環境,或是與整個環境下的智能體交換資訊,不僅耗費許多時間及資源,且不切實際。因此是否可以建立通訊功能,使智能體之間可以互相通訊,找到一個合適拓展的範圍,且智能體在通訊範圍內可以彼此交換訊息,也可以有效地達成目標?
在本篇論文中,利用指數衰減的特性,表現出兩智能體在相同策略下,對於彼此的影響會隨著距離的增加呈現指數衰減;並使用置信橢圓代表每一時間的隊伍,藉由置信橢圓的長短軸與隊友間距離得出合適的通訊距離模型。環境架構在星際爭霸多智能體挑戰賽(SMAC)下,訓練的好壞由勝率判斷。因此加入合適的通訊範圍訓練模型,相對於不能通訊或沒有合適通訊距離的模型,表現出勝率有更快的提升。
In recent years, reinforcement learning has become more and more widely used in the field of machine learning. It is hoped that after the agent interacts with the environment under certain norms, it can learn good policy and behavior patterns, to achieve goals more effectively, or let users have a better experience. But after all, the interaction between a single agent and the entire environment is limited, so it is often necessary to cooperate with other agents to obtain each other's information to better understand the environment, and choose a more effective policy to achieve the goal. However, it is time and resource-consuming for a single agent to understand the entire environment or to exchange information with agents in the entire environment. Therefore, is it possible to establish a communication function, so that the agents can communicate with each other, find a suitable expansion range, and the agents can exchange messages with each other within the communication range, and can also effectively achieve the goal?
In this paper, we use the property of exponential decay to show that under the same policy, the influence of two agents on each other will decay exponentially with the increase of distance. And use the confidence ellipse to represent the formation at each time, and obtain the appropriate communication distance model by the distance between the long and short axes of the confidence ellipse and the teammates. Environment architecture under the StarCraft Multi-Agent Challenge (SMAC), training quality is judged by winning percentage. Therefore, adding a suitable communication range to train a model shows a faster increase in win rate compared to a model that cannot communicate or does not have the appropriate communication distance.
[1] Feriani, A. & Hossain, E. (2021). Single and multi-agent deep reinforcement learning for AI-enabled wireless networks: A tutorial. IEEE Communications Surveys & Tutorials, 23(2), 1226-1252.
[2] Qu, G. & Lin, Y. & Wierman, A. & Li, N. (2020). Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward. Advances in Neural Information Processing Systems, 33, 2074-2086.
[3] Rashid, T. & Samvelyan, M. & Schroeder, C. & Farquhar, G. & Foerster, J. & Whiteson, S. (2018). QMIX: Monotonic value function factorisation for deep multi agent reinforcement learning. International conference on machine learning(pp. 4295-4304). PMLR.
[4] Sutton, R. S. & Barto, A. G. (2018). Reinforcement Learning: An Introduction.MIT press.
[5] Samvelyan, M. & Rashid, T. & De Witt & C.S. & Farquhar, G. & Nardelli, N. & Rudner, T. G. J. & Hung, C. & Torr, P. H. S. & Foerster, J. N. & Whiteson, S.(2019). The StarCraft multi-agent challenge. arXiv preprint arXiv:1902.04043.
[6] Sukhbaatar, S. & Fergus, R. (2016). Learning multiagent communication with backpropagation. Advances in neural information processing systems, 29.
[7] Spruyt V.How to draw an error ellipse representing the covariance matrix.Computer Vision for Dummies, 14.