國立臺灣師範大學博碩士論文全文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳冠綸 Chen, Kuan-Lun
論文名稱：	多智能體強化學習中的適當橢圓影響範圍分析 Proper Influence Ellipse Analysis in Multi-Agent Reinforcement Learning
指導教授：	陳建隆 Chern, Jann-Long 黃志煒 Huang, Chih-Wei
口試委員：	陳建隆 Chern, Jann-Long 黃志煒 Huang, Chih-Wei 林政宏 Lin, Cheng-Hung 陳志有 Chen, Zhi-You
口試日期：	2022/08/12
學位類別：	碩士 Master
系所名稱：	數學系 Department of Mathematics
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	22
中文關鍵詞：	多智能體強化學習、置信橢圓、星際爭霸多智能體挑戰賽
英文關鍵詞：	Multi-agent reinforcement learning, Confidence ellipse, StarCraft MultiAgent Challenge
研究方法:	實驗設計法
DOI URL：	http://doi.org/10.6345/NTNU202201386
論文種類：	學術論文
相關次數：	點閱：206 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近幾年在機器學習領域中，強化學習的應用越來越廣泛，希望智能體在某些規範下與環境互動後，學習到好的策略與行為模式，更能夠有效地達成目標，或是讓用戶有更好的體驗。但畢竟單獨的智能體與整個環境互動是有限的，所以往往需要與其他智能體合作，得知彼此的資訊，才能更了解所處環境，以便選擇更有效的策略達成目標。不過若要讓單一智能體了解整個環境，或是與整個環境下的智能體交換資訊，不僅耗費許多時間及資源，且不切實際。因此是否可以建立通訊功能，使智能體之間可以互相通訊，找到一個合適拓展的範圍，且智能體在通訊範圍內可以彼此交換訊息，也可以有效地達成目標?
在本篇論文中，利用指數衰減的特性，表現出兩智能體在相同策略下，對於彼此的影響會隨著距離的增加呈現指數衰減；並使用置信橢圓代表每一時間的隊伍，藉由置信橢圓的長短軸與隊友間距離得出合適的通訊距離模型。環境架構在星際爭霸多智能體挑戰賽（SMAC）下，訓練的好壞由勝率判斷。因此加入合適的通訊範圍訓練模型，相對於不能通訊或沒有合適通訊距離的模型，表現出勝率有更快的提升。

In recent years, reinforcement learning has become more and more widely used in the field of machine learning. It is hoped that after the agent interacts with the environment under certain norms, it can learn good policy and behavior patterns, to achieve goals more effectively, or let users have a better experience. But after all, the interaction between a single agent and the entire environment is limited, so it is often necessary to cooperate with other agents to obtain each other's information to better understand the environment, and choose a more effective policy to achieve the goal. However, it is time and resource-consuming for a single agent to understand the entire environment or to exchange information with agents in the entire environment. Therefore, is it possible to establish a communication function, so that the agents can communicate with each other, find a suitable expansion range, and the agents can exchange messages with each other within the communication range, and can also effectively achieve the goal?
In this paper, we use the property of exponential decay to show that under the same policy, the influence of two agents on each other will decay exponentially with the increase of distance. And use the confidence ellipse to represent the formation at each time, and obtain the appropriate communication distance model by the distance between the long and short axes of the confidence ellipse and the teammates. Environment architecture under the StarCraft Multi-Agent Challenge (SMAC), training quality is judged by winning percentage. Therefore, adding a suitable communication range to train a model shows a faster increase in win rate compared to a model that cannot communicate or does not have the appropriate communication distance.

Introduction 1
1 Background 1
2 Motivation 1
3 Contribution 2
Preliminaries 3
1 Markov decision process  3
2 Scenarios(SMAC) 4
Influence between agents 6
1 Exponential Decay Property  6
2 Experimental performance 8
Adding Communication to Reinforcement Learning 9
1 Communication tweaks and limitations in StarCraft II 9
2 Confidence ellipse 11
3 Extended model for adding communication 12
4 Comparison of model extension and real extension  15
Training Performance 18
Conclusions and Future Work 21
References 22
                                

[1] Feriani, A. & Hossain, E. (2021). Single and multi-agent deep reinforcement learning for AI-enabled wireless networks: A tutorial. IEEE Communications Surveys & Tutorials, 23(2), 1226-1252.
[2] Qu, G. & Lin, Y. & Wierman, A. & Li, N. (2020). Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward. Advances in Neural Information Processing Systems, 33, 2074-2086.
[3] Rashid, T. & Samvelyan, M. & Schroeder, C. & Farquhar, G. & Foerster, J. & Whiteson, S. (2018). QMIX: Monotonic value function factorisation for deep multi agent reinforcement learning. International conference on machine learning(pp. 4295-4304). PMLR.
[4] Sutton, R. S. & Barto, A. G. (2018). Reinforcement Learning: An Introduction.MIT press.
[5] Samvelyan, M. & Rashid, T. & De Witt & C.S. & Farquhar, G. & Nardelli, N. & Rudner, T. G. J. & Hung, C. & Torr, P. H. S. & Foerster, J. N. & Whiteson, S.(2019). The StarCraft multi-agent challenge. arXiv preprint arXiv:1902.04043.
[6] Sukhbaatar, S. & Fergus, R. (2016). Learning multiagent communication with backpropagation. Advances in neural information processing systems, 29.
[7] Spruyt V.How to draw an error ellipse representing the covariance matrix.Computer Vision for Dummies, 14.

簡易檢索 / 詳目顯示

相關論文