研究生: |
江瑞飛 Rifky Afriza |
---|---|
論文名稱: |
新型態合作型深度強化學習方法用於多智能個體協作任務 A Novel Cooperative Deep Reinforcement Learning to Learn How to Communicate in Multi-Agent Cooperative Tasks |
指導教授: |
包傑奇
Jacky Baltes 薩義德 Saeed Saeedvand |
口試委員: |
包傑奇
Jacky Baltes 薩義德 Saeed Saeedvand 陳瑄易 Chen, Syuan-Yi 李祖聖 Li, Tzuu-Hseng |
口試日期: | 2024/07/01 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 39 |
中文關鍵詞: | 多智能體強化學習 、深度強化學習 |
英文關鍵詞: | Multi-Agent Reinforcement Learning, Deep reinforcement learning |
研究方法: | 實驗設計法 、 行動研究法 |
DOI URL: | http://doi.org/10.6345/NTNU202400884 |
論文種類: | 學術論文 |
相關次數: | 點閱:114 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多智能體強化學習 (MARL) 在處理合作任務時面臨巨大的挑戰,主要因為狀態空間的龐大。傳統方法如獨立近端策略優化 (IPPO) 缺乏對其他智能體的感知,而集中化方法如多智能體近端策略優化 (MAPPO) 則採用集中學習與分散策略。在本研究中,我們引入了一種新穎的以通信為中心的方法,其中智能體將其狀態信息與行動信息一起編碼,創建動態的信息交換通道。通過促進智能體之間的信息交換,我們的方法在個體決策和協作任務完成之間架起了橋樑。通過實證評估,我們展示了我們的方法在提高多種合作MARL場景中的收斂性和性能的有效性,從而推動了在集中框架內的分散策略學習的邊界.
Multi-Agent Reinforcement Learning (MARL) faces formidable challenges when tackling cooperative tasks due to the expansive state space. Traditional approaches, such as Independent Proximal Policy Optimization (IPPO), lack awareness of other agents, while centralized methods like Multi-Agent Proximal Policy Optimization (MAPPO) employ centralized learning with decentralized policies. In this study, This research introduces a novel communication-centric approach where agents encode their state information alongside action messages, creating dynamic channels of information exchange. By facilitating information exchange among agents, our approach bridges the gap between individual decision-making and collaborative task completion. Through empirical evaluations, we demonstrate the effectiveness of our method in improving convergence and performance across diverse cooperative MARL scenarios, thus pushing the boundaries of decentralized policy learning within a centralized framework.
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
S. Saeedvand, H. Mandala, and J. Baltes, “Hierarchical deep reinforcement learning to drag heavy objects by adult-sized humanoid robot,” Applied Soft Computing, vol. 110, p. 107601, 2021.
Y. Chen, T. Wu, S. Wang, X. Feng, J. Jiang, Z. Lu, S. McAleer, H. Dong, S.-C. Zhu, and Y. Yang, “Towards human-level bimanual dexterous manipulation with reinforcement learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 5150–5163, 2022.
J. Baltes, G. Christmann, and S. Saeedvand, “A deep reinforcement learning algorithm to control a two-wheeled scooter with a humanoid robot,” Engineering Applications of Artificial Intelligence, vol. 126, p. 106941, 2023.
R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” Advances in neural information processing systems, vol. 30, 2017.
H. Liu, “Cooperative multi-agent game based on reinforcement learning,” High-Confidence Computing, vol. 4, no. 1, p. 100205, 2024.
X. Hao, H. Mao, W. Wang, Y. Yang, D. Li, Y. Zheng, Z. Wang, and J. Hao, “Breaking the curse of dimensionality in multiagent state space: A unified agent permutation framework,” arXiv preprint arXiv:2203.05285, 2022.
C. S. De Witt, T. Gupta, D. Makoviichuk, V. Makoviychuk, P. H. Torr, M. Sun, and S. Whiteson, “Is independent learning all you need in the starcraft multi-agent challenge?,” arXiv preprint arXiv:2011.09533, 2020.
C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of ppo in cooperative multi-agent games,” Advances in Neural Information Processing Systems, vol. 35, pp. 24611–24624, 2022.
J. Foerster, I. A. Assael, N. De Freitas, and S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning,” Advances in neural information processing systems, vol. 29, 2016.
A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau, “TarMAC: Targeted multi-agent communication,” in Proceedings of the 36th International Conference on Machine Learning (K. Chaudhuri and R. Salakhutdinov, eds.), vol. 97 of Proceedings of Machine Learning Research, pp. 1538–1546, PMLR, 09–15 Jun 2019.
S. Sukhbaatar, A. Szlam, and R. Fergus, “Learning multiagent communication with backpropagation,” in Advances in Neural Information Processing Systems (D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, eds.), vol. 29, Curran Associates, Inc., 2016.
N. Jaques, A. Lazaridou, E. Hughes, C. Gulcehre, P. Ortega, D. Strouse, J. Z. Leibo, and N. De Freitas, “Social influence as intrinsic motivation for multi-agent deep reinforcement learning,” in Proceedings of the 36th International Conference on Machine Learning (K. Chaudhuri and R. Salakhutdinov, eds.), vol. 97 of Proceedings of Machine Learning Research, pp. 3040–3049, PMLR, 09–15 Jun 2019.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347, 2017.
J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in Proceedings of the 32nd International Conference on Machine Learning (F. Bach and D. Blei, eds.), vol. 37 of Proceedings of Machine Learning Research, (Lille, France), pp. 1889–1897, PMLR, 07–09 Jul 2015.
V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint arXiv:2108.10470, 2021.
B. Peng, T. Rashid, C. Schroeder de Witt, P.-A. Kamienny, P. Torr, W. Böhmer, and S. Whiteson, “Facmac: Factored multi-agent centralised policy gradients,” Advances in Neural Information Processing Systems, vol. 34, pp. 12208–12221, 2021.