簡易檢索 / 詳目顯示

研究生: 李安民
Akbar, Ilham
論文名稱: 針對單一和多智能體人形機器人之創新雙演員近端策略優化算法
A Novel Dual-Actor Proximal Policy Optimization Algorithm for Single and Multi-Agent Humanoid Robot
指導教授: 包傑奇
Jacky Baltes
薩義德
Saeed Saeedvand
口試委員: 李祖聖
Li, Tzuu-Hseng
王偉彥
Wang, Wei-yen
包傑奇
Jacky Baltes
薩義德
Saeed Saeedvand
口試日期: 2024/07/01
學位類別: 碩士
Master
系所名稱: 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 68
英文關鍵詞: DA-PPO, IDA-PPO, Single Agent, Multi Agent, reinforcement learning, cooperative tasks, humanoid robots, robotic navigation
DOI URL: http://doi.org/10.6345/NTNU202400949
論文種類: 學術論文
相關次數: 點閱:119下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

Single-agent and multi-agent systems are integral to the dynamic environmental processes of reinforcement learning in advanced humanoid robotic applications. This thesis introduces the Dual Proximal Policy Optimization (DA-PPO) algorithm and its extension, Independent Dual Actor Proximal Policy Optimization (IDA-PPO),designed for robotic navigation and cooperative tasks using the ROBOTIS-OP3 humanoid robot. The study validates the effectiveness of DA-PPO and IDA-PPO cross various scenarios, demonstrating significant improvements in both single-agent and multi-agent environments. DA-PPO excels in robotic navigation and movement tasks, outperforming established reinforcement learning methods in complex environments and basic walking tasks. This success is attributed to its innovative architecture, efficient utilization of hardware resources like the NVIDIA GeForce RTX 3050, and an effective reward function strategy. IDA-PPO, with its decentralized training and dual actor policy network, achieves higher mean rewards and faster learning compared to IPPO and MAPPO. IDA-PPO is 5.49 times faster than MAPPO and 8.22 times faster than IPPO, highlighting its superior efficiency and adaptability in multi-agent tasks. These findings underscore the importance of algorithmic innovation and hardware capabilities in advancing robotic performance, positioning DA-PPO and IDA-PPO as significant advancements in robotic learning

Chapter1 Introduction 1 1.1 Background 1 1.2 Research Aim and Objective 2 1.3 Key Contributions 3 Chapter2 Literature Review 5 2.1 Reinforcement Learning 5 2.2 Model-free Reinforcement learning for locomotion control 6 2.2.1 Single-Agent Reinforcement Learning 8 2.2.1.1 PPO-Proximal Policy Optimization 9 2.2.1.2 SAC-Soft Actor Critic 10 2.2.1.3 DDPG-Deep Deterministic Policy Gradient 12 2.2.1.4 TD3-TwinDelayDDPG 14 2.2.2 Multi-Agent Reinforcement Learning 16 2.2.2.1 IPPO-Independent Proximal Policy Optimization 18 2.2.2.2 MAPPO-Multi Agent Proximal Policy Optimization 20 2.3 Multi Actor Mechanism 22 Chapter3 Methodology 23 3.1 ROBOTIS-OP3 Robot 23 3.2 Dual-Actor Proximal Policy Framework for Single Agent 24 3.2.1 Observation Space 27 3.2.2 Action Space 28 3.2.3 Reward Function 28 3.3 Independent Dual-Actor Proximal Policy Framework for Multi Agent 31 3.3.1 Observation Space 32 3.3.2 Motion Control 34 3.3.2.1 Grasping Control 34 3.3.2.2 Action Space 34 3.3.3 Reward Function 35 Chapter4 Experimental Result 40 4.1 Experimental Results for Single Agent in Navigation Control 40 4.1.1 Joint Position and Velocity Analysis 42 4.1.2 Comparative Analysis and Performance in Complex Environments 44 4.2 Experimental Results for Multi-Agent Cooperative Object Carrying . 48 4.2.1 Balance Control of MARL(uprojz Reference=0.95) . . . . . . . . . 50 4.2.2 Upright Balance and Height Stability of the Object 51 4.2.3 Inter-Agent Distance Analysis(d(agent1,agent2)Reference=0.64) 53 4.2.4 Yaw Rate Performance(ψReference=0) 54 4.2.5 JointPositionandVelocityAnalysisoftheMulti-AgentSystem 55 4.2.6 Comparative Performance of MARL 58 Chapter5 Summary 61 5.1 Conclusion 61 5.2 Future Work 61 References 63

S. Saeedvand, M. Jafari, H. S. Aghdasi, and J. Baltes, “A comprehensive survey on humanoid robot development,” The Knowledge Engineering Review, vol. 34, p. e20, 2019.
D. Rodriguez and S. Behnke, “Deepwalk: Omnidirectional bipedal gait by deep reinforcement learning,” in 2021 IEEE International Conference on Robotics and Automation(ICRA), pp. 3033–3039, 2021.
J. Baltes, G. Christmann, and S. Saeedvand, “A deep reinforcement learning algorithm to control a two-wheeled scooter with a humanoid robot,” Engineering Applications of Artificial Intelligence, vol. 126, p. 106941, 2023.
S. Saeedvand, H. Mandala, and J. Baltes, “Hierarchical deep reinforcement learning to drag heavy objects by adult-sized humanoid robot,” Applied Soft Computing, vol. 110,p. 107601, 2021.
R. Sutton and A. Barto, “Reinforcement learning: An introduction,” IEEE Transactions on Neural Networks, vol. 9, pp. 1054–1054, 1998.
H. Xu, Z. Yan, J. Xuan, G. Zhang, and J. Lu, “Improving proximal policy optimization with alpha divergence,” Neurocomputing, vol. 534, pp. 94–105, 2023.
J. Reher and A. D. Ames, “Dynamic walking: Toward agile and efficient bipedal robots,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 4, no. 1, pp. 535–572,2021.
S. Djebrani and F. Abdessemed, “Multi-agent prototyping for a cooperative carrying task,” in 2009 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 1421 1426, 2009.
C. Yu, X. Yang, J. Gao, J. Chen, Y. Li, J. Liu, Y. Xiang, R. Huang, H. Yang, Y. Wu, and Y. Wang, “Asynchronous multi-agent reinforcement learning for efficient real-time multi robot cooperative exploration,” CoRR, vol. abs/2301.03398, 2023.
A. K. Shakya, G. Pillai, and S. Chakrabarty, “Reinforcement learning algorithms: A brief survey,” Expert Systems with Applications, vol. 231, p. 120495, 2023.
F. AlMahamid and K. Grolinger, “Reinforcement learning algorithms: An overview and classification,” 2021 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–7, 2021.
Y. Li, “Deep reinforcement learning: An overview,” ArXiv, vol. abs/1701.07274, 2017.
L. Ye, H. Liu, X. Wang, B. Liang, and B. Yuan, “Multi-task control for a quadruped robot with changeable leg configuration,” pp. 3944–3950, 2020.
M. Kim, J.-S. Kim, and J.-H. Park, “Automated hyperparameter tuning in reinforcement learning for quadrupedal robot locomotion,” Electronics, vol. 13, no. 1, 2024.
Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,” 2024.
L. Kumar, S. Sortee, T. Bera, and R. Dasgupta, “Enhancing efficiency of quadrupedal locomotion over challenging terrains with extensible feet,” CoRR, vol. abs/2305.01998,2023.
J. Dao, K.Green, H.Duan, A.Fern, andJ.Hurst, “Sim-to-real learning for bipedal locomotion under unsensed dynamic loads,” in 2022 International Conference on Robotics and Automation (ICRA), p. 10449–10455, IEEE Press, 2022.V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller,N. Rudin, A. Allshire, A. Handa, and G. State, “Isaac gym: High performance gpu-based physics simulation for robot learning,” ArXiv, vol. abs/2108.10470, 2021.
Z.Xie, H.Y.Ling, N.H.Kim,andM.vandePanne,“Allsteps: curriculum-drivenlearning of stepping stone skills,” in Computer Graphics Forum, vol. 39, pp. 213–224, 2020.
Y. Liu, H. An, and H. Ma, “A biped robot learning to walk like human by reinforcement learning,” in Proceedings of the 4th International Conference on Advanced Information Science and System, AISS ’22, (New York, NY, USA), Association for Computing Machinery, 2023.
R.P.Singh, M.Benallegue, M.Morisawa, R.Cisneros, andF.Kanehiro, “Learningbipedal walking on planned footsteps for humanoid robots,” in 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids), pp. 686–693, 2022.
H.Mou,J.Xue,J.Liu,Z.Feng,Q.Li,andJ.Zhang,“Amulti-agentreinforcementlearning method for omnidirectional walking of bipedal robots,” Biomimetics, vol. 8, no. 8, 2023.
T. Haarnoja, B. Moran, G. Lever, S. H. Huang, D. Tirumala, M. Wulfmeier, J. Humplik, S. Tunyasuvunakool, N. Siegel, R. Hafner, M. Bloesch, K. Hartikainen, A. Byravan, L. Hasenclever, Y. Tassa, F. Sadeghi, N. Batchelor, F. Casarini, S. Saliceti, C. Game,N. Sreendra, K. Patel, M. Gwira, A. Huber, N. Hurley, F. Nori, R. Hadsell, and N. M. O.Heess, “Learning agile soccer skills for a bipedal robot with deep reinforcement learning,”ArXiv, vol. abs/2304.13653, 2023.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” ArXiv, vol. abs/1707.06347, 2017.
C. Qiu, Y. Hu, Y. Chen, and B. Zeng, “Deepdeterministic policy gradient (ddpg)-based energy harvesting wireless communications,” IEEE Internet of Things Journal, vol. 6, no. 5, p. 8577–8588, 2019.
S. Dankwa and W. Zheng, “Twin-delayed ddpg: A deep reinforcement learning technique to model a continuous movement of an intelligent robot agent,” in Proceedings of the 3rd International Conference on Vision, Image and Signal Processing, ICVISP 2019, (New York, NY, USA), Association for Computing Machinery, 2020.
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” ArXiv, vol. abs/1801.01290, 2018.
Y. Gu, Y. Cheng, C. L. P. Chen, and X. Wang, “Proximal policy optimization with policy feedback,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 7, pp. 4600–4610, 2022.
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. M. O. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” CoRR, vol. abs/1509.02971, 2015.
S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International Conference on Machine Learning, 2018.
Z. Ning and L. Xie, “A survey on multi-agent reinforcement learning and its application,” Journal of Automation and Intelligence, 2024.
A.Wong,T.Bäck,A.V.Kononova,andA.Plaat,“Deepmultiagentreinforcementlearning: challenges and directions,” Artif. Intell. Rev., vol. 56, p. 5023–5056, oct 2022.
L. Li, Y. Li, W. Wei, Y. Zhang, and J. Liang, “Multi-actor mechanism for actor-critic reinforcement learning,” Information Sciences, vol. 647, p. 119494, 2023.
N. Gupta, S. Anand, T. Joshi, D. Kumar, M. Ramteke, and H. Kodamana, “Process control of mab production using multi-actor proximal policy optimization,” Digital Chemical Engineering, vol. 8, p. 100108, 2023.
B. Dai, A. E. Shaw, N. He, L. Li, and L. Song, “Boosting the actor with dual critic,” ArXiv, vol. abs/1712.10282, 2017.
D.Hendrycks and K.Gimpel, “Gaussianerrorlinear units (gelus),” arXiv: Learning, 2016.
J. T. KimandS.Ha,“Observationspacematters: Benchmark and optimization algorithm,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 15271534, 2021.

下載圖示
QR CODE