研究生: 李安民
Akbar, Ilham
論文名稱: 針對單一和多智能體人形機器人之創新雙演員近端策略優化算法
A Novel Dual-Actor Proximal Policy Optimization Algorithm for Single and Multi-Agent Humanoid Robot
指導教授: 包傑奇
Jacky Baltes
Saeed Saeedvand
口試委員: 李祖聖
Li, Tzuu-Hseng
Wang, Wei-yen
Jacky Baltes
Saeed Saeedvand
口試日期: 2024/07/01
學位類別: 碩士
系所名稱: 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 68
英文關鍵詞: DA-PPO, IDA-PPO, Single Agent, Multi Agent, reinforcement learning, cooperative tasks, humanoid robots, robotic navigation
DOI URL: http://doi.org/10.6345/NTNU202400949
論文種類: 學術論文
  • Single-agent and multi-agent systems are integral to the dynamic environmental processes of reinforcement learning in advanced humanoid robotic applications. This thesis introduces the Dual Proximal Policy Optimization (DA-PPO) algorithm and its extension, Independent Dual Actor Proximal Policy Optimization (IDA-PPO),designed for robotic navigation and cooperative tasks using the ROBOTIS-OP3 humanoid robot. The study validates the effectiveness of DA-PPO and IDA-PPO cross various scenarios, demonstrating significant improvements in both single-agent and multi-agent environments. DA-PPO excels in robotic navigation and movement tasks, outperforming established reinforcement learning methods in complex environments and basic walking tasks. This success is attributed to its innovative architecture, efficient utilization of hardware resources like the NVIDIA GeForce RTX 3050, and an effective reward function strategy. IDA-PPO, with its decentralized training and dual actor policy network, achieves higher mean rewards and faster learning compared to IPPO and MAPPO. IDA-PPO is 5.49 times faster than MAPPO and 8.22 times faster than IPPO, highlighting its superior efficiency and adaptability in multi-agent tasks. These findings underscore the importance of algorithmic innovation and hardware capabilities in advancing robotic performance, positioning DA-PPO and IDA-PPO as significant advancements in robotic learning

    Chapter1 Introduction 1 1.1 Background 1 1.2 Research Aim and Objective 2 1.3 Key Contributions 3 Chapter2 Literature Review 5 2.1 Reinforcement Learning 5 2.2 Model-free Reinforcement learning for locomotion control 6 2.2.1 Single-Agent Reinforcement Learning 8 PPO-Proximal Policy Optimization 9 SAC-Soft Actor Critic 10 DDPG-Deep Deterministic Policy Gradient 12 TD3-TwinDelayDDPG 14 2.2.2 Multi-Agent Reinforcement Learning 16 IPPO-Independent Proximal Policy Optimization 18 MAPPO-Multi Agent Proximal Policy Optimization 20 2.3 Multi Actor Mechanism 22 Chapter3 Methodology 23 3.1 ROBOTIS-OP3 Robot 23 3.2 Dual-Actor Proximal Policy Framework for Single Agent 24 3.2.1 Observation Space 27 3.2.2 Action Space 28 3.2.3 Reward Function 28 3.3 Independent Dual-Actor Proximal Policy Framework for Multi Agent 31 3.3.1 Observation Space 32 3.3.2 Motion Control 34 Grasping Control 34 Action Space 34 3.3.3 Reward Function 35 Chapter4 Experimental Result 40 4.1 Experimental Results for Single Agent in Navigation Control 40 4.1.1 Joint Position and Velocity Analysis 42 4.1.2 Comparative Analysis and Performance in Complex Environments 44 4.2 Experimental Results for Multi-Agent Cooperative Object Carrying . 48 4.2.1 Balance Control of MARL(uprojz Reference=0.95) . . . . . . . . . 50 4.2.2 Upright Balance and Height Stability of the Object 51 4.2.3 Inter-Agent Distance Analysis(d(agent1,agent2)Reference=0.64) 53 4.2.4 Yaw Rate Performance(ψReference=0) 54 4.2.5 JointPositionandVelocityAnalysisoftheMulti-AgentSystem 55 4.2.6 Comparative Performance of MARL 58 Chapter5 Summary 61 5.1 Conclusion 61 5.2 Future Work 61 References 63

