研究生: |
陳宥儒 Chen, Yu-Ju |
---|---|
論文名稱: |
基於馬可夫決策過程之路徑規劃演算法用於複雜動態環境 Path Planning Algorithm Based on Markov Decision Process for Complex Dynamic Environment |
指導教授: |
陳美勇
Chen, Mei-Yung |
口試委員: |
莊季高
Juang, Jih-Gau 林惠勇 Lin, Huei-Yung 蘇順豐 Su, Shun-Feng 陳美勇 Chen, Mei-Yung |
口試日期: | 2023/01/11 |
學位類別: |
碩士 Master |
系所名稱: |
機電工程學系 Department of Mechatronic Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 84 |
中文關鍵詞: | 馬可夫決策過程 、複雜動態環境 、路徑規劃 、雙輪差動機器人 |
英文關鍵詞: | Markov decision process, complex dynamic environment, path planning, two-wheeled mobile robot |
DOI URL: | http://doi.org/10.6345/NTNU202300288 |
論文種類: | 學術論文 |
相關次數: | 點閱:133 下載:75 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出了一種基於 馬可夫決策過程 的 機器人 路徑規劃演算法 。首先, 需要將目標點設為一個正的獎勵訊號 。其次, 代理人每走一格就會有一個負的獎勵訊號 。 代理人必須最大化其長期累積的總獎勵,這也是代理人的唯一目標 。利用 根據能夠將長期獎勵最大化所得到的策略來決定機器人行走路徑 。最後,將每個位置所得到的策略串聯來,就得到一條最佳路徑 。此外,本篇論文透過設計 馬可夫決策過程中的價值函數,使得規劃出來的路徑能與牆壁與移動障礙物保持一定的安全距離。 最後,在本論文模擬中,代理人在 產生第一條路徑之後,因應環境變化產生其他路徑相當迅速,且會主動閃避移動障礙物 ;而在實驗的部分,使用了搭載機器人作業系統 (Robot Operating System, ROS)的雙輪差動機器人在 有移動的障礙物和移動的人時,皆能有效的產生閃避障礙物之路徑。
此路徑規劃演算法是由馬可夫決策過程發展而成,也是現代機器學習的基石。有別於傳統的路徑規劃演算法,像是 Dijkstra、 A*、 D*之類的演算 這些演算法無法在複雜動態環境有良好表現甚至無法適用於動態環境,本篇論文所提出的基於馬可夫決策過程路徑規劃演算法 是以計算全域地圖上 各點的獎勵訊 號來決定路徑,在 每個時刻、每一個點都會有一個 預期回報的期望值 ,所以在動態變化較大的環境中可以比較即時的更改路徑因此其在動態環境的效率較佳。
This paper presents a robot path planning algorithm based on Markov decision process. First, we need to set the target point as a positive reward signal. Second, there is a negative reward signal for each square the agent moves. The agent must maximize its long-term accumulated total reward, which is the only goal of the agent. Use a policy that maximizes the long-term reward to determine the path. Otherwise, concatenate the policies obtained at each location to obtain an optimal path. In addition, this paper designs the value function in the Markov decision process so that the path can maintain a certain safe distance from walls and moving obstacles. Finally, the agent generates other paths in response to environmental changes quite quickly after generating the first path and will actively dodge moving obstacles in the simulation of this paper; in the experimental part, a two-wheeled robot equipped with a robot operating system is used. When there are moving obstacles and moving people, the robot can effectively generate paths to avoid obstacles.
This path planning algorithm was developed from the Markov decision process and is the cornerstone of modern machine learning. Different from traditional path planning algorithms such as Dijkstra, A*, and D*, these algorithms cannot perform well in complex dynamic environments or even to dynamic environments. The Markov Decision process based path planning algorithm determines the path by calculating the reward signal of each point on the map. At each moment and each point, there will be an expected value of expected return, so it can be compared in real-time in an environment with large dynamic changes. So it is more efficient in a dynamic environment.
[1] M. L. Puterman, “Markov decision processes,” in Handbooks in Operations Research and Management Science, vol. 2, Elsevier, 1990, pp. 331–434.
[2] X. Jing, Motion Planning. BoD – Books on Demand, 2008.
[3] E. W. Dijkstra, “A Note on Two Problems in Connexion with Graphs,” in Edsger Wybe Dijkstra: His Life,Work, and Legacy, 1st ed., vol. 45, New York, NY, USA: Association for Computing Machinery, 1960, pp. 287–290.
[4] F. Duchoň et al., “Path Planning with Modified a Star Algorithm for a Mobile Robot,” Procedia Engineering, vol. 96, pp. 59–69, Jan. 2014.
[5] C. Padgett and K. Kreutz-Delgado, "A grid algorithm for autonomous star identification," in IEEE Transactions on Aerospace and Electronic Systems, vol. 33, no. 1, pp. 202-213, Jan. 1997, doi: 10.1109/7.570743.
[6] W. E. Wong, V. Debroy, R. Gao and Y. Li, "The DStar Method for Effective Software Fault Localization," in IEEE Transactions on Reliability, vol. 63, no. 1, pp. 290-308, March 2014, doi: 10.1109/TR.2013.2285319.
[7] S. Koenig, M. Likhachev, and D. Furcy, “Lifelong Planning A∗,” Artificial Intelligence, vol. 155, no. 1, pp. 93–146, May 2004, doi: 10.1016/j.artint.2003.12.001.
[8] Koenig, Sven, and Maxim Likhachev. "D^* lite." Aaai/iaai 15 (2002): 476-483.
[9] T. Dean, L. P. Kaelbling, J. Kirman, and A. Nicholson, “Planning under time constraints in stochastic domains,” Artificial Intelligence, vol. 76, no. 1–2, pp. 35–74, 1995.
[10] R. A. Howard, “Dynamic programming and markov processes.,” 1960.
[11] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
[12] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[13] W. Zhang, Algorithms for partially observable Markov decision processes. Hong Kong University of Science and Technology (Hong Kong), 2001.
[14] M. L. Littman, Algorithms for sequential decision-making. Brown University, 1996.
[15] N. Achour and K. Braikia, “An MDP-based approach oriented optimal policy for path planning,” in 2010 International Conference on Machine and Web Intelligence, 2010, pp. 205–210.
[16] S. J. Russell, Artificial intelligence a modern approach. Pearson Education, Inc., 2010.
[17] J. Blythe, “Decision-theoretic planning,” AI magazine, vol. 20, no. 2, pp. 37–37, 1999.
[18] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
[19] D. Ferguson and A. Stentz, “Focussed processing of MDPs for path planning,” in 16th IEEE International Conference on Tools with Artificial Intelligence, 2004, pp. 310–317.
[20] R. Washington, “Incremental Markov-model planning,” in Proceedings Eighth IEEE International Conference on Tools with Artificial Intelligence, 1996, pp. 41–47.
[21] P. Laroche, F. Charpillet, and R. Schott, “Mobile robotics planning using abstract markov decision processes,” in Proceedings 11th International Conference on Tools with Artificial Intelligence, 1999, pp. 299–306.
[22] T. Dean and R. Givan, “Model minimization in Markov decision processes,” in AAAI/IAAI, 1997, pp. 106–111.
[23] M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the complexity of solving Markov decision problems,” arXiv preprint arXiv:1302.4971, 2013.
[24] F. Duchoň et al., “Path planning with modified a star algorithm for a mobile robot,” Procedia Engineering, vol. 96, pp. 59–69, 2014.
[25] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial intelligence, vol. 101, no. 1–2, pp. 99–134, 1998.
[26] J. Blythe, “Planning under uncertainty in dynamic domains,” PhD Thesis, PhD thesis, Computer, 1997.
[27] T. L. Dean, L. P. Kaelbling, J. Kirman, and A. E. Nicholson, “Planning With Deadlines in Stochastic Domains.,” in AAAI, 1993, vol. 93, pp. 574–579.
[28] A. L. Blum and J. C. Langford, “Probabilistic planning in the graph plan framework,” in European Conference on Planning, 1999, pp. 319–332.
[29] M. van Otterlo and M. Wiering, “Reinforcement learning and markov decision processes,” in Reinforcement learning, Springer, 2012, pp. 3–42.
[30] J. Li, Y. Chen, X. Zhao, and J. Huang, “An improved DQN path planning algorithm,” J Supercomput, vol. 78, no. 1, pp. 616–639, Jan. 2022, doi: 10.1007/s11227-021-03878-2.
[31] J. Gao, W. Ye, J. Guo, and Z. Li, “Deep Reinforcement Learning for Indoor Mobile Robot Path Planning,” Sensors, vol. 20, no. 19, Art. no. 19, Jan. 2020, doi: 10.3390/s20195493.