簡易檢索 / 詳目顯示

研究生: 陳宥儒
Chen, Yu-Ju
論文名稱: 基於馬可夫決策過程之路徑規劃演算法用於複雜動態環境
Path Planning Algorithm Based on Markov Decision Process for Complex Dynamic Environment
指導教授: 陳美勇
Chen, Mei-Yung
口試委員: 莊季高
Juang, Jih-Gau
林惠勇
Lin, Huei-Yung
蘇順豐
Su, Shun-Feng
陳美勇
Chen, Mei-Yung
口試日期: 2023/01/11
學位類別: 碩士
Master
系所名稱: 機電工程學系
Department of Mechatronic Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 84
中文關鍵詞: 馬可夫決策過程複雜動態環境路徑規劃雙輪差動機器人
英文關鍵詞: Markov decision process, complex dynamic environment, path planning, two-wheeled mobile robot
DOI URL: http://doi.org/10.6345/NTNU202300288
論文種類: 學術論文
相關次數: 點閱:133下載:75
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出了一種基於 馬可夫決策過程 的 機器人 路徑規劃演算法 。首先, 需要將目標點設為一個正的獎勵訊號 。其次, 代理人每走一格就會有一個負的獎勵訊號 。 代理人必須最大化其長期累積的總獎勵,這也是代理人的唯一目標 。利用 根據能夠將長期獎勵最大化所得到的策略來決定機器人行走路徑 。最後,將每個位置所得到的策略串聯來,就得到一條最佳路徑 。此外,本篇論文透過設計 馬可夫決策過程中的價值函數,使得規劃出來的路徑能與牆壁與移動障礙物保持一定的安全距離。 最後,在本論文模擬中,代理人在 產生第一條路徑之後,因應環境變化產生其他路徑相當迅速,且會主動閃避移動障礙物 ;而在實驗的部分,使用了搭載機器人作業系統 (Robot Operating System, ROS)的雙輪差動機器人在 有移動的障礙物和移動的人時,皆能有效的產生閃避障礙物之路徑。
    此路徑規劃演算法是由馬可夫決策過程發展而成,也是現代機器學習的基石。有別於傳統的路徑規劃演算法,像是 Dijkstra、 A*、 D*之類的演算 這些演算法無法在複雜動態環境有良好表現甚至無法適用於動態環境,本篇論文所提出的基於馬可夫決策過程路徑規劃演算法 是以計算全域地圖上 各點的獎勵訊 號來決定路徑,在 每個時刻、每一個點都會有一個 預期回報的期望值 ,所以在動態變化較大的環境中可以比較即時的更改路徑因此其在動態環境的效率較佳。

    This paper presents a robot path planning algorithm based on Markov decision process. First, we need to set the target point as a positive reward signal. Second, there is a negative reward signal for each square the agent moves. The agent must maximize its long-term accumulated total reward, which is the only goal of the agent. Use a policy that maximizes the long-term reward to determine the path. Otherwise, concatenate the policies obtained at each location to obtain an optimal path. In addition, this paper designs the value function in the Markov decision process so that the path can maintain a certain safe distance from walls and moving obstacles. Finally, the agent generates other paths in response to environmental changes quite quickly after generating the first path and will actively dodge moving obstacles in the simulation of this paper; in the experimental part, a two-wheeled robot equipped with a robot operating system is used. When there are moving obstacles and moving people, the robot can effectively generate paths to avoid obstacles.
    This path planning algorithm was developed from the Markov decision process and is the cornerstone of modern machine learning. Different from traditional path planning algorithms such as Dijkstra, A*, and D*, these algorithms cannot perform well in complex dynamic environments or even to dynamic environments. The Markov Decision process based path planning algorithm determines the path by calculating the reward signal of each point on the map. At each moment and each point, there will be an expected value of expected return, so it can be compared in real-time in an environment with large dynamic changes. So it is more efficient in a dynamic environment.

    誌謝 i 摘要 ii Abstract iii 目錄 v 圖目錄 viii 表目錄 xii 第一章 緒論 1 1.1前言 1 1.2文獻回顧 3 1.2.1圖論上的路徑規劃 3 1.2.2基於馬可夫決策過程之路徑規劃 5 1.3研究動機與目的 6 1.4本論文之貢獻 7 1.5 論文架構 8 第二章 理論基礎 9 2.1馬可夫決策過程(Markov Decision Process, MDP) 9 2.1.1代理人與環境之介面 9 2.1.2目標與獎勵 10 2.1.3回報與分節 11 2.1.4策略與價值函數 12 2.1.5 最佳策略及最佳價值函數 14 2.1.6 動態規劃—策略迭代 16 2.1.7 動態規劃—價值迭代 19 2.2機器人作業系統(Robot Operating System, ROS) 21 2.2.1 檔案系統層(File system level) 22 2.2.2 計算圖層(Computational level) 24 2.2.3 ROS之中的通訊 26 第三章 路徑規劃演算法、設計及實驗流程 28 3.1 馬可夫決策過程之路徑規劃 28 3.1.1 地圖離散化 28 3.1.2 路徑規劃 29 3.1.3 使代理人與牆壁和移動障礙物保持距離 31 3.1.4動態規劃—價值迭代得到策略、路徑 32 3.2 移動機器人實驗流程 34 3.2.1 使用ROS內建SLAM建立地圖並定位 34 3.2.2 實驗流程 36 第四章 實驗設備 37 4.1 工業雙輪差動移動機器人 37 4.2 B1移動機器人上之光學雷達 39 第五章 模擬與實驗結果 41 5.1小地圖之馬可夫決策過程模擬 41 5.2馬可夫決策過程路徑規劃在動態環境之模擬 45 5.2.1 未加牆壁及移動障礙物保護 48 5.2.2 加入與牆壁保持一定距離之設計 53 5.2.3 同時加入與牆壁和移動障礙物保持一定距離之設計 57 5.2.4 不同γ值之比較與探討 60 5.2.5 與A* 演算法比較 61 5.3馬可夫決策過程路徑規劃用於工業移動機器人 69 第六章 結論與未來期望 81 參考文獻 82

    [1] M. L. Puterman, “Markov decision processes,” in Handbooks in Operations Research and Management Science, vol. 2, Elsevier, 1990, pp. 331–434.
    [2] X. Jing, Motion Planning. BoD – Books on Demand, 2008.
    [3] E. W. Dijkstra, “A Note on Two Problems in Connexion with Graphs,” in Edsger Wybe Dijkstra: His Life,Work, and Legacy, 1st ed., vol. 45, New York, NY, USA: Association for Computing Machinery, 1960, pp. 287–290.
    [4] F. Duchoň et al., “Path Planning with Modified a Star Algorithm for a Mobile Robot,” Procedia Engineering, vol. 96, pp. 59–69, Jan. 2014.
    [5] C. Padgett and K. Kreutz-Delgado, "A grid algorithm for autonomous star identification," in IEEE Transactions on Aerospace and Electronic Systems, vol. 33, no. 1, pp. 202-213, Jan. 1997, doi: 10.1109/7.570743.
    [6] W. E. Wong, V. Debroy, R. Gao and Y. Li, "The DStar Method for Effective Software Fault Localization," in IEEE Transactions on Reliability, vol. 63, no. 1, pp. 290-308, March 2014, doi: 10.1109/TR.2013.2285319.
    [7] S. Koenig, M. Likhachev, and D. Furcy, “Lifelong Planning A∗,” Artificial Intelligence, vol. 155, no. 1, pp. 93–146, May 2004, doi: 10.1016/j.artint.2003.12.001.
    [8] Koenig, Sven, and Maxim Likhachev. "D^* lite." Aaai/iaai 15 (2002): 476-483.
    [9] T. Dean, L. P. Kaelbling, J. Kirman, and A. Nicholson, “Planning under time constraints in stochastic domains,” Artificial Intelligence, vol. 76, no. 1–2, pp. 35–74, 1995.
    [10] R. A. Howard, “Dynamic programming and markov processes.,” 1960.
    [11] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
    [12] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
    [13] W. Zhang, Algorithms for partially observable Markov decision processes. Hong Kong University of Science and Technology (Hong Kong), 2001.
    [14] M. L. Littman, Algorithms for sequential decision-making. Brown University, 1996.
    [15] N. Achour and K. Braikia, “An MDP-based approach oriented optimal policy for path planning,” in 2010 International Conference on Machine and Web Intelligence, 2010, pp. 205–210.
    [16] S. J. Russell, Artificial intelligence a modern approach. Pearson Education, Inc., 2010.
    [17] J. Blythe, “Decision-theoretic planning,” AI magazine, vol. 20, no. 2, pp. 37–37, 1999.
    [18] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
    [19] D. Ferguson and A. Stentz, “Focussed processing of MDPs for path planning,” in 16th IEEE International Conference on Tools with Artificial Intelligence, 2004, pp. 310–317.
    [20] R. Washington, “Incremental Markov-model planning,” in Proceedings Eighth IEEE International Conference on Tools with Artificial Intelligence, 1996, pp. 41–47.
    [21] P. Laroche, F. Charpillet, and R. Schott, “Mobile robotics planning using abstract markov decision processes,” in Proceedings 11th International Conference on Tools with Artificial Intelligence, 1999, pp. 299–306.
    [22] T. Dean and R. Givan, “Model minimization in Markov decision processes,” in AAAI/IAAI, 1997, pp. 106–111.
    [23] M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the complexity of solving Markov decision problems,” arXiv preprint arXiv:1302.4971, 2013.
    [24] F. Duchoň et al., “Path planning with modified a star algorithm for a mobile robot,” Procedia Engineering, vol. 96, pp. 59–69, 2014.
    [25] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial intelligence, vol. 101, no. 1–2, pp. 99–134, 1998.
    [26] J. Blythe, “Planning under uncertainty in dynamic domains,” PhD Thesis, PhD thesis, Computer, 1997.
    [27] T. L. Dean, L. P. Kaelbling, J. Kirman, and A. E. Nicholson, “Planning With Deadlines in Stochastic Domains.,” in AAAI, 1993, vol. 93, pp. 574–579.
    [28] A. L. Blum and J. C. Langford, “Probabilistic planning in the graph plan framework,” in European Conference on Planning, 1999, pp. 319–332.
    [29] M. van Otterlo and M. Wiering, “Reinforcement learning and markov decision processes,” in Reinforcement learning, Springer, 2012, pp. 3–42.
    [30] J. Li, Y. Chen, X. Zhao, and J. Huang, “An improved DQN path planning algorithm,” J Supercomput, vol. 78, no. 1, pp. 616–639, Jan. 2022, doi: 10.1007/s11227-021-03878-2.
    [31] J. Gao, W. Ye, J. Guo, and Z. Li, “Deep Reinforcement Learning for Indoor Mobile Robot Path Planning,” Sensors, vol. 20, no. 19, Art. no. 19, Jan. 2020, doi: 10.3390/s20195493.

    下載圖示
    QR CODE