簡易檢索 / 詳目顯示

研究生: 周建廷
Chien-Ting Chou
論文名稱: 利用MapReduce軟體架構於Hadoop叢集進行地貌型直接逕流模組演算之研究
Research on The Computing of Direct Geo Morphology Runoff on Hadoop Cluster by Using MapReduce Framework
指導教授: 葉耀明
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 62
中文關鍵詞: HadoopMapReduce分散式運算大量資料處理
英文關鍵詞: Hadoop, MapReduce, Distributed Computing, Processing for Large Data
論文種類: 學術論文
相關次數: 點閱:395下載:14
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 台灣由於氣候及地形的因素,一旦下起豪大雨便常常造成河川瞬間水位暴漲,甚至釀成嚴重的災情,因此更彰顯洪水預報系統在台灣的重要性。河川流徑洪水演算是洪水預報系統最重要的一環,目的是計算流域中的各項水文相關資料以判斷流量是否超出警戒線。但河川流徑運算公式複雜,流域的相關資料量又龐大,以傳統交予大型電腦處理或者由客戶端連線至伺服端將工作交給伺服器處理等單一主機運算的方式往往需要消耗許多時間,造成預報不夠即時。
    本研究的程式開發借重於Apache軟體基金會所開發的Hadoop開放源碼平台,Hadoop提供大量資料儲存及運算的分散式運算環境,以及提供程式開發者一種專為大量資料處理所設計的軟體架構-MapReduce,以分散式運算提供整合的運算資源加速處理龐大的資料量以減少運算時間。本研究使用MapReduce架構撰寫河川流徑演算程式,將其置於Hadoop叢集上運作,透過5種情境的量測得到最佳河川流徑演算速率可提升至6倍左右,達到提高洪水預報系統的效能、讓預報更即時的目的。

    Because of the weather and landform in Taiwan, a heavy rain often cause sudden rising of the runoff of some basins, even lead to serious disaster. That makes flood information system are highly relied in Taiwan especially in typhoon season. Computing the runoff of a basin is the most important module of flood information system for checking whether the runoff exceeds warning level or not. However this module is complicated and data-intensive, it becomes the bottleneck when the real-time information are needed while a typhoon is attacking the basins.
    The development of applications in this thesis is on "Apache Hadoop"-an open-source software that builds a distributed storage and computing environment, which allows for the distributed processing of large data sets across clusters of computers using a programming model-"MapReduce". We have developed the runoff computing module of a basin by using MapReduce framework on a Hadoop cluster. In our research, to speed up the runoff computing will increase the efficiency of the flood information system. Running our programs in an 18 nodes Hadoop cluster, we have derived the conclusion that it can speed up the execution of runoff computing by 6 times.

    第一章 緒論................................................................................................................ 1 1.1 研究背景與動機............................................................................................ 1 1.2 研究目的與意義............................................................................................ 2 1.3 論文架構........................................................................................................ 3 第二章 文獻探討........................................................................................................ 4 2.1 雲端運算........................................................................................................ 4 2.1.1 雲端運算的定義................................................................................. 4 2.1.2 以自由軟體技術達成各種雲端服務................................................. 8 2.1.3 Hadoop簡介 ..................................................................................... 11 2.2 Hadoop運作架構 ........................................................................................ 13 2.2.1 與儲存相關的背景常駐程序........................................................... 13 2.2.2 輔助背景常駐程序........................................................................... 15 2.2.3 與運算相關的背景常駐程序........................................................... 16 2.2.4 常見的Hadoop叢集建構方式 ........................................................ 17 2.3 MapReduce軟體架構 ................................................................................. 19 2.4 淡水河流域集水區簡介.............................................................................. 21 第三章 以MapReduce架構撰寫直接逕流模組 .................................................... 24 3.1 系統架構...................................................................................................... 24 3.2 直接逕流模組運算資料型態與運算流程.................................................. 27 3.2.1 原始的及修改後的運算資料格式................................................... 27 3.2.2 運算流程........................................................................................... 32 3.3 直接逕流模組核心演算法概要.................................................................. 36 第四章 系統實作及效能量測.................................................................................. 39 4.1 程式開發及測詴環境.................................................................................. 39 4.2 地貌型直接逕流模組程式效能量測及比較.............................................. 41 4.2.1 量測關鍵要素................................................................................... 42 V 4.2.2 單機運算測詴................................................................................... 43 4.2.3 Hadoop叢集中各種量測情境探討 ................................................. 45 第五章 結論與未來發展.......................................................................................... 57 5.1 結論.............................................................................................................. 57 5.2 未來發展...................................................................................................... 57 附錄.............................................................................................................................. 59 參考文獻...................................................................................................................... 61

    [1] Apach Hadoop. http://hadoop.apache.org/
    [2] Jeffrey Dean and Sanjay Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters”. Communications of The ACM(Jan. 2008).
    [3] Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. “Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters”. ACM SIGMOD(Jun. 2007).
    [4] Tom White. “Hadoop: The Definitive Guide, 2nd edition”. O'Reilly Media, Inc(Oct. 2010).
    [5] Jason Venner. “Pro Hadoop”. Apress, Inc(Jun. 2009).
    [6] Chuck Lam. “ Hadoop in Action ”. Manning Publications Co(Dec. 2010).
    [7] Grant Mackey, Saba Sehrish, John Bent, Julio Lopez, Salman Habib, and Jun Wang. “Introducing Map-Reduce to High End Computing”. IEEE Petascale Data Storage Workshop(Nov. 2008).
    [8] Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox. “MapReduce for Data Intensive Scientific Analyses”. IEEE eScience(Dec. 2008).
    [9] Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. “MapReduce Online”. EECS Department of UC Berkeley. Technical Report No. UCB/EECS-2009-136(Oct. 2009).
    [10] Feng Wang, Jie Qiu, Jie Yang, Bo Dong, Xinhui Li, and Ying Li. “Hadoop High availability through Metadata Replication”. ACM CloudDB(Nov. 2009).
    [11] Wei Jiang, Vignesh T. Ravi,and Gagan Agrawal. “Comparing Map-Reduce and FREERIDE for Data-Intensive Applications”. IEEE Cluster Computing and Workshops(2009).
    [12] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. “The Google File System”. ACM Symposium on Operating Systems Principles(Oct. 2003).
    [13] Hadoop Taiwan User Group. http://www.hadoop.tw/
    [14] NIST Cloud Computing Program. http://www.nist.gov/itl/cloud/

    下載圖示
    QR CODE