研究生: |
周建廷 Chien-Ting Chou |
---|---|
論文名稱: |
利用MapReduce軟體架構於Hadoop叢集進行地貌型直接逕流模組演算之研究 Research on The Computing of Direct Geo Morphology Runoff on Hadoop Cluster by Using MapReduce Framework |
指導教授: | 葉耀明 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 62 |
中文關鍵詞: | Hadoop 、MapReduce 、分散式運算 、大量資料處理 |
英文關鍵詞: | Hadoop, MapReduce, Distributed Computing, Processing for Large Data |
論文種類: | 學術論文 |
相關次數: | 點閱:395 下載:14 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
台灣由於氣候及地形的因素,一旦下起豪大雨便常常造成河川瞬間水位暴漲,甚至釀成嚴重的災情,因此更彰顯洪水預報系統在台灣的重要性。河川流徑洪水演算是洪水預報系統最重要的一環,目的是計算流域中的各項水文相關資料以判斷流量是否超出警戒線。但河川流徑運算公式複雜,流域的相關資料量又龐大,以傳統交予大型電腦處理或者由客戶端連線至伺服端將工作交給伺服器處理等單一主機運算的方式往往需要消耗許多時間,造成預報不夠即時。
本研究的程式開發借重於Apache軟體基金會所開發的Hadoop開放源碼平台,Hadoop提供大量資料儲存及運算的分散式運算環境,以及提供程式開發者一種專為大量資料處理所設計的軟體架構-MapReduce,以分散式運算提供整合的運算資源加速處理龐大的資料量以減少運算時間。本研究使用MapReduce架構撰寫河川流徑演算程式,將其置於Hadoop叢集上運作,透過5種情境的量測得到最佳河川流徑演算速率可提升至6倍左右,達到提高洪水預報系統的效能、讓預報更即時的目的。
Because of the weather and landform in Taiwan, a heavy rain often cause sudden rising of the runoff of some basins, even lead to serious disaster. That makes flood information system are highly relied in Taiwan especially in typhoon season. Computing the runoff of a basin is the most important module of flood information system for checking whether the runoff exceeds warning level or not. However this module is complicated and data-intensive, it becomes the bottleneck when the real-time information are needed while a typhoon is attacking the basins.
The development of applications in this thesis is on "Apache Hadoop"-an open-source software that builds a distributed storage and computing environment, which allows for the distributed processing of large data sets across clusters of computers using a programming model-"MapReduce". We have developed the runoff computing module of a basin by using MapReduce framework on a Hadoop cluster. In our research, to speed up the runoff computing will increase the efficiency of the flood information system. Running our programs in an 18 nodes Hadoop cluster, we have derived the conclusion that it can speed up the execution of runoff computing by 6 times.
[1] Apach Hadoop. http://hadoop.apache.org/
[2] Jeffrey Dean and Sanjay Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters”. Communications of The ACM(Jan. 2008).
[3] Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. “Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters”. ACM SIGMOD(Jun. 2007).
[4] Tom White. “Hadoop: The Definitive Guide, 2nd edition”. O'Reilly Media, Inc(Oct. 2010).
[5] Jason Venner. “Pro Hadoop”. Apress, Inc(Jun. 2009).
[6] Chuck Lam. “ Hadoop in Action ”. Manning Publications Co(Dec. 2010).
[7] Grant Mackey, Saba Sehrish, John Bent, Julio Lopez, Salman Habib, and Jun Wang. “Introducing Map-Reduce to High End Computing”. IEEE Petascale Data Storage Workshop(Nov. 2008).
[8] Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox. “MapReduce for Data Intensive Scientific Analyses”. IEEE eScience(Dec. 2008).
[9] Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. “MapReduce Online”. EECS Department of UC Berkeley. Technical Report No. UCB/EECS-2009-136(Oct. 2009).
[10] Feng Wang, Jie Qiu, Jie Yang, Bo Dong, Xinhui Li, and Ying Li. “Hadoop High availability through Metadata Replication”. ACM CloudDB(Nov. 2009).
[11] Wei Jiang, Vignesh T. Ravi,and Gagan Agrawal. “Comparing Map-Reduce and FREERIDE for Data-Intensive Applications”. IEEE Cluster Computing and Workshops(2009).
[12] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. “The Google File System”. ACM Symposium on Operating Systems Principles(Oct. 2003).
[13] Hadoop Taiwan User Group. http://www.hadoop.tw/
[14] NIST Cloud Computing Program. http://www.nist.gov/itl/cloud/