研究生: |
黃俊程 Huang, Chun-cheng |
---|---|
論文名稱: |
實現於圖形處理器的高性能平行位置檢知近似字串比對演算法 High-Performance Parallel Location-Aware Algorithms for Approximate String Matching on GPUs |
指導教授: |
林政宏
Lin, Cheng-Hung |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 中文 |
論文頁數: | 44 |
中文關鍵詞: | 近似字串比對 、位元平行演算法 、圖形運算處理器 、編輯距離 、非確定有限狀態自動機 、平行演算法 |
英文關鍵詞: | approximate string matching, bit-parallel algorithm, graphic processing units, Levenshtein distance, nondeterministic finite automaton, parallel algorithm |
DOI URL: | https://doi.org/10.6345/NTNU202202622 |
論文種類: | 學術論文 |
相關次數: | 點閱:123 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近似字串比對被廣泛的應用於各種不同的領域。例如:去氧核糖核酸序列搜尋比對、電腦文字輸入校正、文字資料探勘以及垃圾電子郵件過濾等。近似字串比對的設計是用來搜索文件中所有近似字串文字樣式字串及使用者定義的插入、刪除、取代的容許誤差值搜索後的匹配的位置。在眾多被提出的近似字串比對演算法中,位元平行演算法被認定是最適、最有效率的演算法。然而,傳統的位元平行演算法無法同時偵測樣式字串匹配字串的起始及結束位置。除此之外,因應現代多核心處理器的硬體發展,以及叢集運算、雲端運算以及大數據處理的需求。如何加速位元平行演算法成為當今重要的課題。本研究分別提出利用多串流平行和高維度平行這兩種位置偵測近似字串比對方法並利用圖形運算處理器(GPU)來加速演算法。實驗結果顯示,比較高維度平行演算法在圖形運算處理器(GPU)以及CPU的執行效能,圖形運算處理器(GPU)版本無論是在作業系統執行環境或者是在圖形顯示處理(GPU)核心執行的情況,效能皆得到顯著的提升。相較於其他相關研究,本研究所提出的高維度平行的方法達到了 11 至 105 倍的效能改善。最終,我們開發了一個讓使用者可以在線上執行位置偵測近似字串比對並找出所有樣式字串匹配結果起點及終點位置網路服務。
Approximate string matching has been widely used in many applications, including deoxyribonucleic acid sequence searching, spell checking, text mining, and spam filters. The method is designed to find all locations of strings that approximately match a pattern in accordance with the number of insertion, deletion, and substitution operations. Among the proposed algorithms, the bit-parallel algorithms are considered to be the best and highly efficient algorithms. However, the traditional bit-parallel algorithms lacks the ability of identifying the start and end positions of a matched pattern. Furthermore, acceleration of the bit-parallel algorithms has become a crucial issue for processing big data nowadays. In this paper, we propose two kinds of parallel location-aware algorithms called data-segmented parallelism and high-degree parallelism as means to accelerate approximate string matching using graphic processing units. Experimental results show that the high-degree parallelism on GPUs achieves significant improvement in system and kernel throughputs compared to CPU counterparts. Compared to state-of-the-art approaches, the proposed high-degree parallelism achieves 11 to 105 times improvement.
Finally, we develop a web service which allows users to perform approximate string matching on line and deliver all matched substrings with the start and end positions.
P. H. Sellers, “The Theory and Computation of Evolutionary Distances: Pattern Recognition,” J. Algorithms 1, 1980, pp.359-373.
E. Ukkonen, “Finding approximate patterns in strings,” J. Algorithms 6, 1985, pp.132–137.
R. Baeza-Yates. Text retrieval: theory and practice. In J. van Leeuwen, editor, Proc. 12th IFIP World Computer Congress, volume I: Algorithms, Software, Architecture, pages 465–476. Elsevier Science, Amsterdam, September 1992.
R. Baeza-Yates and G. Gonnet, “A new approach to text searching,” Communications of the ACM, 35(10): 74–82, October 1992.
R. Baeza-Yates and G. Navarro, “Faster Approximate String Matching,” Algorithmica, vol. 23, 1999, pp. 127-158.
S. Wu and U. Manber, “Fast text searching: allowing errors,” Commun. ACM, vol. 35, 1992, pp. 83-91.
S. Wu, U. Manber, and E. Myers, “A subquadratic algorithm for approximate limited expression matching,” Algorithmica, 15(1):50–67, 1996.
H. Hyyrö, “Improving the bit-parallel NFA of Baeza-Yates and Navarro for approximate string matching,” Information Processing Letters, vol. 108, 2008, pp. 313-319.
CUDA, http://www.nvidia.com/object/cuda_home_new.html
OpenMP, http://openmp.org/wp/
D. Man, K. Nakano, and Y. Ito, “The Approximate String Matching on the Hierarchical Memory Machine,” with Performance Evaluation. Embedded Multicore Socs (MCSoC), Tokyo, Japan, September, 2013.
K. Xu, W. Cui, and Y. Hu, and L. Guo, “Bit-Parallel Multiple Approximate String Matching based on GPU,” Procedia Computer Science, 17(0), 2013, pp. 523-529.
O. Mikael, and A. Yoshinori, “Online approximate string matching with CUDA”, Technical Report of Tokyo Institute of Technology. 1-4. Retrieved March 27, 2014, from http://pds13.egloos.com/pds/200907/26/57/pattmatch-report.pdf