簡易檢索 / 詳目顯示

研究生: 陳碧珠
Chen, Pearl
論文名稱: 利用專門可比語料庫結合機器翻譯自動提取雙語對譯N連詞:以合約文類為例
Using comparable specialized corpora with machine translation for extracting N-gram translation equivalents: A case study of Chinese and English contracts
指導教授: 高照明
Gao, Zhao-Ming
學位類別: 博士
Doctor
系所名稱: 翻譯研究所
Graduate Institute of Translation and Interpretation
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 189
中文關鍵詞: 可比語料庫合約翻譯機器翻譯翻譯記憶系統N連詞相似度比對
英文關鍵詞: comparable corpora, contract translation, machine translation, translation memory, N-grams, similarity comparison
論文種類: 學術論文
相關次數: 點閱:229下載:41
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究從筆譯職場的合約翻譯需求出發,合約文類是高度專門的領域,其文體迥異於一般文件,但同時又具有制式、重複的特徵,非常適合運用翻譯記憶系統。近十年來翻譯記憶系統在筆譯市場上應用日益普遍,但翻譯記憶系統有賴以雙語對譯的平行資料庫做為檢索依據,將既有的翻譯回收利用;若沒有足夠的翻譯資料庫,工具本身並無法發揮效益。這正是合約翻譯要運用翻譯記憶系統的限制所在,合約文件涉及簽約當事人的敏感機密,雙語對譯語料取得困難,而依賴人工翻譯以累積翻譯資料庫又曠日費時,無法迅速建置合約翻譯語料庫以直接套用於翻譯記憶系統。
    因此,本研究從不同語言的單語專門語料庫著手,亦即學界所稱的可比語料庫,以克服翻譯語料不足的困難。專門領域的可比語料庫彼此雖然沒有對譯的關係,但所涵蓋的領域術語、概念及常用表達,必有許多交集且互為翻譯。本研究之目的就在於探討一個可行的方法,利用統計式機器翻譯與字串相似度比對技術,從中文與英文合約可比語料庫當中自動提取雙語對譯的連續詞串,亦即N連詞 (N-gram)。
    研究方法首先以網際網路為語料來源,建置中文與英文合約可比語料庫;其次利用語料庫檢索工具,提取合約的主題詞與關鍵主題詞,再依據這些核心主題詞建立N連詞。接下來應用Google譯者工具包自動翻譯服務,分別產生中文與英文N連詞的機器譯文。最後,借用翻譯記憶系統的相似度比對功能,以英文合約N連詞與「中文N連詞機器英譯」進行字串相似度比對,兩者若完全相同或高度相符,即表示該英文N連詞與對應的中文N連詞極可能互為翻譯。中文N連詞到英文N連詞的配對提取,同樣以英文做為相似度比對的中介語言,所得到的中英對譯N連詞經由專家評估後,發現高度相符 (95% 以上) 的三連詞至六連詞,對應正確率達到83%。
    研究結果顯示,本論文提出的方法,技術上相對簡單且可行,能夠具體提取出互為翻譯的中英文N連詞。在筆譯實務上,這些對譯的N連詞可以直接匯入翻譯記憶系統做為翻譯資源,或做為檢索關鍵詞,以檢索合約語料庫的相關術語、概念、搭配詞、句型、語境,尤其能夠找出雖非直接對譯但內容相關的中文與英文條款,協助譯者提高翻譯效率及品質。相同的資源也可應用於合約翻譯教學,利用中文與英文合約專門語料庫做為平行文本,再搭配中英對譯N連詞,學生可以有效快速檢索出所需的合約術語、概念、句型及其對譯表達,大幅縮短資料搜尋時間與學習曲線。在計算語言學領域,本論文提出的方法對於資訊工程、機器翻譯、翻譯記憶系統開發等領域也可有參考價值,能夠進一步擷取中英合約的對譯術語,甚至擴展至合約對譯句的擷取。

    This study is motivated by the need of contract translation. Business contracts belong to a highly specialized genre, characterized by specific vocabularies, domain terms, formulaic expressions and repetitive standard clauses. These features make contract texts an ideal candidate for applying a Translation Memory System, or TM, which searches and retrieves past translations from a database of source texts and their equivalents in the target language in aligned segments. A TM system thus requires a large database of past translations (known as a parallel corpus) in order to get the best result. And there lies the difficulty in using a TM system for English-Chinese contract translation, as parallel corpora of English and Chinese contracts are scarcely available.
    To overcome the limitations of parallel corpora, this study turns to comparable corpora, i.e. monolingual corpora of similar design in two or more languages. Comparable corpora of a specialized domain, though not direct translations of each other, contain domain terms, concepts and fixed expressions that are mutual translations. This study aims to explore a simple yet effective method for extracting such translation equivalents from a comparable corpus of Chinese and English contracts by employing statistical machine translation and string similarity comparison.
    First, a comparable corpus of Chinese and English contracts is built from texts mined from the Internet. Second, keyword and key keyword lists are built with concordancer tools, on which Chinese and English N-grams are then built. Third, the N-grams are translated into English and Chinese respectively with Google Translator Toolkit. And finally, the English N-grams are compared with the Google-translated English, using the built-in similarity comparison function of a TM system. English N-grams that meet or exceed a pre-defined match value are automatically mapped to the corresponding Chinese N-grams to establish a list of English-Chinese N-gram pairs. Chinese N-grams are also mapped to possible English N-gram translation equivalents following the same procedures. These N-gram pairs are evaluated by experienced contract translators, and the results show that 3-word to 6-word N-grams with a match value of 95% and above have a mapping accuracy of 82%.
    The results show that the method employed is technically simple yet effective. For contract translators, the correctly mapped N-gram pairs can be imported to a TM system as a translation resource, or they can be used as concordance search keywords to retrieve from the comparable corpus needed terms, concepts, collocations, adequate sentence patterns and contexts. The same resources can apply to translator training. Students will benefit from authentic parallel texts, and using the Chinese-English N-gram pairs will improve search results and shorten the learning curve. For computational linguistics, the findings in this paper may suggest further study into extraction of contract terms or even sentence-level translation equivalents from comparable corpora.

    第 1 章 緒論 1 第 1.1 節 研究背景與動機 2 1.1.1 翻譯記憶系統 2 1.1.2 語料庫工具 3 1.1.3 合約專門文體 5 第 1.2 節 研究目的、方法與範圍 8 第 1.3 節 論文架構 10 第 2 章 文獻探討 11 第 2.1 節 語料庫語言學 11 2.1.1 語料庫之分類 14 2.1.2 語料庫主題詞 15 2.1.3 術語提取 16 第 2.2 節 機器翻譯 19 第 2.3 節 合約文體研究與合約翻譯 24 2.3.1 法律與合約文體特徵 26 2.3.2 合約的語篇結構 29 2.3.3 英文合約詞彙特徵 35 2.3.4 英文合約句法特徵 37 2.3.5 中文合約詞彙句法特徵 40 2.3.6 語料庫與合約文體研究 42 2.3.7 合約翻譯 44 第 2.4 節 電腦輔助翻譯工具 49 2.4.1 電子資料搜尋 50 2.4.2 語料庫 52 2.4.3 翻譯記憶系統、術語管理及品管工具 54 第 3 章 研究方法 58 第 3.1 節 建置合約可比語料庫 60 3.1.1 收集語料 61 3.1.2 建置語料庫 65 第 3.2 節 語料庫量化分析 69 3.2.1 詞表及描述統計 70 3.2.2 主題詞表 (keywords) 與關鍵主題詞表 (key keywords) 70 3.2.3 N連詞 (N-grams) 與搭配詞 (collocations) 71 3.2.4 檢索行 (concordance lines) 73 第 3.3 節 機器翻譯與相似度比對 75 第 4 章 語料分析及N連詞提取結果 81 第 4.1 節 語料庫描述統計 82 4.1.1 英文合約 82 4.1.2 中文合約 84 第 4.2 節 詞表、主題詞表、關鍵主題詞表 85 4.2.1 英文合約 86 4.2.2 中文合約 94 第 4.3 節 N連詞 99 4.3.1 英文合約N連詞 100 4.3.2 中文合約N連詞 104 第 4.4 節 機器翻譯 105 第 4.5 節 N連詞相似度比對與評估結果 107 4.5.1 合約N連詞英譯中 109 4.5.2 合約N連詞中譯英 112 第 5 章 討論與結論 116 第 5.1 節 完全相符或高度相符 (95%~100% match) 116 第 5.2 節 部分相符 (75%~94% match) 119 第 5.3 節 對應不當或對應錯誤 120 5.3.1 詞串太短,語境不明 120 5.3.2 被動及否定句型 121 5.3.3 特有表達用法 122 第 5.4 節 其他標準條款 135 5.4.1 通知條款 135 5.4.2 保密條款 136 第 5.5 節 結論 137 5.5.1 可比語料庫做為平行文本 138 5.5.2 雙語詞典編纂 138 5.5.3 可比語料庫與機器翻譯 139 5.5.4 其他 139 第 5.6 節 研究限制與未來方向 140 參考書目 145 附錄 158 附錄一:中央研究院現代漢語詞類標記集 158 附錄二:英文合約高頻 1000 詞 159 附錄三:中文合約高頻 1000 詞 169 附錄四:中英文合約對譯N連詞 178

    Acuyo-Verdejo, M. d. C. (2004). Textual knowledge in legal translation. Hermes, Journal of Linguistics, 32, 167-184. Retrieved from http://download2.hermes.asb.dk/
    Anderman, G., & Rogers, M. (Eds.). (2008). Incorporating Corpora: The Linguist and the Translator. Clevedon/Buffalo/Toronto: Multilingual Matters.
    Anesa, P. (2007). Vagueness and precision in contracts: a close relationship. Linguistica e Filologia, 24, 7-38. doi:10446/109
    Anthony, L. (2011). AntConc (Version 3.3.5). Tokyo: Waseda University. Available at http://www.antlab.sci.waseda.ac.jp/
    Archer, D. (Ed.). (2009). What's in a Word-list? Investigating Word Frequency and Keyword Extraction. Oxford: Ashgate.
    Ari, O. (2006). Review of three software programs designed to identify lexical bundles. Language Learning & Technology, 10(1), 30-37. Retrieved from llt.msu.edu
    Baker, M. (1993). Corpus Linguistics and Translation Studies: Implications and Applications. In M. Baker, G. Francis & E. Tognini-Bonelli (Eds.), Text and Technology: In Honour of John Sinclair (pp. 233-250). Amsterdam/Philadelphia: John Benjamins.
    Baker, M. (1995). Corpora in translation studies: An overview and some suggestions for future research. Target, 7(2), 223-243. doi:10.1075/target.7.2.03bak
    Barnbrook, G., Danielsson, P., & Mahlberg, M. (Eds.). (2004). Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora. London/New York: Continuum
    Beeby, A., Ines, P. R., & Sanchez-Gijon, P. (Eds.). (2009). Corpus Use and Translating: Corpus use for learning to translate and learning corpus use to translate. Amsterdam/Philadelphia: John Benjamins.
    Berber-Sardinha, T. (2000). Comparing corpora with WordSmith Tools: How large must the reference corpus be? Paper presented at the Workshop on Comparing Corpora, Hong Kong, China.
    Beveridge, B. J. (2003). Legal English: How it developed and why it is not appropriate for international commercial contracts. Hieronymus, 2003(1). Retrieved from http://www.tradulex.org/
    Bhatia, V. K. (1983). Applied Discourse Analysis of English Legislative Writing. University of Aston in Birmingham: A Language Studies Research Report.
    Bhatia, V. K., Candlin, C. N., & Engberg, J. (Eds.). (2008). Legal Discourse across Cultures and Systems. Hong Kong: Hong Kong University Press.
    Bhatia, V. K., Langton, N. M., & Lung, J. (2004). Legal discourse: Opportunities and threats for corpus linguistics. In U. Connor & T. A. Upton (Eds.), Discourse in the Professions (pp. 203-226). Amsterdam/Philadelphia: John Benjamins.
    Biber, D., Connor, U., & Upton, T. A. (2007). Discourse on the Move: Using Corpus Analysis to Describe Discourse Structure. Amsterdam/Philadelphia: John Benjamins.
    Biber, D., & Conrad, S. (2009). Register, Genre, and Style. Cambridge: Cambridge University Press.
    Biel, L. (2009a). Corpus-based studies of legal language for translation purposes: methodological and practical potential. In Proceedings of the XVII European LSP Symposium 2009, Aarhus, Denmark.
    Biel, L. (2009b). Organization of background knowledge structures in legal language and related translation problems. Comparative Legilinguistics. International Journal for Legal Communication, 1, 176-189. Retrieved from http://ug.academia.edu/LucjaBiel/
    Blom, B., & Trosborg, A. (1992). An analysis of regulative speech acts in English contracts. Hermes, Journal of Linguistics, 9. Retrieved from http://download2.hermes.asb.dk/
    Bondi, M., & Scott, M. (Eds.). (2010). Keyness in Texts. Amsterdam/Philadelphia: John Benjamins.
    Borja Albi, A. (2007). Corpora for translators in Spain. The CDJ-GITRAD Corpus and the GENTT Project. Multilingual Matters, (Incorporating Corpora - The Linguist and the Translator), 243-265. Retrieved from http://www.gentt.uji.es/?q=publications
    Bowker, L. (1998a). Corpus exploitation focused terminological research. Terminologies Nouvelles, 18(June), 22-27.
    Bowker, L. (1998b). Using specialized monolingual native-language corpora as a translation resource. Meta, 43(4), 631-651. Retrieved from http://www.erudit.org/revue/meta/
    Bowker, L. (2000). Towards a methodology for exploiting specialized target language corpora as translation resources. International Journal of Corpus Linguistics, 5(1), 17-52. doi:10.1075/ijcl.5.1.03bow
    Bowker, L. (2002). Computer-Aided Translation Technology: A Practical Introduction. Ottawa: University of Ottawa Press.
    Bowker, L., & Barlow, M. (2004). Bilingual concordancers and translation memories: a comparative evaluation. In Proceedings of the Second International Workshop on Language Resources for Translation Work, Research and Training (70-79), Geneva, Switzerland.
    Bowker, L., Mcbride, C., & Marshman, E. (2008). Getting more than you paid for? Considerations in integrating free and low-cost technologies into translator training programs. redit(1), 26-47. Retrieved from http://www.redit.uma.es/
    Bowker, L., & Pearson, J. (2002). Working with Specialized Language: a Practical Guide to Using Corpora. London: Routledge.
    Cao, D. (1997). Consideration in translating English/Chinese contracts. Meta, 42(4), 661-669. Retrieved from http://www.erudit.org/revue/meta/
    Cao, D. (2007). Translating Law. Clevedon/Buffalo/Toronto: Multilingual Matters.
    Carvalho, L. (2008). Translating Contracts and Agreements: a Corpus Linguistics Perspective. Journal of Legal Culture, 3(1), 1-15. Retrieved from http://legalcultures.com/en/pdf/vol3num1/carvalho.pdf
    Cheng, W., Greaves, C., Sinclair, J., & Warren, M. (2008). Uncovering the Extent of the Phraseological Tendency: Towards a Systematic Analysis of Concgrams. Applied Linguistics, 30(2), 236-252. doi:10.1093/applin/amn039
    Cheng, W., Greaves, C., & Warren, M. (2006). From n-gram to skipgram to concgram. International Journal of Corpus Linguistics, 11(4), 411-433. doi:10.1075/ijcl.11.4.04che
    Corpas Pastor, G., & Dominguez, M. S. (2007). Surfing the Net: an R&D Project on Tourism Contracts. Paper presented at the 10th International Symposium on Social Communication, Santiago de Cuba.
    Coulthard, M., & Johnson, A. (2007). An Introduction to Forensic Linguistics: Language in Evidence. London/New York: Routledge.
    Coulthard, M., & Johnson, A. (Eds.). (2010). The Routledge Handbook of Forensic Linguistics. London/New York: Routledge.
    Curtotti, M., & McCreath, E. (2010). Corpus based classification of text in Australian contracts. In Proceedings of the Australasian Language Technology Association Workshop (18-26), Melbourne, Australia.
    Curtotti, M., & McCreath, E. (2011). A corpus of Australian contract language: description, profiling and analysis. In Proceedings of the 13th International Conference on Artificial Intelligence and Law (199-208), Pittsburgh, PA, USA.
    Davies, M. (2008-). The Corpus of Contemporary American English: 425 million words, 1990-present. Available at http://corpus.byu.edu/coca/
    Denyer, L. F. (2008). Corpus study carried out on three "legal" verbs to demonstrate their similar and different usage for the purposes of legal translators and lawyer-linguists. In Proceedings of the International Symposium on Using Corpora in Contrastive and Translation Studies, Hangzhou, China.
    Dorr, B. J., Hovy, E. H., & Levin, L. S. (2004). Machine Translation: Interlingual Methods. In K. Brown (Ed.), Encyclopedia of Language and Linguistics (2nd ed., pp. 383-394): Elsevier.
    DuBay, W. H. (2004). The Principles of Readability. Costa Mesa, California: Impact Information.
    Duibhin, C. O. (2008). Windows Interface for Stuttgart Tree Tagger. Available at http://www.smo.uhi.ac.uk/~oduibhin/oideasra/interfaces/winttinterface.htm
    Evison, J. (2010). What are the basics of analysing a corpus? In A. O'Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 122-135). London/New York: Routledge.
    Faber, D., & Lauridsen, K. (1991). The compilation of a Danish-English-French corpus in contract law. In S. Johansson & A.-B. Stenstrom (Eds.), English Computer Corpora. Selected Papers and Research Guide (pp. 235-243). Berlin: Mouton de Gruyter.
    Fitzpatrick, E. (Ed.). (2006). Corpus Linguistics Beyond the Word: Corpus Research from Phrase to Discourse. Amsterdam/New York: Rodopi.
    Flowerdew, L. (2005). An integration of corpus-based and genre-based approaches to text analysis in EAP/ESP: countering criticisms against corpus-based methodologies. English for Specific Purposes, 24, 321-332. doi:10.1016/j.esp.2004.09.002
    Francesconi, E., Montemagni, S., Peters, W., & Tiscornia, D. (Eds.). (2010). Semantic Processing of Legal Texts: Where the Language of Law Meets the Law of Language. London/New York: Springer.
    Garcia, I. (2009). Beyond Translation Memory: Computers and the Professional Translator. The Journal of Specialised Translation, 12, 199-214. Retrieved from http://www.jostrans.org/issue12/art_garcia.pdf
    Garcia, I. (2010). Is machine translation ready yet? Target, 22(1), 7-21. doi:10.1075/target.22.1.02gar
    Gibbons, J., & Turell, M. T. (Eds.). (2008). Dimensions of Forensic Linguistics. Amsterdam/Philadelphia: John Benjamins.
    Granger, S., & Meunier, F. (Eds.). (2008). Phraseology: An interdisciplinary perspective. Amsterdam/Philadelphia: John Benjamins.
    Grewendorf, G., & Rathert, M. (Eds.). (2009). Formal Linguistics and Law. Berlin/New York: Walter de Gruyter.
    Hartmann, R. R. K. (1994). The use of parallel text corpora in the generation of translation equivalents for bilingual lexicography. In Proceedings of the Euralex 1994 (291-297), Amsterdam.
    Hartmann, R. R. K. (1996). Contrastive textology and corpus linguistics: On the value of parallel texts. Language Sciences, 18(3-4), 947-957.
    Helft, M. (March 8, 2010). Google's computing power refines translation tool. New York Times.
    Holtz, M. (2011). Lexico-grammatical properties of abstracts and research articlesDarmstadt). Available from Technische Universitat Darmstadt. Retrieved from http://tuprints.ulb.tu-darmstadt.de/2638/1/PhD-Thesis-Monica-Holtz.pdf
    Holtz, M., & Teich, E. (2009). Design of the Darmstadt Scientific Text Corpus. Technische Universität Darmstadt. Retrieved from http://www.linglit.tu-darmstadt.de/fileadmin/linglit/holtz/DaSciTex/dfg-corpus-design.pdf
    Hunston, S. (2011). Corpus approaches to evaluation: Phraseology and evaluative language. New York/London: Routledge.
    Hutchins, J. (2003). The development and use of machine translation systems and computer-based translation tools. International Journal of Translation, 15(1), 5-26.
    Hutchins, J. (2009). Compendium of Translation Software: directory of commercial machine translation systems and computer-aided translation support tools. Retrieved from http://www.hutchinsweb.me.uk/index.html
    Hutchins, J. (2010). Machine translation: a concise history. Journal of Translation Studies, 13(1-2), 29-70.
    Islam, A., & Inkpen, D. (2008). Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data (TKDD), 2(2), Article 10. doi:10.1145/1376815.1376819
    Kang, N., & Yu, Q. (2011). Corpus-based stylistic analysis of tourism English. Journal of Language Teaching and Research, 2(1), 129-136. Retrieved from http://www.academypublisher.com/jltr/vol02/no01/jltr0201.pdf doi:10.4304
    Klinge, A. (1995). On the linguistic interpretation of contractual modalities. Journal of Pragmatics, 23(6), 649-675. doi:10.1016/0378-2166(94)00051-F
    Koehn, P. (2005). Europarl: a parallel corpus for statistical machine translation. In Proceedings of the Machine Translation Summit X (79-86).
    Kredens, K., & Gozdz-Roszkowski, S. (Eds.). (2007). Language and the Law: international outlooks: Peter Lang.
    Kubler, N., & Aston, G. (2010). Using corpora in translation. In A. O'Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 501-515). London/New York: Routledge.
    Lauridsen, K. M. (1992). The meaning and use of the modals CAN and MAY in English contract law texts. Hermes, Journal of Linguistics, 9. Retrieved from http://download2.hermes.asb.dk/
    Laviosa, S. (1995). The design and analysis of a comparable corpus of English newspaper articles. Outros Temas, 307-314. Retrieved from ler.letras.up.pt
    Lee, D. Y. W. (2010). What corpora are available? In A. O'Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 107-121). London/New York: Routledge.
    Loginova, E., Gojun, A., Blancafort, H., Guegan, M., Gornostay, T., & Heid, U. (2012). Reference lists for the evaluation of term extraction tools. In Proceedings of the Terminology and Knowledge Engineering Conference (TKE 2012), Madrid.
    Ma, X. (2000). Hong Kong Laws Parallel Text. Philadelphia: Linguistic Data Consortium. Retrieved from http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2000T47
    Maher, A., Waller, S., & Kerans, M. E. (2008). Acquiring or enhancing a translation specialism: the monolingual corpus-guided approach. The Journal of Specialised Translation, 10, 56-75. Retrieved from http://www.jostrans.org/issue10/art_maher.pdf
    Mattila, H. E. S. (2006). Comparative legal linguistics (Translated by Christopher Goddard ed.). Aldershot: Ashgate.
    Maynard, D., & Ananiadou, S. (1999). Term extraction using a similarity-based approach. In D. Bourigault, C. Jacquemin & M.-C. Lhomme (Eds.), Recent Advances in Computational Terminology (pp. 261-278). Amsterdam: John Benjamins.
    McCarthy, M., & O'Keeffe, A. (2010). What are corpora and how have they evolved? In A. O'Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 3-13). London/New York: Routledge.
    McEnery, T. (2006). Keywords and moral panics: Mary Whitehouse and media censorship. In Proceedings of the AHRC ICT Methods Network Expert Seminar on Linguistics, Lancaster University, UK.
    McEnery, T., & Xiao, R. (2005). Parallel and comparable corpora: What are they up to? In G. James (Ed.), Corpus Linguistics and Translation Studies. Clevedon: Multilingual Matters.
    Monzo Nebot, E. (2008). Corpus-based activities in legal translator training. Interpreter and Translator Trainer, 2(2), 221-252. Retrieved from https://www.stjerome.co.uk/periodicals/
    Morris, M. (1995). Translation and the Law. Amsterdam/Philadelphia: John Benjamins.
    Nord, C. (1997). Translating as a Purposeful Activity: Functionalist Approaches Explained. Manchester: St Jerome.
    Nord, C. (2007). Looking for help in the translation process -- the role of auxiliary texts in translator training and translation practice. Chinese Translators Journal, 1, 17-26.
    Norre Nielsen, J., & Wichmann, A. (1994). A frequency analysis of selected modal expressions in German and English legal texts. Hermes, Journal of Linguistics, 13, 145-155. Retrieved from http://download2.hermes.asb.dk/
    Ohara, M., Matsubara, S., & Inagaki, Y. (2003). Automatic Extraction of Translation Patterns from Bilingual Legal Corpus.
    Pazienza, M., Pennacchiotti, M., & Zanzotto, F. (2005). Terminology extraction: an analysis of linguistic and statistical approaches. Knowledge Mining(185), 255-279. Retrieved from http://ai-nlp.info.uniroma2.it/pennacchiotti/publications/SFSC_2005.pdf
    Philip, G. (1999). Computer corpora and the law: a new approach to the translation of legal terms. In Proceedings of the International Association of Forensic Linguistics Fourth Biennial Conference, Birmingham, UK.
    Pollach, I. (2006). Electronic word of mouth: A genre analysis of product reviews on consumer opinion web sites. In Proceedings of the 39th Hawaii International Conference on System Sciences, Hawaii.
    Quah, C. K. (2006). Translation and Technology: Palgrave Macmillan.
    Ramos, L. C. (2010). Post-editing free machine translation: from a language vendor’s perspective. In Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas (AMTA-2010), Denver, Colorado.
    Roukos, S., Graff, D., & Melamed, D. (1995). Hansard French/English. Philadelphia: Linguistic Data Consortium. Retrieved from http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20
    Sarcevic, S. (1997). New Approach to Legal Translation. The Hague/London/New York: Kluwer Law International.
    Sarcevic, S. (2000). Legal translation and translation theory: a receiver-oriented approach. In Proceedings of the International Colloquium on Legal Translation: History, Theory/ies and Practice, Geneva, Switzerland.
    Schmid, H. (1994). TreeTagger. University of Stuttgart. Available at http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
    Scott, M. (2009). In Search of a Bad Reference Corpus. In D. Archer (Ed.), What's in a word-list? Investigating word frequency and keyword extraction (pp. 79-92). Oxford: Ashgate.
    Scott, M. (2012). WordSmith Tools (Version 6.0). Liverpool: Lexical Analysis Software. Available at http://www.lexically.net/publications/citing_wordsmith.htm
    Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam/Philadelphia: John Benjamins.
    Semino, E., & Short, M. (2004). Corpus Stylistics: Speech, writing and thought presentation in a corpus of English writing. London/New York: Routledge.
    Sharoff, S. (2006). Translation as problem solving: uses of comparable corpora. In Proceedings of the International Workshop on Language Resources for Translation Work, Research and Training, Genoa, Italy.
    Sharoff, S., Babych, B., & Hartley, A. (2006a). Using collocations from comparable corpora to find translation equivalents. In Proceedings of the 2006 International Conference on Language Resources and Evaluation (465-470), Genoa, Italy.
    Sharoff, S., Babych, B., & Hartley, A. (2006b). Using comparable corpora to solve problems difficult for human translators. In Proceedings of the COLING-ACL 2006 Conference, Sydney, Australia.
    Sharoff, S., Babych, B., & Hartley, A. (2009). Irrefragable answers: Using comparable corpora to retrieve translation equivalents. Lang Resources and Evaluation, 43(1), 15-25. Retrieved from http://www.springerlink.com/content/8k6631431pl3538l/ doi:10.1007/s10579-007-9046-4
    Sinclair, J. M. (2005a). Corpus and text: Basic principles. In M. Wynne (Ed.), Developing Linguistic Corpora: A Guide to Good Practice (pp. 1-16). Oxford: Oxbow Books.
    Sinclair, J. M. (2005b). The phrase, the whole phrase and nothing but the phrase (Phraseology 2005 plenary lecture). In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 407-410). Amsterdam/Philadelphia: John Benjamins.
    Sloculn, J. (1985). A survey of machine translation: Its history, current status, and future prospects. Computational Linguistics, 11(1), 1-17.
    Snell-Hornby, M. (1988). Translation Studies: An Integrated Approach. Amsterdam/Philadelphia: John Benjamins.
    Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., & Tufis, D. (2006). The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th International Conference on Language Resources and Evaluation, (LREC'2006).
    Stubbs, M. (2001). Texts, corpora, and problems of interpretation: A response to Widdowson. Applied Linguistics, 22(2), 149-172. doi:10.1093/applin/22.2.149
    Stubbs, M. (2002). Words and Phrases: Corpus Studies of Lexical Semantics. Oxford: Blackwell.
    Stubbs, M. (2009). The search for units of meaning: Sinclair on empirical semantics. Applied Linguistics, 30(1), 115-137. doi:10.1093/applin/amn052
    Stubbs, M. (2010). Three concepts of keywords. In M. Bondi & M. Scott (Eds.), Keyness in Texts (pp. 21-42). Amsterdam/Philadelphia: John Benjamins.
    Su, F., & Babych, B. (2012). Measuring comparability of documents in non-parallel corpora for efficient extraction of (semi-) parallel translation equivalents. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (10-19), Avignon, France.
    Taylor, C. (2008). What is corpus linguistics? What the data says. ICAME Journal, 32, 143-164.
    Teich, E. (2009). Exploring a corpus of scientific texts using data mining. Language and Computers, 71(1), 233-247. Retrieved from http://www.l3s.de/web/upload/documents/1/teich-fankhauser-final09.pdf
    Thurmair, G. (2003). Making term extraction tools usable. In Proceedings of the EAMT-CLAW 03, Dublin.
    Tognini-Bonelli, E. (2010). Theoretical overview of the evolution of corpus linguistics. In A. O'Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 14-27). London/New York: Routledge.
    Triebel, V. (2009). Pitfalls of English as a Contract Language. In F. Olsen, A. Lorz & D. Stein (Eds.), Translation Issues in Language and Law (pp. 147-181). New York: Palgrave Macmillan.
    Trosborg, A. (1991). An analysis of legal speech acts in English contract law. Hermes, Journal of Linguistics, 6, 65-90. Retrieved from http://download2.hermes.asb.dk/
    Trosborg, A. (1995). Statutes and contracts: An analysis of legal speech acts in the English language of the law. Journal of Pragmatics, 23, 31-53. doi:10.1016/0378-2166(94)00034-C
    Trosborg, A. (1997). Rhetorical Strategies in Legal Language: Discourse Analysis of Statutes and Contracts. Tubingen: Gunter Narr Verlag.
    Varo, E. A., & Hughes, B. (2002). Legal translation explained. Manchester: St Jerome.
    Wilkinson, M. (2005). Using a Specialized Corpus to Improve Translation Quality. Translation Journal, 9(3). Retrieved from http://translationjournal.net/journal//33corpus.htm
    Xiao, R. (2010). How different is translated Chinese from native Chinese? A corpus-based study of translation universals. International Journal of Corpus Linguistics, 15(1), 5-35. doi:doi:10.1075/ijcl.15.1.01xia
    Zanettin, F. (1998). Bilingual comparable corpora and the training of translators. Meta, 43(4), 616-630. Retrieved from http://www.erudit.org/revue/meta/
    Zanettin, F. (2002). Corpora in translation practice. In Proceedings of the LREC 2002 Workshop on Language Resources for Translation Work and Research (10-14), Las Palmas, Spain.
    Zanettin, F. (2003). Corpora in Translator Education. Manchester: St. Jerome.
    Zhang, Z., Iria, J., Brewster, C., & Ciravegna, F. (2008). A comparative evaluation of term recognition algorithms. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC '08) (2108-2113), Marrakech, Morocco.
    Zweigenbaum, P., Gaussier, E., & Fung, P. (Eds.). (2008). Proceedings of LREC 2008 Workshop on Comparable Corpora. Las Palmas, Spain: LREC.

    王正 (2011)。〈翻譯記憶系統的發展與未來〉。《編譯論叢》(台北),4(1),133-160。
    王正、孫東雲 (2009)。〈利用翻譯記憶系統自建雙語平行語料庫〉。《外語研究》2009(5),80-85。
    王秋湜 (2010)。《基於語料庫對海事商務合同文體特點的分析》。大連海事大學碩士論文。中國優秀碩士學位論文全文數據庫。
    王雪 (2009)。《國際銷售合同翻譯中的語域特徵研究》。大連海事大學碩士論文。中國優秀碩士學位論文全文數據庫。
    王道庚 (2006)。《法律翻譯—理論與實踐》。香港:香港城市大學出版社。
    左尚君、戴光榮 (2011)。〈商務合同英語同義術語連用及其翻譯〉。《術語標準化與信息技術》(福州),(1),39-42。
    朱定初 (2001)。〈談英語法律專門術語之翻譯〉。《翻譯學研究集刊》6,27-52。
    朱定初 (2004)。〈談法律專門術語翻譯之雙重功能對等原則〉。《國立編譯館館刊》32(1),60-66。
    宋雷 (編) (2005)。《法律英語翻譯指南:同義、近義法律用語辨析》。台北:五南。
    宋雷、張紹全 (編) (2010)。《英漢對比法律語言學》。北京:北京大學出版社。
    李文中 (2010)。〈語料庫語言學的研究視野〉。《解放軍外國語學院學報》33(2),37-40, 72。
    李彬、劉挺、秦兵、李生 (2003)。〈基於語義依存的漢語句子相似度計算〉。《哈爾濱工業大學資訊檢索研究室論文集》(1)
    杜金榜 (2004)。《法律語言學》。上海:上海外語教育出版社。
    季培培、鄢小燕、岑咏華 (2010)。〈面向領域中文文本信息處理的術語識別與抽取研究綜述〉。《圖書情報工作》(北京),54(16),124-129。
    林克難 (2006)。〈法律文件宜先看後譯〉。《上海翻譯》(上海),4,40-42。
    林克難 (2007)。〈從信達雅、看易寫到模仿-借用-創新—必須重視實用翻譯理論建設〉。《上海翻譯》(上海),3,5-8。
    林語君、高照明 (2009)。〈結合統計與語言訊息的混合式中英雙語句對應演算法〉。見高照明 (編)《計算語言學論文集》(頁 168-192)。台北:文鶴。
    胡庚申 (2001)。《國際商務合同起草與翻譯》。北京:外文出版社。
    孫萬彪 (2002)。《法律翻譯教程》。上海:上海外語教育出版社。
    高照明、黃居仁 (2009)。〈語料為本的計算與統計的方法〉。見高照明 (編)《計算語言學論文集》(頁 1-25)。台北:文鶴。
    康小麗、章成志、王惠臨 (2009)。〈基於可比語料庫的雙語術語抽取研究述評〉。《現代圖書情報技術》(10),7-13。
    曹永強 (2003)。〈法律英語解構〉。見陸文慧 (編)《法律翻譯:從實踐出發》(二版,頁 65-102)。香港:中華書局。
    梁茂成、李文中、許家金 (2010)。《語料庫應用教程》。北京:外語教學與研究出版社。
    陳克健、馬偉雲、劉興寰、蔡瑜方、戴嘉宏、白明弘、范嘉仁 (2002)。《領域詞典工具》。台北:中央研究院中文詞知識庫小組。網址:http://ckip.iis.sinica.edu.tw/CKIP/tool/。
    陳克健、黃居仁 (2007)。《現代漢語平衡語料庫》(4.0版)。台北:中央研究院。網址:http://www.aclclp.org.tw/use_asbc_c.php。
    陸文慧 (編) (2002)。《法律翻譯:從實踐出發》。香港:中華書局。
    曾守正等 (2006)。《基於文件倉儲之中文文件探勘模式、平行處理架構與其相關推廣應用之研究》。國立高雄第一科技大學資訊管理系。
    馮志偉 (2010)。〈基於語料庫的機器翻譯系統〉。《術語標準化與信息技術》(1),28-35。
    馮志偉、王克非、衛乃興、濮建忠、梁茂成 (2012)。〈語料庫語言學在中國專家論壇發言摘登〉。《外語教學與研究》44(3),371-375。
    劉承愚 (2007)。《如何閱讀英文合約》。台北:益思科技法律事務所。
    劉惠君 (2008)。《實務與範本英文契約之比較探討》。雲林科技大學應用外語系碩士論文。台灣博碩士論文知識加值系統。
    蔡尚憶 (2005)。《我國立法語言之研究》。國立清華大學語言學研究所博士論文。台灣博碩士論文知識加值系統。
    盧敏 (編) (2008)。《英語法律文本的語言特點與翻譯》。上海:上海交通大學出版社。
    魏正怡 (2007)。《台灣常見 OEM/ODM 契約探討與具爭議性條文評論》。雲林科技大學應用外語系碩士論文。台灣博碩士論文知識加值系統。
    蘆巧艷 (2010)。《英文合同語篇的言語行為分析》。寧波大學碩士論文。中國優秀碩士學位論文全文數據庫。

    下載圖示
    QR CODE