研究生: |
陳碧珠 Chen, Pearl |
---|---|
論文名稱: |
利用專門可比語料庫結合機器翻譯自動提取雙語對譯N連詞:以合約文類為例 Using comparable specialized corpora with machine translation for extracting N-gram translation equivalents: A case study of Chinese and English contracts |
指導教授: |
高照明
Gao, Zhao-Ming |
學位類別: |
博士 Doctor |
系所名稱: |
翻譯研究所 Graduate Institute of Translation and Interpretation |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 189 |
中文關鍵詞: | 可比語料庫 、合約翻譯 、機器翻譯 、翻譯記憶系統 、N連詞 、相似度比對 |
英文關鍵詞: | comparable corpora, contract translation, machine translation, translation memory, N-grams, similarity comparison |
論文種類: | 學術論文 |
相關次數: | 點閱:229 下載:41 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究從筆譯職場的合約翻譯需求出發,合約文類是高度專門的領域,其文體迥異於一般文件,但同時又具有制式、重複的特徵,非常適合運用翻譯記憶系統。近十年來翻譯記憶系統在筆譯市場上應用日益普遍,但翻譯記憶系統有賴以雙語對譯的平行資料庫做為檢索依據,將既有的翻譯回收利用;若沒有足夠的翻譯資料庫,工具本身並無法發揮效益。這正是合約翻譯要運用翻譯記憶系統的限制所在,合約文件涉及簽約當事人的敏感機密,雙語對譯語料取得困難,而依賴人工翻譯以累積翻譯資料庫又曠日費時,無法迅速建置合約翻譯語料庫以直接套用於翻譯記憶系統。
因此,本研究從不同語言的單語專門語料庫著手,亦即學界所稱的可比語料庫,以克服翻譯語料不足的困難。專門領域的可比語料庫彼此雖然沒有對譯的關係,但所涵蓋的領域術語、概念及常用表達,必有許多交集且互為翻譯。本研究之目的就在於探討一個可行的方法,利用統計式機器翻譯與字串相似度比對技術,從中文與英文合約可比語料庫當中自動提取雙語對譯的連續詞串,亦即N連詞 (N-gram)。
研究方法首先以網際網路為語料來源,建置中文與英文合約可比語料庫;其次利用語料庫檢索工具,提取合約的主題詞與關鍵主題詞,再依據這些核心主題詞建立N連詞。接下來應用Google譯者工具包自動翻譯服務,分別產生中文與英文N連詞的機器譯文。最後,借用翻譯記憶系統的相似度比對功能,以英文合約N連詞與「中文N連詞機器英譯」進行字串相似度比對,兩者若完全相同或高度相符,即表示該英文N連詞與對應的中文N連詞極可能互為翻譯。中文N連詞到英文N連詞的配對提取,同樣以英文做為相似度比對的中介語言,所得到的中英對譯N連詞經由專家評估後,發現高度相符 (95% 以上) 的三連詞至六連詞,對應正確率達到83%。
研究結果顯示,本論文提出的方法,技術上相對簡單且可行,能夠具體提取出互為翻譯的中英文N連詞。在筆譯實務上,這些對譯的N連詞可以直接匯入翻譯記憶系統做為翻譯資源,或做為檢索關鍵詞,以檢索合約語料庫的相關術語、概念、搭配詞、句型、語境,尤其能夠找出雖非直接對譯但內容相關的中文與英文條款,協助譯者提高翻譯效率及品質。相同的資源也可應用於合約翻譯教學,利用中文與英文合約專門語料庫做為平行文本,再搭配中英對譯N連詞,學生可以有效快速檢索出所需的合約術語、概念、句型及其對譯表達,大幅縮短資料搜尋時間與學習曲線。在計算語言學領域,本論文提出的方法對於資訊工程、機器翻譯、翻譯記憶系統開發等領域也可有參考價值,能夠進一步擷取中英合約的對譯術語,甚至擴展至合約對譯句的擷取。
This study is motivated by the need of contract translation. Business contracts belong to a highly specialized genre, characterized by specific vocabularies, domain terms, formulaic expressions and repetitive standard clauses. These features make contract texts an ideal candidate for applying a Translation Memory System, or TM, which searches and retrieves past translations from a database of source texts and their equivalents in the target language in aligned segments. A TM system thus requires a large database of past translations (known as a parallel corpus) in order to get the best result. And there lies the difficulty in using a TM system for English-Chinese contract translation, as parallel corpora of English and Chinese contracts are scarcely available.
To overcome the limitations of parallel corpora, this study turns to comparable corpora, i.e. monolingual corpora of similar design in two or more languages. Comparable corpora of a specialized domain, though not direct translations of each other, contain domain terms, concepts and fixed expressions that are mutual translations. This study aims to explore a simple yet effective method for extracting such translation equivalents from a comparable corpus of Chinese and English contracts by employing statistical machine translation and string similarity comparison.
First, a comparable corpus of Chinese and English contracts is built from texts mined from the Internet. Second, keyword and key keyword lists are built with concordancer tools, on which Chinese and English N-grams are then built. Third, the N-grams are translated into English and Chinese respectively with Google Translator Toolkit. And finally, the English N-grams are compared with the Google-translated English, using the built-in similarity comparison function of a TM system. English N-grams that meet or exceed a pre-defined match value are automatically mapped to the corresponding Chinese N-grams to establish a list of English-Chinese N-gram pairs. Chinese N-grams are also mapped to possible English N-gram translation equivalents following the same procedures. These N-gram pairs are evaluated by experienced contract translators, and the results show that 3-word to 6-word N-grams with a match value of 95% and above have a mapping accuracy of 82%.
The results show that the method employed is technically simple yet effective. For contract translators, the correctly mapped N-gram pairs can be imported to a TM system as a translation resource, or they can be used as concordance search keywords to retrieve from the comparable corpus needed terms, concepts, collocations, adequate sentence patterns and contexts. The same resources can apply to translator training. Students will benefit from authentic parallel texts, and using the Chinese-English N-gram pairs will improve search results and shorten the learning curve. For computational linguistics, the findings in this paper may suggest further study into extraction of contract terms or even sentence-level translation equivalents from comparable corpora.
Acuyo-Verdejo, M. d. C. (2004). Textual knowledge in legal translation. Hermes, Journal of Linguistics, 32, 167-184. Retrieved from http://download2.hermes.asb.dk/
Anderman, G., & Rogers, M. (Eds.). (2008). Incorporating Corpora: The Linguist and the Translator. Clevedon/Buffalo/Toronto: Multilingual Matters.
Anesa, P. (2007). Vagueness and precision in contracts: a close relationship. Linguistica e Filologia, 24, 7-38. doi:10446/109
Anthony, L. (2011). AntConc (Version 3.3.5). Tokyo: Waseda University. Available at http://www.antlab.sci.waseda.ac.jp/
Archer, D. (Ed.). (2009). What's in a Word-list? Investigating Word Frequency and Keyword Extraction. Oxford: Ashgate.
Ari, O. (2006). Review of three software programs designed to identify lexical bundles. Language Learning & Technology, 10(1), 30-37. Retrieved from llt.msu.edu
Baker, M. (1993). Corpus Linguistics and Translation Studies: Implications and Applications. In M. Baker, G. Francis & E. Tognini-Bonelli (Eds.), Text and Technology: In Honour of John Sinclair (pp. 233-250). Amsterdam/Philadelphia: John Benjamins.
Baker, M. (1995). Corpora in translation studies: An overview and some suggestions for future research. Target, 7(2), 223-243. doi:10.1075/target.7.2.03bak
Barnbrook, G., Danielsson, P., & Mahlberg, M. (Eds.). (2004). Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora. London/New York: Continuum
Beeby, A., Ines, P. R., & Sanchez-Gijon, P. (Eds.). (2009). Corpus Use and Translating: Corpus use for learning to translate and learning corpus use to translate. Amsterdam/Philadelphia: John Benjamins.
Berber-Sardinha, T. (2000). Comparing corpora with WordSmith Tools: How large must the reference corpus be? Paper presented at the Workshop on Comparing Corpora, Hong Kong, China.
Beveridge, B. J. (2003). Legal English: How it developed and why it is not appropriate for international commercial contracts. Hieronymus, 2003(1). Retrieved from http://www.tradulex.org/
Bhatia, V. K. (1983). Applied Discourse Analysis of English Legislative Writing. University of Aston in Birmingham: A Language Studies Research Report.
Bhatia, V. K., Candlin, C. N., & Engberg, J. (Eds.). (2008). Legal Discourse across Cultures and Systems. Hong Kong: Hong Kong University Press.
Bhatia, V. K., Langton, N. M., & Lung, J. (2004). Legal discourse: Opportunities and threats for corpus linguistics. In U. Connor & T. A. Upton (Eds.), Discourse in the Professions (pp. 203-226). Amsterdam/Philadelphia: John Benjamins.
Biber, D., Connor, U., & Upton, T. A. (2007). Discourse on the Move: Using Corpus Analysis to Describe Discourse Structure. Amsterdam/Philadelphia: John Benjamins.
Biber, D., & Conrad, S. (2009). Register, Genre, and Style. Cambridge: Cambridge University Press.
Biel, L. (2009a). Corpus-based studies of legal language for translation purposes: methodological and practical potential. In Proceedings of the XVII European LSP Symposium 2009, Aarhus, Denmark.
Biel, L. (2009b). Organization of background knowledge structures in legal language and related translation problems. Comparative Legilinguistics. International Journal for Legal Communication, 1, 176-189. Retrieved from http://ug.academia.edu/LucjaBiel/
Blom, B., & Trosborg, A. (1992). An analysis of regulative speech acts in English contracts. Hermes, Journal of Linguistics, 9. Retrieved from http://download2.hermes.asb.dk/
Bondi, M., & Scott, M. (Eds.). (2010). Keyness in Texts. Amsterdam/Philadelphia: John Benjamins.
Borja Albi, A. (2007). Corpora for translators in Spain. The CDJ-GITRAD Corpus and the GENTT Project. Multilingual Matters, (Incorporating Corpora - The Linguist and the Translator), 243-265. Retrieved from http://www.gentt.uji.es/?q=publications
Bowker, L. (1998a). Corpus exploitation focused terminological research. Terminologies Nouvelles, 18(June), 22-27.
Bowker, L. (1998b). Using specialized monolingual native-language corpora as a translation resource. Meta, 43(4), 631-651. Retrieved from http://www.erudit.org/revue/meta/
Bowker, L. (2000). Towards a methodology for exploiting specialized target language corpora as translation resources. International Journal of Corpus Linguistics, 5(1), 17-52. doi:10.1075/ijcl.5.1.03bow
Bowker, L. (2002). Computer-Aided Translation Technology: A Practical Introduction. Ottawa: University of Ottawa Press.
Bowker, L., & Barlow, M. (2004). Bilingual concordancers and translation memories: a comparative evaluation. In Proceedings of the Second International Workshop on Language Resources for Translation Work, Research and Training (70-79), Geneva, Switzerland.
Bowker, L., Mcbride, C., & Marshman, E. (2008). Getting more than you paid for? Considerations in integrating free and low-cost technologies into translator training programs. redit(1), 26-47. Retrieved from http://www.redit.uma.es/
Bowker, L., & Pearson, J. (2002). Working with Specialized Language: a Practical Guide to Using Corpora. London: Routledge.
Cao, D. (1997). Consideration in translating English/Chinese contracts. Meta, 42(4), 661-669. Retrieved from http://www.erudit.org/revue/meta/
Cao, D. (2007). Translating Law. Clevedon/Buffalo/Toronto: Multilingual Matters.
Carvalho, L. (2008). Translating Contracts and Agreements: a Corpus Linguistics Perspective. Journal of Legal Culture, 3(1), 1-15. Retrieved from http://legalcultures.com/en/pdf/vol3num1/carvalho.pdf
Cheng, W., Greaves, C., Sinclair, J., & Warren, M. (2008). Uncovering the Extent of the Phraseological Tendency: Towards a Systematic Analysis of Concgrams. Applied Linguistics, 30(2), 236-252. doi:10.1093/applin/amn039
Cheng, W., Greaves, C., & Warren, M. (2006). From n-gram to skipgram to concgram. International Journal of Corpus Linguistics, 11(4), 411-433. doi:10.1075/ijcl.11.4.04che
Corpas Pastor, G., & Dominguez, M. S. (2007). Surfing the Net: an R&D Project on Tourism Contracts. Paper presented at the 10th International Symposium on Social Communication, Santiago de Cuba.
Coulthard, M., & Johnson, A. (2007). An Introduction to Forensic Linguistics: Language in Evidence. London/New York: Routledge.
Coulthard, M., & Johnson, A. (Eds.). (2010). The Routledge Handbook of Forensic Linguistics. London/New York: Routledge.
Curtotti, M., & McCreath, E. (2010). Corpus based classification of text in Australian contracts. In Proceedings of the Australasian Language Technology Association Workshop (18-26), Melbourne, Australia.
Curtotti, M., & McCreath, E. (2011). A corpus of Australian contract language: description, profiling and analysis. In Proceedings of the 13th International Conference on Artificial Intelligence and Law (199-208), Pittsburgh, PA, USA.
Davies, M. (2008-). The Corpus of Contemporary American English: 425 million words, 1990-present. Available at http://corpus.byu.edu/coca/
Denyer, L. F. (2008). Corpus study carried out on three "legal" verbs to demonstrate their similar and different usage for the purposes of legal translators and lawyer-linguists. In Proceedings of the International Symposium on Using Corpora in Contrastive and Translation Studies, Hangzhou, China.
Dorr, B. J., Hovy, E. H., & Levin, L. S. (2004). Machine Translation: Interlingual Methods. In K. Brown (Ed.), Encyclopedia of Language and Linguistics (2nd ed., pp. 383-394): Elsevier.
DuBay, W. H. (2004). The Principles of Readability. Costa Mesa, California: Impact Information.
Duibhin, C. O. (2008). Windows Interface for Stuttgart Tree Tagger. Available at http://www.smo.uhi.ac.uk/~oduibhin/oideasra/interfaces/winttinterface.htm
Evison, J. (2010). What are the basics of analysing a corpus? In A. O'Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 122-135). London/New York: Routledge.
Faber, D., & Lauridsen, K. (1991). The compilation of a Danish-English-French corpus in contract law. In S. Johansson & A.-B. Stenstrom (Eds.), English Computer Corpora. Selected Papers and Research Guide (pp. 235-243). Berlin: Mouton de Gruyter.
Fitzpatrick, E. (Ed.). (2006). Corpus Linguistics Beyond the Word: Corpus Research from Phrase to Discourse. Amsterdam/New York: Rodopi.
Flowerdew, L. (2005). An integration of corpus-based and genre-based approaches to text analysis in EAP/ESP: countering criticisms against corpus-based methodologies. English for Specific Purposes, 24, 321-332. doi:10.1016/j.esp.2004.09.002
Francesconi, E., Montemagni, S., Peters, W., & Tiscornia, D. (Eds.). (2010). Semantic Processing of Legal Texts: Where the Language of Law Meets the Law of Language. London/New York: Springer.
Garcia, I. (2009). Beyond Translation Memory: Computers and the Professional Translator. The Journal of Specialised Translation, 12, 199-214. Retrieved from http://www.jostrans.org/issue12/art_garcia.pdf
Garcia, I. (2010). Is machine translation ready yet? Target, 22(1), 7-21. doi:10.1075/target.22.1.02gar
Gibbons, J., & Turell, M. T. (Eds.). (2008). Dimensions of Forensic Linguistics. Amsterdam/Philadelphia: John Benjamins.
Granger, S., & Meunier, F. (Eds.). (2008). Phraseology: An interdisciplinary perspective. Amsterdam/Philadelphia: John Benjamins.
Grewendorf, G., & Rathert, M. (Eds.). (2009). Formal Linguistics and Law. Berlin/New York: Walter de Gruyter.
Hartmann, R. R. K. (1994). The use of parallel text corpora in the generation of translation equivalents for bilingual lexicography. In Proceedings of the Euralex 1994 (291-297), Amsterdam.
Hartmann, R. R. K. (1996). Contrastive textology and corpus linguistics: On the value of parallel texts. Language Sciences, 18(3-4), 947-957.
Helft, M. (March 8, 2010). Google's computing power refines translation tool. New York Times.
Holtz, M. (2011). Lexico-grammatical properties of abstracts and research articlesDarmstadt). Available from Technische Universitat Darmstadt. Retrieved from http://tuprints.ulb.tu-darmstadt.de/2638/1/PhD-Thesis-Monica-Holtz.pdf
Holtz, M., & Teich, E. (2009). Design of the Darmstadt Scientific Text Corpus. Technische Universität Darmstadt. Retrieved from http://www.linglit.tu-darmstadt.de/fileadmin/linglit/holtz/DaSciTex/dfg-corpus-design.pdf
Hunston, S. (2011). Corpus approaches to evaluation: Phraseology and evaluative language. New York/London: Routledge.
Hutchins, J. (2003). The development and use of machine translation systems and computer-based translation tools. International Journal of Translation, 15(1), 5-26.
Hutchins, J. (2009). Compendium of Translation Software: directory of commercial machine translation systems and computer-aided translation support tools. Retrieved from http://www.hutchinsweb.me.uk/index.html
Hutchins, J. (2010). Machine translation: a concise history. Journal of Translation Studies, 13(1-2), 29-70.
Islam, A., & Inkpen, D. (2008). Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data (TKDD), 2(2), Article 10. doi:10.1145/1376815.1376819
Kang, N., & Yu, Q. (2011). Corpus-based stylistic analysis of tourism English. Journal of Language Teaching and Research, 2(1), 129-136. Retrieved from http://www.academypublisher.com/jltr/vol02/no01/jltr0201.pdf doi:10.4304
Klinge, A. (1995). On the linguistic interpretation of contractual modalities. Journal of Pragmatics, 23(6), 649-675. doi:10.1016/0378-2166(94)00051-F
Koehn, P. (2005). Europarl: a parallel corpus for statistical machine translation. In Proceedings of the Machine Translation Summit X (79-86).
Kredens, K., & Gozdz-Roszkowski, S. (Eds.). (2007). Language and the Law: international outlooks: Peter Lang.
Kubler, N., & Aston, G. (2010). Using corpora in translation. In A. O'Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 501-515). London/New York: Routledge.
Lauridsen, K. M. (1992). The meaning and use of the modals CAN and MAY in English contract law texts. Hermes, Journal of Linguistics, 9. Retrieved from http://download2.hermes.asb.dk/
Laviosa, S. (1995). The design and analysis of a comparable corpus of English newspaper articles. Outros Temas, 307-314. Retrieved from ler.letras.up.pt
Lee, D. Y. W. (2010). What corpora are available? In A. O'Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 107-121). London/New York: Routledge.
Loginova, E., Gojun, A., Blancafort, H., Guegan, M., Gornostay, T., & Heid, U. (2012). Reference lists for the evaluation of term extraction tools. In Proceedings of the Terminology and Knowledge Engineering Conference (TKE 2012), Madrid.
Ma, X. (2000). Hong Kong Laws Parallel Text. Philadelphia: Linguistic Data Consortium. Retrieved from http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2000T47
Maher, A., Waller, S., & Kerans, M. E. (2008). Acquiring or enhancing a translation specialism: the monolingual corpus-guided approach. The Journal of Specialised Translation, 10, 56-75. Retrieved from http://www.jostrans.org/issue10/art_maher.pdf
Mattila, H. E. S. (2006). Comparative legal linguistics (Translated by Christopher Goddard ed.). Aldershot: Ashgate.
Maynard, D., & Ananiadou, S. (1999). Term extraction using a similarity-based approach. In D. Bourigault, C. Jacquemin & M.-C. Lhomme (Eds.), Recent Advances in Computational Terminology (pp. 261-278). Amsterdam: John Benjamins.
McCarthy, M., & O'Keeffe, A. (2010). What are corpora and how have they evolved? In A. O'Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 3-13). London/New York: Routledge.
McEnery, T. (2006). Keywords and moral panics: Mary Whitehouse and media censorship. In Proceedings of the AHRC ICT Methods Network Expert Seminar on Linguistics, Lancaster University, UK.
McEnery, T., & Xiao, R. (2005). Parallel and comparable corpora: What are they up to? In G. James (Ed.), Corpus Linguistics and Translation Studies. Clevedon: Multilingual Matters.
Monzo Nebot, E. (2008). Corpus-based activities in legal translator training. Interpreter and Translator Trainer, 2(2), 221-252. Retrieved from https://www.stjerome.co.uk/periodicals/
Morris, M. (1995). Translation and the Law. Amsterdam/Philadelphia: John Benjamins.
Nord, C. (1997). Translating as a Purposeful Activity: Functionalist Approaches Explained. Manchester: St Jerome.
Nord, C. (2007). Looking for help in the translation process -- the role of auxiliary texts in translator training and translation practice. Chinese Translators Journal, 1, 17-26.
Norre Nielsen, J., & Wichmann, A. (1994). A frequency analysis of selected modal expressions in German and English legal texts. Hermes, Journal of Linguistics, 13, 145-155. Retrieved from http://download2.hermes.asb.dk/
Ohara, M., Matsubara, S., & Inagaki, Y. (2003). Automatic Extraction of Translation Patterns from Bilingual Legal Corpus.
Pazienza, M., Pennacchiotti, M., & Zanzotto, F. (2005). Terminology extraction: an analysis of linguistic and statistical approaches. Knowledge Mining(185), 255-279. Retrieved from http://ai-nlp.info.uniroma2.it/pennacchiotti/publications/SFSC_2005.pdf
Philip, G. (1999). Computer corpora and the law: a new approach to the translation of legal terms. In Proceedings of the International Association of Forensic Linguistics Fourth Biennial Conference, Birmingham, UK.
Pollach, I. (2006). Electronic word of mouth: A genre analysis of product reviews on consumer opinion web sites. In Proceedings of the 39th Hawaii International Conference on System Sciences, Hawaii.
Quah, C. K. (2006). Translation and Technology: Palgrave Macmillan.
Ramos, L. C. (2010). Post-editing free machine translation: from a language vendor’s perspective. In Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas (AMTA-2010), Denver, Colorado.
Roukos, S., Graff, D., & Melamed, D. (1995). Hansard French/English. Philadelphia: Linguistic Data Consortium. Retrieved from http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20
Sarcevic, S. (1997). New Approach to Legal Translation. The Hague/London/New York: Kluwer Law International.
Sarcevic, S. (2000). Legal translation and translation theory: a receiver-oriented approach. In Proceedings of the International Colloquium on Legal Translation: History, Theory/ies and Practice, Geneva, Switzerland.
Schmid, H. (1994). TreeTagger. University of Stuttgart. Available at http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
Scott, M. (2009). In Search of a Bad Reference Corpus. In D. Archer (Ed.), What's in a word-list? Investigating word frequency and keyword extraction (pp. 79-92). Oxford: Ashgate.
Scott, M. (2012). WordSmith Tools (Version 6.0). Liverpool: Lexical Analysis Software. Available at http://www.lexically.net/publications/citing_wordsmith.htm
Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam/Philadelphia: John Benjamins.
Semino, E., & Short, M. (2004). Corpus Stylistics: Speech, writing and thought presentation in a corpus of English writing. London/New York: Routledge.
Sharoff, S. (2006). Translation as problem solving: uses of comparable corpora. In Proceedings of the International Workshop on Language Resources for Translation Work, Research and Training, Genoa, Italy.
Sharoff, S., Babych, B., & Hartley, A. (2006a). Using collocations from comparable corpora to find translation equivalents. In Proceedings of the 2006 International Conference on Language Resources and Evaluation (465-470), Genoa, Italy.
Sharoff, S., Babych, B., & Hartley, A. (2006b). Using comparable corpora to solve problems difficult for human translators. In Proceedings of the COLING-ACL 2006 Conference, Sydney, Australia.
Sharoff, S., Babych, B., & Hartley, A. (2009). Irrefragable answers: Using comparable corpora to retrieve translation equivalents. Lang Resources and Evaluation, 43(1), 15-25. Retrieved from http://www.springerlink.com/content/8k6631431pl3538l/ doi:10.1007/s10579-007-9046-4
Sinclair, J. M. (2005a). Corpus and text: Basic principles. In M. Wynne (Ed.), Developing Linguistic Corpora: A Guide to Good Practice (pp. 1-16). Oxford: Oxbow Books.
Sinclair, J. M. (2005b). The phrase, the whole phrase and nothing but the phrase (Phraseology 2005 plenary lecture). In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 407-410). Amsterdam/Philadelphia: John Benjamins.
Sloculn, J. (1985). A survey of machine translation: Its history, current status, and future prospects. Computational Linguistics, 11(1), 1-17.
Snell-Hornby, M. (1988). Translation Studies: An Integrated Approach. Amsterdam/Philadelphia: John Benjamins.
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., & Tufis, D. (2006). The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th International Conference on Language Resources and Evaluation, (LREC'2006).
Stubbs, M. (2001). Texts, corpora, and problems of interpretation: A response to Widdowson. Applied Linguistics, 22(2), 149-172. doi:10.1093/applin/22.2.149
Stubbs, M. (2002). Words and Phrases: Corpus Studies of Lexical Semantics. Oxford: Blackwell.
Stubbs, M. (2009). The search for units of meaning: Sinclair on empirical semantics. Applied Linguistics, 30(1), 115-137. doi:10.1093/applin/amn052
Stubbs, M. (2010). Three concepts of keywords. In M. Bondi & M. Scott (Eds.), Keyness in Texts (pp. 21-42). Amsterdam/Philadelphia: John Benjamins.
Su, F., & Babych, B. (2012). Measuring comparability of documents in non-parallel corpora for efficient extraction of (semi-) parallel translation equivalents. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (10-19), Avignon, France.
Taylor, C. (2008). What is corpus linguistics? What the data says. ICAME Journal, 32, 143-164.
Teich, E. (2009). Exploring a corpus of scientific texts using data mining. Language and Computers, 71(1), 233-247. Retrieved from http://www.l3s.de/web/upload/documents/1/teich-fankhauser-final09.pdf
Thurmair, G. (2003). Making term extraction tools usable. In Proceedings of the EAMT-CLAW 03, Dublin.
Tognini-Bonelli, E. (2010). Theoretical overview of the evolution of corpus linguistics. In A. O'Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 14-27). London/New York: Routledge.
Triebel, V. (2009). Pitfalls of English as a Contract Language. In F. Olsen, A. Lorz & D. Stein (Eds.), Translation Issues in Language and Law (pp. 147-181). New York: Palgrave Macmillan.
Trosborg, A. (1991). An analysis of legal speech acts in English contract law. Hermes, Journal of Linguistics, 6, 65-90. Retrieved from http://download2.hermes.asb.dk/
Trosborg, A. (1995). Statutes and contracts: An analysis of legal speech acts in the English language of the law. Journal of Pragmatics, 23, 31-53. doi:10.1016/0378-2166(94)00034-C
Trosborg, A. (1997). Rhetorical Strategies in Legal Language: Discourse Analysis of Statutes and Contracts. Tubingen: Gunter Narr Verlag.
Varo, E. A., & Hughes, B. (2002). Legal translation explained. Manchester: St Jerome.
Wilkinson, M. (2005). Using a Specialized Corpus to Improve Translation Quality. Translation Journal, 9(3). Retrieved from http://translationjournal.net/journal//33corpus.htm
Xiao, R. (2010). How different is translated Chinese from native Chinese? A corpus-based study of translation universals. International Journal of Corpus Linguistics, 15(1), 5-35. doi:doi:10.1075/ijcl.15.1.01xia
Zanettin, F. (1998). Bilingual comparable corpora and the training of translators. Meta, 43(4), 616-630. Retrieved from http://www.erudit.org/revue/meta/
Zanettin, F. (2002). Corpora in translation practice. In Proceedings of the LREC 2002 Workshop on Language Resources for Translation Work and Research (10-14), Las Palmas, Spain.
Zanettin, F. (2003). Corpora in Translator Education. Manchester: St. Jerome.
Zhang, Z., Iria, J., Brewster, C., & Ciravegna, F. (2008). A comparative evaluation of term recognition algorithms. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC '08) (2108-2113), Marrakech, Morocco.
Zweigenbaum, P., Gaussier, E., & Fung, P. (Eds.). (2008). Proceedings of LREC 2008 Workshop on Comparable Corpora. Las Palmas, Spain: LREC.
王正 (2011)。〈翻譯記憶系統的發展與未來〉。《編譯論叢》(台北),4(1),133-160。
王正、孫東雲 (2009)。〈利用翻譯記憶系統自建雙語平行語料庫〉。《外語研究》2009(5),80-85。
王秋湜 (2010)。《基於語料庫對海事商務合同文體特點的分析》。大連海事大學碩士論文。中國優秀碩士學位論文全文數據庫。
王雪 (2009)。《國際銷售合同翻譯中的語域特徵研究》。大連海事大學碩士論文。中國優秀碩士學位論文全文數據庫。
王道庚 (2006)。《法律翻譯—理論與實踐》。香港:香港城市大學出版社。
左尚君、戴光榮 (2011)。〈商務合同英語同義術語連用及其翻譯〉。《術語標準化與信息技術》(福州),(1),39-42。
朱定初 (2001)。〈談英語法律專門術語之翻譯〉。《翻譯學研究集刊》6,27-52。
朱定初 (2004)。〈談法律專門術語翻譯之雙重功能對等原則〉。《國立編譯館館刊》32(1),60-66。
宋雷 (編) (2005)。《法律英語翻譯指南:同義、近義法律用語辨析》。台北:五南。
宋雷、張紹全 (編) (2010)。《英漢對比法律語言學》。北京:北京大學出版社。
李文中 (2010)。〈語料庫語言學的研究視野〉。《解放軍外國語學院學報》33(2),37-40, 72。
李彬、劉挺、秦兵、李生 (2003)。〈基於語義依存的漢語句子相似度計算〉。《哈爾濱工業大學資訊檢索研究室論文集》(1)
杜金榜 (2004)。《法律語言學》。上海:上海外語教育出版社。
季培培、鄢小燕、岑咏華 (2010)。〈面向領域中文文本信息處理的術語識別與抽取研究綜述〉。《圖書情報工作》(北京),54(16),124-129。
林克難 (2006)。〈法律文件宜先看後譯〉。《上海翻譯》(上海),4,40-42。
林克難 (2007)。〈從信達雅、看易寫到模仿-借用-創新—必須重視實用翻譯理論建設〉。《上海翻譯》(上海),3,5-8。
林語君、高照明 (2009)。〈結合統計與語言訊息的混合式中英雙語句對應演算法〉。見高照明 (編)《計算語言學論文集》(頁 168-192)。台北:文鶴。
胡庚申 (2001)。《國際商務合同起草與翻譯》。北京:外文出版社。
孫萬彪 (2002)。《法律翻譯教程》。上海:上海外語教育出版社。
高照明、黃居仁 (2009)。〈語料為本的計算與統計的方法〉。見高照明 (編)《計算語言學論文集》(頁 1-25)。台北:文鶴。
康小麗、章成志、王惠臨 (2009)。〈基於可比語料庫的雙語術語抽取研究述評〉。《現代圖書情報技術》(10),7-13。
曹永強 (2003)。〈法律英語解構〉。見陸文慧 (編)《法律翻譯:從實踐出發》(二版,頁 65-102)。香港:中華書局。
梁茂成、李文中、許家金 (2010)。《語料庫應用教程》。北京:外語教學與研究出版社。
陳克健、馬偉雲、劉興寰、蔡瑜方、戴嘉宏、白明弘、范嘉仁 (2002)。《領域詞典工具》。台北:中央研究院中文詞知識庫小組。網址:http://ckip.iis.sinica.edu.tw/CKIP/tool/。
陳克健、黃居仁 (2007)。《現代漢語平衡語料庫》(4.0版)。台北:中央研究院。網址:http://www.aclclp.org.tw/use_asbc_c.php。
陸文慧 (編) (2002)。《法律翻譯:從實踐出發》。香港:中華書局。
曾守正等 (2006)。《基於文件倉儲之中文文件探勘模式、平行處理架構與其相關推廣應用之研究》。國立高雄第一科技大學資訊管理系。
馮志偉 (2010)。〈基於語料庫的機器翻譯系統〉。《術語標準化與信息技術》(1),28-35。
馮志偉、王克非、衛乃興、濮建忠、梁茂成 (2012)。〈語料庫語言學在中國專家論壇發言摘登〉。《外語教學與研究》44(3),371-375。
劉承愚 (2007)。《如何閱讀英文合約》。台北:益思科技法律事務所。
劉惠君 (2008)。《實務與範本英文契約之比較探討》。雲林科技大學應用外語系碩士論文。台灣博碩士論文知識加值系統。
蔡尚憶 (2005)。《我國立法語言之研究》。國立清華大學語言學研究所博士論文。台灣博碩士論文知識加值系統。
盧敏 (編) (2008)。《英語法律文本的語言特點與翻譯》。上海:上海交通大學出版社。
魏正怡 (2007)。《台灣常見 OEM/ODM 契約探討與具爭議性條文評論》。雲林科技大學應用外語系碩士論文。台灣博碩士論文知識加值系統。
蘆巧艷 (2010)。《英文合同語篇的言語行為分析》。寧波大學碩士論文。中國優秀碩士學位論文全文數據庫。