簡易檢索 / 詳目顯示

研究生: 羅巧珊
Chiao-Shan, Lo
論文名稱: 中文動詞上下位關係自動標記法
Automatic Labeling of Hypernymy- Troponymy Relation for Chinese Verbs
指導教授: 謝舒凱
Hsieh, Shu-Kai
學位類別: 碩士
Master
系所名稱: 英語學系
Department of English
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 122
中文關鍵詞: 語義關係自動標記動詞詞彙語意動詞上下位關係中文詞網
英文關鍵詞: Automatic extraction, Lexical semantic relation, Troponymy, Chinese WordNet
論文種類: 學術論文
相關次數: 點閱:278下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,詞彙網路(Wordnet)已成為計算語言學相關領域中最為普遍利用的資源之一,對於資訊檢索(Information Retrieval)或是自然語言處理 (Natural Language Processing)的發展有相當大的幫助。詞彙網路是由同義詞集(Synset)以及詞彙語意關係(Lexical Semantic Relation)所建
    構而成,例如以英語為主的普林斯頓詞網(Princeton WordNet)、以及結合多個歐洲語言的歐語詞網(EuroWordNet)等,建構皆已相當完善。然而,一個詞網的建構並非一時一人之力所能完成,其所需要的人力以及耗費的時間相當可觀。因此,如何有效率並有系統的建構一個詞網是近年來研究致力的目標。而詞彙間的語意關係是構成一個詞網的主要元素,因此,如何自動化的抽取詞彙語義關係是建構詞網的重要步驟之一。中研院語言所已建立一個以中頻詞為主的中文詞彙網路(Chinese WordNet, CWN),旨在提供完整的中文辭彙之詞義區分。然而,在目前中文詞彙網路系統中,同義詞集間相互的語意關係乃是採用人為判定標記,且這些標記之數量尚未達成可行應用之一定規模。因此,本研究提出一套半自動化的方法來自動標記詞彙間的語意關係,本篇論文針對動詞之間的上下位詞彙語意關係(Hypernymy-troponymy elation),提出一種自動標記的方法,並抽取具有中文上下位關係之中文動詞組對。

    本篇論文提出兩種並行之方法,第一,藉由句法上特定的句型(lexical syntactic pattern),自動抽取出中文詞彙網路中具有上下位關係之動詞組。第二,我們利用bootstrapping的方法,透過中研院建構的中英雙語詞網(Sinica Bow)大量將普林斯頓英語詞網中的語意關係對映至中文。實驗結果顯示,此系統能快速並大量地自動抽取出具有上下位語意關係之中文動詞組,本論文盼能將此方法應用於正在發展中的中文詞網自動語意關係標記,以及知識本體之自動建構,進而能有效率的建構完善的中文詞彙知識資源。

    WordNet-like databases have become crucial sources for lexical semantic studies and computational linguistic applications such as Information Retrieval (IR) and Natural Language Processing (NLP). The fundamental elements of WordNet are synsets (the synonymous grouping of words) and semantic relations among synsets. However, creating such a lexical network is a time-consuming and labor-intensive project. In particular, for those languages with few resources such as Chinese, is even difficult. Chinese WordNet (CWN), which composed of middle frequency words, has been launched by Academia Sinica based on the similar paradigm as Princeton WordNet. The synset that each word sense locates in CWN is manually labeled. However, the lexical semantic relations among synsets in CWN are only partially constructed and lack of systematic labeling. Therefore, in this thesis, two independent approaches
    were proposed to automatically harvesting lexical semantic relations, especially focused on the hypernymy-troponymy relation of verbs.

    This thesis describes two approaches for discovering hypernymy-troponymy relation among verbs. Syntactic pattern-based approach is used for that sentence structures can always denote relations and reveal information among lexical entries. Bootstrapping approach, on the other
    hand, aims at exploiting an already existing database and combining them within a common, standard framework. From a large scale of input data, our proposed approaches can greatly and rapidly extract verb pairs that are in hypernymy-troponymy relation in Chinese, aiding the construction of lexical database in a more effective way. In addition, it is hoped that these approaches will shed light on the task of automatic acquisition of other Chinese lexical semantic relations and ontology learning as well.

    1 Introduction 1  1.1 Background . . . . . . . . . . 1  1.2 Motivation . . . . . . . . . . 3  1.3 Organization of the Thesis . . 4 2 Related Works . . . . . . . . . 5  2.1 WordNet-like Resources . . . . 5   2.1.1 Princeton WordNet . . . 6   2.1.2 EuroWordNet . . . .. . . 7   2.1.3 Sinica Bow . . . . . . . 8   2.1.4 Chinese WordNet . . . . . . 10   2.1.5 HowNet . . . . . . . . . . 12  2.2 Semantic Relations of Verbs . 13   2.2.1 Semantic Relations of Verbs in WordNet . . . 13   2.2.2 Semantic Relations of Verbs in EuroWordNet . . 16   2.2.3 Other Relations of Verbs . . . . 20  2.3 Troponymy . . . . . . . . . . . . 22   2.3.1 Definition of Troponymy . . . .. 24   2.3.2 Distinguishing Manner . . . . . 26  2.4 Automatic Discovery of Lexical Semantic Relation . .                                  28   2.4.1 Lexico Syntactic Pattern–Based Approach . . . .                                  29   2.4.2 Clustering-Based Approach . . .. . . . . . 32   2.4.3 Bootstrapping Approach . . . . . . . . .33  2.5 Summary . . . . . . . . . . . . . . . . 35 3 Methodology. . . . . . 37  3.1 Syntactic Pattern-Based Approach . . . . . 37   3.1.1 Database: Chinese WordNet . . . . .. . . 37   3.1.2 Data Pre-processing . . . . . . . . . . 39   3.1.3 Syntactic Patterns in Chinese . . . . . 41   3.1.4 Procedure . . . . . . . . . .. . . . . . 42  3.2 Bootstrapping Approach . . . . . . . . . . 44   3.2.1 Data Source . . . . . . . . .. . . . . . 46   3.2.2 Procedure . . . . . . . . . . . . . . . 48  3.3 Evaluation and Scoring . . . . . . . . . . 49   3.3.1 Evaluation . . . .. . . . . . . . . . . 50   3.3.2 Scoring . . . . . . . . . . . . . . . . 54  3.4 Summary . . . . . . . . . . . . . . . . . 55 4 Results and Error Analyses . . . . . . .56  4.1 Results from Syntactic Pattern- based Approach . 56   4.1.1 Error Analyses . . . . . . . . . . . . 58   4.1.2 Interim Summary . . . . . . . . . . . . 68  4.2 Results from Bootstrapping Approach . . . . 69    4.2.1 Error Analyses . . . . . . . . 70  4.3 Discussion . . . . . . . . . . . . 81   4.3.1 Comparison of Two Approaches . . . . . . 81   4.3.2 Comparison of the Results . . . . . . . . 83   4.3.3 Comparison of the Error Types . . . . . . 86   4.3.4 General Discussion . . . . .. . . . . . . 89  4.4 Summary . . . . . . . . . . . . . . . . . . 91 5 Conclusion . . . . . . . . . . . 92  5.1 Summary of the Thesis . . . . . . . . . . . 92  5.2 Contribution . . . . . . . . . . . . . . . 94  5.3 Limitations of the Present Study and Suggestions for                  Future Work . . . . . . . 95 Appendix: A Programming Code . . . . . . .104 B Results from Syntactic Pattern-based Approach. . . 107 C Results from Bootstrapping Approach . . . . . . .110

    [1] 黃居仁. 中文詞彙意義的區辨與操作原則. available at  
      http://cwn.ling.sinica.edu.tw/.
    [2] 張如瑩and 黃居仁. 中央研究院中英雙語知識本體詞網(sinica  
      bow):結合詞網,知識本體,與領域標記的詞彙知識庫. In 第十六屆自
      然語言與語音處理研討會(ROCLING XVI), Greenbay, Taipei.,
      2004.
    [3] 謝舒凱, Petr ˇSimon, and 黃居仁. 大規模詞彙語意關係自動標記
      之初步研究:以中文詞網(chinese wordnet) 為例. In 中華民國計算  語言學國際會議, 交通大學, 2006.
    [4] A. Alonge. Definition of the links and subsets for  
      verbs. EuroWordnet deliverable D006 at  
      http//www.hum.uva.nl/ ewn/docs.htm, 1996.
    [5] H. Alshawi. Processing dictionary definitions with  
      phrasal pattern hierarchies. American Journal of  
      Computational Linguistics, 13.3:195–202, 1987.
    [6] C. F. Baker, C. J. Fillmore, and J. B. Lowe. The  
      berkeley framenet project. In Proceedings of the  
      COLING-ACL, 1998.
    [7] A. Berland and E. Charniak. Finding parts in very large
      corpora. In proccedings of ACL-1999, pages 57–64,
      College park, MD, 1999.
    [8] K.J Chen and Y.M. Hsieh. Chinese treebanks and grammar
      extraction. In Proceedings of the first International
      Joint Conference on Natural Language Processing.,2004.
    [9] T. Chklovski and P. Pantel. Large-scale extraction of
      fine-grained semantic relations between verbs. In  
      Proceedings of KDD Workshop on Mining for and from the
      Semantic Web (MSW-04), pages 12–23, Seattle, WA, 2004.
    [10]S. Climent, H. Rodriguez, and J. Gonzalo. Definition of   the links and subsets for nouns. In EuroWordNet   
      deliverable D005, http://www.hum.uva.nl/ ewn/docs.htm,
      1996.
    [11] D.A. Cruse. Lexical Semantics. Cambridge: Cambridge
       University Press, 1986.
    [12] H. Dang, Y. Ching, M. Palmer, and F. Chiou. Simple  
       features for chinese word sense disambiguation. In     proccedings of COLING02, pages 133–138, Taipei,   
       Taiwan, 2002.
    [13] Z. Dong and Q. Dong. An Introduction to HowNet.
       available from http://www.keenage.com.
    [14] Z. Dong and Q. Dong. HowNet and the computation of
       Meaning. N.J: World Scientific Publishing Co., 2006.
    [15] C. Fellbaum. English verb as a semantic net.   
       International Journal of Lexicography, 3:181–303,     1990.
    [16] C. Fellbaum. WordNet. The MIT press, 1998.
    [17] C. Fellbaum. On the semantics of troponymy. In Rebecca    Green, A.Carol Bean, and H.M.Sung, editors, The   
       semantics of relationships: an interdisciplinary   
       perspective, pages 23–24, 2002.
    [18] C. Fellbaum. On the semantics of Troponymy. Cognitive    Science Laboratory, Princeton University, 2002.
    [19] C. Fellbaum and G. Miller. Folk psychology or semantic    entailment?– a reply to rips and conrad. The   
       Psychological Review, 97:565–570, 1990.
    [20] R. Girju, A. Badulescu, and D. Moldovan. Automatic
       discovery of part-whole relations. Computational   
       Linguistics, 31(1):12–24, 2006.
    [21] M. A. Hearst. Automatic acquisition of hyponyms from     large text corpora. In proceedings of the Fourth      International Conference on Computaional Linguistics
       (COLING), pages 539–545. Nantes, France, 1992.
    [22] M. A. Hearst. Automatic discovery of wordnet    
       relations. In C. Fellcaum, editor, WordNet: An  
       Electronic Lexical Database and Some of its  
       Applications. MIT press, 1998.
    [23] C.R. Huang, F.J. Lo, R.Y. Chang, and S.M. Chang.
       Reconstructing the ontology of the tang dynasty: A     pilot study of the shakespearean-garden approach. In  
       The OntoLex 2004 Workshop, Lisbon, 2004.
    [24] C.R. Huang, E. Tseng, and B. S. Tsai. Translating  
       lexical semantic relations: The first step towards
       multilingual wordnets. In Grace Ngai, Pascale Fung,  
       and Kenneth W. Church, editors, Proceedings of the  
       COLING 2002 Workshop “SemaNet: Building and Using     Semantic Networks”, pages 2–8, 2002.
    [25] C.R. Huang, I. J. Tseng, B. S. Tsai, and B. Murphy.     Cross-lingual portability of semantic relations:      Bootstrapping chinese wordnet with english wordnet      relations. Language and linguistics, 4.3:509–532,     2003.
    [26] K. Kipper-Schuler. VerbNet: A broad-coverage,     
       comprehensive verb lexicon. PhD thesis, University of    Pennsylvania, 2005.
    [27] D.K Lin and P. Pantel. Dirt - discovery of inference  
       rules from text. In Proceedings of ACM Conference on    Knowledge Discovery and Data Mining (KDD-01), pages
       323–328, San Francisco, CA., 2001.
    [28] D.K. Lin, S.J. Zhao, L.J. Qin, and M. Zhou.  
       Identifying synonyms among distributionally
       similar words. In IJCAI-03, 2003.
    [29] W.Y. Ma and K.J. Chen. Introduction to ckip chinese     word segmentation system for the first international    chinese word segmentation bakeoff. In Proceedings of     ACL2nd SIGHAN Workshop on Chinese Language    
       Processing, Seattle, WA, 2003.
    [30] J. Markowitz, T. Ahlswede, and M. Evens. Semantically    significant patterns in dictionary definitions. In     Proceedings of the 24th Annual Meeting of the     
       Association for Computational Linguistics, pages 112–
       119, 1986.
    [31] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and    K. J. Miller. Introduction to wordnet: An on-line      lexical database. International Journal of  
       exicography,3.4:235–244, 1990.
    [32] J. Nakamura and M. Nagao. Extraction of semantic   
       information from an ordinary english dictionary and     its evaluation. In Proceedings of the Twelfth      
       International Conference on Computational    
       Linguistics, pages 459–464,, Budapest., 1988.
    [33] J. Nakamura, K. Sakai, and M. Nagao. Automatic     
       analysis of semantical relation between english nouns    by an ordinary english dictionary. In Institute of  
       Electronics, Information and Communication Engineers    of Japan, WGNLC, Japan, 1987.
    [34] I. Niles and A. Pease. Toward a standard upper   
       ontology. In In Proceedings of the 2nd International
       Conference on Formal Ontology in Information Systems  
       (FOIS- 2001)., Ogunquit, Maine., 2001.
    [35] I. Niles and A. Pease. Linking lexicons and   
       ontologies: Mapping wordnet to the suggested upper
       merged ontology. In Proceedings of the IEEE    
       International Conference on Information and Knowledge    Engineering. (IKE 2003), Las Vegas, Nevada., 2003.
    [36] Martha Palmer and Zhibao Wu. Verb semantics for  
       english-chinese translation. Machine Translation,   
       10:59–92, 1995.
    [37] P. Pantel and D.K Lin. Automatically discovering word    senses. In Proceedings Human Language Technology /   
       North American Association for Computational
       Linguistics (HLT/NAACL-03), pages 21–22, Edmonton,     Canada, 2003.
    [38] P. Pantel and D. Ravichandran. Automatically labeling    semantic classes. In Proceedings of HLT/NAACL-2004.,    Boston, MA, 2004.
    [39] M. Pennacchiotti and P. Pantel. A bootstrapping    
       algorithm for automatic harvesting semantic
       relations. In Proceedings of Inference in   
       Computational Semantics (ICOS-6), pages 87–96,   
       Buxton, England, 2006.
    [40] J. Pustejovsky. The Generative Lexicon. MA:MIT Press,    1995.
    [41] J. Ramanand and P. Bhattacharyya. Towards automatic     evaluation of wordnet synsets. In Global Wordnet  
       Conference (GWC08), 2008.
    [42] S. Richardson, W. Dolan, and L. Vanderwende. Mindnet:    acquiring and structuring semantic information from     text. In 36th Annual meeting of the Association for
       Computational Linguistics, volume 2, pages 1098–1102,    1998.
    [43] B.S. Tsai, C.R. Huang, S.C. Tseng, J.Y. Lin, K.J.   
       Chen, and Y.S. Chuang. 中文詞義的定義與判定原則. 中文信息學   報(Journal of Chinese Information Processing),
       16.4:21–31, 2002.
    [44] P. Vossen. Eurowordnet: a multilingual database for  
       information retrieval. In Proceedings of the DELOS  
       workshop on Cross-language Information Retrieval,
       Zurich, 1997.
    [45] P. Vossen. EuroWordNet: a multilingual database with
       lexical semantic networks. Kluwer Academic   
       Publishers, 1998.
    [46] P. Vossen, P. Diez-Orzas, and W. Peters. The
       multilingual design of eurowordnet. In P. Vossen, N.    Calzolari, G. Adriaens, A. Sanfilippo, and Y. Wilks,     editors, Proceedings of the ACL/EACL-97 workshop
       Automatic Information Extraction and Building of   
       Lexical Semantic Resources for NLP Applications,   
       pages 1–8, Madrid, 1997.
    [47] Yun Xin. Srcb-wsd: Supervised chinese word sense   
       disambiguation with key features. In proccedings of     SemEval-2007, pages 300–303, Prague, Czech Republic,
       2007.

    下載圖示
    QR CODE