簡易檢索 / 詳目顯示

研究生: 林祺傑
Lin, Ci-Jie
論文名稱: 新聞面向事實自動擷取與整合之研究
Aspect Retrieval and Integration for News Fact
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 74
中文關鍵詞: 事實句擷取新聞事實擷取資訊整合
英文關鍵詞: fact sentence extraction, news fact extraction, information integration
DOI URL: https://doi.org/10.6345/NTNU202203989
論文種類: 學術論文
相關次數: 點閱:58下載:15
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 網路資訊流通快速,新聞媒體已經從傳統報章雜誌,改以網路平台傳播新聞資訊,但對同一新聞事件,不同媒體報導內容會有部分相似或相異情況,使用者需耗費時間和精力去統整新聞事實資訊。因此,本論文提出自動擷取新聞事實資訊方法,透過擷取報導內文中的主題關鍵詞,挑選出候選主題相關事實句,並以分類方式,判斷出主題相關事實句。在擷取新聞事實方面,基於主題事實句,使用自然語言分析結果,設計擷取面向詞、關聯詞、描述詞的事實三元詞組方法。而在資訊整合方面,同時考慮三元詞組間相似面向和相似描述語意,使用階層式分群對不同面向事實資訊進行分群,並以漸進式合併方法對相似面向或描述語意的事實三元詞組進行合併。實驗結果顯示事實句擷取、詞組擷取與合併都達到良好效果。因此本論文提供的方法能有效自動整合相關報導中的不同面向資訊,讓使用者對某一新聞事件能有效率獲得各方面事實資訊的瞭解。

    Internet speeds up the flow of information. News media has replaced traditional newspaper and magazines to spread information online in recent years. However, users have to take much time and effort to get exact fact information from the news documents because the news documents collected from different news media have similar content but may also provide additional facts specifically. For solving this problem, we propose a method to automatically extract and integrate fact information of news documents. The candidates of fact sentences are picked out by extracting the keywords of topics from news contents. Then, various features of the candidate sentences are used to perform classification to identify the fact sentences. In order to provide fact information, the triples consisting of facet term, relation term, and description term, are extracted by using a natural language tool on the topic sentences. Then the similarity of the facet terms between two triples is used to cluster the extracted triples by agglomerative hierarchical clustering. For each cluster of triples, we use the incremental method to combine each pair of triples which have similar facet or description terms in order to provide integrated fact information. The result of performance evaluation shows that the methods of fact sentences extraction, triple extraction and combination all get good performance. The proposed approach can effectively integrate facet information from different news documents, which provides users a comprehensive understanding of news documents.

    摘要 I abstract II 誌謝 III 目錄 IV 圖目錄 VI 表目錄 VII 第一章 緒論 1 1.1 研究動機 1 1.2 研究目的 1 1.3 研究範圍與限制 4 1.4 論文方法 4 第二章 文獻探討 6 2.1 關鍵詞擷取 6 2.2 事實資訊擷取方法 7 第三章 主題重要詞擷取方法 12 3.1 資料前處理 12 3.2 關鍵字詞擷取 14 3.3 關鍵字詞擴展 16 第四章 重要事實句擷取方法 18 4.1 產生候選事實重要句子 19 4.2 特徵擷取方法 20 4.2.1 語句結構特徵 20 4.2.2 語意特徵 22 4.2.3 句子前後文特徵 24 4.3 建立分類模型 26 第五章 面向詞與描述詞擷取方法 28 5.1 事實三元詞組擷取 28 5.2 三元詞組資訊補足 30 5.2.1 面向詞補足方法 31 5.2.2 描述詞補足方法 33 5.2.3 新聞事實過濾 33 第六章 事實三元詞組合併方法 35 6.1 相似面向語意分群 35 6.2 依相似面向合併 37 6.3 依相似描述語意合併 42 第七章 實驗評估 46 7.1 實驗資料 46 7.2 重要性句子擷取評估 46 7.3 三元詞組擷取評估 52 7.4 新聞事實合併評估 53 7.5 實驗結果總結 59 第八章 結論及未來研究方向 61 8.1 結論 61 8.2 未來方向 61 參考文獻 63 附錄一 詞性標記列表 65 附錄二 相依性分析之有向邊說明 66 附錄三 新聞報導文章 67 附錄四 附錄三新聞報導之詞組合併範例過程 72 附錄五 附錄三新聞報導之詞組合併結果 73 附錄六 中文停用詞列表 74

    [1] Emir Muñoz , Aidan Hogan , and Alessandra, “Mileo Using Linked Data to Mine RDF from Wikipedia’s Tables,” in Proceedings of the 7th ACM international conference on Web search and data mining, 2014, pages 533-542.
    [2] Guoliang Li, Dong Deng, and Jianhua Feng, “Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction”, in Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, 2011, Pages 529-540.
    [3] Itamar Kastner and Christof Monz, “Automatic Single-Document Key Fact Extraction from Newswire Articles”, in proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, 2009, pages 415-423.
    [4] Lu´ıs Marujo, Wang Ling, Isabel Trancoso, Chris Dyer, Alan W. Black, Anatole Gershman1, David Martins de Matos, Joao P. Neto , and Jaime Carbonell, “Automatic Keyword Extraction on Twitter”, in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015, pages 637–643.
    [5] Olena Medelyan, Vye Perrone, and Ian H. Witten, “Subject Metadata Support Powered by Maui”, in Proceedings of the 10th annual joint conference on Digital libraries, 2010, pages 407-408.
    [6] Rada Mihalcea and Paul Tarau, “TextRank: Bringing Order into Texts”, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004, pages 8–15.
    [7] Radityo Eko Prasojo, Mouna Kacimi, and Werner Nutt, “Entity and Aspect Extraction for Organizing News Comments”, in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015, pages 233-242.
    [8] Robert L. Thorndike, “Who belongs in the family? “, Psychometrika, Volume 18, Issue 4, pages 267-276, 1953.
    [9] Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, and Dekang Lin, “Knowledge Base Completion via Search-Based Question Answering”, in Proceedings of the 23rd international conference on World wide web, 2014, pages 515-526.
    [10] Sandeep Panem, Manish Gupta, and Vasudeva Varma, “Structured Information Extraction from Natural Disaster Events on Twitter”, In Proceedings of the 5th International Workshop on Web-scale Knowledge Representation Retrieval & Reasoning, 2014, Pages 1-8.
    [11] Sergio Oramas, Mohamed Sordo, and Luis Espinosa-Anke. “A Rule-Based Approach to Extracting Relations from Music Tidbits”, In Proceedings of the 24th International Conference on World Wide Web, 2015, Pages 661-666.
    [12] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient Estimation of Word Representations in Vector Space”, arXiv:1301.3781v3, 2013.
    [13] Wanxiang Che, Zhenghua Li, and Ting Liu, “LTP: A Chinese Language Technology Platform”, in Proceedings of the Coling 2010:Demonstrations. 2010.08, pages 13-16, Beijing, China.
    [14] Weize Kong and James Allan, “Extending Faceted Search to the General Web”, in Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014, pages 839-848.
    [15] Zhicheng Dou, Sha Hu, Yulong Luo, Ruihua Song, and Ji-Rong Wen, “Finding Dimensions for Queries”, in Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pages 1311-1320.
    [16] 王伟,赵东岩,赵伟,”中文新闻关键事件的主题句识别[J]”, 北京大学学报:自然科学版, 2011, 47(5):789-796

    下載圖示
    QR CODE