簡易檢索 / 詳目顯示

研究生: 古怡巧
Gu, Yi-Ciao
論文名稱: 幽默語料庫之建置
The Construction of Humor Corpus
指導教授: 曾元顯
Tseng, Yuen-Hsien
學位類別: 碩士
系所名稱: 圖書資訊學研究所
Graduate Institute of Library and Information Studies
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 110
中文關鍵詞: 幽默語料庫語料庫建置語料庫
英文關鍵詞: Humor Corpus, construction of corpus, corpus
DOI URL: http://doi.org/10.6345/THE.NTNU.GLIS.006.2019.A01
論文種類: 學術論文
相關次數: 點閱:167下載:40
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 幽默為調劑生活的重要元素之一,隨著高壓狀態日益劇增,對於幽默的需求也逐漸提升,為尋求幽默內容的最大價值,本研究建構具一定規模、符合臺灣國情,並以正體中文為主的幽默語料庫,其主要目的為:(1)探討幽默語料庫的意義及價值;(2)研擬適合幽默語料庫的詮釋資料格式及語料量;(3)分析幽默語料庫建置流程,並加以典藏;(4)蒐集的語料作分類,並解決分類不一致問題;(5)探究幽默語料庫的擴展性及應用面向。

    Humor is one of the important elements of life. As pressure increases, the demand for humor is gradually increasing. In order to seek the greatest value of humorous content, this research constructs a humor corpus with a certain scale, in line with Taiwan's national conditions, and mainly in Traditional Chinese. The main purposes are: (1) to discuss the meaning and value of The Humor Corpus; (2) to develop the format of the metadata and the amount of corpus suitable for The Humor Corpus; (3) to analyze the process of building a humor corpus and archives of The Humor Corpus; (4) to classify the corpus and solve the problem of classification inconsistency; (5) to explore the extensibility and application orientation of The Humor Corpus.
    The research is detailed below: first, summarize all the relevant theories and backgrounds of The Humor Corpus, including "humor", "corpus" and a combination of the two; second, collect corpus content from multiple sources, and develop appropriate corpus fields and structures, which use content analysis and systems development in information systems research. The corpus processing tasks include cleaning up repeated jokes, labeling catalogs, topic consistency, etc., which will be based on manual work, and the program is assisted; finally, based on the preliminary humor corpus statistics, analyze the application and future research prospects, and design the expected value-added fields such as the causes of jokes, negative examples, characters and humor level scoring mechanism, plus corpus expansion, corpus retrieval system development, etc., to promote chatbot or humor identification or humor generation technology.
    In the end, The Humor Corpus content reached 3,691 jokes (as of January 2019). It is a specialized corpus and a monitor corpus with both diachronic and synchronic, with a complete construction process. The corpus is not limited to language, but it is mainly in Traditional Chinese. It is a " Humor Corpus " suitable for Taiwan's national conditions and conforms to the five characteristics of humor, including subjectivity, regional, cultural, topicality and language differences.

    第一章 緒論 1 第一節 研究動機 1 第二節 研究目的 2 第三節 研究問題 3 第四節 名詞解釋 3 第二章 文獻探討 5 第一節 幽默的定義與範圍 5 第二節 語料庫的定義、背景及應用 11 第三節 幽默語料庫之相關研究 19 第四節 幽默語料與詮釋資料 27 第三章 研究方法與實施 33 第一節 研究方法 33 第二節 研究範圍與限制 35 第三節 研究架構 36 第四節 研究實施與步驟 37 第四章 幽默語料庫建置與分析 41 第一節 建置流程 41 第二節 語料蒐集 43 第三節 語料清理 60 第四節 語料編目 64 第五節 幽默語料庫分析 74 第五章 結論與後續研究 80 第一節 結論 80 第二節 後續研究 82 參考文獻 89 附錄 1 國內語料庫列表 101 附錄 2 國外語料庫列表 102 附錄 3 資料集詮釋資料標準規範之通用性資料集標準框架表 103 附錄 4 Python程式碼─相似度計算 105

