簡易檢索 / 詳目顯示

研究生: 陳彥霖
Chen, Yan-Lin
論文名稱: 應用潛在語意分析於試題相似度比較之可行性
The feasibility of applying Latent Semantic Analysis to analyze Item similarity
指導教授: 何榮桂
學位類別: 碩士
Master
系所名稱: 資訊教育研究所
Graduate Institute of Information and Computer Education
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 76
中文關鍵詞: 潛在語意分析試題相似評分函式LSA
英文關鍵詞: latent semantic analysis, Item similarity, score function, LSA
論文種類: 學術論文
相關次數: 點閱:299下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在應用潛在語意分析(Latent semantic analysis,LSA)模型於試題相似度之判斷,並探討不同的評分函式對於結果的影響,同時根據試題關鍵字的特性,與LSA模型處理詞彙共現(Lexically Co-occur)的特性,提出訓練文件可採用相關文件來提高判斷的精確率。研究結果使用dice或內積為評分函式較接近專家評鑑結果,對於專家相似度評鑑比較一致的試題,有高達0.9的相關程度,而平均相關值也有0.7以上的相關程度,因此潛在語意分析應用於試題相似度是可行的技術。

    The purpose of this study is to apply latent semantic analysis (LSA) to analyze item similarity , and discuss the result of using different score function. The feature of LSA model is “Lexically Co-occur” detection , in other words, LSA model can analyze many documents, and find synonyms , but synonyms rarely exist in the same item , so LSA model needs to be trained by documents which are related to this item . This study revealed that the result using dice measure or inner product measure correlates more closely with expert’s scores. For the items which is more agreeable of expert’s scores than others , the maximum correlation is up to 0.9, and the mean of correlation is up to 0.7, so applying latent semantic analysis to analyze item similarity is a feasible technology.

    中文摘要 i 英文摘要 ii 表目錄 v 圖目錄 vi 第一章 緒論 1 第一節 研究動機與目的 1 第二節 研究目的 2 第二章 文獻探討 4 第一節 命題技巧與原則 4 第二節 潛在語意分析 6 第三節 關鍵字的選取 15 第四節 關鍵字的權重 15 第五節 中研院-中文斷詞系統 17 第三章 研究方法與步驟 19 第一節 研究步驟 19 第二節 研究工具 20 第三節 實驗設計 21 第四章 結果與討論 27 第一節 外在效標建立 27 第二節 評分函式在判斷不同相似度之分析 29 第三節 關鍵字選取之分析 37 第四節 使用相關文件訓練有無之分析 43 第五節 研究結果 48 第五章 結論與建議 51 第一節 結論 51 第二節 建議 52 參考文獻 53 附錄一、中研院平衡語料庫詞類標記集 58 附錄二、高中歷史題庫概況表 60 附錄三、潛在語意分析系統介面 61 附錄四、評鑑所使用的試題範例 62 附錄五、相似度評鑑系統介面 65 附錄六、評鑑系統說明手冊 66 附錄七、專家評鑑資料 68

    中央研究院資訊科學所詞庫小組,中文斷詞系統,http://ckipsvr.iis.sinica.edu.tw (2005/12/29擷取)。

    台灣省國教研習會編(1993), 新法考試的命題技術, 國民小學學習成就評量, 第七頁。

    何榮桂(1991), 電腦化題庫概述, 現代教育, 18期, 頁121-129。

    何榮桂、陳麗如(1998), 電腦化適性測驗題庫品質管理策略之研究, 第七屆國際電腦輔助教學研討會, 409-410。

    陳柏琳(2005), Chinese Spoken Document Recognition, Organization and Retrieval, 網路資訊檢索技術與趨勢研討會。

    郭榮芳(2005), 應用潛在語意分析於測驗題庫相似性之比對, 國立臺灣師範大學資訊教育研究所碩士論文。

    鄭淑玲、葉瑞峰、鄭雙慧(2003), 結合隱含式語意分析與基因演算法之適性化遠距教學測驗評量系統, TANET, C5 網路教學系統, ID 9835。

    K.J. Chen & S.H. Liu(1992). Word Identification for Mandarin Chinese Sentences. Proceedings of COLING 1992, pages 101-107.

    K.J. Chen & Ming-Hong Bai(1998). Unknown Word Detection for Chinese by a Corpus-based Learning Method. International Journal of Computational linguistics and Chinese Language Processing, Vol.3, #1, 27-44.

    K.J. Chen & Wei-Yun Ma (2002). Unknown Word Extraction for Chinese Documents. Proceedings of COLING, 169-175.

    Dice, L. R. (1945). Measure of the Amount of Ecologic Association between Species. Journal of Ecolog, 26, 297-302.

    Dumais, S.T(1991). Improving the retrieval of information from external sources. Behavior Research Methods, Instruments and Computers,23,229-236.

    Frakes, W. B. and Baeza-Yates, R. (1992) . Information Retrieval, Data Structure and Algorithms. Prentice Hall.

    Foltz PW, Kintsch W., and Landauer TK. (1993). An analysis of textual
    coherence using Latent Semantic Indexing .Society for Text and Discourse, Jackson, WY

    Gavin.W. O’Brien (1994). Information Management Tools for Updating an SVD-Encoded Indexing Scheme. TR UT-CS-94-259, U. Tenn.

    Harman ,D.(1992). Relevance feedback and other query modification techniques. Information Retrieval: Data structures and algorithms. Englewood Cliffs NJ: Prentice Hall, 363-392.

    Hull, D.(1994). Improving Text Retrieval for the Routing Problem using Latent Semantic Indexing. ACM SIGER Conference, 282-291.

    J.-T. Chien, M.-S. Wu and H.-J. Peng(2004). On latent semantic language modeling and smoothing. Proceedings of International Conference on Spoken Language Processing vol. 2, 1373-1376.

    Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, and Guihong Cao(2004). Dependence language model for information retrieval. In SIGIR, 2004.

    Landauer,T.& S.Dumais. (1997).A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition,induction,and representation of knowledge .Psychological Review 104,
    211-240.

    Landauer,T.K.,D.Laham & P.W.Foltz.(1998).Computer-based grading of the conceptual content of essays. Unpublished manuscript.

    Landauer,T.,P.W.Foltz & D.Lanham(1998). An introduction to latent semantic analysis . Discourse Processes 25,259-284.

    MacDonald, I. L., & Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-valued Time Series (1st ed.). London: Chapman&Hall.

    Ma Wei-Yun & K.J. Chen(2003). A bottom-up Merging Algorithm for Chinese Unknown Word Extraction. Proceedings of ACL workshop on Chinese Language Processing , 31-38.

    Salton, G. & McGill, M.J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.

    Salton, G.& Buckley,C.(1988).Term-weighting approaches in automatic text retrieval. Information Processing and Management,24(5),513-523.

    Singhal, A. and Salton, G.(1998). AutomaticText Browsing Using Vector Space Model. Proceeding of the Dual-Use Technologies and Applications, 318-324.

    Sullivan, D.(2001). Document Warehousing and Text Mining. Wiley Computer Publishing, 326.

    Trivedi, A., Medonca, A. E., Johnson, B. S.(2004). Using Machine Learning for Classifying Documents and Extracting Features. 11th World Congress of Medical Informatics.

    Xiangzhu, G. and Murugesan, S.(2003). A Dynamic Information Retrieval System for the Web. Proceedings of the Annual International Computer Software and Applications Conference, 670-675.

    Y. Akita and T. Kawahara(2004). Language modeling adaptation based on PLSA of topics and speakers. Proceedings of International Conference on Spoken Language Processing.

    下載圖示
    QR CODE