研究生: |
洪琴婷 Chin-Ting Hung |
---|---|
論文名稱: |
LSI-based Document Retrieval LSI-based Document Retrieval |
指導教授: |
邱貴發
Chiou, Guey-Fa |
學位類別: |
碩士 Master |
系所名稱: |
資訊教育研究所 Graduate Institute of Information and Computer Education |
論文出版年: | 2002 |
畢業學年度: | 90 |
語文別: | 英文 |
論文頁數: | 105 |
中文關鍵詞: | latent semantic indexing 、information retrieval 、singular value decomposition 、relevance feedback |
英文關鍵詞: | latent semantic indexing, information retrieval, singular value decomposition, relevance feedback |
論文種類: | 學術論文 |
相關次數: | 點閱:291 下載:12 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Latent Semantic Indexing (LSI) is a retrieval technique that employs Singular Value Decomposition (SVD) and maps each document vector into a lower dimensional space to achieve concept matching. LSI has been proved that it has a better performance than traditional lexical searching methods and has the ability to overcome synonym and polysemy problems. Our purposes were to construct an LSI model to facilitate the retrieving process, and to propose potential uses of LSI in education.
We used five test collections, two Chinese and three English to verify our LSI model. The standard test collection, MED, was used to verify the correctness of our system, and the collections of ERIC and English educational abstracts were used to test the feasibility of LSI in educational materials; in addition, two Chinese test collections were used to examine the LSI usability on Chinese documents. Our major concerns in the tests were term weighting, stemming, reduction dimensions, and relevance feedback.
Results showed that the LSI system model worked well not only for English documents but also for character-based Chinese documents. The LSI method could effectively group semantically relevant documents. The better weighting types were log idf, log entropy, log gfidf, tf idf, and tf gfidf. Results also indicated significant improvement in retrieval after stemming. Relevance feedback with different weighting ratio worked well. And the best dimension value in ERIC documents was around 50 or 60. In conclusion, we believed that LSI is a suitable system model for retrieving relevant documents.
Keywords: latent semantic indexing (LSI), information retrieval (IR), singular value decomposition (SVD), relevance feedback.
Latent Semantic Indexing (LSI) is a retrieval technique that employs Singular Value Decomposition (SVD) and maps each document vector into a lower dimensional space to achieve concept matching. LSI has been proved that it has a better performance than traditional lexical searching methods and has the ability to overcome synonym and polysemy problems. Our purposes were to construct an LSI model to facilitate the retrieving process, and to propose potential uses of LSI in education.
We used five test collections, two Chinese and three English to verify our LSI model. The standard test collection, MED, was used to verify the correctness of our system, and the collections of ERIC and English educational abstracts were used to test the feasibility of LSI in educational materials; in addition, two Chinese test collections were used to examine the LSI usability on Chinese documents. Our major concerns in the tests were term weighting, stemming, reduction dimensions, and relevance feedback.
Results showed that the LSI system model worked well not only for English documents but also for character-based Chinese documents. The LSI method could effectively group semantically relevant documents. The better weighting types were log idf, log entropy, log gfidf, tf idf, and tf gfidf. Results also indicated significant improvement in retrieval after stemming. Relevance feedback with different weighting ratio worked well. And the best dimension value in ERIC documents was around 50 or 60. In conclusion, we believed that LSI is a suitable system model for retrieving relevant documents.
Keywords: latent semantic indexing (LSI), information retrieval (IR), singular value decomposition (SVD), relevance feedback.
References
[1] Arthur C. Graesser, Peter Wiemer-Hastings, Katja Wiemer- Hastings, Derek Harter, Natalie Person, and the Tutoring Research Group (2000): Using Latent Semantic Analysis to Evaluate the Contributions of Students in AutoTutor. Interactive Learning Environments; V8, No2, p129-147.
[2] Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York , Addison-Wesley Longman, ACM press.
[3] DaeHo Baek, HeuiSeok Lim, HaeChang Rim (2000). Latent Semantic Indexing Model for Boolean Query Formulation. ACM SIGIR’00; p310-312.
[4] Darrell Laham, Winston Bennett, Jr., Thomas Landauer (2000). An LSA-Based Software Tool for Matching Jobs, People, and Instruction. Interactive Learning Environments; V8, No3, p171-185.
[5]. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. and Harshman, R. A. (1990). Indexing By Latent Semantic Analysis. Journal of the American Society for Information Science 41(6): 391-407.
[6] Dian Irene Witter. Downdating the Latent Semantic Indexing Model for Information Retrieval. MS Thesis, The University of Tennessee. December 1997.
[7]. Dumais, S. T., Furnas, G. W., Landauer, T. K., Deerwester, S. & Harshman, R. (1988) Using latent semantic analysis to improve access to textual information. Proceedings of the Conference on Human Factors in Computing Systems, CHI. 281-286.
[8] Eileen Kintsch, Dave Steinhart, Gerry Stahl, Cindy Matthews, Ronald Lamb, and LSA Research Group (2000): Developing Summarization Skills through the Use of LSA-Based Feedback. Interactive Learning Environments; V8, No2, p87-109.
[9] Gavin W. O'Brien. MS Thesis; The University of Tennessee. Information Management Tools for Updating an SVD-Encoded Indexing Scheme. December 1994.
[10] G. Salton and C. Buckley.(1990) Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41,p.288-297.
[11] G. Salton and C. Buckley.(1988) Term-weighting approaches in automatic retrieval. Information Processing & Management, 24(5), p.513-523.
[12] G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw Hill Book Co, New York, 1983.
[13] H. Park and S. Van Huffel.(1995) Two-way bidiagonalization scheme for downdating the singular value decomposition. Linear Algebra Applications, 222:23-40.
[14] Hsin-Ping Wu, LSI-based IR in Chinese Documents. Master Thesis, National Taiwan University, June 1997. (In Chinese)
[15] Jared Freeman, Bryan Thompson and Marvin Cohen (2000). Modeling and Diagnosing Domain Knowledge Using Latent Semantic Indexing. Interactive Learning Environments; V8, No3, p187-209.
[16] Jwo-Luen Huang, Passage Retrieval Using Latent Semantics Indexing, Master Thesis, National Taiwan University. June 1997. (In Chinese)
[17] Landauer, T.K., Foltz, P.W., & Laham, D. (1998). An Introduction to Latent Semantic Analysis. Discourse Processes; 25 p.259-284.
[18] M. Berry, S. Dumais, and G. O'Brien.(1995) Using Linear Algebra for Intelligent Information Retrieval. SIAM Review, 37(4):573-595.
[19]. Michael W. Berry, Zlatko Drmac, Elizabeth R. Jessup (1999). Matrices, Vector Spaces, and Information Retrieval. Siam Review Society for Industrial and Applied Mathematics. Vol. 41, No. 2, pp. 335–362.
[20] Ming Gu and Stanley C. Eisenstat. Downdating The Singular Value Decomposition. SIAM J. Matrix Analysis Application, 16(3):793-810, July 1995.
[21] Nicholas J. Belkin and W. Bruce Croft. Information filtering and information retrieval: two sides of the same coin? COMMUNICATIONS OFTHE ACM; December 1992,Vol.35, No.12. p.30-38.
[22] Peter W. Foltz, Sara Gilliam, and Scott Kendall (2000). Supporting Content-Based Feedback in On-line Writing Evaluation with LSA. Interactive Learning Environments; V8, No2, p111-127.
[23] Peter Wiemer-Hastings and Arthur C. Graesser (2000). Select-a-Kibitzer: A Computer Tool that Gives Meaningful Feedback on Student Compositions. Interactive Learning Environments; V8, No2, p149-169.
[24] Ruth V Small;Stuart Sutton;Makiko Miwa;Claire Urfels;Michael Eisenberg. (1998) Information seeking for instructional planning: An exploratory study. Journal of Research on Computing in Education; Washington; V31, No2, p204.
[25] S.T. Dumais. (1991) Improving the Retrieval of Information from External Sources. Behavior Research Methods, Instructions & Computers, 23:229-236.
[26] Shih-Hung Wu, Pey-Ching Yang, Von-Wun Soo. (1998) An Assessment of Character-based Chinese News Filtering Using Latent Semantic Indexing. Computational Linguistics and Chinese Language Processing, vol.3, no.2, pp.61-78.
[27] Todd A. Letsche. Toward Large-Scale Information Retrieval Using Latent Semantic Indexing, MS Thesis; The University of Tennessee, August 1996.
[28] Wolfe, M. B., Schreiner, M. E., Rehder, B., Laham, D., Foltz, P. W., Kintsch, W., & Landauer, T. K. (1998). Learning from text: Matching readers and text by Latent
Semantic Analysis. Discourse Processes; 25 p.309-336.