簡易檢索 / 詳目顯示

研究生: 張崴
Wei, Chang
論文名稱: Dynamic Generation of a Facet Hierarchy for Web Search Result
Dynamic Generation of a Facet Hierarchy for Web Search Result
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 68
中文關鍵詞: facet hierarchybrowsing costsemanticentropyuser behaviorencoding
英文關鍵詞: facet hierarchy, browsing cost, semantic, entropy, user behavior, encoding
論文種類: 學術論文
相關次數: 點閱:140下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • In this thesis, we propose a method to construct a facet hierarchy to organize the web search results dynamically. The proposed method is designed by two steps. First, we extract candidate facet terms according to a knowledge base. Second, we construct a facet hierarchy according to the candidate facet terms. We design an objective function to simulate the browsing cost when a user accesses the search results by a facet hierarchy. Accordingly, two algorithms are proposed to construct a facet hierarchy to optimize the objective function. The first one is a bottom-up approaches which select the best facet terms from the lowest level iteratively. The second one is a top-down approach, which uses an entropy function to estimate the expected browsing cost to select facet terms from the top level. Both algorithms are greedy algorithms which find optimal solutions. We evaluate the proposed methods on different distributions of access probability. The experiment results show that the facet hierarchies construct by the proposed methods achieves better performance on saving 30 to 50 percent of expected browsing cost than the one of the existing method.

    In this thesis, we propose a method to construct a facet hierarchy to organize the web search results dynamically. The proposed method is designed by two steps. First, we extract candidate facet terms according to a knowledge base. Second, we construct a facet hierarchy according to the candidate facet terms. We design an objective function to simulate the browsing cost when a user accesses the search results by a facet hierarchy. Accordingly, two algorithms are proposed to construct a facet hierarchy to optimize the objective function. The first one is a bottom-up approaches which select the best facet terms from the lowest level iteratively. The second one is a top-down approach, which uses an entropy function to estimate the expected browsing cost to select facet terms from the top level. Both algorithms are greedy algorithms which find optimal solutions. We evaluate the proposed methods on different distributions of access probability. The experiment results show that the facet hierarchies construct by the proposed methods achieves better performance on saving 30 to 50 percent of expected browsing cost than the one of the existing method.

    ABSTRACT i CONTENTS ii List of Figures iv List of Tables vi Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Goal 3 1.3 Method 3 1.4 Organization 4 Chapter 2 Related Work 5 2.1 Search Results Clustering 5 2.2 Faceted Search 6 2.3 Named Entities Identification 8 Chapter 3 Problem Definition 11 Chapter 4 Methodology 19 4.1 Framework 19 4.2 Topic Term Extraction 20 4.2.1 Entity Mention Annotation 21 4.2.2 Topic Term Selection 22 4.3 Candidate Facet Terms Generation 24 4.3.1 Semantic Filtering on Candidate Facet Terms 27 4.4 Facet Hierarchy Construction 33 4.4.1 Bottom-Up Method 33 4.4.2 Top-Down Method 41 Chapter 5 Experiment 56 5.1 Evaluation on Category Ranking 57 5.2 Evaluation on Expected Browsing Cost 58 5.3 Experiment Discussion 63 Chapter 6 Conclusion and Future Work 64 REFERENCE 65

    [1] U. Scaiella, P. Ferragina, A. Marino, and M. Ciaramita, “Topical clustering of search results” in Proceedings of the fifth ACM international conference on Web search and data mining, 2012.

    [2] L. Zhang, Y. Zhang, Y. Chen, “Summarizing Highly Structured Documents for Effective Search Interaction” in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, 2012.

    [3] W. Kong, J. Allan, “ Extracting Query Facets from Search Results” in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, 2013.

    [4] C. Li, N. Yan, S. B. Roy, L. Lisham, G. Das, “Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia” in Proceedings of the 19th international conference on World wide web, 2010.

    [5] P. Jiang, H. Hou, L. Chen, S. Chen, C. Yao, C. Li, M. Wang, “Wiki3C: Exploiting Wikipedia for Context-aware Concept Categorization” in Proceedings of the sixth ACM international conference on Web search and data mining, 2013.

    [6] J. Vosecky, D. Jiang, K. W. Leung, W. Ng, “Dynamic Multi-Faceted Topic Discovery in Twitter” in Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, 2013.
    [7] M. D. Hoffman, D. M. Blei, C. Wang, J. Paisley, “Stochastic variational inference”, The Journal of Machine Learning Research Vol.14(1), January 2013.

    [8] C. Carpineto, S. Osiński, G. Romano, D. Weiss, “A Survey of Web Clustering Engines” published in ACM Computing Surveys (CSUR) Vol.41(3), July 2009.

    [9] D. M. Blei, A. Y. Ng, M. I. Jordanm, “Latent dirichlet allocation” published in Journal of Machine Learning Research, 2003.

    [10] X. Zhu, Z. Y. Ming, X. Zhu, T. S. Chua, “Topic Hierarchy Construction for the Organization of Multi-Source User Generated Content” in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieva, 2013.

    [11] P. Ferragina, U. Scaiella, “TAGME: On-the-fly annotation of short text fragments (by Wikipedia entities)” in Proceedings of the 19th ACM international conference on Information and knowledge management, 2010.

    [12] D. Milne, I. H. Witten, “Learning to Link with Wikipedia” in Proceeding of the 17th ACM conference on Information and knowledge management, 2008.

    [13] S. Dingare, M. Nissim, J. Finkel, C. Manning, C. Grover, “A System For Identifying Named Entities in Biomedical Text: How Results From Two Evaluations Reflect on Both the System and the Evaluations” published in Comparative and Functional Genomics Volume 6, Issue 1-2, pages 77-85, February – March 2005.

    [14] W. Shen, J. Wang, P. Luo, M. Wang, “LINDEN: Linking Named Entities with Knowledge Base via Semantic Knowledge” in Proceedings of the 21st international conference on World Wide Web, 2012.

    [15] W. Shen, J. Wang, P. Luo, M. Wang, “Linking named entities in Tweets with knowledge base via user interest modeling” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 2013.

    [16] A. Ritter, S. Clark, Mausam, O. Etzioni, “Named entity recognition in tweets: An experimental study” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011.

    [17] Q. Mei, X. Shen, C. X. Zhai, “Automatic labeling of multinomial topic models” in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 2007.

    [18] J. Zhu, A. Ahmed, E. P. Xing, “MedLDA: maximum margin supervised topic models” in The Journal of Machine Learning Research, 2012.

    [19] D. Agarwal, B. C. Chen, “fLDA: matrix factorization through latent dirichlet allocation” in Proceedings of the third ACM international conference on WSDM, 2010.

    [20] M. Cornolti, P. Ferragina, M. Ciaramita, “A Framework for Benchmarking Entity-Annotation Systems” in Proceedings of the 22nd international conference on World Wide Web, 2013.

    [21] D. Milne, I. H. Witten, “Learning to link with Wikipedia” in Proceedings of the 17th ACM conference on Information and knowledge management, 2008.

    [22] C.E Shannon, "A Mathematical Theory of Communication" in Bell System Technical Journal 27: 379–423, 1948.

    下載圖示
    QR CODE