國立臺灣師範大學博碩士論文全文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	張崴 Wei, Chang
論文名稱：	Dynamic Generation of a Facet Hierarchy for Web Search Result Dynamic Generation of a Facet Hierarchy for Web Search Result
指導教授：	柯佳伶 Koh, Jia-Ling
學位類別：	碩士 Master
系所名稱：	資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2014
畢業學年度：	102
語文別：	英文
論文頁數：	68
中文關鍵詞：	facet hierarchy 、browsing cost 、semantic 、entropy 、user behavior 、encoding
英文關鍵詞：	facet hierarchy, browsing cost, semantic, entropy, user behavior, encoding
論文種類：	學術論文
相關次數：	點閱：361 下載：3
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

In this thesis, we propose a method to construct a facet hierarchy to organize the web search results dynamically. The proposed method is designed by two steps. First, we extract candidate facet terms according to a knowledge base. Second, we construct a facet hierarchy according to the candidate facet terms. We design an objective function to simulate the browsing cost when a user accesses the search results by a facet hierarchy. Accordingly, two algorithms are proposed to construct a facet hierarchy to optimize the objective function. The first one is a bottom-up approaches which select the best facet terms from the lowest level iteratively. The second one is a top-down approach, which uses an entropy function to estimate the expected browsing cost to select facet terms from the top level. Both algorithms are greedy algorithms which find optimal solutions. We evaluate the proposed methods on different distributions of access probability. The experiment results show that the facet hierarchies construct by the proposed methods achieves better performance on saving 30 to 50 percent of expected browsing cost than the one of the existing method.

ABSTRACT    i
CONTENTS    ii
List of Figures    iv
List of Tables    vi
Chapter 1    Introduction    1
1.1    Motivation    1
1.2    Goal    3
1.3    Method    3
1.4    Organization    4
Chapter 2    Related Work    5
2.1    Search Results Clustering    5
2.2    Faceted Search    6
2.3    Named Entities Identification    8
Chapter 3    Problem Definition    11
Chapter 4    Methodology    19
4.1    Framework    19
4.2    Topic Term Extraction    20
4.2.1    Entity Mention Annotation    21
4.2.2    Topic Term Selection    22
4.3    Candidate Facet Terms Generation    24
4.3.1    Semantic Filtering on Candidate Facet Terms    27
4.4    Facet Hierarchy Construction    33
4.4.1    Bottom-Up Method    33
4.4.2    Top-Down Method    41
Chapter 5    Experiment    56
5.1    Evaluation on Category Ranking    57
5.2    Evaluation on Expected Browsing Cost    58
5.3    Experiment Discussion    63
Chapter 6    Conclusion and Future Work    64
REFERENCE    65

                                

[1] U. Scaiella, P. Ferragina, A. Marino, and M. Ciaramita, “Topical clustering of search results” in Proceedings of the fifth ACM international conference on Web search and data mining, 2012.

[2] L. Zhang, Y. Zhang, Y. Chen, “Summarizing Highly Structured Documents for Effective Search Interaction” in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, 2012.

[3] W. Kong, J. Allan, “ Extracting Query Facets from Search Results” in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, 2013.

[4] C. Li, N. Yan, S. B. Roy, L. Lisham, G. Das, “Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia” in Proceedings of the 19th international conference on World wide web, 2010.

[5] P. Jiang, H. Hou, L. Chen, S. Chen, C. Yao, C. Li, M. Wang, “Wiki3C: Exploiting Wikipedia for Context-aware Concept Categorization” in Proceedings of the sixth ACM international conference on Web search and data mining, 2013.

[6] J. Vosecky, D. Jiang, K. W. Leung, W. Ng, “Dynamic Multi-Faceted Topic Discovery in Twitter” in Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, 2013.
[7] M. D. Hoffman, D. M. Blei, C. Wang, J. Paisley, “Stochastic variational inference”, The Journal of Machine Learning Research Vol.14(1), January 2013.

[8] C. Carpineto, S. Osiński, G. Romano, D. Weiss, “A Survey of Web Clustering Engines” published in ACM Computing Surveys (CSUR) Vol.41(3), July 2009.

[9] D. M. Blei, A. Y. Ng, M. I. Jordanm, “Latent dirichlet allocation” published in Journal of Machine Learning Research, 2003.

[10] X. Zhu, Z. Y. Ming, X. Zhu, T. S. Chua, “Topic Hierarchy Construction for the Organization of Multi-Source User Generated Content” in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieva, 2013.

[11] P. Ferragina, U. Scaiella, “TAGME: On-the-fly annotation of short text fragments (by Wikipedia entities)” in Proceedings of the 19th ACM international conference on Information and knowledge management, 2010.

[12] D. Milne, I. H. Witten, “Learning to Link with Wikipedia” in Proceeding of the 17th ACM conference on Information and knowledge management, 2008.

[13] S. Dingare, M. Nissim, J. Finkel, C. Manning, C. Grover, “A System For Identifying Named Entities in Biomedical Text: How Results From Two Evaluations Reflect on Both the System and the Evaluations” published in Comparative and Functional Genomics Volume 6, Issue 1-2, pages 77-85, February – March 2005.

[14] W. Shen, J. Wang, P. Luo, M. Wang, “LINDEN: Linking Named Entities with Knowledge Base via Semantic Knowledge” in Proceedings of the 21st international conference on World Wide Web, 2012.

[15] W. Shen, J. Wang, P. Luo, M. Wang, “Linking named entities in Tweets with knowledge base via user interest modeling” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 2013.

[16] A. Ritter, S. Clark, Mausam, O. Etzioni, “Named entity recognition in tweets: An experimental study” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011.

[17] Q. Mei, X. Shen, C. X. Zhai, “Automatic labeling of multinomial topic models” in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 2007.

[18] J. Zhu, A. Ahmed, E. P. Xing, “MedLDA: maximum margin supervised topic models” in The Journal of Machine Learning Research, 2012.

[19] D. Agarwal, B. C. Chen, “fLDA: matrix factorization through latent dirichlet allocation” in Proceedings of the third ACM international conference on WSDM, 2010.

[20] M. Cornolti, P. Ferragina, M. Ciaramita, “A Framework for Benchmarking Entity-Annotation Systems” in Proceedings of the 22nd international conference on World Wide Web, 2013.

[21] D. Milne, I. H. Witten, “Learning to link with Wikipedia” in Proceedings of the 17th ACM conference on Information and knowledge management, 2008.

[22] C.E Shannon, "A Mathematical Theory of Communication" in Bell System Technical Journal 27: 379–423, 1948.

簡易檢索 / 詳目顯示

相關論文