簡易檢索 / 詳目顯示

研究生: 鄭舜宸
Shun-Chen, Cheng
論文名稱: 提供網頁搜尋結果篩選之查詢字詞推薦
Two-level Query Suggestion for Specialization on Web Search Results
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 70
中文關鍵詞: 查詢字推薦階層式推薦隨機漫走
英文關鍵詞: query suggestions, hierarchical suggestions, random walk
論文種類: 學術論文
相關次數: 點閱:110下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究的目標是從搜尋引擎所回傳的大量搜尋結果,評估挑選出一些查詢推薦字,讓使用者透過這些推薦字篩選搜尋結果,以減少使用者瀏覽搜尋結果的負擔。本研究提出一個雙層的查詢字詞推薦方法,稱為M_PhRank,第一層提供概念廣的主題查詢字詞,第二層則呈現語意較明確的次主題查詢字詞。本論文提出的方法主要分為挑選主題查詢字詞,計算單字語意明確度以及挑選次主題查詢字詞三部分。在第一部分,針對前處理後留下的單字藉由涵蓋的資料物件數量作為挑選依據,將主題查詢字詞作為第一階層的推薦。第二部分建立單字之間的鄰近位置出現的關係圖,以此關係圖透過隨機漫步演算法,計算各個候選字在該搜尋結果中的語意明確程度。最後,基於給定的推薦字詞之數量,依據主題查詢字詞的涵蓋率做比例分配,評估其第二層可推薦之數量進而挑選推薦字詞,完成階層架構之建置。實驗顯示M_PhRank比基準方法能涵蓋更多查詢結果關聯度高的物件,且能降低涵蓋率提升時重複率增加的幅度;另外,從使用者評估的實驗結果顯示, M_PhRank所建立的查詢推薦字架構能提供較好的輔助查詢效果。

    The goal of this thesis is to automatically suggest query keywords from the search results returned by the search engine in order to further filter the large amount of search results by using these query keywords as the specialized queries. A two-level query suggestion method, called the M_PhRank, is proposed. The first level suggestion aims to provide the query terms, which can cover search results as many as possible, and the query terms in the second level should have clear meaning and lower overlap between their covered objects. Firstly, the coverage over search results is computed as the novelty score of a word, which is used to select the topic terms in the first level suggestion. Secondly, the semantic scores of words are estimated by using the random walk algorithm on the co-occurrence graph of words. The query keywords consisting of 2-3 non-topic terms form the candidate subtopic terms, whose semantic scores are computed according to the semantic scores of their composing words. According to the given suggestion number, the number of subtopic terms under the topic-terms is decided proportional to the coverage of the topic terms. Finally, the hierarchical query suggestion structure is constructed by the topic terms in first level and their corresponding subtopic terms on the second level. The empirical experiment results show that the M_PhRank method performs better than the baseline method on providing more semantics specific terms and high coverage with limited overlap increasing. Moreover, according to user survey, the hierarchy of query keyword suggestions constructed by M_PhRank gets high satisfaction on query assistance.

    附表目錄 i 附圖目錄 ii 第一章 緒論 1 1.1 研究動機 1 1.2 研究目的 2 1.3 研究的範圍與限制 4 1.4 論文方法 5 1.5 論文架構 6 第二章 文獻探討 7 2.1 查詢詞推薦 7 2.1.1 查詢詞擴展 7 2.2.2社交標籤系統之查詢推薦 8 2.2 查詢結果相異性 10 第三章 主題查詢字詞挑選方法 12 3.1 前處理 13 3.2 概念廣泛度評估方法 14 3.2.1 新穎程度值 15 3.2.2 加入亂度的新穎程度值 18 第四章 單字語意明確度計算方法 22 4.1 建立關係圖 23 4.2 計算邊的權重值 24 4.2.1 頻率式權重值 24 4.2.2 語意式權重值 28 4.3 隨機漫步 31 4.4 計算節點加權值 34 第五章 產生並挑選次主題查詢字詞 36 5.1 產生候選次主題查詢字詞 37 5.2 挑選次主題查詢字詞 39 5.3 多樣化挑選機制 41 第六章 實驗評估與討論 43 6.1 實驗資料來源及環境設定 43 6.1.1 實驗資料來源 43 6.1.2 資料前處理 43 6.1.3 實驗環境 44 6.2 實驗評估方法 45 6.3 本系統內部採用方法之效果比較 46 6.3.1 查詢測試資料 47 6.3.2 實驗結果 47 6.3.3 實驗結果討論 55 6.4 階層式推薦架構之效果分析 55 6.4.1 測試資料 55 6.4.2 實驗基準比較方法 56 6.4.3 實驗結果 57 6.4.4 實驗結果討論 60 6.5 使用者評分 60 6.5.1 查詢測試資料 61 6.5.2 實驗結果 61 6.5.3 實驗結果討論 66 第七章 結論與未來研究方向 67 7.1 結論 67 7.2 未來研究方向 68 參考文獻 69

    [1] Z. Abbassi, V. S. Mirrokni, and M. Thakur. Diversity maximization under matroid constraints. In KDD, pages 32-40, 2013.

    [2] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5-14, 2009.

    [3] K. Bache, D. Newman, and P. Smyth. Text-based measures of document diversity. In KDD, pages 23-31, 2013.

    [4] Z. Bao, B. Kimelfeld, and Y. Li. Automatic suggestion of query-rewrite rules for enterprise search. In SIGIR, pages 591-600, 2012.

    [5] S. Bhatia, D. Majumdar, and P. Mitra. Query suggestions in the absence of query logs. In SIGIR, pages 795-804, 2011.

    [6] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3, 2003.

    [7] J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335-336, 1998.

    [8] V. Dang and W. B. Croft. Diversity by proportionality: An election-based approach to search result diversification. In SIGIR, pages 65-74, 2012.

    [9] V. Dang and W. B. Croft. Term level search result diversification. In SIGIR, pages 603-612, 2013.

    [10] V. Dang and G. Kumaran, Adam Troy. Domain dependent query reformulation for web search. In CIKM, pages 1045-1054, 2012.

    [11] M. P. Kato, T. Sakai, and K. Tanaka. Structured query suggestion for specialization and parallel movement: Effect on search behaviors. In WWW, pages 389-398, 2012.

    [12] J. L. Koh and I. C. Chou. An Efficient Approach for mining top-k high utility specialized query expansions on social tagging systems. In DASFAA, pages ,2014.

    [13] C. D. Manning and H. Schütze. Foundations of statistical natural language processing. MIT press, 1999.

    [14] K. T. Maxwell and W. B. Croft. Compact query Term Selection using topically related text. In SIGIR, pages 583-592, 2013.

    [15] T. Nguyen, H. W. Lauw, and P. Tsaparas. Using micro-reviews to select an efficient set of reviews. In CIKM, pages 1067-1076, 2013.

    [16] U. Ozertem, O. Chapelle, P. Donmez, and E. Velipasaoglu. Learning to suggest: a machine learning framework for ranking query suggestions. In SIGIR, pages 25-34, 2012.

    [17] J. H. Paik. A novel tf-idf weighting scheme for effective ranking. In SIGIR, pages 343-352, 2013.

    [18] R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In WWW, pages 881-890, 2010.

    [19] D. Skoutas and M. Alrifai. Tag clouds revisited. In CIKM, pages 221-230, 2011.

    [20] P. Venetis, G. Koutrika, and H. Garcia-Molina. On the selection of tags for tag clouds. In WSDM, pages 835–844, 2011.

    [21] J. Xu and W. B. Croft. Query expansion using local and global document analysis. In SIGIR, pages 4-11, 1996.

    下載圖示
    QR CODE