簡易檢索 / 詳目顯示

研究生: 邱奕智
I-Chih Chiu
論文名稱: 有效率探勘社交標籤系統中前k名擴展查詢字集之研究
An Efficient Method for Mining Top-k Query Expansions on Social Tagging System
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2012
畢業學年度: 101
語文別: 中文
論文頁數: 75
中文關鍵詞: 社交標籤系統前k名擴展查詢字集動態樹狀結構
英文關鍵詞: social-tagging system, top-k query expansion, dynamic tree structure
論文種類: 學術論文
相關次數: 點閱:205下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文考慮資料物件具有標籤集且含有評分資訊的社交標籤系統,當給定一個查詢找出標籤集中包含所有查詢字的物件資料,本論文方法將對這些物件資料所包含的標籤進行處理,找出可用性分數值最高的前k名擴展查詢字集,且每一個擴展查詢字集都能找到指定數量以上的資料物件。本論文提出的方法分成從查詢結果挑選具代表標籤,以及有效率地探勘前k名擴展查詢字集兩部分。首先,我們運用兩種挑選具代表性標籤的方法:平均差異性及新穎性,計算標籤的代表分數,選定代表分數最高的n個標籤為代表性標籤。接下來,本論文採用一個稱為UT-tree的樹狀結構,用來儲存可形成擴展查詢字集的標籤集資訊。我們提出一個稱為UT-growth的演算法,可從UT-tree中有效率找出可用性最高的前 名擴展查詢字集,且不會產生過多不必要檢查的擴展查詢字集。此外,我們運用動態估算一個擴展查詢字集可用性分數的上限值和下限值的概念,提出一個動態建立UT-tree的方法,並提出dynamic UT-growth演算法,動態更新所找到可用性前 名的擴展查詢字集,若第 名擴展查詢字集的下限值比第 名的擴展查詢字集上限值高,即可提前結束UT-tree的建立及探勘。實驗結果證實先採用代表標籤選取比未採用代表標籤選取,其找出的擴展查詢字集可達到較好的查詢效果。此外,實驗結果顯示UT-growth演算法比相關方法有較好的執行效率,且Dynamic UT-growth演算法在多數情況可提供比UT-growth演算法更有效率的處理。

    In this thesis, we consider the social tagging systems in which each object contains a tag set and a corresponding score. After giving a query to find all the objects whose tagsets contain the query keywords, our goal is to find the top-k utility of query expansions from the tags of the objects such that, each query expansion can find the required number of objects. The proposed methods consist of two parts: to select representative tags from query results and to efficiently discover the top-k query expansions. Firstly, we apply two different functions, AveDiversity and Novelty, to compute the representative score of each tag. Then the tags with the top-n highest scores are selected as the representative tags. Next, we design a tree structure called UT-tree which is used to store the tag sets of objects and their corresponding information for generating query expansions. We propose an algorithm called UT-growth, which can efficiently discover out the top-k utility query expansions from the UT-tree by prevent from generating unnecessary query expansions for checking. In addition, by dynamically estimating the upper bound and lower bound of the utility for a query expansion, we provide a dynamic approach to construct UT-tree and propose the Dynamic UT-growth algorithm. This approach dynamically updates the top-k utility of the query expansions. Accordingly, when the utility lower bound of the kth query expansion is larger than the utility upper bound of the (k+1)th query expansion, the construction and mining process on the UT-tree can be early terminated. The experimental results show that finding representative tags before mining top-k query expansions can improve the searching effectiveness of the discovered top-k query expansions. Furthermore, the UT-growth algorithm has better performance on efficiency than the related method, and the Dynamic UT-growth algorithm can provide even more efficient processing than the UT-growth algorithm in most cases.

    附表目錄 i 附圖目錄 ii 第一章 緒論 1 1.1 研究動機 1 1.2 研究目的 2 1.3 研究的範圍與限制 3 1.4 論文方法 5 1.5 論文架構 6 第二章 相關研究探討 7 2.1 社交標籤簡介 7 2.2 以標籤輔助查詢之技術 9 2.2.1 標籤雲探勘 9 2.2.2 解決標籤語意概念差異 10 2.2.3 查詢之關鍵字推薦 11 第三章 相關名詞及問題定義 13 3.1 相關名詞定義 13 3.2 問題定義 14 第四章 代表標籤挑選方法 17 4.1 代表性評估方法 17 4.1.1 平均相異性程度值 19 4.1.2 新穎性程度值 21 4.2 挑選代表性方法之步驟及演算法 22 第五章 可用性前k名擴展查詢字集探勘方法 26 5.1 UT-TREE儲存結構 26 5.1.1 以r值過濾標籤 27 5.1.2 UT-Tree樹狀結構 29 5.1.3 UT-Tree之Header Table 29 5.1.4 UT-tree建構方法 30 5.2 UT-GROWTH探勘演算法 32 5.2.1 以標籤對UT-tree做投影 32 5.2.2 UT-growth演算法處理過程 34 5.3 DYNAMIC UT-GROWTH探勘演算法 40 5.3.1 可用性分數值上下限估算方法 40 5.3.2 Dynamic UT-Growth演算法處理過程 42 第六章 實驗評估與討論 45 6.1 實驗資料來源及環境設定 45 6.1.1 實驗資料來源 45 6.1.2 資料前處理 46 6.1.3 實驗環境 47 6.2 評估選取代表標籤方法之效果 47 6.2.1 查詢測試資料 48 6.2.2 實驗評估方法 48 6.2.3 實驗結果 50 6.2.4 實驗結果討論 58 6.3 評估前K名擴展查詢字集探勘方法之效率 60 6.3.1 實驗評估方法 60 6.3.2 實驗結果 61 6.3.3 實驗結果討論 70 第七章 結論與未來研究方向 71 7-1 結論 71 7-2 未來研究方向 72 參考文獻 73

    [1] H. S. Al-Khalifa and H. C. Davis, ”Measuring the Semantic Value of Folksonmies,” Innovations in Information Technology, 2006.
    [2] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rule in Large Databases,” in Proceedings of the 20th International Conference on Very Large Databases, 1994.
    [3] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng, ”NUS-WIDE: A Real-World Web Image Database from National University of Singapore,” ACM International Conference on Image and Video Retrieval, 2009.
    [4] J. Fokker, J. Pouwelse, W. Buntine, “Tag-Based Navigation for Peer-to-Peer Wikipedia,” in Proceedings of the 15th international conference on World wide web(WWW), 2006.
    [5] M. Gupta, R. Li, Z. Yin and J. Han, “Survey on Social Tagging Techniques,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining(KDD), 2010.
    [6] S. A. Golder and B. A. Huberman, “The Structure of Collaborative Tagging System, ”Information dynamics Lab, HP Labs.
    [7] J. Gemmell, A. Shepitsen, B. Mobasher and R. Burke, “Personalization in Folksonomies Based on Tag Clustering,” in Proceedings of the 23rd Association for the Advancement of Artificial Intelligence(AAAI), 2008.
    [8] J. Han, J. Pei and Y. Yin, “Mining Frequent Pattern without Candidate Generation,” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 2000.
    [9] J.-L. Koh, G.-T. Chiang, and I-C. Chiu “The Strategies for Supporting query Specialization and Query Generalization in Social Tagging System,” in Proceedings of the 4th International Workshop on Social Networks and Social Web Mining(SNSM), 2013.
    [10] D. Liu, X.-S. Hua, L. Yang, M. Wang and H.-J. Zhang, “Tag Ranking”, in Proceedings of the 18th international conference on World wide web(WWW), 2009.
    [11] X. Liang, M. Xie, L. V.S. Lakshmanan, “Adding Structure to Top-K: From Items to Expansions,” in Proceedings of the 20th ACM international conference on Information and knowledge management(CIKM), 2011.
    [12] R. Lémdani, G. Polaillon, N. Bennacer, and Y. Bourda, “A semantic similarity measure for recommender systems,” in Proceedings of the 7th International Conference on Semantic Systems(I-Semantics), 2011
    [13] C. Mesnage and M. J. Carman, “Tag Navigation,” in Proceedings of the 2nd international workshop on Social software engineering and applications, 2009.
    [14] J. Peng, D. Zeng, H. Zhao and F.-Y. Wang, “Collaborative Filtering in Social Tagging Systems Based on Joint Item-Tag Recommendations,” in Proceedings of the 19th ACM international conference on Information and knowledge management(CIKM), 2010.
    [15] D. Skoutas and M. Alrifai, “Tag Clouds Revisited,” in Proceedings of the 20th ACM international conference on Information and knowledge management(CIKM), 2011.
    [16] P.-N. Tan, M.Steinbach, V. Kumar, “Introduction to Data Mining,” 2006.
    [17] D. Vandic, J.-W. v. Dam and F. Hogenboom, “A Semantic Clustering-Based Approach for Searching and Browsing Tag Spaces,” in Proceedings of the 26th ACM Symposium on Applied Computing(SAC), 2011.
    [18] Wikipedia. Tag cloud — wikipedia, the free encyclopedia, 2013.[Online; accessed 25-February-2013].
    [19] L. Wu, L. Yang, , N. Yu, X.-S. Hua, “Learning to tag,” in Proceedings of the 18th international conference on World wide web(WWW), 2009.
    [20] K. Wang, L. Tang, J. Han and J. Lin, “Top Down FP-Growth for Association Rule Mining,” in Proceedings of the 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD), 2002.
    [21]A. Zubiaga, A. P. García-Plaza, V. Fresno, and R. Martínez, “Content-based Clustering for Tag Cloud Visualization,” in Proceedings of International Conference on Advances in Social Networks Analysis and Mining(ASONAM), 2009.

    下載圖示
    QR CODE