研究生: |
施柏先 |
---|---|
論文名稱: |
對社交標籤系統提出有效率的 Top-k 近似查詢處理方法之研究 The Efficient Top-k Similar Processing Method for Similar Search on Social Tagging System |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2012 |
畢業學年度: | 101 |
語文別: | 中文 |
論文頁數: | 70 |
中文關鍵詞: | 社交標籤系統 、近似距離公式評估 、Top-k 近似查詢處理方法 |
英文關鍵詞: | Social tagging system, Distance measure evaluation, Top-k similar search algorithm |
論文種類: | 學術論文 |
相關次數: | 點閱:95 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文研究目的為針對社交標籤網站提供一個有效率的標籤集近似查詢處理系統,我們使用一個多階層的標籤集索引結構來對系統中資料物件的標籤集建立索引,此索引結構內記錄的資訊能夠估算一群資料物件與使用者所給的查詢標籤集合距離的上限及下限值,能夠篩除掉多個和查詢標籤集距離較大的標籤集,使查詢更有效率。本論文根據此索引結構設計了傑卡德距離與重疊距離兩個標籤集距離的計算方法,並提出考量語義的修正計算方法及對應的距離上下限值估算方法,實驗評估結果可顯示這些距離計算方法對近似查詢結果的有效性。本論文亦針對此索引結構提出一個有效率的 Top-k 近似查詢演算法,使用者僅需設定資料結果筆數 ( k ),可找出標籤集和給定查詢標籤集合最相似的前 k 筆資料物件。實驗結果顯示本研究提出的 Top-k 搜尋處理方法和基本方法比較,在多數情況下可有效提昇處理效率。
This thesis mainly aims to provide an efficient similar search system for tag sets on the social tagging system. We use a multi-level index structure to construct an index on the tag sets of the objects. We can estimate the upper/lower bound of the distance value between a given query tag set and the tag sets of a group of objects according to the information maintained in the index structure, which can be used to improve the efficiency of searching by applying the pruning strategies. According to the property of this index structure, we design the Jaccard distance and Overlap distance measure function for evaluating the similarity distance betwen two tag sets. Then the modified distance measure functions and the upper/lower bound estimating methods corresponding to the specific distance measure functions are proposed. The experimental results show that the effectiveness on similarity searches of the proposed distance measure functions. We also proposes an efficient top-k similar search algorithm based on the index structure. Users only need to give the number of required search results, k, the system could find the top-k most similar objects with the query tag set. The experimental results show that, in most test cases, the proposed algorithm performs better in efficiency than the baseline method.
[1] J.-C. Chuang, C.-W. Cho, and A. L.-P. Chen. Similarity search in transaction
databases with a two-level bounding mechanism. In M. Lee, K.-L. Tan, and V. Wu-
wongse, editors, Database Systems for Advanced Applications, volume 3882 of
Lecture Notes in Computer Science, pages 572--586. Springer Berlin Heidelberg,
2006.
[2] B. Ding, H. Wang, R. Jin, J. Han, and Z. Wang. Optimizing index for taxonomy
keyword search. In Proceedings of the 2012 ACM SIGMOD International Con-
ference on Management of Data, SIGMOD '12, pages 493--504, New York, NY,
USA, 2012. ACM.
[3] S. A. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems.
J. Inf. Sci., 32(2):198--208, April 2006.
[4] M. Gupta, R. Li, Z. Yin, and J. Han. Survey on social tagging techniques. SIGKDD
Explor. Newsl., 12(1):58--72, November 2010.
[5] C.-C. Hsieh and J. Cho. Finding similar items by leveraging social tag clouds. In
Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC '12,
pages 644--651, New York, NY, USA, 2012. ACM.
[6] J.-L. Koh, G.-T.Chiang, and I.-C. Chiu. The strategies for supporting query special-
ization and query generalization in social tagging system. Proceedings of the 4th
International Workshop on Social Networks and Social Web Mining(SNSM), 2013.
[7] J.-L. Koh, N. Shongwe, and C.-W. Cho. A multi-level hierarchical index structure
for supporting efficient similarity search on tag sets. 2012 Sixth International Con-
ference on Research Challenges in Information Science (RCIS), pages 1--12, May
2012.
[8] K.-P. Lee, H.-G. Kim, and H.-J. Kim. A social inverted index for social-tagging-
based information retrieval. J. Inf. Sci., 38(4):313--332, August 2012.
[9] J. I. Lopez-Veyna, V. J. Sosa-Sosa, and I. Lopez-Arevalo. Kesosd: keyword search
over structured data. In Proceedings of the Third International Workshop on Key-
69
word Search on Structured Data, KEYS '12, pages 23--31, New York, NY, USA,
2012. ACM.
[10] G. S. Manku, A. Jain, and A. Das Sarma. Detecting near-duplicates for web crawl-
ing. In Proceedings of the 16th international conference on World Wide Web,
WWW '07, pages 141--150, New York, NY, USA, 2007. ACM.
[11] B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. Evaluat-
ing similarity measures for emergent semantics of social tagging. In Proceedings of
the 18th international conference on World wide web, WWW '09, pages 641--650,
New York, NY, USA, 2009. ACM.
[12] A. Mathes. Folksonomies - cooperative classification and communication through
shared metadata. http://www.adammathes.com/academic/computer-mediated-
communication/folksonomies.html, 2004.
[13] C. Ordonez, E. Omiecinski, and N. Ezquerra. A fast algorithm to cluster high di-
mensional basket data. Proceedings 2001 IEEE International Conference on Data
Mining, pages 633--636, 2001.
[14] R. Schenkel, T. Crecelius, M. Kacimi, S. Michel, T. Neumann, J. X. Parreira, and
G. Weikum. Efficient top-k querying over social-tagging networks. In Proceedings
of the 31st annual international ACM SIGIR conference on Research and develop-
ment in information retrieval, SIGIR '08, pages 523--530, New York, NY, USA,
2008. ACM.
[15] B. Spell. Java api for wordnet searching. http://lyle.smu.edu/ tspell/jaws/, 2008.
[16] V. Zanardi and L. Capra. Social ranking: uncovering relevant content using tag-
based recommender systems. In Proceedings of the 2008 ACM conference on Rec-
ommender systems, RecSys '08, pages 51--58, New York, NY, USA, 2008. ACM.
[17] J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv.,
38(2), July 2006