研究生: |
江政杰 Cheng-Chieh Chiang |
---|---|
論文名稱: |
以區域為基礎的影像搜尋 — 影樣表達、比對與學習 A Framework for Region-Based Image Retrieval — Image Representation, Matching, and Learning |
指導教授: |
李忠謀
Lee, Chung-Mou 洪一平 Hung, Yi-Ping |
學位類別: |
博士 Doctor |
系所名稱: |
資訊教育研究所 Graduate Institute of Information and Computer Education |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 英文 |
論文頁數: | 92 |
中文關鍵詞: | 影像搜尋 、視覺特徵 、影像注釋 、語意間隙 、相關回饋 |
英文關鍵詞: | Region-Based Image Retrieval, Visual Features, Image Annotation, Semantic Gap, Relevance Feedback |
論文種類: | 學術論文 |
相關次數: | 點閱:178 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文針對以區域為基礎的影像搜尋為主題進行研究。以區域為基礎的影像搜尋將影像分割成許多區域,各區域涵括相近的主題或特徵,然後用影像區域建立影像搜尋之索引與機制。我們建構了一個以區域為基礎的影像搜尋架構,包含了特徵擷取-包含視覺性特徵與語意性特徵-來表達影像資訊,設計一套影像比對機制,然後建置一個重複學習的機制允許使用者可以回饋他們的意見。
本論文首先定義一種新的影像視覺特徵,我們稱之為顏色-大小直方圖,本特徵的主要概念是結合傳統的顏色特徵與影像區域大小的資訊,來增加影像資訊的表達能力。此外,我們設計一個視覺-文字模式的影像特徵,這種特徵並不是單純的訊號低階特徵,而是建構在低階特徵之上彙整過的中階資訊。我們同時設計一套影像的語意特徵,這種語意特徵可以透過機器學習的機制,自動擷取影像內涵的語義概念。透過這三層的特徵描述,影像表達的機制可以更加完善。
除了三個不同階層的影像特徵,本論文還設計一套重複的機器學習模式,使系統可以根據使用者回饋的資訊,判斷使用者希望搜尋的影像資訊類型。這樣的學習模是可以讓影像搜尋的結果更加精確,而彌補使用影像視覺特徵之不足。因此,由影像特徵來表達影像資訊,到設計影像比對之學習模式,本論文建構一整套以區域為基礎的影像搜尋機制。
This thesis focuses on issues of region-based image retrieval, which employs image regions, parts of an image with homogeneous subjects or features, to index and represent an image. We design a framework of region-based image retrieval involving the feature extraction-both visual and semantic-for image representation, the similarity measure for image matching and ranking, and the interactively learning scheme for estimating the user requests in relevance feedbacks.
We first propose a new type of visual features, called color-size feature, which embeds region-size attributes in color features. The color-size feature does not only provide color features but also contain the structure information in an image. We also design a visual-word-based image feature that categorizes region features in visual feature spaces. The design of the visual-word-based image feature expects to yield a compact region-based image representation.
However, the semantic gap between visual features and human perception is challenging in image understanding and retrieval. The users usually recognize an image by their concepts, but, unfortunately, only low-level feature vectors can be directly extracted for digital images. We try to handle the problem of semantic gap in the two ways: (i) image annotation to discover the semantic contents in images and (ii) relevance feedbacks to interactively learn what the users’ requests are. We design a hierarchical approach of image annotation such that more information with higher-level concepts can be included in the retrieval task. We employ the proposed image annotation to design a type of semantic-based image features that contains semantic information in human views.
Also, we propose an interactive approach to estimating the user intention according to the positive examples in relevance feedbacks. Our proposed approach does not only consider the likelihood measure that analyzes which representing units are appropriate to represent the user intention implicit in positive examples of user feedbacks, but also involve the confusion measure that records the degree of the confusion between any two representing units. Either the proposed visual-word-based or semantic-based images feature is used to be the representing units for the user requests in our work. Therefore, we design the similarity measure of images using the estimation of the user intention for image matching and ranking.
[Atmosukarto et al. 05] I. Atmosukarto, W. K. Leow, and Z. Huang, “Feature Combination and Relevance Feedback for 3D Model Retrieval”, in Proceedings of International Conference on Multimedia Modeling, 2005.
[Barnard] Data for “Object recognition as machine translation” (ECCV 02), http:// kobus.ca/research/data/eccv_2002/index.html, Last accessed July 2007.
[Barnard and Forsyth 01] K. Barnard and D. Forsyth, “Learning the semantics of words and pictures”, in Proceedings of ICCV, pp. 408-415, 2001.
[Barnard and Shirahatti 03] K. Barnard and N. V. Shirahatti, “A Method for Comparing Content Based Image Retrieval Methods”, Internet Imaging IX, Electronic Imaging 2003.
[Basu 05] S. Basu, “Semi-supervised Clustering: Probabilistic Models, Algorithms and Experiments”, Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, 2005.
[Bilenko et al. 04] M. Bilenko, S. Basu, and R. J. Mooney, “Integrating Constraints and Metric Learning in Semi-Supervised Clustering”, in Proceedings of ICML, 2004.
[Blei and Jordan 03] D. Blei and M. Jordan, “Modeling Annotated Data”, in Proceedings of ACM SIGIR, 2003
[Blei et al. 03] D. Blei, A. Ng, and M. Jordan, ”Latent Dirichlet Allocation”, Journal of Machine Learning Research, Vol.3, pp. 993–1022, 2003.
[Blum and Mitchell 98] A. Blum and T. Mitchell, “Combining Labeled and Unlabeled Data with Co-Training”, in Proceedings of the eleventh annual conference on Computational learning theory, pp. 92-100, 1998.
[Buck 95] C. Buckley and G. Salton, “Optimization of Relevance Feedback Weights”, in Proceedings of ACM SIGIR, pp.351-357, 1995.
[Carneiro and Vasconcelos 05] G. Carneiro and N. Vasconcelos, “Formulating Semantic Image Annotation as a Supervised Learning Problem”, in Proceedings of CVPR, 2005.
[Carson et al. 02] C. Carson, S. Belongie, H. Greenspan, and J. Malik, ”Blobworld: Image Segmentation Using Expectation-maximization and Its Application to Image Querying”, IEEE Transaction on Pattern Analysis and Machine Intelligence, 24(8):1026-1038, 2002.
[Castelli and Bergman 01] V. Castelli and L. D. Bergman, “Image Databases: Search and Retrieval of Digital Imagery”, 1st Ed., 2001, Wiley-Interscience.
[Chang et al. 03] E. Y. Chang, K. Goh, G. Sychay, and G. Wu, “CBSA: Content-based Soft Annotation for Multimodal Image Retrieval Using Bayes Point Machines”, IEEE Transaction on Circuits and Systems for Video Technology, 13(1):26–38, 2003.
[Chiang et al. 04] C.- C. Chiang, D.- W. Fuh, Y.- P. Hung, G. C. Lee, “Region-Based Image Retrieval Using Color And Texture Features of Watershed Regions”, in Proceedings of 6th Asia Conference on Computer Vision, ACCV, pp.1056-1061, Korea, 2004.
[CLEF] Cross-Language Evaluation Forum (CLEF), http://www.clef-campaign.org/, Last accessed July 2007.
[Cohen et al. 04] I. Cohen, F. G. Cozman, N. Sebe, M. C. Cirelo, and T. S. Huang, “Semisupervised Learning of Classifiers: Theory, Algorithms, and Their Application to Human-Computer Interaction”, IEEE Transaction on PAMI, Vol. 26, No. 12, Dec. 2004.
[Cox et al. 96] I. J. Cox, M. L. Miller, S. M. Omohundro, and P. N. Yianilos, “PicHunter: Bayesian Relevance Feedback for Image Retrieval”, in Proceedings of International Conference on Pattern Recognition, pp.361-369, 1996.
[Cox et al. 00] I. J. Cox, M. L. Miller, T. P. Minka, T. V. Papathomas, and P. N. Yianilos, “The Bayesian Image Retrieval System, PicHunter: Theory, Implementation, and Psychophysical Experiments”, IEEE Transaction on Image Processing, Vol. 9, No. 1: 20–37, 2000.
[Csurka et al. 04] G. Csurka, C. Bray, C. Dance, and L. Fan, “Visual categorization with bags of keypoints”, in Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1-22, 2004.
[Datta et al. 05] R. Datta, J. Li, and J. Z. Wang, “Content-Based Image Retrieval - Approaches and Trends of the New Age”, in Proceedings of the ACM SIGMM international workshop on Multimedia information retrieval, Singapore, Nov., 2005
[Duda et al. 01] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, 2nd Ed. 2001, John Wiley & Sons, Inc.
[Duygulu et al. 02] P. Duygulu, K. Barnard, J. F. G. de Freitas, and D. A. Forsyth, “Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary”, in Proceedings of European Conference on Computer Vision, pages 97-112, 2002.
[Fayyad and Irani 93] U. M. Fayyad, and K. B. Irani, “Multi-interval discretization of continuous-valued attributes for classification learning”, in Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France: Morgan Kaufmann, pp. 1022-1029, 1993.
[Fei-Fei] L. Fei-Fei dataset page, http://www.vision.caltech.edu/feifeili/Datasets.htm, Last accessed July 2007.
[Fei-Fei et al. 04] L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories”, in Proceedings of IEEE CVPR Workshop on Generative-Model Based Vision. 2004
[Fei-Fei and Perona 05] L. Fei-Fei and P. Perona, “A Bayesian hierarchical model for learning natural scene categories”, in Proceedings of CVPR, pp. 524-531, 2005.
[Feng et al. 04] S. L. Feng, R. Manmatha, and V. Lavrenko, “Multiple Bernoulli Relevance Models for Image and Video Annotation”, in Proceedings of CVPR, 2004.
[Fergus et al. 05] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, “Learning object categories from Google’s image search”, in Proceedings of ICCV, 2005.
[Flickner et al. 95] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by Image and Video Content: The QBIC System”, IEEE Computer, Vol. 28, No. 9, pp.23-32, Sep. 1995.
[Goh et al. 04] K. Goh, E. Y. Chang, and W.- C. Lai , “Multimodal Concept-dependent Active Learning for Image Retrieval”, in Proceedings of the 12th annual ACM international conference on Multimedia, NY, USA, pp. 564-571, 2004.
[Gudivada and Raghavan 95] V. N. Gudivada and V. V. Raghavan, “Content-Based Image Retrieval Systems”, IEEE Computer, Vol. 28, No. 9, 1995.
[Haffari 06] G. R. Haffari, “A Survey on Inductive Semi-Supervised Learning”, http:// www.cs.sfu.ca/~ghaffar1/personal/publications/, Last accessed July 2007.
[Hofmann 99] T. Hofmann, “Probabilistic latent semantic indexing”, in proceedings of ACM SIGIR, 1999.
[ICCV 2005 Courses] ICCV 2005 Short Courses: Recognition and Learning Object Categories, http://people.csail.mit.edu/torralba/iccv2005/, Last accessed July 2007.
[Jeon et al. 03] J. Jeon, V. Lavrenko, and R. Manmatha, “Automatic Image Annotation and Retrieval using Cross-Media Relevance Models”, in Proceedings of ACM SIGIR, 2003.
[Jin et al. 04] W. Jin, R. Shi, and T. -S. Chua, “A Semi-Naïve Bayesian Method Incorporating Clustering with Pair-Wise Constraints for Auto Image Annotation”, in Proceedings of ACM MM, 2004.
[Jing et al. 04] F. Jing, M. Li, H.- J. Zhang, and B. Zhang, “An efficient and effective region-based image retrieval framework”, IEEE Transaction on Image Processing, vol. 13(5), 2004.
[Laaksonen et al. 02] J. Laaksonen, M. Koskela, and E. Oja, “PicSOM-Self- Organizing Image Retrieval with MPEG-7 Content Descriptors”, IEEE Trans. on Neural Networks, 13(4):841-853, 2002.
[Lavrenko and Croft 01] V. Lavrenko and W. Croft, “Relevance-Based Language Models”, in proceedings of ACM SIGIR, pp. 120-127, 2001.
[Lew et al. 2006] M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, “Content-based multimedia information retrieval: state of the art and challenges”, ACM Transactions on Multimedia Computing, Communications and Applications, vol. 2(1): 1–19, 2006.
[Lin et al. 04] Y. Lin, D. Zhang, G. Liu, and W. -Y. Ma, “Region-Based Image Retrieval with Perceptual Colors”, Advances in Multimedia Information Processing - PCM 2004, LNCS 3332/2004, pp. 931-938.
[Lin et al. 07] Y. Lin, D. Zhang, G. Liu, and W. -Y. Ma, “A Survey of Content-Based Image Retrieval with High-Level Semantics”, Pattern Recognition, Vol. 40, pp. 262-282, Jan. 2007.
[Lowe 99] D. Lowe, “Object recognition from local scale-invariant features”, In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pages 1150-1157, September 1999.
[Lowe 04] D. Lowe, “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, 60(2):91-110, 2004.
[Ma and Manjunath 99] W. Y. Ma and B. S. Manjunath, “NeTra: A Toolbox for Navigating Large Image Databases”, Multimedia Systems Journal, Vol. 7, No. 3, pp.184-198, 1999.
[Manjunath et al. 01] B. S. Manjunath, J. R. Ohm, V. V. Vasudevan, and A. Yamada, “Color and Texture Descriptors”, IEEE Transaction on Circuits System Video Tech (Special Issue on MPEG-7), June 2001, pp.703-715.
[Maree et al. 05] R. Maree, P. Geurts, J. Piater, and L. Wehenkel, “Random Subwindows for Robust Image Classification”, in Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, 2005.
[Mezaris et al. 03] V. Mezaris, I. Kompatsiaris, and M. G. Strintzis, “An Ontology Approach to Object- Based Image Retrieval”, in Proceedings of ICIP, 2003.
[Qian et al. 06] X. Qian, X. Du, and Q. Wang, “Semi-Supervised Hierarchical Clustering Analysis for High Dimensional Data”, International Journal of Information Technology, Vol.12, No.3, 2006.
[Rubner et al. 00] Y. Rubner, C. Tomasi, and L. J. Guibas, “The Earth Mover’s Distance As A Metric for Image Retrieval”, Internal Journal of Computer Vision, Nov. 2000, pp.99-121.
[Rui et al. 97] Y. Rui, T. S. Huang and S. Mehrotra, “Content-Based Image Retrieval with Relevance Feedback in MARS”, in Proceedings of International Conference on Image Processing, Vol.2, pp. 815-818, 1997.
[Rui et al. 98] Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, “Relevance Feedback: A Power Tool for Interactive Content-Based Image Retrieval”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 8(5): 644-655, 1998.
[Rui et al. 99] Y. Rui, T. S. Huang and S. F. Chang, “Image Retrieval: Current Techniques, Promising Directions, and Open Issues”, Journal of Visual Communication and Image Representation, pp.39-62, 1999.
[Pentland et al. 94] A. Pentland, R. W. Picard, and S. Sclaroff, “Photobook: Content-Based Manipulation of Image Databases,” in Proceedings of SPIE Storage and Retrieval of Image and Video Databases II, No. 2185, pp.34-47, 1994.
[Schettini et al. 01] R. Schettini, G. Ciocca, S. Zuffi, “Survey Of Methods For Colour Image Indexing And Retrieval In Image Databases”, L.W. MacDonald and M.R. Luo, Editors, Color imaging science: exploiting digital media, Wiley, J. & Sons Ltd, Chichester, England, 2001.
[Smeulders et al. 00] A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-Based Image Retrieval at the End of the Early Years”, IEEE Transaction on PAMI, 22(12):1349–1380, 2000.
[Smith 97] J. Smith, “Integrated Spatial and Feature Image Systems: Retrieval, Compression and Analysis”, Ph.D. thesis, Graduate School of Arts and Sciences, Columbia University, San Diego, CA, USA, 1997.
[Smith and Chang 96] J. R. Smith and S. F. Chang, “VisualSEEk: a Fully Automated Content-Based Image Query System”, in Proceedings of the ACM international conference on Multimedia, pp.87-98, 1996.
[Smith and Li 99] J. R. Smith, C. S. Li, “Image Classification and Querying Using Composite Region Templates”, Computer Vision and Image Understanding, Jul. 1999, pp.165-174.
[Sivic et al. 05] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman, “Discovering objects and their location in images”, in Proceedings of ICCV, 2005.
[Sivic and Zisserman 04] J. Sivic and A. Zisserman, ”Efficient Visual Content Retrieval and Mining in Videos”, in Proceedings of Pacific-Rim Conference on Multimedia, 2004
[Srikanth et al. 05] M. Srikanth, J. Varner, M. Bowden, and D. Moldovan, “Exploiting Ontologies for Automatic Image Annotation”, in Proceedings of ACM SIGIR, 2005.
[Stricker and Orengo 95] Stricker and M. Orengo, “Similarity of Color Images”, in Proceedings of SPIE Storage and Retrieval for Image and Video Databases, pp. 381-392, 1995.
[Su et al. 03] Z. Su, H. J. Zhang, S. Li, and S. Ma, “Relevance Feedback in Content-Based Image Retrieval: Bayesian Framework, Feature Subspaces, and Progressive Learning”, IEEE Transactions on Image Processing, Vol. 12, No. 8, pp. 924-937, August 2003.
[Swain and Ballard 91] M. J. Swain and D. H. Ballard, “Color indexing”, International Journal of Computer Vision, 7(1): 11-32, 1991.
[Tong and Chang 01] S. Tong and E. Chang, “Support Vector Machine Active Learning for Image Retrieval”, in Proceedings of ACM Multimedia, 2001.
[TREC] TREC Video Retrieval Evaluation Home Page, http://www-nlpir.nist.gov/ projects/trecvid/, Last accessed July 2007.
[Vasconcelos 00] N. Vasconcelos, “Bayesian Models for Visual Information Retrieval”, Ph.D. thesis, Massachusetts Institute of Technology, June 2000.
[Vincent and Soille 91] L. Vincent and P. Soille, “Watersheds in digital spaces: an efficient algorithm based on immersion simulations”, IEEE Trans. on Pattern Analysis and Machine Intelligence, June 1991, Vol. 13(6), 583-598.
[Vu et al. 01] K. Vu, A. Hua, and J. H. Oh, “A noise-free Similarity Model for Image Retrieval Systems”, in Proceeding of SPIE Conference on Storage and Retrieval Media Databases, San Jose, CA., 2001, pp.1-11.
[Wang 97] D. Wang, “A multiscale gradient algorithm for image segmentation using watersheds”, Pattern Recognition, 30(12), 2043-2052, 1997.
[Wang et al. 01] J. Z. Wang, J. Li, and G. Wiederhold, “SIMPLIcity: Semantics- Sensitive Integrated Matching for Picture Libraries”, IEEE Transaction on Pattern Analysis and Machine Intelligence, 23(9), 947–963, 2001.
[Xu and Schuurmans 05] L. Xu and D. Schuurmans, “Unsupervised and Semi-Supervised Multi-Class Support Vector Machines”, in Proceedings of the Twentieth National Conference on Artificial Intelligence, 2005.
[Yarowsky 95] D. Yarowsky, “Unsupervised Word Sense Disambiguation Rivaling Supervised Methods”, in Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196, 1995.
[Ye et al. 05] J. Ye, X. Zhou, J. Pei, L. Chen, and L. Zhang, “A Stratification-Based Approach to Accurate and Fast Image Annotation”, in Proceedings of the 6th International Conference on Web-Age Information Management, 2005.
[Zachary and Iyengar 99] J. M. Zachary and S. S. Iyengar, “Content Based Image Retrieval Systems”, Application-Specific Systems and Software Engineering and Technology, pp.136-143, 1999.
[Zhang and Chen 02] C. Zhang and T. Chen, “An Active Learning Framework for Content Based Information Retrieval”, IEEE Transaction on Multimedia, Special Issue on Multimedia Database, June 2002.
[Zhang et al. 00] H. J. Zhang, W. Y. Liu, and C. H. Hu, “iFind - A System for Semantics and Feature Based Image Retrieval over Internet”, in Proceedings of ACM MM, pp.477-478, Oct. 2000.
[Zhang and Oles 00] T. Zhang and F. J. Oles, “A Probability Analysis on the Value of Unlabeled Data for Classification Problems”, in Proceedings of ICML, pp. 1191-1198, 2000.
[Zhang et al. 06] R. Zhang, Z. Zhang, M.- J. Li, W.- Y. Ma, and H.- J. Zhang, “A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieval”, Multimedia Systems Journal, the special issue of Using Machine Learning Approaches to Multimedia Information Retrieval, 2006.
[Zhou et al. 05] X. D. Zhou, L. Chen, J. Ye, Q. Zhang, and B. Shi, “Automatic Image Semantic Annotation Based on Image-Keyword Document Model”, in Proceedings of CIVR, 2005.
[Zhou and Huang 03] X. S. Zhou and T. S. Huang, “Relevance Feedback in Image Retrieval: A Comprehensive Review,” Multimedia Systems Journal, 8:536-544, 2003.
[Zhu 05] X. Zhu, “Semi-Supervised Learning with Graphs”, PhD Thesis, CMU, 2005.