Basic Search / Detailed Display

Author: 林祐生
You Sheng, Lin
Thesis Title: 以投影片單應性映射之相關特徵進行演 講影片分析研究
Using Slides Homographic Characteristics for Speech Video Segmentation
Advisor: 李忠謀
Lee, Chung-Mou
Degree: 碩士
Master
Department: 資訊工程學系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2014
Academic Year: 102
Language: 中文
Number of pages: 71
Keywords (in Chinese): 候選畫面影像匹配隨機抽樣一致影像單應性
Keywords (in English): Candidate frame extraction, Slide-frame matching, RANSAC, Homography
Thesis Type: Academic thesis/ dissertation
Reference times: Clicks: 147Downloads: 11
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 投影簡報檔是講者用來輔助說明、提供註記,以及引領觀眾快熟掌握重點的工 具,但缺點是無法清楚表達細節;配合影音多媒體串流,不論在教學、會議或是演說 等場合中,可以完整的提供觀眾更多的細節資訊,但是也因此需要一個更有效率的方 式來瀏覽內容。
    本文提出一個有效率而且準確的方法,將投影片以及影音內容配對。主要流程分 為三個部份,首先找出影音串流的候選影像,以減少後續配對是所花的時間,接著找 出影像特徵,計算候選影像及投影片匹配的特徵點,取得相似度。然後使用鄰近點差 異及隨機抽樣一致將可信度低的特徵濾除;如果條件許可,利用單應性特徵得到投 影片的約略位置。利用投影片在畫面中的比例將畫面分類為「有投影片」及「無投影 片」兩類畫面。接著將「有投影片」的部分在利用前面取得的相似度直接配對,並且 利用投票機制修正結果,最後可以正確的找出 96% 的畫面切換時間點。

    Matching slides with video data frame is a method to provide users a quick way skim over the whole video by any given slide content, and will also help people quickly to jump to any point in the video, which may improve the user expereinces. But manually add mark to each time stamp in the video is time wasting. In this research , we develop an automatic process to achieve this. By given slides file and video file input, the proposed method will output segmented results.
    First, we use a heuristic method to eliminate duplicated and similar frames in recorded speech video. Then applying matching process based on SIFT. Then the matched candidates would be filtered by nearest neighbor ranking, which is suggested by D.G. Lowe. After we got matched candidates, a non-slide-frame detection will prune frames without slides displayed. Before output, we refine the recognition results with context scoring machanisims. The applying to a voting schema to improve the results of frame-slides pairs, and were achieved about 96% coverages of slide-frame switches.

    圖片索引 ................................... III 表格索引 ................................... V 第一章 緒論 1 1.1 研究動機............................... 2 1.2 研究困難............................... 2 1.3 研究目的............................... 3 1.4 研究範圍及限制 ........................... 3 1.5 論文架構............................... 4 第二章 文獻探討 5 2.1 名詞定義.............................. 5 2.2 候選畫面選取............................. 6 2.3 特徵取得方式............................. 7 2.3.1 特徵點偵測 (Feature Detection) .............. 8 2.3.2 特徵描述單元 (Feature Descriptor Extraction) . . . . . . 10 2.3.3 其它特徵取得方式(OtherMethods) ............ 12 2.3.4 特徵點匹配(KeypointsMatching)............. 13 2.3.5 特徵點質量評估 (Keypoints Matching Evaluation) . . . . 13 2.4 投影片與影音串流匹配方法 ..................... 14 2.4.1 直接特徵比較 ........................ 15 2.4.2 人物遮蔽(Occlusion).................... 16 2.4.3 失焦及低解析度的錄影影像 ................. 17 2.4.4 其他情形........................... 18 2.5 投影片與影音串流匹配之應用 .................... 19 2.6 投影片定位(SlidesRegistration).................. 20 第三章 研究方法 21 3.1 方法流程............................... 22 3.2 前處理................................ 23 3.2.1 處理對象........................... 24 3.3 單應性相關特徵 (Homographic Characteristics) . . . . . . . . . . 27 3.3.1 尺度不變特徵轉換(SIFT).................. 27 3.3.2 匹配特徵點.......................... 32 3.3.3 單應性 (Homography) 及投影 (Projection) . . . . . . . . 33 3.3.4 隨機抽樣一致性(RANSAC) ................ 34 3.4 區分畫面類別............................. 36 3.4.1 利用高斯混合模型分類.................... 36 3.5 辨識畫面............................... 37 3.5.1 建立更多特徵辨識畫面.................... 38 3.5.2 投票機制........................... 40 第四章 實驗 43 4.1 環境及資料設定 ........................... 43 4.1.1 測試資料........................... 43 4.2 成效評估方式............................. 46 4.2.1 主要評估方式 ........................ 46 4.2.2 執行時間分析 ........................ 47 4.2.3 記憶體使用量 ........................ 47 4.3 前處理之相關實驗結果........................ 48 4.3.1 畫面減少率、涵蓋率..................... 48 4.4 畫面分類結果............................. 49 4.4.1 靈敏度............................ 49 4.4.2 切換點命中率 ........................ 49 4.5 畫面辨識結果............................. 50 4.5.1 準確度............................ 50 4.5.2 涵蓋率、命中率 ....................... 52 4.5.3 方法比較........................... 53 4.5.4 時間軸配對結果 ....................... 54 4.6 時間及儲存空間分析 ......................... 54 第五章 結論 57 5.1 未來工作............................... 57 附錄A 實驗圖表 59 A.1 其他實驗比較............................. 59 A.2 影片縮略圖.............................. 61 參考文獻 68

    [1] M. A. Fischler and R. C. Bolles. “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography”. In: Communications of the ACM 24.6 (1981), 381–395.
    [2] P. J. Burt and E. H. Adelson. “The Laplacian pyramid as a compact image code”. In: Communi- cations, IEEE Transactions on 31.4 (1983), 532–540.
    [3] C. Harris and M. Stephens. “A combined corner and edge detector.” In: Alvey vision conference. Vol. 15. Manchester, UK, 1988, p. 50.
    [4] M. J. Swain and D. H. Ballard. “Color indexing”. In: International journal of computer vision 7.1 (1991), 11–32.
    [5] H. Zhang, A. Kankanhalli, and S. W. Smoliar. “Automatic partitioning of full-motion video”. In: Multimedia systems 1.1 (1993), 10–28.
    [6] J. R. Smith and S.-F. Chang. “Tools and Techniques for Color Image Retrieval.” In: Storage and Retrieval for Image and Video Databases (SPIE). Vol. 2670. 1996, 2–7.
    [7] G. D. Abowd, C. G. Atkeson, A. Feinstein, C. Hmelo, R. Kooper, S. Long, N. Sawhney, and M. Tani. “Teaching and learning as multimedia authoring: the classroom 2000 project”. In: Proceed- ings of the fourth ACM international conference on Multimedia. ACM, 1997, 187–198.
    [8] Y. Zhuang, Y. Rui, T. S. Huang, and S. Mehrotra. “Adaptive key frame extraction using unsuper- vised clustering”. In: Image Processing, 1998. ICIP 98. Proceedings. 1998 International Confer- ence on. Vol. 1. IEEE, 1998, 866–870.
    [9] T. F. Syeda-Mahmood. “Indexing for topics in videos using foils”. In: Computer Vision and Pat- tern Recognition, 2000. Proceedings. IEEE Conference on. Vol. 2. IEEE, 2000, 312–319.
    [10] R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003.
    [11] T. Liu, H.-J. Zhang, and F. Qi. “A novel video key-frame-extraction algorithm based on perceived motion energy model”. In: Circuits and Systems for Video Technology, IEEE Transactions on 13.10 (2003), 1006–1013.
    [12] D. G. Lowe. “Distinctive image features from scale-invariant keypoints”. In: International journal of computer vision 60.2 (2004), 91–110.
    [13] J. Matas, O. Chum, M. Urban, and T. Pajdla. “Robust wide-baseline stereo from maximally stable extremal regions”. In: Image and vision computing 22.10 (2004), 761–767.
    [14] K. Mikolajczyk and C. Schmid. “A performance evaluation of local descriptors”. In: Pattern Anal- ysis and Machine Intelligence, IEEE Transactions on 27.10 (2005), 1615–1630.
    [15] H. Bay, T. Tuytelaars, and L. Van Gool. “Surf: Speeded up robust features”. In: Computer Vision– ECCV 2006. Springer, 2006, 404–417.
    [16] Q. Fan, K. Barnard, A. Amir, A. Efrat, and M. Lin. “Matching slides to presentation videos using SIFT and scene background matching”. In: Proceedings of the 8th ACM international workshop on Multimedia information retrieval. ACM, 2006, 239–248.
    [17] C. Gianluigi and S. Raimondo. “An innovative algorithm for key frame extraction in video sum- marization”. In: Journal of Real-Time Image Processing 1.1 (2006), 69–88.
    [18] E. Rosten and T. Drummond. “Machine learning for high-speed corner detection”. In: Computer Vision–ECCV 2006. Springer, 2006, 430–443.
    [19] E. Bulut and T. Capin. “Key frame extraction from motion capture data by curve saliency”. In: Proceedings of 20th Annual Conference on Computer Animation and Social Agents, Belgium. 2007.
    [20] Q. Fan, A. Amir, K. Barnard, R. Swaminathan, and A. Efrat. “Temporal modeling of slide change in presentation videos”. In: Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on. Vol. 1. IEEE, 2007, I–989.
    [21] G. Gigonzac, F. Pitie, and A. Kokaram. “Electronic slide matching and enhancement of a lecture video”. In: (2007).
    [22] W. Chen and J. Zhang. “Parametric model for video content analysis”. In: Pattern Recognition Letters 29.3 (2008), 181–191.
    [23] T. Tuytelaars and K. Mikolajczyk. “Local invariant feature detectors: a survey”. In: Foundations and Trends® in Computer Graphics and Vision 3.3 (2008), 177–280.
    [24] Q. Fan, K. Barnard, A. Amir, and A. Efrat. “Accurate alignment of presentation slides with edu- cational video”. In: Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on. IEEE, 2009, 1198–1201.
    [25] L. Juan and O. Gwun. “A comparison of sift, pca-sift and surf”. In: International Journal of Image Processing (IJIP) 3.4 (2009), 143–152.
    [26] T. Lin, B.-J. Yen, C.-H. Chang, H.-F. Yang, and G. C. Lee. “Indexing and teaching focus mining of lecture videos”. In: Multimedia, 2009. ISM’09. 11th IEEE International Symposium on. IEEE, 2009, 681–686.
    [27] X. Wang and M. Kankanhalli. “Robust alignment of presentation videos with slides”. In: Advances in Multimedia Information Processing-PCM 2009. Springer, 2009, 311–322.
    [28] A. Winslow, Q. Tung, Q. Fan, J. Torkkola, R. Swaminathan, K. Barnard, A. Amir, A. Efrat, and C. Gniady. “Studying on the move-enriched presentation video for mobile devices”. In: INFOCOM Workshops 2009, IEEE. IEEE, 2009, 1–6.
    [29] W. Xiangyu, S. Ramanathan, and M. Kankanhalli. “A robust framework for aligning lecture slides with video”. In: Image Processing (ICIP), 2009 16th IEEE International Conference on. IEEE, 2009, 249–252.
    [30] J. Adcock, M. Cooper, L. Denoue, H. Pirsiavash, and L. A. Rowe. “Talkminer: a lecture webcast search engine”. In: Proceedings of the international conference on Multimedia. ACM, 2010, 241– 250.
    [31] N.-M. Cheung, D. Chen, V. Chandrasekhar, S. S. Tsai, G. Takacs, S. A. Halawa, and B. Girod. “Restoration of out-of-focus lecture video by automatic slide matching”. In: Proceedings of the international conference on Multimedia. ACM, 2010, 899–902.
    [32] E. Mair, G. D. Hager, D. Burschka, M. Suppa, and G. Hirzinger. “Adaptive and generic corner detection based on the accelerated segment test”. In: Computer Vision–ECCV 2010. Springer, 2010, 183–196.
    [33] E. Rosten, R. Porter, and T. Drummond. “Faster and better: A machine learning approach to corner detection”. In: Pattern Analysis and Machine Intelligence, IEEE Transactions on 32.1 (2010), 105–119.
    [34] F. Shi and X. Guo. “Keyframe extraction based on kmeas results to adjacent DC images similarity”. In: Signal Processing Systems (ICSPS), 2010 2nd International Conference on. Vol. 1. IEEE, 2010, V1–611.
    [35] E. Tola, V. Lepetit, and P. Fua. “Daisy: An efficient dense descriptor applied to wide-baseline stereo”. In: Pattern Analysis and Machine Intelligence, IEEE Transactions on 32.5 (2010), 815– 830.
    [36] Q. Fan, K. Barnard, A. Amir, and A. Efrat. “Robust spatiotemporal matching of electronic slides to presentation videos”. In: Image Processing, IEEE Transactions on 20.8 (2011), 2315–2328.
    [37] S. Leutenegger, M. Chli, and R. Y. Siegwart. “BRISK: Binary robust invariant scalable keypoints”. In: Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011, 2548–2555.
    [38] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. “ORB: an efficient alternative to SIFT or SURF”. In: Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011, 2564– 2571.
    [39] C. Sujatha and U. Mudenagudi. “A Study on Keyframe Extraction Methods for Video Summary”. In: Computational Intelligence and Communication Networks (CICN), 2011 International Con- ference on. IEEE, 2011, 73–77.
    [40] Z. Wang, B. Fan, and F. Wu. “Local intensity order pattern for feature description”. In: Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011, 603–610.
    [41] A. Alahi, R. Ortiz, and P. Vandergheynst. “Freak: Fast retina keypoint”. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012, 510–517.
    [42] M. Calonder, V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, and P. Fua. “BRIEF: Computing a local binary descriptor very fast”. In: Pattern Analysis and Machine Intelligence, IEEE Transac- tions on 34.7 (2012), 1281–1298.
    [43] Y. Kharitonova, Q. Tung, A. Danehy, A. Efrat, and K. Barnard. “Client-side backprojection of presentation slides into educational video”. In: Proceedings of the 20th ACM international con- ference on Multimedia. ACM, 2012, 1005–1008.
    [44] O. Miksik and K. Mikolajczyk. “Evaluation of local detectors and descriptors for fast feature matching”. In: Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, 2012, 2681–2684.
    [45] Confreaks. URL: http://www.confreaks.com/.
    [46] Coursera. URL: https://www.coursera.org/courses.
    [47] TED: Ideas worth spreading. URL: https://www.ted.com/.
    [48] VideoLectures.NET . URL: http://videolectures.net/.
    [49] YouTube. Multimedia. URL: http://www.youtube.com/.
    [50] 嶄新的線上學習體驗—NGL 3.0. URL: http://ngl.csie.ntnu.edu.tw/login.

    下載圖示
    QR CODE