研究生: |
楊儒松 Ru-Song Yang |
---|---|
論文名稱: |
基於視覺和聽覺的教學影片內容分析與分類 Content-Based Lecture Videos Analysis and Classification Based on Audio and Visual Cues |
指導教授: |
李忠謀
Lee, Chung-Mou |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 39 |
中文關鍵詞: | 教學影片分析 、語音情緒辨識 、肢體辨識 |
英文關鍵詞: | lecture videos analysis, speech emotion recognition, gesture recognition |
論文種類: | 學術論文 |
相關次數: | 點閱:179 下載:9 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現在大部分的教室仍使用黑板,以黑板授課的教學影片亦相當普及,但黑板授課的教學影片在多媒體語意分析的領域深具挑戰性但極少被討論。本論文針對黑板授課的教學影片,提出一個基於視覺和聽覺的研究方法,針對講者的肢體行為與語音內容進行探討,用以提醒學生在不同時段的教學影片上要投入多少的注意力。在視覺分析上,針對講者於教學中出現的各種姿態作分析,辨別出講者姿態所代表的意義;而在聽覺分析上本研究提出一個基於語音情緒辨識的模型,針對講者的語音內容將講者語音分類為快樂、生氣、厭倦、悲傷、正常等五種聲音情緒,再藉由講者語音情緒上的變化來分析講者的教學狀態。
綜合視覺與聽覺的分析結果,我們可以評估出講者在教學時候各時段的重要性,同時也反映語意的強度。學習者可以根據每個時段下講者教學的重要性投注適當的注意力,讓學習者更有效率的藉由教學影片學習。
Most of the classrooms come with blackboards, and blackboards are widely used as a teaching prop in lecture video recordings. However, there are very few discussions about lecture video recordings that use blackboard as teaching prop concerning its multimedia semantics analysis. The article used a visual and optical based research method to explore speaker’s body languages and tone of speech in the blackboard lecture recordings, and how the amount of attention to pay in different segments of lecture recordings to enhance students’ learning. The visual analysis focused on semantics implied in speaker’s postures. The optical analysis focused on the variations of speaker’s speech emotions in his flow of teaching. The article proposed a speech emotion recognition model that divides speech emotions into five categories of happy, angry, bored, sad, and normal.
The results of the analysis showed semantic intensity of the speaker and the importance of speakers teaching in different segments, and how students can learn more effectively with their variations in amount of attention according to the importance of speakers’ teaching throughout lecture video recordings.
[1] Ying Li, Shrikanth Narayanan, C.-C. Jay Kuo, “Content-Based Movie Analysis and Indexing Based on AudioVisual Cues,” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO.8 , AUGUST 2004.
[2] C. Krishna Mohan, B.Yegnanarayana , “Classification of sport videos using edge-based features and autoassociative neural network models,” Signal, Image and Video Processing, 4, 1: 61-73.
[3] Cannon, W.B. , “Again the James-Lange theory of emotion: a critical examination and an alternative theory”, Am J. Psychol, 39.106-24,1931.
[4] Cornelius R.R., “A THEORETICAL APPROACHES TO EMOTION”, ISCA Workshop on Speech and Emotion, Vassar College Poughkeepsie, NY USA, 2000.
[5] Picard R.W., “Toward Machine Emotional Intelligence: Analysis of Affective Physiological State”, IEEE Transactions on Pattern Analysis and Machine Intelligence Vol 23,no. 10.October 2002.
[6] B. Schuller, G. Rigoll and M. Lang(2003).“Hidden Markov Model-based Speech Emotion Recognition”, Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, vol. 2, pp. 1-4.
[7] D. Ververidis, C. Kotropoulos and I.Pitas(2004).“Automatic Emotional Speech Classification,” Proceedings of IEEE International Conference on Acoustics,
Speech, and Signal Processing, Montreal, Quebec, Canada,vol. 1, pp. 593-596.
[8] B. Schuller, G. Rigoll and M. Lang(2004).“Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine – Belief Network Architecture”, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada, vol. 1, pp. 577-580.
[9] X.H. Le, G. Quénot and E. Castelli(2004).“Recognizing Emotions for Audio-Visual Document Indexing," Proceedings of 9th Symposium on Computers and Communications,Alexandria, Egypt, vol. 2, pp. 580-584.
[10] Oh-Wook Kwon, Kwokleung Chan, Jiucang Hao, Te-Won Lee,” Emotion Recognition by Speech Signals”, Institute for Neural Computation University of California, San Diego, USA.
[11] Dimitrios Ververidis, Constantine Kotropoulos ,” Emotional speech recognition: Resources, features, and methods”, Artificial Intelligence and Information
Analysis Laboratory, Department of Informatics, Aristotle University of Thessaloniki,University Campus, Box 451, Thessaloniki 541 24, Greece, accepted 24 April 2006.
[11] Y. Chen and W.J. Heng, “Automatic synchronization of speech tra,nscript and slides in presentation,” in Proc. Int. Symp. Circuits and Systems, vol. 2, pp. 568–571. 2003.
[12] F. Wang, C.W. Ngo, and T.C. Pong, “Synchronization of lecture videos and electronic slides by video text analysis,” in ACM Multimedia, pp. 315–318,2003.
[13] T. Liu, R. Hejelsvold, and J.R. Kender, “Analysis and enhancement of videos of electronic slide presentations,” in IEEE International Conference on Multimedia and Expo, vol. 1, pp. 77–80, 2002.
[14] C.W. Ngo, F. Wang, and T.C. Pong, “Structuring lecture videos for distance learning applications,” in Proc. IEEE Int. Symp. Multimedia and Software Engineering, pp. 215–222, 2003.
[15] L. He, Z. Liu, and Z. Zhang, “Why take notes use the whiteboard capture system,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 776–779, 2003.
[16] L. He and Z. Zhang, “Real-time whiteboard capture and processing using a video camera for teleconferencing,” in Proc. ICASSP, pp. 1113–1116, 2005.
[17] M. Wienecke, G.A. Fink, and G. Sagerer, “Toward automatic videobased whiteboard reading,” Int. J. Doc. Anal. Recognit., vol. 7, no. 2-3, pp. 188–200, 2005.
[18] Z. Zhang and L. He, “Notetaking with a camera: Whiteboard scanning and image enhancement,” in Proc. ICASSP, vol. 3, pp. 533–536, 2004.
[19] C.C. Chang and C.K. Lin, LIBSVM: a libraryfor support vector machines. Software availableat http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[20] S. Ammouri, and G.A. Bilodeau, “Face and Hands Detection and Tracking Applied to the Monitoring of Medication Intake,” Canadian Conference on Computer and Robot Vision, pp. 147-154, Canadian, May 2008.
[21] 語音訊號處理,王小川 編著,2009年2月
[22] Fukuda S., and Kostov V., ”Extraction emotion from voice”, IEEE International Conference on System, Man, and Cybernetics, 1999.
[23] Theodoros Giannakopoulos, Aggelos Pikrakis and Sergios Theodoridis,” A DIMENSIONAL APPROACH TO EMOTION RECOGNITION OF SPEECH FROM MOVIES,” ICASSP 2009