研究生: |
馬仲文 Ma, Chung-Wen |
---|---|
論文名稱: |
結合臉部表情及聲音之嬰兒情緒辨識系統 An Infant Emotion Recognition System Using both Facial Expressions and Vocalization |
指導教授: |
方瓊瑤
Fang, Chiung-Yao |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 中文 |
論文頁數: | 70 |
中文關鍵詞: | 嬰兒監控系統 、臉部偵測 、嬰兒情緒辨識 、區域三元化圖形(LTP) 、Zernike moments 、梅爾頻率倒頻譜係數(MFCCs) |
英文關鍵詞: | infant monitory system, face detection, infant emotion recognition, local ternary pattern(LTP), Zernike moments, mel frequency cepstral coefficients(MFCCs) |
論文種類: | 學術論文 |
相關次數: | 點閱:201 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
嬰兒的情緒發展會影響未來的學習力、注意力甚至於成長後的個性及人際關係,而在人一生的情緒發展中以嬰兒時期的情緒發展最為重要。所以若是能得知嬰兒目前情緒及生理需求並予以滿足,對未來發展影響甚大,然而嬰兒在1歲之前只能使用臉部表情及不帶詞意的聲音來向父母表達自己目前的情緒及生理需求。所以本論文開發一套結合嬰兒臉部表情及聲音的監控系統,適時協助轉達嬰兒情緒,以減輕父母照顧嬰兒的負擔,更幫助父母妥善的照顧嬰兒。
本系統一開始分成兩部分執行,一部分為影像部分,另一部分為聲音部分。影像部分主要分為嬰兒臉部偵測及臉部特徵擷取,當系統讀入連續的嬰兒影像後,會從影像中擷取膚色區域並從這些膚色區域中找出嬰兒的臉部區域。接著採用local ternary pattern標示影像中嬰兒臉部輪廓,並進行差分影像累積,最後計算累積差分影像中0階至3階的Zernike moments值,當作嬰兒臉部特徵使用。而聲音方面利用常見的mel frequency cepstral coefficients與其差量倒頻譜係數當作嬰兒聲音特徵使用。最後利用support vector machine將影像及聲音特徵分別進行分類,並將兩者分類結果整合成嬰兒情緒類別。
實驗影片共有100段,其中每段影片僅包含單一情緒類別,合計影片長度為100分鐘,拍攝嬰兒之月齡為1個月至7個月,而嬰兒情緒辨識之平均正確率約為85.3%,由此可知,本系統的辨識結果具有一定的可信度。
The emotional development of infants will affect their learning ability, attention, personality and interpersonal in the future, thus it is very important in the life of person. However infants are difficult to use words to express their emotions or physiological needs, others can understand their emotion or physiological needs by their facial expressions, vocalization, and body movements. Therefore, the study presents an infant emotion recognition system using both facial expressions and vocalization to reduce the burden of parents to take care of the infants.
The system can be divided into two parts: image processing part and speech processing part. Image processing part consists of two main stages: infant face detection and facial expression feature extraction. In the infant face detection stage, the system detects the skin color pixels from the input images and uses the connect component technology to find the biggest skin color region which is regarded as the face of infants. In the facial expression feature extraction stage, the system uses the local ternary pattern technology to label the face contour of the infants and calculates the values of 0 to 3 order Zernike moments in the cumulative difference image.
In speech processing part, the system uses common mel frequency cepstral coefficients and its delta cepstrum coefficients as speech features. Finally the system uses support vector machine to classify the facial expression features and vocalization respectively. By combining two types of classification results, the system gets the emotion of the infants.
The number of experimental sequence is 100 with total length 100 minutes and the infants in these sequences are 1-7 months old. Each sequence only contains one emotion, while the average rate of infant emotions is 85.3%. As a result, the proposed system is robust and efficient.
[Bri32] K. M. B. Bridges, “Emotional Development in Early Infancy,” Child Development, vol. 3, no. 4, pp. 324-341, 1932.
[Che09] W. Chen, T. Sun, X. Yang, and L. Wang, “Face Detection Based on Half Face-template,” Proceedings of the International Conference on Electronic Measurement & Instruments, Beijing, China, pp. 4_54-4_58, 2009.
[Chi13] C. Y. Chiu and P. T. Huang, “Application of the Honeybee Mating Optimization Algorithm to Patent Document Classification in Combination with the Support Vector Machine,” International Journal of Automation and Smart Technology, vol. 3, no. 3, pp. 179-191, 2013.
[Coo95] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active Shape Models—Their Training and Application,” Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38-59, 1995.
[Cru14] A. C. Cruz, B. Bhanu, and N. S. Thakoor, “Vision and Attention Theory Based Sampling for Continuous Facial Emotion Recognition,” IEEE Transactions on Affective Computing, vol.5, no.4, pp. 418-431, 2014.
[Ekm78] P. Ekman and W. Friesen, “Facial Action Coding System: A Technique for the Measurement of Facial Movement,” Consulting Psychologists Press, 1978.
[Gee09] A. Geetha, V. Ramalingam, S. Palanivel, and B. Palaniappan, “Facial Expression Recognition—A Real Time Approach,” Expert Systems with Applications, vol.36, no.1, pp. 303-308, 2009.
[Kan00] T. Kanade, J. F. Cohn, and Y. Tian, “Comprehensive Database for Facial Expression Analysis,” Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, pp. 46-53, 2000.
[Laj12] S. M. Lajevardi and H. R. Wu, “Facial Expression Recognition in Perceptual Color Space,” IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3721-3733, 2012.
[Li13] Y. Q. Li, S. F. Wang, Y. P. Zhao, and Q. Ji, “Simultaneous Facial Feature Tracking and Facial Expression Recognition,” IEEE Transactions on Image Processing, vol. 22, no. 7, pp. 2559-2573, 2013.
[Lin12] J. C. Lin, C. H. Wu, and W. L. Wei, “Error Weighted Semi-coupled Hidden Markov Model for Audio-visual Emotion Recognition,” IEEE Transactions on Multimedia, vol. 14, no. 1, pp. 142-156, 2012.
[Pal06] P. Pal, A. N. Iyer, and R. E. Yantorno, “Emotion Detection from Infant Facial Expressions and Cries,” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France, vol.2, pp. II_721-II_724, 2006.
[Sat96] J. Sato and S. Morishima, “Emotion Modeling in Speech Production Using Emotion Space,” Proceedings of the IEEE International Workshop on Robot and Human Communication, Tsukuba, Japan, pp. 472-477, 1996.
[Sin13] A. K. Singh, J. Mukhopadhyay, and K. S. Rao, “Classification of Infant Cries Using Epoch and Spectral Features,” Proceedings of the National Conference on Communications, New Delhi, India, pp. 1-5, 2013.
[Sir14] P. Siritanawany and K. Kotani, “Facial Expression Classification by Temporal Template Features,” Proceedings of the SICE Annual Conference 2014, Sapporo, Japan, pp. 604-609, 2014.
[Sor02] M. Soriano, B. Martinkauppi, and S. Huovinen, “Skin Detection in Video under Changing Illumination Conditions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Barcelona, Spain, vol.1, pp. 839-842, 2002.
[Tan10] X. Tan and B. Triggs, “Skin Detection in Video under Changing Illumination Conditions,” IEEE Transactions on Image Processing, vol. 19, no. 6, pp. 1635-1650, 2010.
[Taw13] A. Tawari and M. M. Trivedi, “Face Expression Recognition by Cross Modal Data Association,” IEEE Transactions on Multimedia, vol. 15, no. 7, pp. 1543-1552, 2013.
[Tha89] R. E. Thayer, “The Biopsychology of Mood and Arousal,” NewYork, NY, USA: Oxford Univ. Press, 1989.
[Val11] M. F. Valstar, B. H. Jiang, M. Mehu, M. Pantic, and K. Scherer, “The First Facial Expression Recognition and Analysis Challenge,” Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, Santa Barbara, USA, pp. 921-926, 2011.
[Wan09] Y. Wang, X. Ning, C. Yang, and Q. Wang, “A Novel Method for Face Detection Across Illumination Changes,” Proceedings of the Global Congress on Intelligent Systems, Xiamen, China, vol. 2, pp. 374-378, 2009.
[Wu13] C. H. Wu, W. L. Wei, J. C. Lin, and W. Y. Lee, “Speaking Effect Removal on Emotion Recognition from Facial Expressions Based on Eigenface Conversion,” IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 1732-1744, 2013.
[李11] 李宜蓁,〈孩子語言發展有問題,怎麼辦?〉。出自《親子天下》,第22期,2011。
[陳10] 陳秋利,〈自動膚色範圍界定之嬰兒臉部偵測及表情辨識系統〉。出自國立臺灣師範大學碩士論文,2010。
[黃11] 黃律嘉,〈以主成份分析為基礎之嬰兒表情辨識系統〉。出自國立臺灣師範大學碩士論文,2011。
[游07] 游祿勳,〈新生嬰兒哭聲情緒之辨識〉。出自國立成功大學碩士論文,2007。
[盧93] 盧素碧,〈幼兒的發展與輔導〉。出自文景書局,1993。
[1] Erik Erikson, Available at: http://www.simplypsychology.org/Erik-Erikson.html, Accessed at 2014.
[2] LIBSVM — A Library for Support Vector Machines, Available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm/, Accessed 2013.