研究生: |
林育德 Lin, Yu-De |
---|---|
論文名稱: |
栩栩如生:動畫人物動作重演 Lifelike: Animated Character Motion Transfer |
指導教授: |
方瓊瑤
Fang, Chiung-Yao |
口試委員: |
陳世旺
Chen, Sei-Wang 黃仲誼 Huang, Chung-I 羅安鈞 Luo, An-Chun 方瓊瑤 Fang, Chiung-Yao |
口試日期: | 2024/01/20 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 42 |
中文關鍵詞: | 動畫人物 、動作重演 、無監督式學習 、多尺度特徵融合 、資料增強 、評估指標 、電腦視覺 |
英文關鍵詞: | Animated character, Motion transfer, Unsupervised learning, Multi-scale feature fusion, Data augmentation, Evaluation metric, Computer vision |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202400486 |
論文種類: | 學術論文 |
相關次數: | 點閱:499 下載:20 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
動作重演旨在將驅動影片中人物的動作轉換為來源影像中人物的動作,並依照轉換結果生成一段動作重演動畫。近期研究多探討真實人類至真實人類的動作重演,甚至已經能夠生成真假難辨的動畫。本研究認為動作重演的價值在於將真實人類的動作轉換為動畫人物的動作,因為可提升動畫人物的動作品質,豐富使用者的娛樂體驗。
本研究期望動畫人物可重演真實人類談話時的面部表情、頭部轉動和肩膀移動等動作,於是分別從VoxCeleb資料集收集名人訪談影片和AnimeCeleb資料集收集動畫人物動作影片,並將這兩種類型的影片組成Celebrity450訓練集。藉由讓系統從Celebrity450訓練集中學習真實人類和動畫人物的動作轉換,預測真實人類至動畫人物的動作轉換。現有技術易受到真實人類和動畫人物幾何差異的影響,產生面部動作轉換能力不足的問題,本研究提出多解析度光流技術,讓系統分別從不同解析度的特徵圖學習相應解析度的光流。同時,AnimeCeleb資料集沒有提供動畫人物肩膀移動的動作資訊,造成系統不能學習動畫人物肩膀移動的動作轉換。本研究在動畫人物動作影片進行特定型態的資料增強,透過移動每張影格的像素座標,模擬動畫人物各類型的肩膀動作。此外,當前指標不能明確評估動作轉換的性能,本研究提出反轉評估技術,透過比對重建之來源影像與來源影像的差距,間接評估動作重演的性能。
實驗表明本研究在動畫人物動作重演領域取得重大的突破,多解析度光流技術不僅改善現有技術的缺失,還能將真實人類面部動作的細節轉換至動畫人物;特定型態資料增強讓系統可從訓練集學習動畫人物肩膀移動的動作轉換,實現生動的動畫人物動作重演;反轉評估技術展現穩健的信效度,提供未來研究一個明確可行的評估指標。
Motion transfer aims to transfer motion of the character from driving video to source image, and generates animation based on the transferred results. Recent works have focused on motion transfer from real humans to real humans, even producing animations that are nearly indistinguishable from reality. We believe the value of motion transfer lies in transferring motion from real humans to animated characters, since it can improve the motion quality of animated characters and enhance the user’s entertainment experience.
We expect animated characters to reenact real human facial expressions, head rotations, and shoulder motions. Thus, we collected some celebrity talking videos from VoxCeleb dataset and some animated character motion videos from AnimeCeleb dataset, then combined these two types of videos into the Celebrity450 training set. The system can predict motion transfer from real humans to animated characters by learning the motion transfer of both real humans and animated characters in the Celebrity450 training set. The current technology is influenced by geometric differences between real humans and animated characters, which leads to inadequate facial motion transfer capability. We propose multi-resolution optical flows technology, allowing the system to learn optical flow at different resolutions from the feature maps at each resolution." Meanwhile, The AnimeCeleb dataset does not provide any animated character shoulder motion information, making the system unable to learn the shoulder motion transfer of animated characters. We perform specific types of data augmentation on animated character motion videos, simulating various types of animated character shoulder motions by adjusting the pixel coordinates within each frame. In addition, the current metric cannot clearly evaluate the motion transfer performance. We propose inverse metric technology to indirectly evaluate motion transfer performance by measuring the difference between reconstructed source image and source image.
Experiments demonstrate that our methods have achieved a very significant breakthrough in animated character motion transfer. Multi-resolution optical flows technology not only addresses the shortcomings of current technology, but also transfers fine-detail facial motion from real humans to animated characters. Specific types of data augmentation enable the system to learn the shoulder motion transfer of animated characters, achieving vivid animated character motion transfer. The inverse metric technology demonstrates steady reliability and validity, providing a clear metric for future work.
[Sia19-1] A. Siarohin, S. Lathuilière, S. Tulyakov, E. Ricci, and N. Sebe, “Animating Arbitrary Objects via Deep Motion Transfer,” Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), California, 2019, pp. 2377-2386.
[Sia19-2] A. Siarohin, S. Lathuilière, S. Tulyakov, E. Ricci, and N. Sebe. “First Order Motion Model for Image Animation,” Proceedings of 2019 Neural Information Processing Systems Conference (NeurIPS), Vancouver, 2019.
[Zha20] J. Zhang, X. Zeng, M. Wang, Y. Pan, L. Liu, Y. Liu, Y. Ding, and C. Fan, “FReeNet: Multi-Identity Face Reenactment,” Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, 2020, pp. 5326-5335.
[Son21] L. Song, W. Wu, C. Fu, C. Qian, C. C. Loy, and R. He, “Pareidolia Face Reenactment,” Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, 2021, pp. 2236-2245.
[Sia21] A. Siarohin, O. J. Woodford, J. Ren, M. Chai, and S. Tulyakov, “Motion Representations for Articulated Animation,” Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, 2021, pp. 13653-13662.
[Ren21] Y. Ren, G. Li, Y. Chen, T. H. Li, and S. Liu, “PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering,” Proceedings of 2021 IEEE international conference on computer vision (ICCV), Montreal, 2021, pp. 13759-13768.
[Hon22] F. T. Hong, L. Zhang, L. Shen, and D. Xu, “Depth-Aware Generative Adversarial Network for Talking Head Video Generation,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 2022, pp. 3397-3406.
[Zha22] J. Zhao and H. Zhang “Thin-Plate Spline Motion Model for Image Animation,” Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 2022, pp. 3657-3666.
[Kim22] K. Kim, S. Park, J. Lee, S. Chung, and J. Lee, “AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment,” Proceedings of 2022 European Conference on Computer Vision (ECCV), Tel-Aviv, 2022.
[Kan23] T. Kang, J. Oh, J. Lee, S. Park, and J. Choo, “Expression Domain Translation Network for Cross-domain Head Reenactment,” arXiv:2310.10073, 2023.
[Nag17] A. Nagrani, J. S. Chung, and A. Zisserman, “VoxCeleb: A Large-Scale Speaker Identification Dataset,” Proceedings of 2017 Interspeech, Stockholm, 2017, pp.2616-2620.
[Li17] T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero, “Learning a model of facial shape and expression from 4d scans,” ACM Transactions on Graphics, vol.36, no. 6, pp. 1–17, 2017.
[Den19] Y. Deng, J. Yang, S. Xu, D. Chen, Y. Jia, and X. Tong, “Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set,” Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), California, 2019.
[Fen21] Y. Feng, H. Feng, M. J. Black, and T. Bolkart, “Learning an animatable detailed 3d face model from in-the-wild images,” ACM Transactions on Graphics, vol. 40, no. 8, pp. 1–13, 2021.
[He15] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings of 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp. 770-778.
[Hin15] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv:1207.0580, 2015.
[Obr21] P. O’Brien, “Dwarfland: Marketing Disney’s folly,” In Snow White and the Seven Dwarfs: New Perspectives on Production, Reception, Bloomsbury Academic, ISBN 9781501351228, 2021, pp.133-148.
[Kal21] M. Kalmakurki, “Character costume portrayal and the multilayered process of costume design in Snow White and the Seven Dwarfs (1937),” In Snow White and the Seven Dwarfs: New Perspectives on Production, Reception, Bloomsbury Academic, ISBN 9781501351228, 2021, pp.79-96.
[1] Vulture. The 100 Sequences That Shaped Animation From Bugs Bunny to Spike Spiegel to Miles Morales, the history of an art form that continues to draw us in, https://www.vulture.com/article/most-influential-best-scenes-animation-history.html, 2020.
[2] Youtube. Preview Clip: Minnie the Moocher (1932, Cab Calloway and his Cotton Club Band), https://www.youtube.com/watch?v=jA9sANDUcNs, 2018.
[3] Youtube. Betty Boop: Snow White (1933) HD. https://www.youtube.com/watch?v=cKOSJ5AAwfc, 2015.
[4] Espacenet. US1242674 A - Method of producing moving-picture cartoons, https://worldwide.espacenet.com/patent/search/family/003310473/publicaton/US1242674A?q=pn%3DUS1242674, 1917.
[5] Youtube. Avatar: The Way of Water | Acting In The Volume | Featurette | In Cinemas Now, https://www.youtube.com/watch?v=uGerIQIjuqg, 2023.
[6] X.【映像】モンハンフェスタ大阪会場で公開された映像「ザ・メイキングブ ラージャン カットシーン」をTwitterで公開, https://twitter.com/MH_official_JP/status/1219494958819758080, 2020.
[7] Youtube。VTuber靠粉絲抖內比上班族還賺?揭露虛擬YouTuber勞動的血汗面|公視P# 新聞實驗室,https://www.youtube.com/watch?v=Kxgve6-gQFI , 2023年。
[8] Youtube. Identity protection with deepfakes: ‘Welcome to Chechnya’ director David France, https://www.youtube.com/watch?v=2du6dVL3Nuc, 2020.
[9] 中華民國法務部。強化打擊網路性暴力犯罪,完善犯罪被害人權益保障 ,立法院今(7)日三讀通過刑法增訂第28章之1「妨害性隱私及不實性影像罪章」及「犯罪被害人權益保障法」,建構性暴力防護網絡,全面保障弱勢群體!,https://www.moj.gov.tw/2204/2795/2796/161048/post,2023年。