研究生: |
王順達 Wang, Shun-Ta |
---|---|
論文名稱: |
基於頻率域和時序性特徵的假人臉影片偵測 Face Forgery Detection with Frequency and Recurrent Features |
指導教授: |
葉梅珍
Yeh, Mei-Chen |
口試委員: | 陳祝嵩 彭彥璁 葉梅珍 |
口試日期: | 2021/07/30 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 35 |
中文關鍵詞: | 深度學習 、合成影像偽造 、偽造偵測 、離散餘弦轉換 、人臉偵測 |
英文關鍵詞: | Deep learning, Face Detection, Image Synthesis, Deepfake Forensics, Discrete Cosine Transform |
DOI URL: | http://doi.org/10.6345/NTNU202101099 |
論文種類: | 學術論文 |
相關次數: | 點閱:131 下載:19 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習生成技術日新月異發展, 越來越多深度學習生成的假臉充斥
在網路世界。多項研究證實人眼對於深度學習生成假臉的真偽越來越沒有判斷
能力,將來勢必衍生更多擬真度極高的假影片讓大眾堅信不移,製造多重假資
訊和社會恐慌。然而深度學習模型卻有辦法偵測某些細微特徵,不論是從語意
上、屬性上、和頻譜上,甚至是幀和幀之間的不一致性都逃不過模型精準的法
眼,因此利用深度學習模型偵測假臉勢在必行。
近年來,深度學習偵測假臉研究日益受到關注,其中不乏利用離散餘弦轉
換、傅立葉轉換等方式將特徵圖轉換至頻率域,並在頻譜中學習特徵,以及運
用注意機制讓模型學習、強調局部特定區域,和利用循環神經網路學習幀和幀
之間的不一致性。但過往研究往往忽略模型追求的目標是具備高度泛化能力,
畢竟將來人類面臨到的造假影片不會是模型訓練時所見過的,也必然隨著深度
生成技術演進產生更難辨別的影片,此時模型能否精準偵測便是考驗演算法泛
化能力的時候。
因此本研究結合卷積神經網路抽取空間域特徵,離散餘弦轉換後的頻譜抽
取頻率域特徵,以及利用注意機制學習、強調竄改區域,和運用 GRU 架構抽
取前面學習到的特徵再加以學習時序性特徵,辨別真偽。此外還設計兩種損失
函數實驗,Focal Loss 和 Cross-Entropy Loss 追求最好的模型泛化能力。實驗證
實,我們的模型架構能在沒有預訓練的情況下,在 Celeb-DF 資料集達到當今最
佳的泛化結果,並在其他資料集也展現顯著的泛化能力。
With the rapid development of deep generative models, more and more fake
faces generated by deep learning models, so-called DeepFakes, are widely spread on the Internet. A number of studies show that the human eye is becoming less and less capable of judging the authenticity of DeepFakes, which must be harder in the future. Furthermore, DeepFakes are also creating much fake information and social panic. However, deep learning models are able to detect subtle features. Whether they are from semantics, attributes, spectrum, or even frame-to-frame inconsistencies, they have nowhere to hide by the detection from deep learning models. This is why we investigate DeepFakes detection by deep learning.
In recent years, DeepFakes detection has received increasing attention. Some of
the researchers use discrete cosine transform, Fourier transform and other methods to convert feature maps into frequency domain so as to learn features in the frequency spectrum. Others utilize attention mechanisms to allow models to emphasize local areas. Still others use recurrent neural network to learn the inconsistency between two frames. However, researchers often overlook a fact that the goal of designing a DeepFakes detection model is to have a high level of generalizability. After all, the fake clips that human encounters in the future will not be seen during model training, and DeepFakes will definitely become more complicated as the deep generative technology evolves. At present, how effectively can the model detect DeepFakes depends on the generalizability of the algorithm.
Therefore, we design a novel architecture which uses convolutional neural
network to extract spatial domain features, discrete cosine transform to extract
frequency domain features, attention mechanism to emphasize the tampering area, GRU module to learn sequential features and then distinguish the authenticity. In addition, two loss functions are evaluated—Focal Loss and Cross-Entropy Loss in order to pursue the best model generalizability.
Experiments have proved that our model can achieve the best generalization
results in the Celeb-DF dataset without pre-trained, and it also exhibits significant
generalizability in other datasets.
[1] L. Jiang, W. Wu, R. Li, C. Qian, and C. C. Loy, “Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection,” in CVPR, 2020.
[2] H. H. Nguyen, F. Fang, J. Yamagishi, and I. Echizen, “Multi-task learning for detecting and segmenting manipulated facial images and videos,” in BTAS, 2019.
[3] H. Dang, F. Liu, J. Stehouwer, X. Liu, and A. K. Jain. “On the detection of digital face manipulation,” in CVPR, 2020.
[4] I. Masi, A. Killekar, R. M. Mascarenhas, S. P. Gurudatt, and W. AbdAlmageed, “Two-branch re- current network for isolating deepfakes in videos,” in ECCV, 2020.
[5] Y. Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, “Thinking in frequency: Face forgery detection by mining frequency-aware clues,” in ECCV, 2020.
[6] E. Sabir, J. Cheng, A.Jaiswal, W.AbdAlmageed, I. Masi and P. Natarajan, “Recurrent Convolutional Strategies for Face Manipulation Detection in Videos,“ in CVPRW, 2019.
[7] D. M. Montserrat, H. Hao, S. K. Yarlagadda, S. Baireddy, R. Shao, J. Horvath, E. Bartusiak, J. Yang, D. Guera, F. Zhu, and E. J. Delp, “Deepfakes detection with automatic face weighting,” in CVPRW, 2020.
[8] L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, “Face x-ray for more general face forgery detection,” in CVPR, 2020.
[9] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Niessner, “FaceForensics++: Learning to detect manipulated facial images,” in ICCV, 2019.
[10] DeepFakes. www.github.com/deepfakes/faceswap. Accessed: 2021-02-23.
[11]J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner.“Face2face: Real-time face capture and reenactment of rgb videos,” in CVPR, 2016.
[12] FaceSwap. https://github.com/MarekKowalski/FaceSwap. Accessed: 2021-02-23.
[13]J. Thies, M. Zollhofer, and M. Nießner. “Deferred neural rendering: Image synthesis using neural textures,” arXiv preprint arXiv:1904.12356, 2019.
[14] A. Haliassos, K. Vougioukas, S. Petridis, and M. PanticIn. “Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection” in CVPR, 2021.
[15] J. S. Chung and A. Zisserman. “Lip reading in the wild,” in ACCV, 2016.
[16] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition,” in CVPR, 2016.
[17] C. Chen, Z. Xiong, X. Liu, and F. Wu. “Camera Trace Erasing,” in CVPR, 2020.
[18] Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-DF: A large-scale challenging dataset for deepfake forensics,” in CVPR, 2020.
[19]L. Li, J. Bao, H. Yang, D. Chen, and F. Wen, “Faceshifter: Towards high fidelity and occlusion aware face swapping,” in CVPR, 2020.
[20] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in CVPR, 2019
[21]P. Korshunov, and S. Marcel, “Deepfakes: a New Threat to Face Recognition? Assessment and Detection,” arXiv preprint arXiv:1812.08685, 2018.
[22] H. Liu, X. Li, W. Zhou, Y. Chen, Y. He, H. Xue, W. Zhang, and N. Yu, “Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain,” in CVPR, 2021.
[23] T. T. Nguyen, Q. V. H. Nguyen, C. M. Nguyen, D. Nguyen, and D. T. Nguyen, “Deep learning for deepfakes creation and detection,” arXiv preprint arXiv:1909.11573, 2019.
[24] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in ECCV, 2016.
[25] E. Ilg, N. Mayer, T. Saikia,M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in CVPR, 2017.
[26] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint, arXiv:1409.1556, 2014.
[27]J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in CVPR, 2019.
[28] H. Zhao, W. Zhou, D.g Chen, T. Wei, W. Zhang, and N. Yu, “Multi-attentional Deepfake Detection,” in CVPR, 2021.
[29] J. Li, H. Xie, J. Li, Z. Wang, and Y. Zhang, “Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection,” in CVPR, 2021.
[30] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” arXiv preprint arXiv:1610.02357, 2016.
[31] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” arXiv preprint arXiv:1708.02002, 2017.
[32] J. Guo, J. Deng, N. Xue, and S. Zafeiriou, “Stacked dense U-Nets with dual transformers for robust face alignment,” in BMVC, 2018.
[33] J. Fridrich, and J. Kodovsky, “Rich models for steganalysis of digital images,” in IEEE Transactions on Information Forensics and Security, 2012.
[34] D. Cozzolino, G. Poggi, and L. Verdoliva, “Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection,” in Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, 2017.
[35] N. Rahmouni, V. Nozick, J. Yamagishi, and I. Echizen, "Distinguishing computer graphics from natural images using convolution neural networks,” in WIFS, 2017.
[36] B. Bayar, and M. C Stamm, “A deep learning approach to universal image manipulation detection using a new convolutional layer,” in Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, 2016.
[37] D.Afchar, V. Nozick, J. Yamagishi, and I. Echizen, “Mesonet: a compact facial video forgery detection network,” in WIFS, 2018.
[38] L. Trinh, M. Tsang, S. Rambhatla, and Y. Liu, “Interpretable and Trustworthy Deepfake Detection via Dynamic Prototypes,” in WACV, 2021.
[39] Y. Li, and S. Lyu, “Exposing deepfake videos by detecting face warping artifacts,” in CVPRW, 2019.
[40] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in CVPR, 2019.
[41] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in ICCV, 2015.