簡易檢索 / 詳目顯示

研究生: Rohit Das
Rohit Das
論文名稱: 3D-GANTex:基於StyleGAN3的多視角圖像和3DDFA的網格生成方式重建3D人臉
3D-GANTex: 3D Face Reconstruction with StyleGAN3-based Multi-View Images and 3DDFA based Mesh Generation
指導教授: 王科植
Wang, Ko-Chih
林宗翰
Lin, Tzung-Han
口試委員: 孫沛立
Sun, Pei-Li
葉 梅珍
Yeh, Mei-Chen
王科植
Wang, Ko-Chih
林宗翰
Lin, Tzung-Han
口試日期: 2023/06/27
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 89
英文關鍵詞: 3D Face Reconstruction, Generative Adversarial Network(GAN), Latent Space, Texture Map, Multi-View Generation, StyleGAN3
研究方法: 實驗設計法參與觀察法
DOI URL: http://doi.org/10.6345/NTNU202300718
論文種類: 學術論文
相關次數: 點閱:119下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Texture estimation from a single image is a challenging task due to the lack of texture information available and limited training data. This thesis proposes a novel approach for texture estimation from a single in the wild image using a Generative Adversarial Network (GAN) and 3D Dense Face Alignment (3DDFA). The method begins by generating multi-view faces using the latent space of GAN. Then 3DDFA generates a 3D face mesh as well as a high-resolution texture map that is consistent with the estimated face shape. The generated texture map is later refined using an iterative process that incorporates information from both the input image and the estimated 3D face shape.

    Studies have been conducted to investigate the contributions of different components of the mentioned method, and show that:
    1. Use of the GAN latent space can be a critical benchmark for achieving high-quality results.
    2. Editing the latent space can generate high quality multi-view images.
    3. Generating 3D mesh and texture map estimation from a single image is possible with a very high accuracy.

    To evaluate the effectiveness of this approach, experiments were conducted on in-the-wild images and the results were compared with state of-the-art 3D Scanner. To verify that, subjective valuation has been performed on 16 participants. The results prove that the mentioned method outperforms existing method in terms of performance, demonstrating the effectiveness of this approach.

    Results generated from the aforementioned method are very accurate and has the potential to serve as an important contribution in avatar creation as well as 3D Face Reconstruction.

    In summary, the proposed method for texture estimation from a single image using GAN latent space and 3DDFA represents a significant advancement in the field of computer vision and has potential applications in a wide range of fields, including virtual try-on, facial recognition, beauty industry as well as metaverse.

    1. Introduction 1 1.1 Objective 1 1.2 Face Frontalization 2 1.3 Texture Generation 3 1.4 3D Model Generation 4 1.5 Challenges for 3D Model Generation 5 1.6 Proposed Framework 6 2. Literature Review 7 2.1 Generative Adversarial Networks (GAN) 7 2.2 Latent Space 8 2.3 Multi-View Face Generation 11 2.4 Face Rotation 12 2.5 Encoder 13 2.6 Texture 16 2.6.1 Research based on 3D Morphable Model (3DMM) 17 2.6.2 3D Dense Face Alignment 3DDFA 18 2.7 Dataset 19 3. Methodology 21 3.1 Latent Space Embedding 21 3.2 Encoder 22 3.2.1 Restyle Encoder 22 3.3 Encoder4Editing(e4e) Encoder 23 3.3.1 ReStyle-e4e Encoder 25 3.4 InterFaceGAN 26 3.5 Loss Functions 28 3.5.1 Pixel-Wise Loss 29 3.5.2 LPIPS Loss 30 3.5.3 Identity Based Reconstruction 31 3.6 Generate 3D Face Model and Texture Maps using 3DDFA 32 3.6.1 Normalized Coordinate Code (NCC) 32 3.6.2 Projected Normalized Coordinate Code (PNCC) 32 3.6.3 Pose Adaptive Convolutions 34 4. Experiments and Evaluation 36 4.1 Hardware and Environment 37 4.2 Dataset- Flickr-Faces High Quality Dataset (FFHQ) 37 4.3 Embedding the Image to Latent Space 38 4.4 Multi-View Synthesis using InterFaceGAN 39 4.5 Generate Texture Map and 3D Model using 3DDFA 40 4.6 Evaluation Metric for StyleGAN3 Generated Images 41 4.6.1 Structural Similarity Index (SSIM) 41 4.6.2 Feature Similarity Index (FSIM) 42 4.6.3 Perceptual Loss 42 4.6.4 Multiscale Structural Similarity Index (MS-SSIM) 43 4.6.5 Evaluation with No Background 44 4.7 Evaluation Metric for Generated UV Map 44 4.7.1 Pixel Density 44 4.7.2 Texture Density 44 4.8 Evaluation 3D Face Mesh and Texture using Hardware 50 4.9 Subjective Evaluation 57 5. Results and Discussions 71 5.1 Generation of Multi-View Images 71 5.2 Generating UV Map 74 5.3 Creating 3D Model using 3DDFA 76 6. Conclusion 80 References 81 Appendix 88 Appendix 1 88 Appendix 2 89

    [1] M. Mirza et al., "Generative Adversarial Nets," Advances in neural Information Processing Systems, vol. 27, p. 2672—2680, 2014, https://doi.org/10.48550/arXiv.1406.2661.
    [2] B. Gecer, J. Deng, and S. Zafeiriou, "OSTeC: One-Shot Texture Completion," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021-06-01 2021: IEEE, https://doi.org/10.1109/cvpr46437.2021.00754.
    [3] H. Zhou et al., "Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p. 5911—5920, 2020, https://doi.org/10.48550/ARXIV.2003.08124.
    [4] H. Bai et al., "FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction," arXiv preprint arXiv:2211.13874, 2022, https://doi.org/10.48550/ARXIV.2211.13874.
    [5] Y. Alaluf et al., "Third Time’s the Charm? Image and Video Editing with StyleGAN3," in Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, 2023: Springer, p. 204—220, https://doi.org/10.1007/978-3-031-25063-7_13.
    [6] X. Zhu, X. Liu, Z. Lei, and S. Z. Li, "Face Alignment in Full Pose Range: A 3D Total Solution," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, p. 78—92, 2019-01-01 2019, https://doi.org/10.1109/tpami.2017.2778152.
    [7] X. Yin et al., "Towards Large-Pose Face Frontalization in the Wild," in 2017 IEEE International Conference on Computer Vision (ICCV), 2017-10-01 2017: IEEE, p. 3990—3999, https://doi.org/10.1109/iccv.2017.430.
    [8] T. Karras et al., "Alias-Free Generative Adversarial Networks," Advances in Neural Information Processing Systems, vol. 34, p. 852—863, 2021, https://doi.org/10.48550/arXiv.2106.12423.
    [9] T. Karras et al., "Analyzing and Improving the Image Quality of StyleGAN," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, p. 8110—8119, https://doi.org/10.48550/arXiv.1912.04958.
    [10] T. Karras, S. Laine, and T. Aila, "A Style-Based Generator Architecture for Generative Adversarial Networks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, p. 4401—4410, https://doi.org/10.48550/arXiv.1812.04948.
    [11] D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," arXiv preprint arXiv:1312.6114, 2013, https://doi.org/10.48550/arXiv.1312.6114.
    [12] T. Karras et al., "Training Generative Adversarial Networks with Limited Data," Advances in Neural Information Processing Systems, vol. 33, p. 12104—12114, 2020.
    [13] C. Eastwood and C. K. Williams, "A Framework for the Quantitative Evaluation of Disentangled Representations," in International Conference on Learning Representations, 2018.
    [14] Z. Wu, D. Lischinski, and E. Shechtman, "Stylespace Analysis: Disentangled Controls for StyleGAN Image Generation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, p. 12863—12872, https://doi.org/10.48550/arXiv.2011.12799.
    [15] I. Kemelmacher-Shlizerman and R. Basri, "3D Face Reconstruction from a Single Image Using a Single Reference Face Shape," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 2, p. 394—405, 2011-02-01 2011, https://doi.org/10.1109/tpami.2010.63.
    [16] J. J. Atick, P. A. Griffin, and A. N. Redlich, "Statistical Approach to Shape from Shading: Reconstruction of Three-Dimensional Face Surfaces from Single Two-Dimensional Images," Neural Computation, vol. 8, p. 1321—1340, 1996, https://doi.org/10.1162/neco.1996.8.6.1321.
    [17] V. Blanz and T. Vetter, "A Morphable Model for the Synthesis of 3D Faces," in Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 1999, p. 187—194, https://doi.org/10.1145/311535.311556.
    [18] Y. Tian et al., "CR-GAN: Learning Complete Representations for Multi-View Generation," arXiv preprint arXiv:1806.11191, 2018, https://doi.org/10.48550/arXiv.1806.11191.
    [19] L. Tran, X. Yin, and X. Liu, "Disentangled Representation Learning GAN for Pose-Invariant Face Recognition," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017-07-01 2017: IEEE, p. 1415—1424, https://doi.org/10.1109/cvpr.2017.141.
    [20] X. Zhu et al., "High-Fidelity Pose and Expression Normalization for Face Recognition in the Wild," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015-06-01 2015: IEEE, https://doi.org/10.1109/cvpr.2015.7298679.
    [21] Y. Hu et al., "Pose-Guided Photorealistic Face Rotation," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018-06-01 2018: IEEE, p. 8398—8406, https://doi.org/10.1109/cvpr.2018.00876.
    [22] L. Tran, X. Yin, and X. Liu, "Representation Learning by Rotating Your Faces," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 12, p. 3007—3021, 2019-12-01 2019, https://doi.org/10.1109/tpami.2018.2868350.
    [23] J. Yim et al., "Rotating your Face using Multi-Task Deep Neural Network," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015-06-01 2015: IEEE, p. 676—684, https://doi.org/10.1109/cvpr.2015.7298667.
    [24] Y. Qian, W. Deng, and J. Hu, "Unsupervised Face Normalization With Extreme Pose and Expression in the Wild," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019-06-01 2019: IEEE, p. 9851—9858, https://doi.org/10.1109/cvpr.2019.01008.
    [25] R. Gross et al., "Multi-PIE," Image and Vision Computing, vol. 28, no. 5, p. 807—813, 2010-05-01 2010, https://doi.org/10.1016/j.imavis.2009.08.002.
    [26] O. Tov et al., "Designing an Encoder for StyleGAN Image Manipulation," ACM Transactions on Graphics (TOG), vol. 40, no. 4, p. 1—14, 2021, https://doi.org/10.1145/3450626.3459838.
    [27] E. Richardson et al., "Encoding in Style: A Stylegan Encoder for Image-to-Image Translation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, p. 2287—2296, https://doi.org/10.48550/arXiv.2008.00951.
    [28] H. Luo et al., "Normalized Avatar Synthesis using Stylegan and Perceptual Refinement," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, p. 11662—11672, https://doi.org/10.1109/CVPR46437.2021.01149.
    [29] B. Egger et al., "3D Morphable Face Models—Past, Present, and Future," ACM Transactions on Graphics, vol. 39, no. 5, p. 1—38, 2020-10-31 2020, https://doi.org/10.1145/3395208.
    [30] S. J. Garbin, M. Kowalski, M. Johnson, and J. Shotton, "High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images," presented at the Computer Vision – ECCV 2020, 2020-01-01, 2020. https://doi.org/10.1007/978-3-030-58604-1_14.
    [31] B. Gecer et al., "Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks," presented at the Computer Vision – ECCV 2020, 2020-01-01, 2020. https://doi.org/10.1007/978-3-030-58526-6_25.
    [32] B. Gecer, S. Ploumpis, I. Kotsia, and S. Zafeiriou, "GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019-06-01 2019: IEEE, p. 1155—1164, https://doi.org/10.1109/cvpr.2019.00125.
    [33] J. Guo et al., "Towards Fast, Accurate and Stable 3D Dense Face Alignment," in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX, 2020: Springer, p. 152—168, https://doi.org/10.1007/978-3-030-58529-7_10.
    [34] P. Dollár, P. Welinder, and P. Perona, "Cascaded Pose Regression," in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010: IEEE, p. 1078—1085, https://doi.org/10.1109/CVPR.2010.5540094.
    [35] X. Cao, Y. Wei, F. Wen, and J. Sun, "Face Alignment by Explicit Shape Regression," International Journal of Computer Vision, vol. 107, p. 177—190, 2014, https://doi.org/10.1109/CVPR.2012.6248015.
    [36] X. Xiong and F. De la Torre, "Supervised Descent Method and its Applications to Face Alignment," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, p. 532—539, https://doi.org/10.1109/CVPR.2013.75.
    [37] J. Booth et al., "A 3D Morphable Model Learnt from 10,000 Faces," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016-06-01 2016: IEEE, p. 5543—5552, https://doi.org/10.1109/cvpr.2016.598.
    [38] A. Lattas et al., "AvatarMe: Realistically Renderable 3D Facial Reconstruction" In-The-Wild"," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, p. 760—769, https://doi.org/10.48550/arXiv.2003.13845.
    [39] A. Lattas et al., "AvatarMe++: Facial Shape and BRDF Inference With Photorealistic Rendering-Aware GANs," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, p. 9269—9284, 2022-12-01 2022, https://doi.org/10.1109/tpami.2021.3125598.
    [40] L. Bao et al., "High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies," ACM Transactions on Graphics, vol. 41, no. 1, p. 1—21, 2022-02-28 2022, https://doi.org/10.1145/3472954.
    [41] H. Yang et al., "Facescape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, p. 601—610, https://doi.org/10.48550/arXiv.2003.13989.
    [42] J. Deng et al., "UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p. 7093—7102, 2018, https://doi.org/10.48550/ARXIV.1712.04695.
    [43] J.-Y. Zhu, P. Krähenbühl, E. Shechtman, and A. A. Efros, "Generative Visual Manipulation on the Natural Image Manifold," in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, 2016: Springer, p. 597—613, https://doi.org/10.48550/arXiv.1609.03552.
    [44] A. Creswell and A. A. Bharath, "Inverting the Generator of a Generative Adversarial Network," IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 7, p. 1967—1974, 2018, https://doi.org/10.1109/TNNLS.2018.2875194.
    [45] R. Zhang et al., "The Unreasonable Effectiveness of Deep Features as a Perceptual Metric," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, p. 586—595, https://doi.org/10.1109/CVPR.2018.00068.
    [46] Y. Alaluf, O. Patashnik, and D. Cohen-Or, "Restyle: A Residual-Based Stylegan Encoder via Iterative Refinement," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, p. 6711—6720, https://doi.org/10.48550/arXiv.2104.02699.
    [47] B. Xu, N. Wang, T. Chen, and M. Li, "Empirical Evaluation of Rectified Activations in Convolutional Network," arXiv preprint arXiv:1505.00853, 2015, https://doi.org/10.48550/arXiv.1505.00853.
    [48] Y. Shen, J. Gu, X. Tang, and B. Zhou, "Interpreting the Latent Space of GANs for Semantic Face Editing," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, p. 9243—9252, https://doi.org/10.48550/arXiv.1907.10786.
    [49] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014, https://doi.org/10.48550/arXiv.1409.1556.
    [50] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, "Arcface: Additive Angular Margin Loss for Deep Face Recognition," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, p. 4690—4699, https://doi.org/10.1109/TPAMI.2021.3087709.
    [51] Y. Ke and R. Sukthankar, "PCA-SIFT: A more Distinctive Representation for Local Image Descriptors," in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., 2004, vol. 2: IEEE, p. II—II, https://doi.org/10.1109/CVPR.2004.1315206.
    [52] A. Hadid, "The Local Binary Pattern Approach and its Applications to Face Analysis," in 2008 First Workshops on Image Processing Theory, Tools and Applications, 2008: IEEE, p. 1—9, https://doi.org/10.1109/IPTA.2008.4743795.
    [53] L. Spreeuwers, "Fast and Accurate 3D Face Recognition: Using Registration to an Intrinsic Coordinate System and Fusion of Multiple Region Classifiers," International Journal of Computer Vision, vol. 93, no. 3, p. 389—414, 2011, https://doi.org/10.1007/s11263-011-0426-2.
    [54] Y. Shen, C. Yang, X. Tang, and B. Zhou, "Interfacegan: Interpreting the Disentangled Face Representation Learned by GANs," IEEE transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, p. 2004—2018, 2020, https://doi.org/10.48550/arXiv.2005.09635.
    [55] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image Quality Assessment: From Error Visibility to Structural Similarity," IEEE transactions on Image Processing, vol. 13, no. 4, p. 600—612, 2004, https://doi.org/10.1109/TIP.2003.819861.
    [56] L. Zhang, L. Zhang, X. Mou, and D. Zhang, "FSIM: A Feature Similarity Index for Image Quality Assessment," IEEE transactions on Image Processing, vol. 20, no. 8, p. 2378—2386, 2011, https://doi.org/10.1109/TIP.2011.2109730.
    [57] J. Johnson, A. Alahi, and L. Fei-Fei, "Perceptual Losses for Real-Time Style Transfer and Super-Resolution," in Computer Vision–ECCV 2016, 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, 2016: Springer International Publishing, p. 694—711, https://doi.org/10.1007/978-3-319-46475-6_43.
    [58] Z. Wang, E. P. Simoncelli, and A. C. Bovik, "Multiscale Structural Similarity for Image Quality Assessment," in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, 2003, vol. 2: IEEE, p. 1398—1402, https://doi.org/10.1109/ACSSC.2003.1292216.

    下載圖示
    QR CODE