簡易檢索 / 詳目顯示

研究生: 徐秉琛
HSU, Pin-Chen
論文名稱: 以生成對抗網路為基礎之低光源圖片增強系統使用閃光-非閃光圖片
Low-light Image Enhancement with Flash and No-Flash Image Pairs Using Generative Adversarial Network
指導教授: 方瓊瑤
Fang, Chiung-Yao
口試委員: 方瓊瑤
Fang, Chiung-Yao
陳世旺
Chen, Sei-Wang
黃仲誼
Huang, Chung-I
羅安鈞
Luo, An-Chun
許之凡
Hsu, Chih-Fan
口試日期: 2022/06/30
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 42
中文關鍵詞: 生成對抗網路低光源影像增強深度學習注意力機制閃光燈影像生成
英文關鍵詞: Generative Adversarial Network, Low-Light Image Enhancement, Flash Image Enhancement, Deep Learning, Attention Mechanism, Flash, Image Generation
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202201239
論文種類: 學術論文
相關次數: 點閱:115下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本篇研究提出一以生成對抗網路為基礎之低光源影像增強系統。本系統藉由結合沒有使用閃光燈與有使用閃光燈的兩張影像,來生成一張同時具有真實光影分布與色彩細節豐富的影像。此系統主要目的在改善使用者在低光源環境下拍照的體驗。
    使用數位相機在低光源環境進行拍攝時,通常會調高感光元件的感光度(ISO值)來維持正常的亮度或延長相機快門時間,但這會產生明顯的噪點雜訊或造成影像的模糊。另一方面,攝影師會使用閃光燈來提供額外的照明,雖然使用閃光燈可以得到色彩真實的影像,但是有可能會破壞環境中的光影分布。例如產生額外的反光、陰影或是使被攝物看起來變得平坦。因此,本研究希望結合低光源圖像以及閃光燈影像兩者的特點,透過生成對抗網路來生成出較為真實的影像。
    本系統採用低光源圖像以及閃光燈影像兩種影像輸入修改後的生成對抗網路。此網路以Pix2PixHD 為基底並且做出幾項改良,其中包含調整模型架構,修改損失函數為相對平均最小平方(Relativistic average least square)並且在生成器中加入輕量級的注意力機制模組(Convolutional block attention module, CBAM)。
    為此,本研究同時建立一個低光源影像資料庫(CVIU Short exposure Flash Long exposure(SFL) dataset)。此資料庫共計210個影像組,其中每組皆包含三張影像:使用短時間曝光拍出的低光源圖像、使用閃光燈拍出的閃光燈影像和使用長時間曝光拍出的基準真相(ground truth) 。此資料庫的影像使用來訓練與評估本系統。實驗結果顯示,本系統在SFL資料庫測試集中實現了22.5267的峰值訊噪比(Peak signal-to-noise ratio, PSNR)和0.6662的結構相似性指數(Structural similarity index, SSIM)。

    This study proposes a low-light image enhancement system based on a Generative Adversarial Network (GAN) that inputs two kinds of images, those with and without flash, to generate an enhanced image with real light and shadow distribution and rich color details. The main aim of this system is to improve the user's experience of taking pictures in a low-light environment.
    When using a digital camera to shoot in a low-light environment, it is common to increase the sensitivity (ISO value) of the photosensitive element or to prolong the camera shutter time to maintain normal brightness. However, these actions produce a noticeably noisy or blurred image. On the other hand, photographers may use a flash to provide additional lighting. This can achieve images with true colors, but it destroys the distribution of light and shadow in the environment, such as by creating extra reflections or shadows, or flattening the subject. Therefore, this study hopes to combine characteristics of images with and without flash to generate a more realistic image through a GAN.
    This study adopts a modified GAN with two kinds of image input: a low-light image and its corresponding flash image. The proposed network is based on Pix2PixHD with several enhancements, including adjusting the model architecture, modifying the loss function to relativistic average least square, and adding a lightweight attention mechanism module called a convolutional block attention module (CBAM) to the generator.
    To this end, this study established a low-light image (CVIU short exposure flash long exposure (SFL)) dataset. The SFL dataset consists of 210 image triples, each with three kinds of image: a low-light image captured using short exposures, a corresponding flash image captured using flash, and a corresponding ground truth image captured using long exposures. This dataset was used to train and evaluate the proposed system. Experimental results show that the system achieves a peak signal-to-noise ratio (PSNR) of 22.5267 and a structural similarity index measure (SSIM) of 0.6662 in the SFL database test set.

    Chapter 1 Introduction 1 1.1 Research Motivation 1 1.2 Background and Difficulty 5 Chapter 2 Related Work 8 2.1 Exposure Parameters 8 2.1.1 Iso Value 9 2.1.2 Aperture 9 2.1.3 Shutter Speed 10 2.1.4 Other Light 10 2.2 Low-Light and Flash Image Enhancement 11 2.3 Image Fusion 13 2.4 Generative Adversarial Networks (GANs) 14 2.4.1 Overview and Training 15 2.4.2 Loss Function 17 2.4.3 Architectures 18 Chapter 3 Low-Light Image Enhancement System 20 3.1 Pix2pixhd Baseline 20 3.2 Improved Architecture 21 3.2.1 Generator 22 3.2.2 Discriminator 23 3.2.3 Loss Function 25 Chapter 4 Experiments 26 4.1 Research Environment and Equipment Setup 26 4.2 SFL Dataset 26 4.3 Evaluation 28 4.4 Experimental Results 30 Chapter 5 Conclusions and Future Work 35 5.1 Conclusions 35 5.2 Future Work 36 References 37

    [1] "The Exposure Triangle," Available: https://actioncamera.blog/2017/02/22/the-exposure-triangle/. (2021, Jan 7)
    [2] "Aperture Diagram," Available: https://indiahikes.com/wp-content/uploads/2018/07/Aperture-diagram-Indiahikes.jpg. (2021, Feb 24)
    [3] "What Is the Perfect Iso Value for Decent Lighting?," Available: https://qph.fs.quoracdn.net/main-qimg-734ee51969a2de137a6abf618f3febf2.webp. (2021, Feb 24)
    [4] "Dslr Photography 101: What Is Shutter Speed?," Available: https://d14pr3cu5atb0x.cloudfront.net/cms/uploads/2015/10/shutter_speed_illustration-582x214.png. (2021, Feb 24)
    [5] A. Schiffhauer (2018, Nov 14). "See the Light with Night Sight," Available: https://www.blog.google/products/pixel/see-light-night-sight/. (2020, Oct 28)
    [6] A. Radford, L. Metz, and S. Chintala, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks," Proceedings of The International Conference on Learning Representations(ICLR), Puerto Rico, 2016.
    [7] E. Eisemann and F. Durand, "Flash Photography Enhancement Via Intrinsic Relighting," ACM Transactions on Graphics, vol. 23, no. 3, pp. 673-678, 2004.
    [8] S. O'Dea (2020, Jan). "Number of Smartphones Sold to End Users Worldwide from 2007 to 2021," Available: https://www.statista.com/statistics/263437/global-smartphone-sales-to-end-users-since-2007/. (2020, Oct 15)
    [9] A. Khatri (2019, Sep 13). "Iphone 11's Cameras Tested in Real Life, Proving That Apple Has Delivered What It Promised," Available: https://mobygeek.com/mobile/iphone-11s-cameras-tested-in-real-life-proving-that-apple-has-delivered-what-it-promised-8599. (2020, Oct 28)
    [10] E. Qi (2020, Jul 3). "Smartphone Cis Sensors to Top Five Billion Units in 2020 as Quad-Camera Smartphone Designs Ramp Despite Covid-19," Available: https://www.counterpointresearch.com/smartphone-cis-sensors-top-five-billion-mark-in-2020/. (2020, Oct 15)
    [11] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative Adversarial Nets," Proceedings of Neural Information Processing Systems(NIPS), Canada, pp. 2672-2680, 2014.
    [12] M. Brown (2019, Dec 14). "Main Factors to Consider When Buying a Smartphone," Available: https://theappliancesreviews.com/main-factors-to-consider-when-buying-a-smartphone/. (2020, Oct 20)
    [13] Wikipedia. "High-Dynamic-Range Imaging," Available: https://en.wikipedia.org/wiki/High-dynamic-range_imaging#Mid_19th_century. (2020, Oct 27)
    [14] Apple (2020, Oct 13). "Iphone 12 Pro," Available: https://www.apple.com/iphone-12-pro/. (2020, Oct 21)
    [15] Google (2020, Oct 1). "Google Pixel 5," Available: https://store.google.com/product/pixel_5. (2020, Oct 21)
    [16] Samsung (2020, Feb 12). "Galaxy S20 5g," Available: https://www.samsung.com/us/mobile/galaxy-s20-5g/. (2020, Oct 21)
    [17] MarketsandMarkets. "Computational Photography Market by Offering (Camera Modules, Software), Type (Single- and Dual-Lens, 16-Lens), Product (Smartphone Cameras, Standalone Cameras, Machine Vision Cameras), Application (3d Imaging, Ar, Vr, Mr), Region - Global Forecast to 2024," Available: https://www.marketsandmarkets.com/Market-Reports/computational-photography-market-232323308.html. (2020, Oct 28)
    [18] M. Levoy and Y. Pritch (2018, Nov 14). "Night Sight: Seeing in the Dark on Pixel Phones," Available: https://ai.googleblog.com/2018/11/night-sight-seeing-in-dark-on-pixel.html. (2020, Oct 28)
    [19] O. Liba, K. Murthy, Y.-T. Tsai, T. Brooks, T. Xue, N. Karnad, Q. He, J. T. Barron, D. Sharlet, R. Geiss, S. W. Hasinoff, Y. Pritch, and M. Levoy, "Handheld Mobile Photography in Very Low Light," ACM Transactions on Graphics, vol. 38, no. 6, pp. 1-16, 2019, Art. no. 164.
    [20] C. Chen, Q. Chen, J. Xu, and V. Koltun, "Learning to See in the Dark," Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, USA, 2018.
    [21] Y. Jiang, X. Gong, D. Liu, Y. Cheng, C. Fang, X. Shen, J. Yang, P. Zhou, and Z. Wang, "Enlightengan: Deep Light Enhancement without Paired Supervision," IEEE Transactions on Image Processing, vol. 30, pp. 2340-2349, 2021.
    [22] J. Ch́avez, R. Mora, and E. Cayllahua-Cahuina, "Ambient Lighting Generation for Flash Images with Guided Conditional Adversarial Networks," Proceedings of 15th International Conference on Computer Vision Theory and Applications(VISAPP), Malta, 2020.
    [23] H. D. Cheng and X. J. Shi, "A Simple and Effective Histogram Equalization Approach to Image Enhancement," Digital Signal Processing, vol. 14, no. 2, pp. 158-170, 2004.
    [24] M. Abdullah-Al-Wadud, M. H. Kabir, M. A. A. Dewan, and O. Chae, "A Dynamic Histogram Equalization for Image Contrast Enhancement," IEEE Transactions on Consumer Electronics vol. 53, no. 2, pp. 593-600, 2007.
    [25] S. Rahman, M. M. Rahman, M. Abdullah-Al-Wadud, G. D. Al-Quaderi, and M. Shoyaib, "An Adaptive Gamma Correction for Image Enhancement," EURASIP Journal on Image and Video Processing, no. 35, 2016.
    [26] X. Guan, S. Jian, P. Hongda, Z. Zhiguo, and G. Haibin, "An Image Enhancement Method Based on Gamma Correction," presented at the International Symposium on Computational Intelligence and Design, Changsha, China, 2009.
    [27] E. H. Land, "The Retinex Theory of Color Vision," Scientific American, vol. 237, pp. 108-128, 1977.
    [28] D. J. Jobson, Z. Rahman, and G. A. Woodell, "Properties and Performance of a Center/Surround Retinex," IEEE Transactions on Image Processing, vol. 6, no. 3, pp. 451-462, 1997.
    [29] D. J. Jobson, Z. Rahman, and G. A. Woodell, "A Multiscale Retinex for Bridging the Gap between Color Images and the Human Observation of Scenes," IEEE Transactions on Image Processing, vol. 6, no. 7, pp. 965-976, 1997.
    [30] D. J. Jobson, Z. Rahman, and G. A. Woodell, "A Multiscale Retinex for Bridging the Gap between Color Images and the Human Observation of Scenes," IEEE Transactions on Image Processing vol. 6, no. 7, pp. 965-976, 1997.
    [31] M. Fan, W. Wang, W. Yang, and J. Liu, "Integrating Semantic Segmentation and Retinex Model for Low Light Image Enhancement," Proceedings of The 28th ACM International Conference on Multimedia(ACM Multimedia), Seattle, pp. 2317-2325, 2020.
    [32] X. Guo, Y. Li, and H. Ling, "Lime: Low-Light Image Enhancement Via Illumination Map Estimation," IEEE Transactions on Image Processing vol. 26, no. 2, pp. 982-993, 2017.
    [33] K. G. Lore, A. Akintayo, and S. Sarkar, "Llnet: A Deep Autoencoder Approach to Natural Low-Light Image Enhancement," Pattern Recognition, vol. 61, pp. 650-662, 2017.
    [34] F. Lv, F. Lu, J. Wu, and C. Lim, "Mbllen: Low-Light Image/Video Enhancement Using Cnns," Proceedings of British Machine Vision Conference(BMVC), Newcastle, UK, 2018.
    [35] L. Tao, C. Zhu, G. Xiang, Y. Li, H. Jia, and X. Xie, "Llcnn: A Convolutional Neural Network for Low-Light Image Enhancement," Proceedings of IEEE visual communication and image processing(VCIP), St. Petersburg, FL, USA, pp. 1-4, 2017.
    [36] G. Kim, D. Kwon, and J. Kwon, "Low-Lightgan: Low-Light Enhancement Via Advanced Generative Adversarial Network with Task-Driven Training," Proceedings of IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, pp. 2811-2815, 2019.
    [37] C. Wei, W. Wang, W. Yang, and J. Liu, "Deep Retinex Decomposition for Low-Light Enhancement," presented at the British Machine Vision Conference, Northumbria, UK, 2018.
    [38] A. Zhu, L. Zhang, Y. Shen, Y. Ma, S. Zhao, and Y. Zhou, "Zero-Shot Restoration of Underexposed Images Via Robust Retinex Decomposition," Proceedings of 2020 IEEE International Conference on Multimedia and Expo(ICME), London, UK, pp. 1-6, 2020.
    [39] H. Jiang and Y. Zheng, "Learning to See Moving Objects in the Dark," Proceedings of IEEE international conference on computer vision(ICCV), pp. 7323-7332, 2019
    [40] N. Capece, F. Banterle, P. Cignoni, F. Ganovelli, R. Scopigno, and U. Erra, "Deepflash: Turning a Flash Selfie into a Studio Portrait," Signal Processing: Image Communication, vol. 77, pp. 28-39, 2019.
    [41] Y. Zheng, E. Blasch, and Z. Liu, "Multispectral Image Fusion and Colorization," 2018.
    [42] G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, and K. Toyama, "Digital Photography with Flash and No-Flash Image Pairs," ACM Transactions on Graphics, vol. 23, no. 3, pp. 664-672, 2004.
    [43] S. Zhuo, D. Guo, and T. Sim, "Robust Flash Deblurring," Proceedings of The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition(CVPR), San Francisco, USA, pp. 2440-2447, 2010.
    [44] Y. Chang, C. Jung, J. Sun, and F. Wang, "Siamese Dense Network for Reflection Removal with Flash and No-Flash Image Pairs," International Journal of Computer Vision, vol. 128, pp. 1673-1698, 2020.
    [45] Z. Xia, M. Gharbi, F. Perazzi, K. Sunkavalli, and A. Chakrabarti, "Deep Denoising of Flash and No-Flash Pairs for Photography in Low-Light Environments," arXiv preprint arXiv:2012.05116, 2020.
    [46] M. Arjovsky and L. Bottou, "Towards Principled Methods for Training Generative Adversarial Networks," Proceedings of International Conference on Learning Representations(ICLR), Toulon, France, 2017.
    [47] S. Nowozin, B. Cseke, and R. Tomioka, "F-Gan: Training Generative Neural Samplers Using Variational Divergence Minimization," Proceedings of Thirtieth Conference on Neural Information Processing Systems(NIPS), Barcelona, Spain, pp. 271-279, 2016.
    [48] M. Arjovsky, S. Chintala, and L. Bottou, "Wasserstein Generative Adversarial Networks," Proceedings of the 34th International Conference on Machine Learning(PMLR), Sydney, Australia, vol. 70, pp. 214-223, 2017.
    [49] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, "Improved Training of Wasserstein Gans," Proceedings of Thirty-first Conference on Neural Information Processing Systems(NIPS), Long Beach, US, pp. 5769-5779, 2017.
    [50] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-Image Translation with Conditional Adversarial Networks," Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, HI, USA, 2017.
    [51] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks," Proceedings of IEEE International Conference on Computer Vision(ICCV), Venice, Italy, 2017.
    [52] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas, "Stackgan: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks," Proceedings of IEEE International Conference on Computer Vision(ICCV), Venice, Italy, pp. 5907-5915, 2017.
    [53] T. Karras, T. Aila, S. Laine, and J. Lehtinen, "Progressive Growing of Gans for Improved Quality, Stability, and Variation," Proceedings of International Conference on Learning Representations(ICLR), Vancouver, Canada, 2018.
    [54] M. Mirza and S. Osindero, "Conditional Generative Adversarial Nets," 2014.
    [55] J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, "Toward Multimodal Image-to-Image Translation," Proceedings of International Conference on Neural Information Processing Systems(NIPS), Long Beach, California, USA, pp. 465-476, 2017
    [56] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, "Stargan: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake, USA, pp. 8789-8797, 2018.
    [57] O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," Proceedings of Medical Image Computing and Computer-Assisted Intervention(MICCAI), Munich, Germany, vol. 9351, pp. 234-241, 2015.
    [58] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, "High-Resolution Image Synthesis and Semantic Manipulation with Conditional Gans," Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA, vol. 1, pp. 8798-8807, 2018.
    [59] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(Las Vegas, 2016.
    [60] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon1, "Cbam: Convolutional Block Attention Module," Proceedings of European Conference on Computer Vision(ECCV), Munich, Germany, 2018.
    [61] D. Ulyanov, A. Vedaldi, and V. Lempitsky, "Instance Normalization: The Missing Ingredient for Fast Stylization," 2016.
    [62] A. Jolicoeur-Martineau, "The Relativistic Discriminator: A Key Element Missing from Standard Gan," Proceedings of the 7th International Conference on Learning Representations(ICLR), New Orleans, 2019.
    [63] S. He and R. W. H. Lau, "Saliency Detection with Flash and No-Flash Image Pairs," Proceedings of European Conference on Computer Vision(ECCV), Zurich, Switzerland, pp. 110-124, 2014.
    [64] Y. Aksoy, C. Kim, P. Kellnhofer, S. Paris, M. Elgharib, M. Pollefeys, and W. Matusik, "A Dataset of Flash and Ambient Illumination Pairs from the Crowd," Proceedings of European Conference on Computer Vision(ECCV), Munich, Germany, pp. 634-649, 2018.
    [65] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image Quality Assessment: From Error Visibility to Structural Similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, 2004.
    [66] Google (2016). "Butteraugli," Available: https://github.com/google/butteraugli. (2022, May 5)

    下載圖示
    QR CODE