簡易檢索 / 詳目顯示

研究生: 黃瀚
Huang, Han
論文名稱: 深度學習輔助的基於分佈的集成科學資料統計視覺化與分析
Deep Learning-Assisted Statistical Visualization and Analysis for Distribution-Based Ensemble Scientific Data Summarization
指導教授: 王科植
Wang, Ko-Chih
口試委員: 王科植
Wang, Ko-Chih
葉梅珍
Yeh, Mei-Chen
紀明德
Chi, Ming-Te
口試日期: 2025/01/03
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 47
中文關鍵詞: 深度學習基於分布表示原位資料處理大型集成資料
英文關鍵詞: Deep learning, distribution-based, in situ data processing, large ensemble data
研究方法: 實驗設計法
論文種類: 學術論文
相關次數: 點閱:8下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為了透過計算機模擬研究複雜的現實世界現象,科學家通常依賴從多次模擬運行中生成的集合數據集,這些模擬運行使用不同的參數配置。這一過程會生成極大規模的數據集,導致傳統的數據分析流程因有限的I/O帶寬和磁盤容量而變得相當侷限。基於分布的數據表示已被提出作為一個可能的解決方案。通過原位資料處理來生成緊湊的基於分布的表示,不僅緩解了有限的I/O帶寬和磁盤容量的挑戰,還能實現不確定性量化,從而減少誤解的風險。然而,基於分布的方法本質上會犧牲數據樣本的空間信息,可能會降低數據分析流程中的精確度。為了解決這一問題,我們引入了一種深度學習模型來從分布表示中重建數據體積。我們並不使用直接從分布表示預測數據塊的模型,而是提出了一種基於Gumbel-Sinkhorn神經網絡(GSNN)的深度學習模型,它學習將從塊的分布中抽取的樣本映射到塊內的空間位置。該深度學習模型不僅支持高質量的後續數據分析和可視化,還能提供逐點不確定性量化,並保證重建的數據塊分布與其分布表示一致。

    To study complex real-world phenomena using computer simulations, scientists often rely on ensemble datasets generated from multiple simulation runs with varying parameter configurations. This process can produce extreme-scale datasets, making traditional data analysis pipelines impractical due to limited I/O bandwidth and disk capacity. Distribution-based data representations have been proposed as a promising solution.
    Processing data in situ to generate compact distribution-based representations not only alleviates the challenges of limited I/O bandwidth and disk capacity but also enables uncertainty quantification, thus mitigating the risk of misinterpretation. Nevertheless, distribution-based method inherently sacrifices spatial information of data samples within the distribution, potentially reducing precision in the data analysis pipeline. To address this issue, we introduce a deep learning model to reconstruct data volume from the distribution representation. Instead of using a model that predicts a data block directly from its distribution representation, we propose a deep learning model based on the Gumbel-Sinkhorn Neural Network (GSNN) that learns to map samples drawn from a block's distribution to spatial locations within the block. The deep learning model can support high-quality downstream data analysis and visualization, provide point-wise uncertainty quantification, and guarantee the distribution of the reconstructed data block follows the block's distribution representation.

    1 Introduction 1 2 RelatedWork 5 2.1 InSituDataProcessing 5 2.2 Distribution-basedDataRepresentation 6 2.3 DeepLearninginScientificData 7 3 Overview 9 3.1 Distribution-basedDataRepresentation 10 3.2 DataReconstructionviaGS-3DNet 11 4 Gumbel-SinkhornDeepLearningModel 13 4.1 Gumbel-SinkhormDeepLearningArchitecture 13 4.2 Gumbel-Sinkhorn3DNet 15 4.3 ObjectiveFunction 18 4.4 DistributionReconstruction 19 5 Experiment 20 5.1 DatasetandTrainingConfiguration 20 5.2 ExperimentSetup 21 5.3 QuantitativeEvaluation 24 5.4 QualitativeEvaluation 26 6 Discussion 32 6.1 ImpactofDistributionModelingError 32 6.2 ImpactofBlockSize 33 6.3 UncertaintyQuantification 33 7 Conclusion 37 8 Bibliography 38

    Ann S. Almgren, John B. Bell, Mike J. Lijewski, Zarija Luki´ c, and Ethan Van Andel. Nyx: A massively parallel amr code for computational cosmology. The Astrophysical Journal, 765(1):39, February 2013.
    Tushar Athawale and Alireza Entezari. Uncertainty quantification in linear interpolation for isosurface extraction. IEEE Transactions on Visualization and Computer Graphics, 19(12):2723–2732, 2013.
    James Ahrens, S´ebastien Jourdain, Patrick O’Leary, John Patchett, David H Rogers, and Mark Petersen. An image-based approach to extreme scale in situ visualization and analysis. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 424–434. IEEE, 2014.
    James Ahrens, S´ebastien Jourdain, Patrick O’Leary, John Patchett, David H Rogers, Patricia Fasel, Andrew Bauer, Mark Petersen, Francesca Samsel, and Benjamin Boeckel. In situ mpas-ocean imagebased visualization. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Visualization & Data Analytics Showcase, 2014.
    Tushar M. Athawale, Dan Maljovec, Lin Yan, Chris R. Johnson, Valerio Pascucci, and Bei Wang. Uncertainty visualization of 2d morse complex ensembles using statistical summary maps. IEEE Transactions on Visualization and Computer Graphics, 28(4):1955–1966, 2022.
    Tushar Athawale, Elham Sakhaee, and Alireza Entezari. Isosurface visualization of data with nonparametric models for uncertainty. IEEE Transactions on Visualization and Computer Graphics, 22(1):777786, 2015.
    Tushar M. Athawale, Sudhanshu Sane, and Chris R. Johnson. Uncertainty visualization of the marching squares and marching cubes topology cases. In 2021 IEEE Visualization Conference (VIS), pages 106–110, 2021
    Andrew C Bauer, Hasan Abbasi, James Ahrens, Hank Childs, Berk Geveci, Scott Klasky, Kenneth Moreland, Patrick O’Leary, Venkatram Vishwanath, Brad Whitlock, et al. In situ methods, infrastructures, and applications on high performance computing platforms. In Computer Graphics Forum, volume 35, pages 577–597. Wiley Online Library, 2016.
    Tim Biedert and Christoph Garth. Contour tree depth images for large data visualization. In EGPGV@ EuroVis, pages 77–86, 2015.
    Sheng Di and Franck Cappello. Fast error-bounded lossy hpc data compression with sz. In 2016 ieee international parallel and istributed
    processing symposium (ipdps), pages 730–739. IEEE, 2016.
    Soumya Dutta, Chun-Ming Chen, Gregory Heinlein, Han-Wei Shen, and Jen-Ping Chen. In situ distribution guided analysis and visualization of transonic jet engine simulations. IEEE transactions on visualization and computer graphics, 23(1):811–820, 2016.
    Soumya Dutta, Han-Wei Shen, and Jen-Ping Chen. In situ prediction driven feature analysis in jet engine simulations. In 2018 IEEE Pacific Visualization Symposium (PacificVis), pages 66–75. IEEE, 2018.
    Soumya Dutta, Jonathan Woodring, Han-Wei Shen, Jen-Ping Chen, and James Ahrens. Homogeneity guided probabilistic data ummaries
    for analysis and visualization of large-scale data sets. In 2017 IEEE Pacific Visualization Symposium (PacificVis), pages 111–120. IEEE, 2017.
    Safae Elmisaoui, Imad Kissami, and Jean-Michel Ghidaglia. High-performance computing to accelerate large-scale computational fluid dynamics simulations: A comprehensive study. In International Conference on Advanced Intelligent Systems for Sustainable Development, pages 352–360. Springer, 2023.
    Ashutosh Gupta, Subhajit Paul, Arghya Bhattacharya, and Prachi Jain. A framework for realistic paired dataset generation for deep learning based restoration of satellite images. In IGARSS 20242024 IEEE International Geoscience and Remote Sensing Symposium, pages 6997–7002, 2024.
    Li Guo, Shaojie Ye, Jun Han, Hao Zheng, Han Gao, Danny Z. Chen, Jian-Xun Wang, and Chaoli Wang. Ssr-vfd: Spatial super-resolution for vector field data analysis and visualization. In 2020 IEEE Pacific Visualization Symposium (PacificVis), pages 71–80, 2020.
    Wenbin He, Chun-Ming Chen, Xiaotong Liu, and Han-Wei Shen. A bayesian approach for probabilistic streamline computation in uncertain flows. In 2016 IEEE Pacific Visualization Symposium (Paci
    f icVis), pages 214–218. IEEE, 2016.
    Subhashis Hazarika, Soumya Dutta, Han-Wei Shen, and Jen-Ping Chen. Codda: A flexible copula-based distribution driven analysis framework for large-scale multivariate data. IEEE transactions on visualization and computer graphics, 25(1):1214–1224, 2018.
    Marcel Hlawatsch, Philipp Leube, Wolfgang Nowak, and Daniel Weiskopf. Flow radar glyphs—static visualization of unsteady flow with uncertainty. IEEE Transactions on Visualization and Computer Graphics, 17(12):1949–1958, 2011.
    Jun Han and Chaoli Wang. Tsr-tvd: Temporal super-resolution for time-varying data analysis and visualization. IEEE Transactions on Visualization and Computer Graphics, 26(1):205–215, 2020.
    Jun Han and Chaoli Wang. Ssr-tvd: Spatial super-resolution for timevarying data analysis and visualization. IEEE Transactions on Visualization and Computer Graphics, 28(6):2445–2456, 2022.
    Jun Han and Chaoli Wang. Coordnet: Data generation and visualization generation for time-varying volumes via a coordinate-based neural network. IEEE Transactions on Visualization and Computer Graphics, 29(12):4951–4963, 2023.
    Wenbin He, Junpeng Wang, Hanqi Guo, Ko-Chih Wang, Han-Wei Shen, Mukund Raj, Youssef S. G. Nashed, and Tom Peterka. Insitunet: Deep image synthesis for parameter space exploration of ensemble simulations. IEEE Transactions on Visualization and Computer Graphics, page 1–1, 2019.
    Jun Han, Hao Zheng, Danny Z. Chen, and Chaoli Wang. Stnet: An end-to-end generative framework for synthesizing spatiotemporal super-resolution volumes. IEEE Transactions on Visualization and Computer Graphics, 28(1):270–280, 2022.
    Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax, 2017.
    Alexander Kumpf, Josef Stumpfegger, Patrick Fabian H¨ artl, and R¨ udiger Westermann. Visual analysis of multi-parameter distributions across ensembles of 3d fields. IEEE Transactions on Visualization and Computer Graphics, 28(10):3530–3545, 2022.
    Peter Lindstrom and Martin Isenburg. Fast and efficient compression of floating-point data. IEEE transactions on visualization and computer graphics, 12(5):1245–1250, 2006.
    Peter Lindstrom. Fixed-rate compressed floating-point arrays. IEEE transactions on visualization and computer graphics, 20(12):26742683, 2014.
    Shusen Liu, Joshua A. Levine, Peer-Timo Bremer, and Valerio Pascucci. Gaussian mixture model based volume visualization. In IEEE Symposium on Large Data Analysis and Visualization (LDAV), pages 73–77, 2012.
    Shusen Liu, Joshua A Levine, Peer-Timo Bremer, and Valerio Pascucci. Gaussian mixture model based volume visualization. In IEEE symposium on large data analysis and visualization (LDAV), pages 73–77. IEEE, 2012.
    Guan Li, Jiayi Xu, Tianchi Zhang, Guihua Shan, Han-Wei Shen, Ko-Chih Wang, Shihong Liao, and Zhonghua Lu. Distribution-based particle data reduction for in-situ analysis and visualization of large-scale n-body cosmological simulations. In 2020 IEEE Pacific Visualization Symposium (PacificVis), pages 171–180. IEEE, 2020.
    Kwan-Liu Ma. In situ visualization at extreme scale: Challenges and opportunities. IEEE Computer Graphics and Applications, 29(6):1419, 2009.
    Gonzalo Mena, David Belanger, Scott Linderman, and Jasper Snoek. Learning latent permutations with gumbel-sinkhorn networks. arXiv preprint arXiv:1802.08665, 2018.
    Patrick O’Leary, James Ahrens, S´ebastien Jourdain, Scott Wittenburg, David H Rogers, and Mark Petersen. Cinema image-based in situ analysis and visualization of mpas-ocean simulations. Parallel Computing, 55:43–48, 2016.
    Kai Pothkow and Hans-Christian Hege. Positional uncertainty of isocontours: Condition analysis and probabilistic measures. IEEE Transactions on Visualization and Computer Graphics, 17(10):1393–1406, 2011.
    Kai P¨ othkow, Britta Weber, and Hans-Christian Hege. Probabilistic marching cubes. Computer Graphics Forum, 30(3):931–940, 2011.
    ALEXANDER L Read. Linear interpolation of histograms. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 425(1-2):357360, 1999.
    Prajit Ramachandran, Barret Zoph, and Quoc V. Le. Searching for activation functions, 2017.
    Ronell Sicat, Jens Kr¨uger, Torsten M¨oller, and Markus Hadwiger. Sparse pdf volumes for consistent multi-resolution volume render
    ing. IEEE transactions on visualization and computer graphics,
    20(12):2417–2426, 2014.
    Ronell Sicat, Jens Kr¨ uger, Torsten M¨ oller, and Markus Hadwiger. Sparse pdf volumes for consistent multi-resolution volume rendering. IEEE Transactions on Visualization and Computer Graphics, 20(12):2417–J. Shen, H. Li, J. Xu, A. Biswas, and H. Shen. Idlat: An importancedriven latent generation method for scientific data. IEEE Transactions on Visualization and Computer Graphics, 29(01):679–689, 2023.2426, 2014.
    Jingyi Shen and Han-Wei Shen. Psrflow: Probabilistic super resolution with flow-based models for scientific data. IEEE Transactions on Visualization and Computer Graphics, 30(1):986–996, 2024.
    Christoph Schulz, Karsten Schatz, Michael Krone, Matthias Braun, Thomas Ertl, and Daniel Weiskopf. Uncertainty visualization for secondary structures of proteins. In 2018 IEEE Pacific Visualization Symposium (PacificVis), pages 96–105, 2018.
    Cheng Sun and Ko-Chih Wang. Dla-vps: Deep-learning-assisted visual parameter space analysis of cosmological simulations. IEEE Computer Graphics and Applications, 42(3):41–52, 2022.
    Anna Tikhonova, Carlos D Correa, and K-L Ma. An exploratory technique for coherent visualization of time-varying volume data. In Computer Graphics Forum, volume 29, pages 783–792. Wiley Online Library, 2010.
    Anna Tikhonova, Carlos D Correa, and Kwan-Liu Ma. Explorable images for visualizing volume data. PacificVis, 10(177-184):4, 2010.
    Anna Tikhonova, Carlos D Correa, and Kwan-Liu Ma. Visualization by proxy: A novel framework for deferred interaction with volume data. IEEE Transactions on Visualization and Computer Graphics, 16(6):1551–1559, 2010.
    David Thompson, Joshua A Levine, Janine C Bennett, Peer-Timo Bremer, Attila Gyulassy, Valerio Pascucci, and Philippe P P´ ebay. Analysis of large-scale scalar data using hixels. In 2011 IEEE symposium on large data analysis and visualization, pages 23–30. IEEE, 2011.
    David Thompson, Joshua A. Levine, Janine C. Bennett, Peer-Timo Bremer, Attila Gyulassy, Valerio Pascucci, and Philippe P. P´ ebay. Analysis of large-scale scalar data using hixels. In 2011 IEEE Symposium on Large Data Analysis and Visualization, pages 23–30, 2011.
    Habib Toye, Peng Zhan, Ganesh Gopalakrishnan, Aditya R Kartadikaria, Huang Huang, Omar Knio, and Ibrahim Hoteit. Ensemble data assimilation in the red sea: sensitivity to ensemble selection and atmospheric forcing. Ocean Dynamics, 67(7):915–933, July 2017.
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023.
    Daniel Weiskopf. Uncertainty visualization: Concepts, methods, and applications in biological data visualization. Frontiers in Bioinformatics, 2:793819, 2022.
    Chaoli Wang and Jun Han. Dl4scivis: A state-of-the-art survey on deep learning for scientific visualization. IEEE Transactions on Visualization and Computer Graphics, 29(8):3714–3733, 2023.
    Junpeng Wang, Subhashis Hazarika, Cheng Li, and Han-Wei Shen. Visualization and visual analysis of ensemble data: A survey. IEEE transactions on visualization and computer graphics, 25(9):2853–2872, 2018.
    Zelun Wang and Jyh-Charn Liu. Translating math formula images to latex sequences using deep neural networks with sequence-level training, 2019.
    Ko-Chih Wang, Kewei Lu, Tzu-Hsuan Wei, Naeem Shareef, and HanWei Shen. Statistical visualization and analysis of large data using a value-based spatial distribution. In 2017 IEEE pacific visualization symposium (PacificVis), pages 161–170. IEEE, 2017.
    Ko-Chih Wang, Kewei Lu, Tzu-Hsuan Wei, Naeem Shareef, and Han-Wei Shen. Statistical visualization and analysis of large data using a value-based spatial distribution. In 2017 IEEE Pacific Visualization Symposium (PacificVis), pages 161–170, 2017.
    Ross T. Whitaker, Mahsa Mirzargar, and Robert M. Kirby. Contour boxplots: A method for characterizing uncertainty in feature sets from simulation ensembles. IEEE Transactions on Visualization and Computer Graphics, 19(12):2713–2722, 2013.
    Ko-Chih Wang, Naeem Shareef, and Han-Wei Shen. Image and distribution based volume rendering for large data sets. In 2018 IEEE Pacific Visualization Symposium (PacificVis), pages 26–35. IEEE, 2018.
    Ko-Chih Wang, Tzu-Hsuan Wei, Naeem Shareef, and Han-Wei Shen. Ray-based exploration of large time-varying volume data using per-ray proxy distributions. IEEE transactions on visualization and computer graphics, 26(11):3299–3313, 2019.
    Ko-Chih Wang, Jiayi Xu, Jonathan Woodring, and Han-Wei Shen. Statistical super resolution for data analysis and visualization of large scale cosmological simulations. In 2019 IEEE Pacific Visualization Symposium (PacificVis), pages 303–312. IEEE, 2019.
    Qianwen Wang, Chen Zhu-Tian, Yong Wang, and Huamin Qu. A survey on ml4vis: Applying machine learning advances to data visualization. IEEE Transactions on Visualization and Computer Graphics, 28(12):5134–5153, 2022.
    Hao-Yi Yang, Zhi-Rong Lin, and Ko-Chih Wang. Efficient and portable distribution modeling for large-scale scientific data processing with data-parallel primitives. Algorithms, 14(10):285, 2021.
    Hongfeng Yu, Chaoli Wang, Ray W Grout, Jacqueline H Chen, and Kwan-Liu Ma. In situ visualization for large-scale combustion simulations. IEEE computer graphics and applications, 30(3):45–57, 2010.

    下載圖示
    QR CODE