研究生: |
洪聖軒 Hung, Sheng-Hsuan |
---|---|
論文名稱: |
深度學習模型Schnet的分析、簡化與改進方法探討 Exploring architecture of schnet: simplification of input and functional expansion |
指導教授: |
蔡明剛
Tsai, Ming-Kang |
口試委員: |
葉丞豪
Yeh, Chen-Hao 張鈞智 Chang, Chun-Chih 陳柏琳 Chen, Ber-lin 孫英傑 Sun, Ying-Chieh 蔡明剛 Tsai, Ming-Kang |
口試日期: | 2022/06/16 |
學位類別: |
碩士 Master |
系所名稱: |
化學系 Department of Chemistry |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 69 |
中文關鍵詞: | 化學資訊學 、機器學習 、深度學習 、神經網路 |
英文關鍵詞: | Cheminformatics, Machine learning, Deep learning, Neural network |
研究方法: | 大數據分析法 |
DOI URL: | http://doi.org/10.6345/NTNU202200669 |
論文種類: | 學術論文 |
相關次數: | 點閱:274 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在深度學習應用化學領域的研究中,以對分子特定性質的高準確度、低計算成本預測的研究一直是具有高關注度的研究方向。在本研究中所使用的Schnet便是對此研究方向提出的一個相對成熟的深度學習模型,具有能預測QM9數據集中分子包含HOMO能量 (EH)、LUMO能量 (EL)以及兩者能量差距(EG)在內的性質平均絕對誤差達到接近或小於1 kcal/mol的準確度,並且其計算成本遠低於經典的DFT計算。針對Schnet的優秀預測效果,本研究對於其架構的主要部分進行分析,得到類似於誘導效應的資料關聯。之後,利用QM9數據集中的所具有的分子SMILES,產生鍵步資訊並替換Schnet架構中的一部份以達成輸入資訊的簡化,最終獲得了與原始Schnet相比大約1-2倍左右的平均絕對誤差。在架構的更改上,進一步利用來自Deep4Chem的分子數據集來測試Schnet經過簡化之後的架構其預測螢光放光波長、吸光波長以及量子產率的能力,然後再外加一層用以對分子環境不同作為標示的嵌入層,將分子環境的資訊輸入模型中,以獲得更好的預測結果。
在對Schnet模型的後續改進中,對Deep4Chem分子數據集中的螢光分子之吸光與放光波長的預測,其平均絕對誤差達到了0.131 eV與0.087 eV;而在Schnet中加入一層嵌入層之後,對吸光與放光波長的預測之平均絕對誤差則被降低到了0.083 eV與0.082 eV。在預測量子產率的表現上,兩種模型分別的平均絕對誤差為0.336與0.292。
In the field of deep learning applied chemistry, the study of molecule-specific property prediction with high accuracy and low computational cost has been a high interest research direction. The Schnet model used in this study is a relatively mature deep learning model for this research direction, which can predict the mean absolute errors (MAEs) of molecules in the QM9 dataset including HOMO energy levels (EH), LUMO energy levels (EL) and the energy gap between them (EG) with an accuracy close to or less than 1 kcal/mol, and its computational cost is much lower than that of classical DFT calculations. In the present work, for the better prediction of Schnet, the main part of its structure is analyzed to obtain data correlations similar to inductive effect. Then, using the molecular SMILES available in the QM9 dataset to generate the bond steps information and replace part of the structure of Schnet to achieve simplification of the input information, which results in MAEs of about 1-2 times compared to the original Schnet. To change the structure of Schnet, the simplified structure of Schnet is further tested to predict fluorescence emission wavelengths, absorption wavelengths, and quantum yields of molecular data sets from Deep4Chem, and then an additional embedding layer is added to label the differences in molecular environments to input information about molecular environments into the model for better prediction results.
In the subsequent improvement of the Schnet model, MAEs of the prediction of the absorption wavelengths and emission wavelengths of the fluorescent molecules in the Deep4Chem molecular dataset reached 0.131 eV and 0.087 eV, while MAEs of the prediction of the absorption wavelengths and emission wavelengths is reduced to 0.083 eV and 0.082 eV by adding an embedded layer to Schnet. MAEs in the predicted quantum yield is 0.336 and 0.292 for two models.
1. Friederich, P.; Häse, F.; Proppe, J.; Aspuru-Guzik, A. Nat. Mater., 2021, 20, 750-761.
2. Schütt, K. T.; Sauceda, H. E.; Kindermans, P.-J.; Tkatchenko, A.; Müller, K.-R. J. Chem. Phys., 2018, 148, 241722.
3. Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Von Lilienfeld, O. A. Sci. Data, 2014, 1.
4. Joung, J. F.; Han, M.; Hwang, J.; Jeong, M.; Choi, D. H.; Park, S. J. Am. Chem. Soc. Au, 2021, 1, 427-438.
5. Weininger, D. J. Chem. Inf. Model., 1988, 28, 31-36.
6. Lecun, Y.; Bengio, Y.; Hinton, G. Nature, 2015, 521, 436-444.
7. Zheng, S.; Yan, X.; Yang, Y.; Xu, J. J. Chem. Inf. Model., 2019, 59, 914-923.
8. Hong, S. H.; Ryu, S.; Lim, J.; Kim, W. Y. J. Chem. Inf. Model., 2020, 60, 29-36.
9. Kang, M.; Cho, I.; Park, J.; Jeong, J.; Lee, K.; Lee, B.; Del Orbe Henriquez, D.; Yoon, K.; Park, I. ACS Sensors, 2022, 7, 430-440.
10. Christensen, O.; Schlosser, R. D.; Nielsen, R. B.; Johansen, J.; Koerstz, M.; Jensen, J. H.; Mikkelsen, K. V. J. Phys. Chem. A, 2022, 126, 1681–1688.
11. Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; Dahl, G. E. In ICML., 2017, 70, 1263-1272.
12. Gómez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hernández-Lobato, J. M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A. ACS Cent. Sci., 2018, 4, 268-276.
13. Gomaa, S.; Soliman, A. A.; Mohamed, A.; Emara, R.; Attia, A. M. ACS Omega, 2022, 7, 3549-3556.
14. Lee, M.; Min, K. ACS Omega, 2022, 7, 3649-3655.
15. Park, J.; Shim, Y.; Lee, F.; Rammohan, A.; Goyal, S.; Shim, M.; Jeong, C.; Kim, D. S. ACS Polym. Au, 2022, XXXX, XXX-XXX.
16. Seidl, P.; Renz, P.; Dyubankova, N.; Neves, P.; Verhoeven, J.; Wegner, J. K.; Segler, M.; Hochreiter, S.; Klambauer, G. J. Chem. Inf. Model., 2022, 62, 2111-2120.
17. Ren, S.; Fonseca, E.; Perry, W.; Cheng, H.-P.; Zhang, X.-G.; Hennig, R. G. J. Phys. Chem. A, 2022, 126, 529-535.
18. Majorel, C.; Girard, C.; Arbouet, A.; Muskens, O. L.; Wiecha, P. R. ACS Photonics, 2022, 9, 575-585.
19. Ko, T. W.; Finkler, J. A.; Goedecker, S.; Behler, J. Nat. Commun., 2021, 12.
20. Yoshioka, N.; Mizukami, W.; Nori, F. Communications Physics, 2021, 4.
21. Ihalage, A.; Hao, Y. NPJ Comput. Mater., 2021, 7.
22. Lan, J.; Kapil, V.; Gasparotto, P.; Ceriotti, M.; Iannuzzi, M.; Rybkin, V. V. Nat. Commun., 2021, 12.
23. Tsymbalov, E.; Shi, Z.; Dao, M.; Suresh, S.; Li, J.; Shapeev, A. NPJ Comput. Mater., 2021, 7.
24. Belle, C. E.; Aksakalli, V.; Russo, S. P. J. Cheminformatics, 2021, 13.
25. Fung, V.; Hu, G.; Ganesh, P.; Sumpter, B. G. Nat. Commun., 2021, 12.
26. Kiyohara, S.; Tsubaki, M.; Mizoguchi, T. NPJ Comput. Mater., 2020, 6.
27. Schütt, K. T.; Arbabzadah, F.; Chmiela, S.; Müller, K. R.; Tkatchenko, A. Nat. Commun., 2017, 8, 13890.
28. Sadowski, J.; Gasteiger, J. Chem. Rev., 1993, 93, 2567-2581.
29. O'Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vandermeersch, T.; Hutchison, G. R. J. Cheminformatics, 2011, 3, 33.
30. M. J. Frisch; G. W. Trucks; H. B. Schlegel; G. E. Scuseria; M. A. Robb; J. R. Cheeseman; G. Scalmani; V. Barone; B. Mennucci; G. A. Petersson; H. Nakatsuji; M. Caricato; X. Li; H. P. Hratchian; A. F. Izmaylov; J. Bloino; G. Zheng; J. L. Sonnenberg; M. Hada; M. Ehara; K. Toyota; R. Fukuda; J. Hasegawa; M. Ishida; T. Nakajima; Y. Honda; O. Kitao; H. Nakai; T. Vreven; J. A. Montgomery; Jr.; J. E. Peralta; F. Ogliaro; M. Bearpark; J. J. Heyd; E. Brothers; K. N. Kudin; V. N. Staroverov; R. Kobayashi; J. Normand; K. Raghavachari; A. Rendell; J. C. Burant; S. S. Iyengar; J. Tomasi; M. Cossi; N. Rega; J. M. Millam; M. Klene; J. E. Knox; J. B. Cross; V. Bakken; C. Adamo; J. Jaramillo; R. Gomperts; R. E. Stratmann; O. Yazyev; A. J. Austin; R. Cammi; C. Pomelli; J. W. Ochterski; R. L. Martin; K. Morokuma; V. G. Zakrzewski; G. A. Voth; P. Salvador; J. J. Dannenberg; S. Dapprich; A. D. Daniels; Ö. Farkas; J. B. Foresman; J. V. Ortiz; J. Cioslowski; Fox, a. D. J., Gaussian, Inc., Revision d.01, 2009.
31. Stewart, J. J. P., Colorado Springs, Version 13. 136L, 2012.
32. He, K.; Zhang, X.; Ren, S.; Sun, J., In 2016 IEEE CVPR, 2016.