簡易檢索 / 詳目顯示

研究生: 葉宗融
Ye, Zong-Rong
論文名稱: 機器學習方法預測 數千種有機螢光團的放射波長
Predicting the emission wavelength of thousands organic fluorophores by the machine learning approach
指導教授: 蔡明剛
Tsai, Ming-Kang
學位類別: 碩士
Master
系所名稱: 化學系
Department of Chemistry
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 34
中文關鍵詞: 機器學習螢光分子隨機森林聚類分群法
英文關鍵詞: Machine learning, Flourescent molecules, Random Forest, Kmeans clustering
DOI URL: http://doi.org/10.6345/NTNU201900295
論文種類: 學術論文
相關次數: 點閱:176下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在過去70年中,螢光分子已廣泛應用於各種領域,如螢光紡織品,螢光油墨和螢光塑料產品。 有機螢光顏料的應用也在螢光檢測,生物探針和標記方面。
    在這項研究中,我們導入了超過一萬個有機螢光分子進行分析,並使用分子結構文件生成分子描述符。 我們還應用聚類方法,以更好地了解這些各種有機螢光分子並幫助建模。 我們希望為螢光分子的選擇和設計提供廣泛而有效的模型,並促進螢光材料的發展。在我們的信息方法處理之後,我們的模型中留下的一些描述符最初是為了描述環和多個鍵屬性而創建的。所選出的描述符與我們的化學直覺相關,且解釋性比重較高的描述符多為對共軛性質的描述,與以往化學家對螢光分子結構經驗相符。

    For over past 70 year fluorescent molecules have been used in a wide range of applications, such as Fluorescent textiles, Fluorescent ink, and Fluorescent plastic products. The applications of organic fluorescent pigments are also in the fluorescent inspection, biological probes and labeling aspect.
    In this research, we imported over ten thousands organic fluorescent molecules for analysis and use the molecular structure files to generate molecular descriptors. We apply the clustering method for the better insight in these various organic fluorescent molecules and aiding the modeling. We expect to bulid a broad and valid model for the selection and design of fluorescent molecules and spur the development of fluorescent materials. After our informatic methods process, some descriptors left in our model are originally created to describe the rings and multiple bond properties. The selected descriptors are closely related to our chemical instinct.

    Table of Contents Table of Contents i Table of Figures iii Table of Tables iv Abstract v 中文摘要 vi Introduction 1 Methods 3 Data 3 Feature Selection and Modeling 5 KNN based generalized simulated annealing 5 Variance Threshold 7 Lasso 7 Kmeans clustering 8 Silhouette score 9 Principal component analysis 9 Random Forest 10 K-Folds Cross Validation 11 Multiple Linear Regression 12 Result and Discussion 13 Generalized simulated annealing 13 Variance threshold 13 Kmeans clustering 16 Lasso regression 24 Model selection 28 Descriptors in lassoRF model 29 Reference 32

    1. Specht, E. A.; Braselmann, E.; Palmer, A. E., A Critical and Comparative Review of Fluorescent Tools for Live-Cell Imaging. Annu. Rev. Physiol. 2017, 79, 93-117.
    2. Kwon, N.; Hu, Y.; Yoon, J., Fluorescent Chemosensors for Various Analytes Including Reactive Oxygen Species, Biothiol, Metal Ions, and Toxic Gases. ACS Omega 2018, 3 (10), 13731-13751.
    3. Moerner, W. E.; Fromm, D. P., Methods of single-molecule fluorescence spectroscopy and microscopy. Rev. Sci. Instrum. 2003, 74 (8), 3597-3619.
    4. He, Z.; Ke, C.; Tang, B. Z., Journey of Aggregation-Induced Emission Research. ACS Omega 2018, 3 (3), 3267-3277.
    5. Kim, E.; Park, S. B., Discovery of New Fluorescent Dyes: Targeted Synthesis or Combinatorial Approach? In Springer Series on Fluorescence, Springer Berlin Heidelberg: 2010; pp 149-186.
    6. Sauer, M.; Hofkens, J.; Enderlein, J., Handbook of Fluorescence Spectroscopy and Imaging: From Single Molecules to Ensembles. 2011.
    7. Vogel, A., Darstellung von Benzoesäure aus der Tonka-Bohne und aus den Meliloten - oder Steinklee - Blumen. Ann Phys 1820, 64 (2), 161-166.
    8. Chen, J.; Liu, W.; Ma, J.; Xu, H.; Wu, J.; Tang, X.; Fan, Z.; Wang, P., Synthesis and properties of fluorescence dyes: tetracyclic pyrazolo[3,4-b]pyridine-based coumarin chromophores with intramolecular charge transfer character. J. Org. Chem. 2012, 77 (7), 3475-82.
    9. Gramatica, P., A SHORT HISTORY OF QSAR EVOLUTION. 2011.
    10. Todeschini, R.; Consonni, V., Handbook of molecular descriptors. WileyVCH, Weinheim. 2000; Vol. 11.
    11. Weiss, S. M.; Kulikowski, C. A., Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Morgan Kaufmann Publishers Inc.: 1991; p 223.
    12. King, R. D.; Hirst, J. D.; Sternberg, M. J. E., New approaches to QSAR: Neural networks and machine learning. Perspect. Drug Discov. Des. 1993, 1 (2), 279-290.
    13. Srinivasan, A.; King, R. D., Feature construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes. Data Min. Knowl. Discov. 1999, 3 (1), 37-57.
    14. Reaxys. [Frankfurt, Germany] ; [New York, NY] : Elsevier.
    15. Weininger, D., SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 1988, 28 (1), 31-36.
    16. Yap, C. W., PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. 2011, 32 (7), 1466-1474.
    17. Goodarzi, M.; Dejaegher, B.; Heyden, Y. V. J. J. o. A. I., Feature selection methods in QSAR studies. 2012, 95 3, 636-51.
    18. Golbraikh, A.; Tropsha, A., Beware of q2! J. Mol. Graph. Model. 2002, 20 (4), 269-276.
    19. Altman, N. S., An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. Am Stat 1992, 46 (3), 175-185.
    20. Sutter, J. M.; Dixon, S. L.; Jurs, P. C., Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated Annealing. J Chem Inf Comput Sci 1995, 35 (1), 77-84.
    21. Fisher, R. A., XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. Trans. R. Soc. Edinburgh 1919, 52 (2), 399-433.
    22. Tibshirani, R., Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Series B Stat Methodol 2011, 73 (3), 273-282.
    23. Rousseeuw, P. J., Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987, 20, 53-65.
    24. Akella, L. B.; DeCaprio, D., Cheminformatics approaches to analyze diversity in compound screening libraries. Curr Opin Chem Biol 2010, 14 (3), 325-330.
    25. Bro, R.; Smilde, A. K., Principal component analysis. Anal. Methods 2014, 6 (9), 2812-2831.
    26. Abdi, H.; Williams, L. J., Principal component analysis. Wiley Interdiscip Rev Comput Stat 2010, 2 (4), 433-459.
    27. Tin Kam, H., The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 1998, 20 (8), 832-844.
    28. Kleinberg, E. M., An overtraining-resistant stochastic modeling method for pattern recognition. Ann. Stat. 1996, 24 (6), 2319-2349.
    29. Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J. C.; Sheridan, R. P.; Feuston, B. P., Random Forest:  A Classification and Regression Tool for Compound Classification and QSAR Modeling. J Chem Inf Comput Sci 2003, 43 (6), 1947-1958.
    30. Palmer, D. S.; O'Boyle, N. M.; Glen, R. C.; Mitchell, J. B. O., Random Forest Models To Predict Aqueous Solubility. J Chem Inf Model 2007, 47 (1), 150-158.
    31. Keller, C. A.; Evans, M. J., Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10. Geosci Model Dev 2019, 12 (3), 1209-1225.
    32. Ahneman, D. T.; Estrada, J. G.; Lin, S.; Dreher, S. D.; Doyle, A. G., Predicting reaction performance in C–N cross-coupling using machine learning. Science 2018, 360 (6385), 186.

    無法下載圖示 本全文未授權公開
    QR CODE