Recommended Readings
NOTE THAT THE LIST OF READINGS PROVIDED IS NOT COMPREHENSIVE AND THE ORDER DOES NOT INDICATE IMPORTANCE. THE CONTENT WILL OCCASIONALLY BE UPDATED
High Resolution Mass spectrometry and Molecular characterization
Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence by Schymanski et al 2014. Link
Soil Organic Matter Characterization by Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FTICR MS): A Critical Review of Sample Preparation, Analysis, and Data Interpretation by Bahureksa et al. 2021. Link
Tracking complex mixtures of chemicals in our changing environment by Beate et al. 2020. Link
Mass Spectrometry: A Textbook by Jürgen H Gross
Critical Assessment of the Chemical Space Covered by LC−HRMS Non-Targeted Analysis by Hulleman et al. 2023. Link
BIODEGRADABLE PLASTICS. LINK
GOOD PAPER FOR MZMINE AND NON TARGET SCREENING: (Mass spectrometry data processing in MZmine 3: feature detection 2 and annotation) LINK
https://rdrr.io/github/robertyoung3/MSanalyzeNOM/f/
Installation: https://rdrr.io/github/robertyoung3/MSanalyzeNOM/
METABOLOMICS AND CHEMOINFORMATICS FOR MASS SPEC
New software tools, databases, and resources in metabolomics: updates from 2020 by Biswapriya B. Misra. LINK
Advances in decomposing complex metabolite mixtures using substructure- and network-based computational metabolomics approaches by Beniddir et al 2020. LINK
Combining HRMS and ML tools
https://kruvelab.com/blog/ (KRUVE LAB)
https://nationalmaglab.org/user-facilities/icr/instruments/ (NATIONAL MAGNETIC LABORATORY)
Graph Neural Networks
A Comprehensive Survey on Graph Neural Networks by Wu et. al, 2021. Link
A Survey on Graph Neural Networks for Graph Summarization by Nasrin et al. 2023. Link
A compact review of molecular property prediction with graph neural networks by Wieder et al. 2020. Link
Graph neural networks: A review of methods and applications by Zhou et. al, 2020. Link
Computing Graph Neural Networks: A Survey from Algorithms to Accelerators by Abadal et. al, 2021. Link
Graph Neural Networks in TensorFlow and Keras with Spektral by Grattarola and Alippi 2021. Link
Graph neural networks for materials science and chemistry by Reiser et al. 2022. Link
Deep learning methods for molecular representation and property prediction by Li et al. 2022. Link
Knowledge-Embedded Message-Passing Neural Networks: Improving Molecular Property Prediction with Human Knowledge by Hasebe 2021. Link
Convolutional Networks on Graphs for Learning Molecular Fingerprints by Duvenaud et al., 2015. Link
Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism by Zhaoping et al. 2019. Link
Graph attention networks by Velickovic et al 2018. Link
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks by Segler et al 2017. Link
Machine learning for chemoinformatic
Deep learning in chemistry by Mater and Coote, 2019. Link
Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery By Tu et al. 2023. Link
Deep learning for computational chemistry by Goh et al. 2016 Link
Machine Learning Modeling of Environmentally Relevant Chemical Reactions for Organic Compounds by Zhang and Zhang, 2022. Link
Transfer Learning for Drug Discovery by Cai et al. 2020. Link
Putting Chemical Knowledge to Work in Machine Learning for Reactivity by Jorner 2023. Link
Open-Source Machine Learning in Computational Chemistry by Hagger and Kirschner 2023. Link
deepchem/examples/tutorials at master · deepchem/deepchem · GitHub
https://biopython.org/. (. BIOPYTHON TUTORIAL -- THIS IS FOR BIOINFORMATICS)
https://docs.scvi-tools.org/en/stable/tutorials/notebooks/api_overview.html FOR BIOINFORMATICS
https://www.cheminformania.com/ nice site
https://figshare.com/projects/NCCT_Chemistry_Dashboard_Data/32198 EPA'S DATA
https://epa.figshare.com/search?q=:keyword:%20%22PHYSPROP%22
https://github.com/akensert/molgraph molgraph
https://deepchem.io/models/ (MODELS IN DEEPCHEM)
https://github.com/PatWalters/practical_cheminformatics_tutorials
https://github.com/Aouidate/Chemoinformatics-compiliation?tab=readme-ov-file (New)
Molecular Representation for Chemoinformatic
Molecular representations in AI-driven drug discovery: a review and practical guide by Laurianne et al. 2022. Link
Learning Molecular representations for medicinal chemistry by Kangway et al. 2020. Link
Geometry-enhanced molecular representation learning for property prediction by Fang et al. 2022. Link
A review of molecular representation in the age of machine learning by Wigh et al 2022. Link
Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation by Krenn et al. 2020. Link
Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems by John et al. 2021. Link
Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models by Jiang et al. 2021. Link
Canonicalizing BigSMILES for Polymers with Defined Backbones by Tzyy-Shyang et al. 2022. Link
Analyzing learned molecular representation for property prediction by Yang et al. 2019. Link
Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties by Gallegos et al. 2021. Link
Generative models: Transformers, Variational Autoencoders (VAEs), and Generative Adversarial Networks (GANs)
Attention Is All You Need by Vaswani et al. 2017. Link
Transformers: State-of-the-Art Natural Language Processing by Wolf et al. 2020. Link
https://github.com/huggingface/transformers # This is a library for transformer models
Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules by Gomez-Bombarelli et al. 2018. Link
Generative models for molecular discovery: Recent advances and challenges by Bilodeau et al. 2022. Link
Generative Models for De Novo Drug Design by Xiaochu et al. 2021. Link
The illustrated transformer by Jay Alammar. Link
Molecule Attention Transformer by Muziarka et al. 2021. Link
A survey of transformers by Lin et al. 2022. Link
Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction by Schwaller et al. 2019. Link
BERT Learns (and Teaches) Chemistry by Payne et al. Link
Transformers for molecular property prediction: Lessons learned from the past five years. By Sultan et al. 2024. Link
A REVIEW OF LARGE LANGUAGE MODELS AND AUTONOMOUS AGENTS IN CHEMISTRY BY RAMOS AND COLLISON 2024. LINK
Codes
https://github.com/AspirinCode/papers-for-molecular-design-using-DL
http://kundajelab.github.io/dragonn/tutorials.html
https://zitniklab.hms.harvard.edu/software/
https://keras.io/examples/ #(Keras collection of tutorials and some examples of codes)
https://pytorch.org/ecosystem/ #(Ecosystem for PyTorch, for drug discovery and molecular chemistry, focus on TORCHDRUG and DGL) https://torchdrug.ai/docs/tutorials/
TRANSFER LEARNING (An important technique in research areas with limited experimental Datasets and computational resources)
A survey of transfer learning by Weiss et al. 2016. Link
A Comprehensive Survey on Transfer Learning by Zhuang et al. 2021. Link
Transfer and Multi-task Learning in QSAR Modeling: Advances and Challenges by Simoes et al. 2018. Link
Using Rule-Based Labels for Weak Supervised Learning by Goh et al 2018. Link
Inductive transfer learning for molecular activity prediction: Next-Gen QSAR Models with MolPMoFiT by Li and Fourches, 2020. Link
Chemformer: a pre-trained transformer for computational chemistry by Irwin et al. 2022. Link
CRNNTL: convolutional recurrent neural network and transfer learning for QSAR modelling by Li and Yu, 2022. Link
Machine Learning Methods for Small Data Challenges in Molecular Science. link
Common Benchmarking and "potential transfer learning" Datasets for Chemoinformatic.
https://arxiv.org/pdf/2102.09548.pdf DRUG DISCOVERY DATA
Accessing some of these websites online
https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest-tutorial # Acessing PUBCHEM tutorial
https://github.com/kjappelbaum/awesome-chemistry-datasets (GOOD COLLECTIONS OF DATASETS FOR BENCHMARKING ML MODELS)
Network Analysis, and automated mechanism and kinetic generator
Exploration of Reaction Pathways and Chemical Transformation Networks by Simm et al. 2019. Link
Reproducible molecular networking of untargeted mass spectrometry data using GNPS by Allegra et al. 2020. Link
Reaction Mechanism Generator: Automatic construction of chemical kinetic mechanisms (Ftom William Green group at MIT) Link1 Link 2
Description of Dissolved Organic Matter Transformational Networks at the Molecular Level by Leyva et al. 2023. Link
Network Analysis for Prioritizing Biodegradation Metabolites of Polycyclic Aromatic Hydrocarbons by Sleight et al. 2020. Link
Feature-based molecular networking in the GNPS analysis environment by Louis-Félix et al 2020. Link
Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking by Mingxun et al. 2016. Link
Image analysis for ML
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION BY Simonyan & Zisserman. Link
Going Deeper with Convolutions by Szegedy et al. 2015. Link
ImageNet Classification with Deep Convolutional Neural Networks by Krizhevsky et al. Link
Deep Residual Learning for Image Recognition by He at al. 2016. Link
Machine learning for Proteins
Collection of papers and tools. https://github.com/yangkky/Machine-learning-for-proteins
Sites
GOOD BLOG FOR PROTEIN RELATED WORK: https://portal.valencelabs.com/blogs/post/adventures-of-pop---the-undruggable-protein-iJygl6lv48aIeWX
A guide to machine learning
Image source: https://vas3k.ru/blog/machine_learning/
https://chemintelligence.com/ai-for-chemistry
https://mit6874.github.io/ (Dr. Gifford class outline)
https://www.slideegg.com/puzzle-slide-template-3
https://capd.mit.edu/resources/academic-job-sites/
https://www.norman-network.com/nds/SLE/ (Data available)
https://comptox.epa.gov/dashboard/chemical-lists (Data available)
https://zinc.docking.org/substances/home/ (Data available)
https://en.wikipedia.org/wiki/List_of_chemical_databases (List of databases). very good
http://www.t3db.ca/ (Toxins)
http://sitem.herts.ac.uk/aeru/iupac/purchase_database.htm (pesticide properties for purchase)
https://www-library.ch.cam.ac.uk/list-useful-databases (Database)
https://www.cambridgemedchemconsulting.com/resources/miscellaneous/databases.html
https://guides.lib.ua.edu/c.php?g=39819&p=4956716
https://fordham.libguides.com/Chemistry/Databases
https://depth-first.com/articles/2011/10/12/sixty-four-free-chemistry-databases/ (good site)
https://www.chemistryviews.org/details/education/10015921/Chemistry_Databases/
https://www.reaxys.com/#/search/advanced (good site)
https://www.cas.org/support/documentation/cas-databases (CAS DATABASES)
https://www.nist.gov/pml/productsservices/physical-reference-data
https://ncifrederick.cancer.gov/scientificlibrary/electronicresources/databases.aspx
https://medium.com/mlearning-ai/what-are-cheminformatics-resources-67783cc788f6 (List of resources ).
https://www.echemportal.org/echemportal/content/participants (OECD data sources)
https://qsartoolbox.org/resources/databases/
https://cdxapps.epa.gov/oms-substance-registry-services/search
PlasticDB PAPER : PlasticDB: a database of microorganisms and proteins linked to plastic biodegradation by Gambarini et al. 2022. Link
THE DATABASE: https://plasticdb.org/downloaddata
Sources of images
https://unsplash.com/?utm_source=medium&utm_medium=referral