Spaces:
Sleeping
Model documentation & parameters
Algorithm Version: Which model version to use.
Property goals: One or multiple properties that will be optimized.
Protein target: An AAS of a protein target used for conditioning. Leave blank unless you use affinity
as a property goal
.
Decoding temperature: The temperature parameter in the SMILES/SELFIES decoder. Higher values lead to more explorative choices, smaller values culminate in mode collapse.
Maximal sequence length: The maximal number of SMILES tokens in the generated molecule.
Number of samples: How many samples should be generated (between 1 and 50).
Limit: Hypercube limits in the latent space.
Number of steps: Number of steps for a GP optmization round. The longer the slower. Has to be at least Number of initial points
.
Number of initial points: Number of initial points evaluated. The longer the slower.
Number of optimization rounds: Maximum number of optimization rounds.
Sampling variance: Variance of the Gaussian noise applied during sampling from the optimal point.
Samples for evaluation: Number of samples averaged for each minimization function evaluation.
Max. sampling steps: Maximum number of sampling steps in an optmization round.
Seed: The random seed used for initialization.
Model card -- PaccMannGP
Model Details: PaccMannGP is a language-based Variational Autoencoder that is coupled with a GaussianProcess for controlled sampling. This model systematically explores the latent space of a trained molecular VAE.
Developers: Jannis Born, Matteo Manica and colleagues from IBM Research.
Distributors: Original authors' code wrapped and distributed by GT4SD Team (2023) from IBM Research.
Model date: Published in 2022.
Model version: A molecular VAE trained on 1.5M molecules from ChEMBL.
Model type: A language-based molecular generative model that can be explored with Gaussian Processes to generate molecules with desired properties.
Information about training algorithms, parameters, fairness constraints or other applied approaches, and features: Described in the original paper.
Paper or other resource for more information: Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model (2022; Journal of Chemical Information & Modeling).
License: MIT
Where to send questions or comments about the model: Open an issue on GT4SD repository.
Intended Use. Use cases that were envisioned during development: Chemical research, in particular drug discovery.
Primary intended uses/users: Researchers and computational chemists using the model for model comparison or research exploration purposes.
Out-of-scope use cases: Production-level inference, producing molecules with harmful properties.
Factors: Not applicable.
Metrics: High reward on generating molecules with desired properties.
Datasets: ChEMBL.
Ethical Considerations: Unclear, please consult with original authors in case of questions.
Caveats and Recommendations: Unclear, please consult with original authors in case of questions.
Model card prototype inspired by Mitchell et al. (2019)
Citation
@article{born2022active,
author = {Born, Jannis and Huynh, Tien and Stroobants, Astrid and Cornell, Wendy D. and Manica, Matteo},
title = {Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model},
journal = {Journal of Chemical Information and Modeling},
volume = {62},
number = {2},
pages = {240-257},
year = {2022},
doi = {10.1021/acs.jcim.1c00889},
note ={PMID: 34905358},
URL = {https://doi.org/10.1021/acs.jcim.1c00889}
}