jannisborn's picture
update
e83e5dc unverified

A newer version of the Gradio SDK is available: 4.44.1

Upgrade

Model documentation & parameters

Algorithm Version: Which model version to use.

Property goals: One or multiple properties that will be optimized.

Protein target: An AAS of a protein target used for conditioning. Leave blank unless you use affinity as a property goal.

Decoding temperature: The temperature parameter in the SMILES/SELFIES decoder. Higher values lead to more explorative choices, smaller values culminate in mode collapse.

Maximal sequence length: The maximal number of SMILES tokens in the generated molecule.

Number of samples: How many samples should be generated (between 1 and 50).

Limit: Hypercube limits in the latent space.

Number of steps: Number of steps for a GP optmization round. The longer the slower. Has to be at least Number of initial points.

Number of initial points: Number of initial points evaluated. The longer the slower.

Number of optimization rounds: Maximum number of optimization rounds.

Sampling variance: Variance of the Gaussian noise applied during sampling from the optimal point.

Samples for evaluation: Number of samples averaged for each minimization function evaluation.

Max. sampling steps: Maximum number of sampling steps in an optmization round.

Seed: The random seed used for initialization.

Model card -- PaccMannGP

Model Details: PaccMannGP is a language-based Variational Autoencoder that is coupled with a GaussianProcess for controlled sampling. This model systematically explores the latent space of a trained molecular VAE.

Developers: Jannis Born, Matteo Manica and colleagues from IBM Research.

Distributors: Original authors' code wrapped and distributed by GT4SD Team (2023) from IBM Research.

Model date: Published in 2022.

Model version: A molecular VAE trained on 1.5M molecules from ChEMBL.

Model type: A language-based molecular generative model that can be explored with Gaussian Processes to generate molecules with desired properties.

Information about training algorithms, parameters, fairness constraints or other applied approaches, and features: Described in the original paper.

Paper or other resource for more information: Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model (2022; Journal of Chemical Information & Modeling).

License: MIT

Where to send questions or comments about the model: Open an issue on GT4SD repository.

Intended Use. Use cases that were envisioned during development: Chemical research, in particular drug discovery.

Primary intended uses/users: Researchers and computational chemists using the model for model comparison or research exploration purposes.

Out-of-scope use cases: Production-level inference, producing molecules with harmful properties.

Factors: Not applicable.

Metrics: High reward on generating molecules with desired properties.

Datasets: ChEMBL.

Ethical Considerations: Unclear, please consult with original authors in case of questions.

Caveats and Recommendations: Unclear, please consult with original authors in case of questions.

Model card prototype inspired by Mitchell et al. (2019)

Citation

@article{born2022active,
    author = {Born, Jannis and Huynh, Tien and Stroobants, Astrid and Cornell, Wendy D. and Manica, Matteo},
    title = {Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model},
    journal = {Journal of Chemical Information and Modeling},
    volume = {62},
    number = {2},
    pages = {240-257},
    year = {2022},
    doi = {10.1021/acs.jcim.1c00889},
    note ={PMID: 34905358},
    URL = {https://doi.org/10.1021/acs.jcim.1c00889}
}