vumichien's picture
Set `library_name` to `tf-keras`. (#3)
8128342 verified
---
library_name: tf-keras
---
## Model description
This repo contains the model and the notebook for fine-tuning BERT model on SNLI Corpus for Semantic Similarity. [Drug Molecule Generation with VAE](https://keras.io/examples/generative/molecule_generation/).
Full credits go to [Victor Basu](https://www.linkedin.com/in/victor-basu-520958147/)
Reproduced by [Vu Minh Chien](https://www.linkedin.com/in/vumichien/)
Motivation: Using a Variational Autoencoder to generate molecules for drug discovery. Automatic chemical design using a data-driven continuous representation of molecules generates new molecules via efficient exploration of open-ended spaces of chemical compounds. The model consists of three components: Encoder, Decoder, and Predictor. The Encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the Decoder converts these continuous vectors back to discrete molecule representations. The Predictor estimates chemical properties from the latent continuous vector representation of the molecule. Continuous representations allow the use of gradient-based optimization to efficiently guide the search for optimized functional compounds.
![intro](https://bit.ly/3CtPMzM)
## Intended uses & limitations
In this example, RDKit is used to conveniently and efficiently transform SMILES into molecule objects, and then from those obtain sets of atoms and bonds. SMILES expresses the structure of a given molecule in the form of an ASCII string. The SMILES string is a compact encoding that, for smaller molecules, is relatively human-readable. Encoding molecules as a string both alleviates and facilitates database and/or web searching of a given molecule. RDKit uses algorithms to accurately transform a given SMILES to a molecule object, which can then be used to compute a great number of molecular properties/features.
## Training and evaluation data
The ZINC – A Free Database of Commercially Available Compounds for Virtual Screening dataset was used in this tutorial. The dataset comes with molecule formula in SMILE representation along with their respective molecular properties such as logP (water–octanal partition coefficient), SAS (synthetic accessibility score), and QED (Qualitative Estimate of Drug-likeness).
## Model Plot
<details>
<summary>View Model Plot</summary>
![Model Image](./model.png)
</details>
## Output samples
Latent spaces samples
![Latent spaces](./latent_space_clusters.png)
<details>
<summary>View samples</summary>
![Samples](./samples.png)
</details>