Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,25 @@ tags:
|
|
7 |
- chemistry
|
8 |
- biology
|
9 |
---
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
- chemistry
|
8 |
- biology
|
9 |
---
|
10 |
+
Chemlactica-125m is a continually pretrained galactica-125m model for organic molecules. It is pretrained on (soon-to-be-released) 40B tokens covering
|
11 |
+
110M+ molecules from PubChem as well as their chemical properties (molecular weight, synthetic accessibility score, drug-likeness etc.)
|
12 |
+
and similarities (Tanimoto distance between ECFP fingerprints).
|
13 |
+
|
14 |
+
Example prompts:
|
15 |
+
|
16 |
+
`[START_SMILES]CC(=O)OC1=CC=CC=C1C(=O)O[END_SMILES][SAS]` will attempt to predict the synthetic accessibility score of the given molecule.
|
17 |
+
|
18 |
+
`[SAS]2.25[/SAS][SIMILAR]0.62 CC(=O)OC1=CC=CC=C1C(=O)O[/SIMILAR][START_SMILES]` will attempt to generate a molecule that has 2.25 SAS score and
|
19 |
+
has a 0.62 similarity score to the given molecule.
|
20 |
+
|
21 |
+
The model can be wrapped into an optimization loop to traverse the chemical space with evolving prompts.
|
22 |
+
|
23 |
+
A preprint with the details of the model and an optimization algorithm built on top of this model that sets state-of-the-art on Practical Molecular Optimization
|
24 |
+
and other benchmarks will be released soon.
|
25 |
+
|
26 |
+
All numbers are rounded to two decimal points. Available tags: `[CLOGP]`, `[WEIGHT]`, `[QED]`, `[SAS]`, `[TPSA]`, `[RINGCOUNT]`...
|
27 |
+
|
28 |
+
The model is part of the 3-model family: [Chemlactica-125M](https://huggingface.co/yerevann/chemlactica-125m),
|
29 |
+
[Chemlactica-1.3B](https://huggingface.co/yerevann/chemlactica-1.3b) and [Chemma-2B](https://huggingface.co/yerevann/chemma-2b).
|
30 |
+
|
31 |
+
We are looking forward to see the community using the model in new applications and contexts.
|