Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ tags:
|
|
8 |
- biology
|
9 |
---
|
10 |
Chemlactica-125m is a continually pretrained [galactica-125m](https://huggingface.co/facebook/galactica-125m) model for organic molecules.
|
11 |
-
It is pretrained on
|
12 |
(molecular weight, synthetic accessibility score, drug-likeness etc.)
|
13 |
and similarities (Tanimoto distance between ECFP fingerprints).
|
14 |
|
@@ -19,10 +19,10 @@ Example prompts:
|
|
19 |
`</s>[SAS]2.25[/SAS][SIMILAR]0.62 CC(=O)OC1=CC=CC=C1C(=O)O[/SIMILAR][START_SMILES]` will attempt to generate a molecule that has 2.25 SAS score and
|
20 |
has a 0.62 similarity score to the given molecule.
|
21 |
|
22 |
-
The model can be wrapped into an optimization loop to traverse the chemical space with evolving prompts.
|
23 |
|
24 |
-
A preprint with the details of the model and an optimization algorithm built on top of this model that sets state-of-the-art on
|
25 |
-
and other benchmarks
|
26 |
|
27 |
Few notes:
|
28 |
* All queries should start with `</s>` symbol.
|
|
|
8 |
- biology
|
9 |
---
|
10 |
Chemlactica-125m is a continually pretrained [galactica-125m](https://huggingface.co/facebook/galactica-125m) model for organic molecules.
|
11 |
+
It is pretrained on [40B tokens covering 110M+ molecules from PubChem](https://huggingface.co/datasets/yerevann/PubChemForLM) as well as their chemical properties
|
12 |
(molecular weight, synthetic accessibility score, drug-likeness etc.)
|
13 |
and similarities (Tanimoto distance between ECFP fingerprints).
|
14 |
|
|
|
19 |
`</s>[SAS]2.25[/SAS][SIMILAR]0.62 CC(=O)OC1=CC=CC=C1C(=O)O[/SIMILAR][START_SMILES]` will attempt to generate a molecule that has 2.25 SAS score and
|
20 |
has a 0.62 similarity score to the given molecule.
|
21 |
|
22 |
+
The model can be wrapped into an optimization loop to traverse the chemical space with evolving prompts. See the [code on GitHub](https://github.com/YerevaNN/ChemLactica).
|
23 |
|
24 |
+
A preprint with the details of the model and an optimization algorithm built on top of this model that sets state-of-the-art on
|
25 |
+
Practical Molecular Optimization and other benchmarks is [available on arxiv](https://arxiv.org/abs/2407.18897).
|
26 |
|
27 |
Few notes:
|
28 |
* All queries should start with `</s>` symbol.
|