yerevann commited on
Commit
ca17666
1 Parent(s): fec96a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -7,4 +7,25 @@ tags:
7
  - chemistry
8
  - biology
9
  ---
10
- This is the model card for chemlactica-125m, a continually pretrained galactica-125m model for chemistry.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - chemistry
8
  - biology
9
  ---
10
+ Chemlactica-125m is a continually pretrained galactica-125m model for organic molecules. It is pretrained on (soon-to-be-released) 40B tokens covering
11
+ 110M+ molecules from PubChem as well as their chemical properties (molecular weight, synthetic accessibility score, drug-likeness etc.)
12
+ and similarities (Tanimoto distance between ECFP fingerprints).
13
+
14
+ Example prompts:
15
+
16
+ `[START_SMILES]CC(=O)OC1=CC=CC=C1C(=O)O[END_SMILES][SAS]` will attempt to predict the synthetic accessibility score of the given molecule.
17
+
18
+ `[SAS]2.25[/SAS][SIMILAR]0.62 CC(=O)OC1=CC=CC=C1C(=O)O[/SIMILAR][START_SMILES]` will attempt to generate a molecule that has 2.25 SAS score and
19
+ has a 0.62 similarity score to the given molecule.
20
+
21
+ The model can be wrapped into an optimization loop to traverse the chemical space with evolving prompts.
22
+
23
+ A preprint with the details of the model and an optimization algorithm built on top of this model that sets state-of-the-art on Practical Molecular Optimization
24
+ and other benchmarks will be released soon.
25
+
26
+ All numbers are rounded to two decimal points. Available tags: `[CLOGP]`, `[WEIGHT]`, `[QED]`, `[SAS]`, `[TPSA]`, `[RINGCOUNT]`...
27
+
28
+ The model is part of the 3-model family: [Chemlactica-125M](https://huggingface.co/yerevann/chemlactica-125m),
29
+ [Chemlactica-1.3B](https://huggingface.co/yerevann/chemlactica-1.3b) and [Chemma-2B](https://huggingface.co/yerevann/chemma-2b).
30
+
31
+ We are looking forward to see the community using the model in new applications and contexts.