unikei
/

bert-base-smiles

Inference Endpoints

Model card Files Files and versions Community

unikei commited on Sep 26, 2023

Commit

756af86

·

1 Parent(s): 0bdc322

Update README.md

Files changed (1) hide show

README.md +33 -1

README.md CHANGED Viewed

@@ -2,5 +2,37 @@
 license: bigscience-openrail-m
 widget:
 - text: CC(Sc1nn(-c2ccc(Cl)cc2)c([MASK])s1)C(=O)NCC1CCCO1
 pipeline_tag: fill-mask
----

 license: bigscience-openrail-m
 widget:
 - text: CC(Sc1nn(-c2ccc(Cl)cc2)c([MASK])s1)C(=O)NCC1CCCO1
+datasets:
+- ChEMBL
 pipeline_tag: fill-mask
+---
+# BERT base for SMILES
+This is bidirectional transformer pretrained on SMILES (simplified molecular-input line-entry system) strings.
+Example: Amoxicillin
+```
+O=C([C@@H](c1ccc(cc1)O)N)N[C@@H]1C(=O)N2[C@@H]1SC([C@@H]2C(=O)O)(C)C
+```
+Two training objectives were used:
+1. masked language modeling
+2. molecular-formula validity prediction
+## Intended uses
+This model is primarily aimed at being fine-tuned on the following tasks:
+- molecule classification
+- molecule-to-gene-expression mapping
+- cell targeting
+## How to use in your code
+```python
+from transformers import BertTokenizerFast, BertModel
+checkpoint = 'unikei/bert-base-smiles'
+tokenizer = BertTokenizerFast.from_pretrained(checkpoint)
+model = BertModel.from_pretrained(checkpoint)
+example = 'O=C([C@@H](c1ccc(cc1)O)N)N[C@@H]1C(=O)N2[C@@H]1SC([C@@H]2C(=O)O)(C)C'
+tokens = tokenizer(example, return_tensors='pt')
+predictions = model(**tokens)
+```