unikei commited on
Commit
756af86
1 Parent(s): 0bdc322

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -2,5 +2,37 @@
2
  license: bigscience-openrail-m
3
  widget:
4
  - text: CC(Sc1nn(-c2ccc(Cl)cc2)c([MASK])s1)C(=O)NCC1CCCO1
 
 
5
  pipeline_tag: fill-mask
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: bigscience-openrail-m
3
  widget:
4
  - text: CC(Sc1nn(-c2ccc(Cl)cc2)c([MASK])s1)C(=O)NCC1CCCO1
5
+ datasets:
6
+ - ChEMBL
7
  pipeline_tag: fill-mask
8
+ ---
9
+
10
+ # BERT base for SMILES
11
+ This is bidirectional transformer pretrained on SMILES (simplified molecular-input line-entry system) strings.
12
+
13
+ Example: Amoxicillin
14
+ ```
15
+ O=C([C@@H](c1ccc(cc1)O)N)N[C@@H]1C(=O)N2[C@@H]1SC([C@@H]2C(=O)O)(C)C
16
+ ```
17
+
18
+ Two training objectives were used:
19
+ 1. masked language modeling
20
+ 2. molecular-formula validity prediction
21
+
22
+ ## Intended uses
23
+ This model is primarily aimed at being fine-tuned on the following tasks:
24
+ - molecule classification
25
+ - molecule-to-gene-expression mapping
26
+ - cell targeting
27
+
28
+ ## How to use in your code
29
+ ```python
30
+ from transformers import BertTokenizerFast, BertModel
31
+ checkpoint = 'unikei/bert-base-smiles'
32
+ tokenizer = BertTokenizerFast.from_pretrained(checkpoint)
33
+ model = BertModel.from_pretrained(checkpoint)
34
+
35
+ example = 'O=C([C@@H](c1ccc(cc1)O)N)N[C@@H]1C(=O)N2[C@@H]1SC([C@@H]2C(=O)O)(C)C'
36
+ tokens = tokenizer(example, return_tensors='pt')
37
+ predictions = model(**tokens)
38
+ ```