File size: 1,149 Bytes
4b2303c
 
 
97af73d
 
 
 
 
 
 
5c4d1d5
97af73d
 
5c4d1d5
97af73d
 
5c4d1d5
97af73d
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
license: apache-2.0
---


This model can be used to generate a SMILES string from an input caption.

## Example Usage
```python
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("laituan245/molt5-small-caption2smiles", model_max_length=512)
model = T5ForConditionalGeneration.from_pretrained('laituan245/molt5-small-caption2smiles')

input_text = 'The molecule is a monomethoxybenzene that is 2-methoxyphenol substituted by a hydroxymethyl group at position 4. It has a role as a plant metabolite. It is a member of guaiacols and a member of benzyl alcohols.'
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids, num_beams=5, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# The model will generate "COC1=C(C=CC(=C1)CCCO)O". The ground-truth is "COC1=C(C=CC(=C1)CO)O".
```

## Paper

For more information, please take a look at our paper.

Paper: [Translation between Molecules and Natural Language](https://arxiv.org/abs/2204.11817)

Authors: *Carl Edwards\*, Tuan Lai\*, Kevin Ros, Garrett Honke, Heng Ji*