ZJU-Fangyin commited on
Commit
5f7be17
1 Parent(s): 58a67ce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md CHANGED
@@ -1,3 +1,76 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ ## Model description
6
+ This repo contains a large molecular generative model built with molecular language SELFIES.
7
+
8
+ ## Intended uses
9
+ You can use the model to generate molecules from scratch (i.e., inputting the bos_token), or input a partial structure for the model to complete.
10
+
11
+ ## How to use
12
+ We have provided two types of examples. You can modify the input, generation parameters, etc., according to your needs.
13
+
14
+ - Denovo molecule generation example:
15
+ ```python
16
+ from transformers import AutoTokenizer, LlamaForCausalLM
17
+ import torch
18
+
19
+ >>> tokenizer = AutoTokenizer.from_pretrained("zjunlp/MolGen-7b")
20
+ >>> model = LlamaForCausalLM.from_pretrained(
21
+ "zjunlp/MolGen-7b",
22
+ load_in_8bit=True,
23
+ torch_dtype=torch.float16,
24
+ device_map="auto",
25
+ )
26
+ >>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
27
+ >>> sf_input = tokenizer(tokenizer.bos_token, return_tensors="pt").to(device)
28
+
29
+ >>> molecules = model.generate(input_ids=sf_input["input_ids"],
30
+ attention_mask=sf_input["attention_mask"],
31
+ do_sample=True,
32
+ max_new_tokens=10,
33
+ top_p=0.75,
34
+ top_k=30,
35
+ return_dict_in_generate=False,
36
+ num_return_sequences=5,
37
+ )
38
+ >>> sf_output = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True).replace(" ","") for g in molecules]
39
+ ['[C][C][=C][C][=C][Branch2][Ring1][=Branch2][C][=Branch1]',
40
+ '[C][N][C][C][C][Branch2][Ring2][Ring2][N][C]',
41
+ '[C][O][C][=C][C][=C][C][Branch2][Ring1][Branch1]',
42
+ '[C][N][C][C][C@H1][Branch2][Ring1][Branch2][N][Branch1]',
43
+ '[C][=C][C][Branch2][Ring1][#C][C][=Branch1][C][=O]']
44
+ ```
45
+
46
+ - Molecular completion example:
47
+ ```python
48
+ from transformers import AutoTokenizer, LlamaForCausalLM
49
+ import torch
50
+
51
+ >>> tokenizer = AutoTokenizer.from_pretrained("zjunlp/MolGen-7b")
52
+ >>> model = LlamaForCausalLM.from_pretrained(
53
+ "zjunlp/MolGen-7b",
54
+ load_in_8bit=True,
55
+ torch_dtype=torch.float16,
56
+ device_map="auto",
57
+ )
58
+ >>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
59
+ >>> sf_input = tokenizer("[C][N][O]", return_tensors="pt").to(device)
60
+
61
+ >>> molecules = model.generate(input_ids=sf_input["input_ids"],
62
+ attention_mask=sf_input["attention_mask"],
63
+ do_sample=True,
64
+ max_new_tokens=10,
65
+ top_p=0.75,
66
+ top_k=30,
67
+ return_dict_in_generate=False,
68
+ num_return_sequences=5,
69
+ )
70
+ >>> sf_output = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True).replace(" ","") for g in molecules]
71
+ ['[C][N][O][C][=Branch1][C][=O][/C][Ring1][=Branch1][=C][/C][=C]',
72
+ '[C][N][O][/C][=Branch1][#Branch1][=C][/N][Branch1][C][C][C][C]',
73
+ '[C][N][O][/C][=C][/C][=C][C][=Branch1][C][=O][C][=C]',
74
+ '[C][N][O][C][=Branch1][C][=O][N][Branch1][C][C][C][=Branch1]',
75
+ '[C][N][O][Ring1][Branch1][C][C][C][C][C][C][C][C]']
76
+ ```