mjbuehler commited on
Commit
7756bc9
·
verified ·
1 Parent(s): 80cbd0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -0
README.md CHANGED
@@ -1,3 +1,105 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ ---
5
+ license: apache-2.0
6
+ ---
7
+ # ProteinForceGPT: Generative strategies for modeling, design and analysis of protein mechanics
8
+
9
+
10
+ ### Load model
11
+
12
+ This model is an autoregressive transformer model in GPT-style, trained to analyze and predict the mechanical properties of a large number of protein sequences.
13
+
14
+ The pretraining task is defined as "Sequence<...>" where ... is an amino acid sequence.
15
+
16
+ Mechanics-related tasks are:
17
+
18
+ CalculateForce<GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...> [0.262]',
19
+ CalculateEnergy<GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...> [0.220]',
20
+ CalculateForceEnergy<GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...> [0.262,0.220]',
21
+ CalculateForceHistory<GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...> [0.004,0.034,0.125,0.142,0.159,0.102,0.079,0.073,0.131,0.105,0.071,0.058,0.072,0.060,0.049,0.114,0.122,0.108,0.173,0.192,0.208,0.153,0.212,0.222,0.244]',
22
+ GenerateForce<0.262> [GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...]’
23
+ GenerateForce<0.220> [GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...]’
24
+ GenerateForceEnergy<0.262,0.220> [GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...]’
25
+ GenerateForceHistory<0.004,0.034,0.125,0.142,0.159,0.102,0.079,0.073,0.131,0.105,0.071,0.058,0.072,0.060,0.049,0.114,0.122,0.108,0.173,0.192,0.208,0.153,0.212,0.222,0.244> [GEECDCGSPSNPCCDAATCKLRPGAQCADGLCCDQCRFKKKRTICRIARGDFPDDRCTGQSADCPRWN]’
26
+
27
+ Load pretrained model:
28
+
29
+ ```python
30
+ from transformers import AutoModelForCausalLM, AutoTokenizer
31
+
32
+ pretrained_model_name='lamm-mit/ProteinForceGPT'
33
+
34
+ tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name, trust_remote_code=True)
35
+ tokenizer.pad_token = tokenizer.eos_token
36
+
37
+ model_name = pretrained_model_name
38
+
39
+ model = AutoModelForCausalLM.from_pretrained(
40
+ model_name,
41
+ trust_remote_code=True
42
+ ).to(device)
43
+
44
+ model.config.use_cache = False
45
+ ```
46
+
47
+ Sample inference using the "Sequence<...>" task, where here, the model will simply autocomplete the sequence starting with "AIIAA":
48
+
49
+ ```python
50
+ prompt = "Sequence<GEECDC"
51
+ generated = torch.tensor(tokenizer.encode(prompt, add_special_tokens = False)) .unsqueeze(0).to(device)
52
+ print(generated.shape, generated)
53
+
54
+ sample_outputs = model.generate(
55
+ inputs=generated,
56
+ eos_token_id =tokenizer.eos_token_id,
57
+ do_sample=True,
58
+ top_k=500,
59
+ max_length = 300,
60
+ top_p=0.9,
61
+ num_return_sequences=1,
62
+ temperature=1,
63
+ ).to(device)
64
+
65
+ for i, sample_output in enumerate(sample_outputs):
66
+ print("{}: {}\n\n".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
67
+ ```
68
+ Sample inference using the "CalculateForce<...>" task, where here, the model will calculate the maximum unfolding force of a given sequence:
69
+
70
+ ```python
71
+ prompt = "'CalculateForce<GEECDCGSPSNPCCDAATCKLRPGAQCADGLCCDQCRFKKKRTICRIARGDFPDDRCTGQSADCPRWN>"
72
+ generated = torch.tensor(tokenizer.encode(prompt, add_special_tokens = False)) .unsqueeze(0).to(device)
73
+
74
+ sample_outputs = model.generate(
75
+ inputs=generated,
76
+ eos_token_id =tokenizer.eos_token_id,
77
+ do_sample=True,
78
+ top_k=500,
79
+ max_length = 300,
80
+ top_p=0.9,
81
+ num_return_sequences=3,
82
+ temperature=1,
83
+ ).to(device)
84
+
85
+ for i, sample_output in enumerate(sample_outputs):
86
+ print("{}: {}\n\n".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
87
+ ```
88
+ Output:
89
+ ```raw
90
+ 0: CalculateForce<GEECDCGSPSNPCCDAATCKLRPGAQCADGLCCDQCRFKKKRTICRIARGDFPDDRCTGQSADCPRWN> [0.262]```
91
+ '
92
+
93
+ ## Citation
94
+ To cite this work:
95
+ ```
96
+ @article{GhafarollahiBuehler_2024,
97
+ title = {ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning },
98
+ author = {A. Ghafarollahi, M.J. Buehler},
99
+ journal = {},
100
+ year = {2024},
101
+ volume = {},
102
+ pages = {},
103
+ url = {}
104
+ }
105
+ ```