lamm-mit
/

ProteinForceGPT

 ---
 license: apache-2.0
 ---
+---
+license: apache-2.0
+---
+# ProteinForceGPT: Generative strategies for modeling, design and analysis of protein mechanics
+### Load model
+This model is an autoregressive transformer model in GPT-style, trained to analyze and predict the mechanical properties of a large number of protein sequences.
+The pretraining task is defined as "Sequence<...>" where ... is an amino acid sequence.
+Mechanics-related tasks are:
+CalculateForce<GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...> [0.262]',
+CalculateEnergy<GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...> [0.220]',
+CalculateForceEnergy<GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...> [0.262,0.220]',
+CalculateForceHistory<GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...> [0.004,0.034,0.125,0.142,0.159,0.102,0.079,0.073,0.131,0.105,0.071,0.058,0.072,0.060,0.049,0.114,0.122,0.108,0.173,0.192,0.208,0.153,0.212,0.222,0.244]',
+GenerateForce<0.262> [GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...]’
+GenerateForce<0.220> [GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...]’
+GenerateForceEnergy<0.262,0.220> [GEECDCGSPSNPCCDAATCKLRPGAQCADGLC...]’
+GenerateForceHistory<0.004,0.034,0.125,0.142,0.159,0.102,0.079,0.073,0.131,0.105,0.071,0.058,0.072,0.060,0.049,0.114,0.122,0.108,0.173,0.192,0.208,0.153,0.212,0.222,0.244> [GEECDCGSPSNPCCDAATCKLRPGAQCADGLCCDQCRFKKKRTICRIARGDFPDDRCTGQSADCPRWN]’
+Load pretrained model:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+pretrained_model_name='lamm-mit/ProteinForceGPT'
+tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name, trust_remote_code=True)
+tokenizer.pad_token = tokenizer.eos_token
+model_name = pretrained_model_name
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    trust_remote_code=True
+).to(device)
+model.config.use_cache = False
+```
+Sample inference using the "Sequence<...>" task, where here, the model will simply autocomplete the sequence starting with "AIIAA":
+```python
+prompt = "Sequence<GEECDC"
+generated = torch.tensor(tokenizer.encode(prompt, add_special_tokens = False)) .unsqueeze(0).to(device)
+print(generated.shape, generated)
+sample_outputs = model.generate(
+                                inputs=generated,
+                                eos_token_id =tokenizer.eos_token_id,
+                                do_sample=True,
+                                top_k=500,
+                                max_length = 300,
+                                top_p=0.9,
+                                num_return_sequences=1,
+                                temperature=1,
+                                ).to(device)
+for i, sample_output in enumerate(sample_outputs):
+      print("{}: {}\n\n".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
+```
+Sample inference using the "CalculateForce<...>" task, where here, the model will calculate the maximum unfolding force of a given sequence:
+```python
+prompt = "'CalculateForce<GEECDCGSPSNPCCDAATCKLRPGAQCADGLCCDQCRFKKKRTICRIARGDFPDDRCTGQSADCPRWN>"
+generated = torch.tensor(tokenizer.encode(prompt, add_special_tokens = False)) .unsqueeze(0).to(device)
+sample_outputs = model.generate(
+                                inputs=generated,
+                                eos_token_id =tokenizer.eos_token_id,
+                                do_sample=True,
+                                top_k=500,
+                                max_length = 300,
+                                top_p=0.9,
+                                num_return_sequences=3,
+                                temperature=1,
+                                ).to(device)
+for i, sample_output in enumerate(sample_outputs):
+      print("{}: {}\n\n".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
+```
+Output:
+```raw
+0: CalculateForce<GEECDCGSPSNPCCDAATCKLRPGAQCADGLCCDQCRFKKKRTICRIARGDFPDDRCTGQSADCPRWN> [0.262]```
+'
+## Citation
+To cite this work:
+```
+@article{GhafarollahiBuehler_2024,
+    title   = {ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning },
+    author  = {A. Ghafarollahi, M.J. Buehler},
+    journal = {},
+    year    = {2024},
+    volume  = {},
+    pages   = {},
+    url     = {}
+}
+```