YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Pretrained model
This model is a pretrained autoregressive transformer model in GPT-style, trained on a large number of protein sequences. The training task used is "Sequence<...>".
Dataset: https://huggingface.co/datasets/lamm-mit/GPTProteinPretrained
Load pretrained model:
from transformers import AutoModelForCausalLM, AutoTokenizer
pretrained_model_name='lamm-mit/GPTProteinPretrained'
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
model_name = pretrained_model_name
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True
).to(device)
model.config.use_cache = False
Sample inference using the "Sequence<...>" task, where here, the model will simply autocomplete the sequence starting with "AIIAA":
import torch
device='cuda'
prompt = "Sequence<ETAVPKLLQAL"
generated = torch.tensor(tokenizer.encode(prompt, add_special_tokens = False)) .unsqueeze(0).to(device)
print(generated.shape, generated)
sample_outputs = model.generate(
inputs=generated,
eos_token_id =tokenizer.eos_token_id,
do_sample=True,
top_k=500,
max_length = 1024,
top_p=0.9,
num_return_sequences=1,
temperature=1,
).to(device)
for i, sample_output in enumerate(sample_outputs):
print("{}: {}\n\n".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
Output (here, three candidate sequences):
torch.Size([1, 57]) tensor([[ 86, 104, 116, 120, 104, 113, 102, 104, 63, 80, 74, 84, 72, 73,
89, 81, 84, 87, 90, 89, 81, 72, 73, 76, 79, 79, 74, 79,
86, 86, 71, 84, 81, 87, 84, 89, 73, 79, 73, 89, 79, 76,
79, 89, 80, 92, 76, 76, 87, 89, 89, 74, 81, 86, 79, 76,
79]], device='cuda:0')
0: Sequence<MGQEFVNQTWVNEFILLGLSSDQNTQVFLFVLILVMYIITVVGNSLILLLIRLDSRLHTPMYFFLSNLSFVDLCFSTTTVPQLLANFLSVHKSISFLGCVAQLYIFLTLGGTEFFLLGAMAYDRYVAVCYPLHYTVIMNWRVCTSLAVASWVSGFLNSLVHTVITFRLPFCGPNEIDHFFCEVPALLKLACADTSLNEMAMNACCVLILLIPFSLILISYTRILITILRMPSATGRRKAFSTCASHIIVVILFYGTAISTYIQPSSDPVADQDKLMALFYAILTPMLNPIIYSLRNKDVKGAWQKLLNKLRVTQKRKFMAVTLH>
Citation
To cite this work:
@article{WeiKaplanBuehler_2023,
title = {Generative pretrained autoregressive transformer graph neural network applied to the analysis and discovery of novel proteins},
author = {M.J. Buehler},
journal = {J. Appl. Phys.},
year = {2023},
volume = {},
pages = {},
url = {https://doi.org/10.1063/5.0157367}
}
- Downloads last month
- 90
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.