avuhong
/

PiccoviralesGPT

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

avuhong commited on Mar 16, 2023

Commit

83386b9

•

1 Parent(s): 63b1277

Update README.md

Files changed (1) hide show

README.md +39 -1

README.md CHANGED Viewed

@@ -7,6 +7,9 @@ metrics:
 model-index:
 - name: output_v3
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -25,7 +28,42 @@ More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data

 model-index:
 - name: output_v3
   results: []
+widget:
+  - text: >-
+      <|endoftext|>MAADGYLPDWLEDNLSEGIREWWALKPGAPQPKANQQHQDNARGLVLPGYKYLGPGNGL
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 ## Intended uses & limitations
+### Generate novel sequences for viral capsid proteins
+### Calculate the perplexity of a protein sequence
+```python
+def calculatePerplexity(sequence, model, tokenizer):
+    input_ids = torch.tensor(tokenizer.encode(sequence)).unsqueeze(0)
+    input_ids = input_ids.to(device)
+    with torch.no_grad():
+        outputs = model(input_ids, labels=input_ids)
+    loss, logits = outputs[:2]
+    return math.exp(loss)
+def split_sequence(sequence):
+    chunks = []
+    max_i = 0
+    for i in range(0, len(sequence), 60):
+        chunk = sequence[i:i+60]
+        if i == 0:
+            chunk = '<|endoftext|>' + chunk[:-1]
+        chunks.append(chunk)
+        max_i = i
+    chunks = '\n'.join(chunks)
+    if max_i+61==len(sequence):
+        chunks = chunks+"\n<|endoftext|>"
+    else:
+        chunks = chunks+"<|endoftext|>"
+    return chunks
+seq = "MAADGYLPDWLEDNLSEGIREWWALKPGAPQPKANQQHQDNARGLVLPGYKYLGPGNGLDKGEPVNAADAAALEHDKAYDQQLKAGDNPYLKYNHADAEFQERLKEDTSFGGNLGRAVFQAKKRLLEPLGLVEEAAKTAPGKKRPVEQSPQEPDSSAGIGKSGAQPAKKRLNFGQTGDTESVPDPQPIGEPPAAPSGVGSLTMASGGGAPVADNNEGADGVGSSSGNWHCDSQWLGDRVITTSTRTWALPTYNNHLYKQISNSTSGGSSNDNAYFGYSTPWGYFDFNRFHCHFSPRDWQRLINNNWGFRPKRLNFKLFNIQVKEVTDNNGVKTIANNLTSTVQVFTDSDYQLPYVLGSAHEGCLPPFPADVFMIPQYGYLTLNDGSQAVGRSSFYCLEYFPSQMLRTGNNFQFSYEFENVPFHSSYAHSQSLDRLMNPLIDQYLYYLSKTINGSGQNQQTLKFSVAGPSNMAVQGRNYIPGPSYRQQRVSTTVTQNNNSEFAWPGASSWALNGRNSLMNPGPAMASHKEGEDRFFPLSGSLIFGKQGTGRDNVDADKVMITNEEEIKTTNPVATESYGQVATNHQSAQAQAQTGWVQNQGILPGMVWQDRDVYLQGPIWAKIPHTDGNFHPSPLMGGFGMKHPPPQILIKNTPVPADPPTAFNKDKLNSFITQYSTGQVSVEIEWELQKENSKRWNPEIQYTSNYYKSNNVEFAVNTEGVYSEPRPIGTRYLTRNL"
+seq = split_sequence(seq)
+print(f"{calculatePerplexity(seq, model, tokenizer):.2f}")
+```
 ## Training and evaluation data