learn2therm / README.md
evankomp's picture
Update README.md
ecf60fd
|
raw
history blame
1.08 kB
metadata
license: mit
tags:
  - protein
  - thermostability

Purpose: classifies protein sequence into Thermophilic (> 60C) or Mesophilic (<40C) by host organism growth temperature.

Training: ProteinBERT (Rostlab/prot_bert) was fine tuned on a class balanced version of learn2therm (see here), about 250k protein amino acid sequences.

Training parameters below: TODO

See the training repository for code.

Usage: Prepare sequences identically to using the original pretrained model:

from transformers import BertModelForSequenceClassification, BertTokenizer
import torch
import re
tokenizer = BertTokenizer.from_pretrained("evankomp/learn2therm", do_lower_case=False )
model = BertModelForSequenceClassification.from_pretrained("evankomp/learn2therm")
sequence_Example = "A E T C Z A O"
sequence_Example = re.sub(r"[UZOB]", "X", sequence_Example)
encoded_input = tokenizer(sequence_Example, return_tensors='pt')
output = torch.argmax(model(**encoded_input), dim=1)

1 indicates thermophilic, 0 mesophilic.