--- license: mit tags: - protein - thermostability --- __Purpose__: classifies protein sequence into Thermophilic (> 60C) or Mesophilic (<40C) by host organism growth temperature. __Training__: ProteinBERT (Rostlab/prot_bert) was fine tuned on a class balanced version of learn2therm (see [here]()), about 250k protein amino acid sequences. Training parameters below: TODO See the [training repository](https://github.com/BeckResearchLab/learn2thermML) for code. __Usage__: Prepare sequences identically to using the original pretrained model: ``` from transformers import BertModelForSequenceClassification, BertTokenizer import torch import re tokenizer = BertTokenizer.from_pretrained("evankomp/learn2therm", do_lower_case=False ) model = BertModelForSequenceClassification.from_pretrained("evankomp/learn2therm") sequence_Example = "A E T C Z A O" sequence_Example = re.sub(r"[UZOB]", "X", sequence_Example) encoded_input = tokenizer(sequence_Example, return_tensors='pt') output = torch.argmax(model(**encoded_input), dim=1) ``` 1 indicates thermophilic, 0 mesophilic.