evankomp commited on
Commit
2a53964
1 Parent(s): 407f706

Create README.md

Browse files

__Purpose__: classifies protein sequence into Thermophilic (> 60C) or Mesophilic (<40C) by host organism growth temperature.

__Training__:
ProteinBERT (Rostlab/prot_bert) was fine tuned on a class balanced version of learn2therm (see [here]()), about 250k protein amino acid sequences.

Training parameters below:
TODO

See the [training repository](https://github.com/BeckResearchLab/learn2thermML) for code.

__Usage__:
Prepare sequences identically to using the original pretrained model:

```
from transformers import BertModelForSequenceClassification, BertTokenizer
import torch
import re
tokenizer = BertTokenizer.from_pretrained("evankomp/learn2therm", do_lower_case=False )
model = BertModelForSequenceClassification.from_pretrained("evankomp/learn2therm")
sequence_Example = "A E T C Z A O"
sequence_Example = re.sub(r"[UZOB]", "X", sequence_Example)
encoded_input = tokenizer(sequence_Example, return_tensors='pt')
output = torch.argmax(model(**encoded_input), dim=1)
```

1 indicates thermophilic, 0 mesophilic.

Files changed (1) hide show
  1. README.md +6 -0
README.md ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - protein
5
+ - thermostability
6
+ ---