rampasek
/

prot_bert_bfd_rosetta20aa

Text Classification

protein language model

Inference Endpoints

Model card Files Files and versions Community

rampasek commited on Mar 29, 2022

Commit

8aa5d9f

•

1 Parent(s): d1f6c52

Update README.md

Files changed (1) hide show

README.md +27 -1

README.md CHANGED Viewed

@@ -1,8 +1,34 @@
 # ProtBert-BFD finetuned on Rosetta 20AA dataset
 This model is finetuned to predict Rosetta fold energy using a dataset of 100k 20AA sequences.
-The starting pretrained model is from ProtTrans.
 It was trained on protein sequences using a masked language modeling (MLM) objective. It was introduced in
 [this paper](https://doi.org/10.1101/2020.07.12.199554) and first released in
 [this repository](https://github.com/agemagician/ProtTrans).

+---
+language: protein
+tags:
+- protein language model
+datasets:
+- BFD
+- Custom Rosetta
+---
 # ProtBert-BFD finetuned on Rosetta 20AA dataset
 This model is finetuned to predict Rosetta fold energy using a dataset of 100k 20AA sequences.
+Current model in this repo: `prot_bert_bfd-finetuned-032722_1752`
+## Performance
+- 20AA sequences (1k eval set):\
+Metrics: 'mae': 0.090115, 'r2': 0.991208, 'mse': 0.013034, 'rmse': 0.114165
+- 40AA sequences (10k eval set):\
+Metrics: 'mae': 0.537456, 'r2': 0.659122, 'mse': 0.448607, 'rmse': 0.669781
+- 60AA sequences (10k eval set):\
+Metrics: 'mae': 0.629267, 'r2': 0.506747, 'mse': 0.622476, 'rmse': 0.788972
+## `prot_bert_bfd` from ProtTrans
+The starting pretrained model is from ProtTrans, trained on 2.1 billion proteins from BFD.
 It was trained on protein sequences using a masked language modeling (MLM) objective. It was introduced in
 [this paper](https://doi.org/10.1101/2020.07.12.199554) and first released in
 [this repository](https://github.com/agemagician/ProtTrans).
+> Created by [Ladislav Rampasek](https://rampasek.github.io)