rampasek commited on
Commit
8aa5d9f
1 Parent(s): d1f6c52

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -1
README.md CHANGED
@@ -1,8 +1,34 @@
 
 
 
 
 
 
 
 
 
1
  # ProtBert-BFD finetuned on Rosetta 20AA dataset
2
 
3
  This model is finetuned to predict Rosetta fold energy using a dataset of 100k 20AA sequences.
4
 
5
- The starting pretrained model is from ProtTrans.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  It was trained on protein sequences using a masked language modeling (MLM) objective. It was introduced in
7
  [this paper](https://doi.org/10.1101/2020.07.12.199554) and first released in
8
  [this repository](https://github.com/agemagician/ProtTrans).
 
 
1
+ ---
2
+ language: protein
3
+ tags:
4
+ - protein language model
5
+ datasets:
6
+ - BFD
7
+ - Custom Rosetta
8
+ ---
9
+
10
  # ProtBert-BFD finetuned on Rosetta 20AA dataset
11
 
12
  This model is finetuned to predict Rosetta fold energy using a dataset of 100k 20AA sequences.
13
 
14
+ Current model in this repo: `prot_bert_bfd-finetuned-032722_1752`
15
+
16
+ ## Performance
17
+
18
+ - 20AA sequences (1k eval set):\
19
+ Metrics: 'mae': 0.090115, 'r2': 0.991208, 'mse': 0.013034, 'rmse': 0.114165
20
+
21
+ - 40AA sequences (10k eval set):\
22
+ Metrics: 'mae': 0.537456, 'r2': 0.659122, 'mse': 0.448607, 'rmse': 0.669781
23
+
24
+ - 60AA sequences (10k eval set):\
25
+ Metrics: 'mae': 0.629267, 'r2': 0.506747, 'mse': 0.622476, 'rmse': 0.788972
26
+
27
+
28
+ ## `prot_bert_bfd` from ProtTrans
29
+ The starting pretrained model is from ProtTrans, trained on 2.1 billion proteins from BFD.
30
  It was trained on protein sequences using a masked language modeling (MLM) objective. It was introduced in
31
  [this paper](https://doi.org/10.1101/2020.07.12.199554) and first released in
32
  [this repository](https://github.com/agemagician/ProtTrans).
33
+
34
+ > Created by [Ladislav Rampasek](https://rampasek.github.io)