File size: 1,086 Bytes
8aa5d9f
 
 
 
 
 
 
 
 
d1f6c52
 
 
 
8aa5d9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d1f6c52
 
 
8aa5d9f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
language: protein
tags:
- protein language model
datasets:
- BFD
- Custom Rosetta
---

# ProtBert-BFD finetuned on Rosetta 20AA dataset

This model is finetuned to predict Rosetta fold energy using a dataset of 100k 20AA sequences.

Current model in this repo: `prot_bert_bfd-finetuned-032722_1752`

## Performance

- 20AA sequences (1k eval set):\
Metrics: 'mae': 0.090115, 'r2': 0.991208, 'mse': 0.013034, 'rmse': 0.114165

- 40AA sequences (10k eval set):\
Metrics: 'mae': 0.537456, 'r2': 0.659122, 'mse': 0.448607, 'rmse': 0.669781

- 60AA sequences (10k eval set):\
Metrics: 'mae': 0.629267, 'r2': 0.506747, 'mse': 0.622476, 'rmse': 0.788972


## `prot_bert_bfd` from ProtTrans
The starting pretrained model is from ProtTrans, trained on 2.1 billion proteins from BFD.
It was trained on protein sequences using a masked language modeling (MLM) objective. It was introduced in
[this paper](https://doi.org/10.1101/2020.07.12.199554) and first released in
[this repository](https://github.com/agemagician/ProtTrans).

> Created by [Ladislav Rampasek](https://rampasek.github.io)