rampasek
/

prot_bert_bfd_rosetta20aa

Text Classification

protein language model

Inference Endpoints

Model card Files Files and versions Community

prot_bert_bfd_rosetta20aa / README.md

rampasek's picture

Update README.md

8aa5d9f about 2 years ago

|

raw history blame contribute delete

No virus

1.09 kB

	---
	language: protein
	tags:
	- protein language model
	datasets:
	- BFD
	- Custom Rosetta
	---

	# ProtBert-BFD finetuned on Rosetta 20AA dataset

	This model is finetuned to predict Rosetta fold energy using a dataset of 100k 20AA sequences.

	Current model in this repo: `prot_bert_bfd-finetuned-032722_1752`

	## Performance

	- 20AA sequences (1k eval set):\
	Metrics: 'mae': 0.090115, 'r2': 0.991208, 'mse': 0.013034, 'rmse': 0.114165

	- 40AA sequences (10k eval set):\
	Metrics: 'mae': 0.537456, 'r2': 0.659122, 'mse': 0.448607, 'rmse': 0.669781

	- 60AA sequences (10k eval set):\
	Metrics: 'mae': 0.629267, 'r2': 0.506747, 'mse': 0.622476, 'rmse': 0.788972


	## `prot_bert_bfd` from ProtTrans
	The starting pretrained model is from ProtTrans, trained on 2.1 billion proteins from BFD.
	It was trained on protein sequences using a masked language modeling (MLM) objective. It was introduced in
	[this paper](https://doi.org/10.1101/2020.07.12.199554) and first released in
	[this repository](https://github.com/agemagician/ProtTrans).

	> Created by [Ladislav Rampasek](https://rampasek.github.io)