---
base_model: westlake-repl/SaProt_35M_AF2
library_name: peft
---
# Base model: [westlake-repl/SaProt_35M_AF2](https://huggingface.co/westlake-repl/SaProt_35M_AF2)

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This model is used to predict fitness of GB1 protein variants. 
### Task type
protein level regression

### Dataset description
The dataset is from:

Nicholas C Wu, Lei Dai, C Anders Olson, James O Lloyd-Smith, Ren Sun (2016) Adaptation in protein fitness landscapes is facilitated by indirect paths eLife 5:e16965
https://doi.org/10.7554/eLife.16965

Label is the fitness of mutant protein. The fitness of each variant can be viewed as the fitness relative to wildtype,
such that = 1. Therefore all labels are larger than 0, if label >1 means high fitness compare to wildtype.

### Model input type
Amino acid sequence

### Performance
test_spearman: 0.54

test_pearson: 0.98

### LoRA config
lora_dropout: 0.0

lora_alpha: 16

target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]

modules_to_save: ["classifier"]

### Training config
class: AdamW

betas: (0.9, 0.98)

weight_decay: 0.01

learning rate: 1e-3

epoch: 20

batch size: 1000

precision: 16-mixed