File size: 1,090 Bytes
5ae9b45
 
 
 
 
 
d41d57c
 
b2faf81
5ae9b45
 
363cdaa
8575546
5ae9b45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
license: mit
---

**Base model:** [westlake-repl/SaProt_650M_AF2](https://huggingface.co/westlake-repl/SaProt_650M_AF2)

**Task type:** protein-level regression

**Dataset:** This dataset contains over 100K mutants derived from the wild type EYFP protein. The number of samples for 
training, validation and test is 100317, 5969 and 5968. 10% of double-site mutants and 10% of triple-site mutants were used for validation and test 
respectively, and the remains for training. This model was trained by Jia Zheng's lab at Westlake University. The dataset will be released later by this team.

**Model input type:** Amino acid sequence

**Performance (on test set):** 0.94 Spearman's ρ

**LoRA config:**
- **r:** 8
- **lora_dropout:** 0.0
- **lora_alpha:** 16
- **target_modules:** ["query", "key", "value", "intermediate.dense", "output.dense"]
- **modules_to_save:** ["classifier"]

**Training config:**

- **optimizer:**
  - **class:** AdamW
  - **betas:** (0.9, 0.98)
  - **weight_decay:** 0.01
- **learning rate:** 1e-4
- **epoch:** 20
- **batch size:** 64
- **precision:** 16-mixed