File size: 1,569 Bytes
0782659 bf3889c 0782659 e49c307 6b543de 0782659 6b543de 0782659 6b543de ba0cb3c cd93748 26b789a 0782659 6b543de 0782659 6b543de 0782659 ca01d39 6b543de 0782659 6b543de 0782659 6b543de 0782659 6b543de 0782659 6b543de 0782659 6b543de 0782659 6b543de 0782659 6b543de 0782659 6b543de 0782659 6b543de 0782659 6b543de |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
---
base_model: westlake-repl/SaProt_35M_AF2
library_name: peft
---
# Base model: [westlake-repl/SaProt_35M_AF2](https://huggingface.co/westlake-repl/SaProt_35M_AF2)
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This model is used to predict fitness of mutant β-subunit of tryptophan synthase (TrpB).
TrpB synthesizes L-tryptophan (Trp) from indole and L-serine (Ser).
TrpB variant Tm9D8*, derived from the hyperthermophile Thermotoga maritima, was selected as the parent enzyme.
Tm9D8* differs from wildtype TmTrpB by ten amino acid substitutions (P19G, E30G, I69V, K96L, P140L, N167D, I184F, L213P, G228S, and T292S).
### Task type
protein level regression
### Dataset description
The dataset is from [A combinatorially complete epistatic fitness landscape in an enzyme active site](https://www.biorxiv.org/content/10.1101/2024.06.23.600144v1).
The dataset can also be found at [SaProtHub dataset](https://huggingface.co/datasets/SaProtHub/TrpB_fitness_landsacpe_dataset).
Label means mutation fitness, here represents growth rate of E. coli strain. The maximum fitness is 1, the closer to 1, the better fitness.
### Model input type
Amino acid sequence
### Performance
test_pearson: 0.93
test_spearman: 0.38
### LoRA config
lora_dropout: 0.0
lora_alpha: 16
target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]
modules_to_save: ["classifier"]
### Training config
class: AdamW
betas: (0.9, 0.98)
weight_decay: 0.01
learning rate: 5e-4
epoch: 100
batch size: 100
precision: 16-mixed |