tm9d8s_fitness / README.md
FarmerTao's picture
Update README.md
bf3889c verified
---
base_model: westlake-repl/SaProt_35M_AF2
library_name: peft
---
# Base model: [westlake-repl/SaProt_35M_AF2](https://huggingface.co/westlake-repl/SaProt_35M_AF2)
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This model is used to predict fitness of mutant β-subunit of tryptophan synthase (TrpB).
TrpB synthesizes L-tryptophan (Trp) from indole and L-serine (Ser).
TrpB variant Tm9D8*, derived from the hyperthermophile Thermotoga maritima, was selected as the parent enzyme.
Tm9D8* differs from wildtype TmTrpB by ten amino acid substitutions (P19G, E30G, I69V, K96L, P140L, N167D, I184F, L213P, G228S, and T292S).
### Task type
protein level regression
### Dataset description
The dataset is from [A combinatorially complete epistatic fitness landscape in an enzyme active site](https://www.biorxiv.org/content/10.1101/2024.06.23.600144v1).
The dataset can also be found at [SaProtHub dataset](https://huggingface.co/datasets/SaProtHub/TrpB_fitness_landsacpe_dataset).
Label means mutation fitness, here represents growth rate of E. coli strain. The maximum fitness is 1, the closer to 1, the better fitness.
### Model input type
Amino acid sequence
### Performance
test_pearson: 0.93
test_spearman: 0.38
### LoRA config
lora_dropout: 0.0
lora_alpha: 16
target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]
modules_to_save: ["classifier"]
### Training config
class: AdamW
betas: (0.9, 0.98)
weight_decay: 0.01
learning rate: 5e-4
epoch: 100
batch size: 100
precision: 16-mixed