AmelieSchreiber
commited on
Commit
•
52a3820
1
Parent(s):
0a52d93
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,26 @@ tags:
|
|
12 |
- cafa 5
|
13 |
- protein function prediction
|
14 |
---
|
|
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
## Using the model
|
18 |
First, downlowd the file `go-basic.obo` [from here](https://huggingface.co/datasets/AmelieSchreiber/cafa_5)
|
|
|
12 |
- cafa 5
|
13 |
- protein function prediction
|
14 |
---
|
15 |
+
# ESM-2 for Protein Function Prediction
|
16 |
|
17 |
+
This is an experimental model fine-tuned from the [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model
|
18 |
+
for multi-label classification. In particular, the model is fine-tuned on the CAFA-5 protein sequence dataset available
|
19 |
+
[here](). More precisely, the `train_sequences.fasta` file is the list of protein sequences that were trained on, and the
|
20 |
+
`train_terms.tsv` file contains the gene ontology protein function labels for each protein sequence. For more details on using
|
21 |
+
ESM-2 models for multi-label sequence classification, [see here](https://huggingface.co/docs/transformers/model_doc/esm).
|
22 |
+
|
23 |
+
## Fine-Tuning
|
24 |
+
|
25 |
+
The model was fine-tuned for 7 epochs at a learning rate of `5e-5`, and achieves the following metrics:
|
26 |
+
```
|
27 |
+
Validation Loss: 0.0027,
|
28 |
+
Validation Micro F1: 0.3672,
|
29 |
+
Validation Macro F1: 0.9967,
|
30 |
+
Validation Micro Precision: 0.6052,
|
31 |
+
Validation Macro Precision: 0.9996,
|
32 |
+
Validation Micro Recall: 0.2626,
|
33 |
+
Validation Macro Recall: 0.9966
|
34 |
+
```
|
35 |
|
36 |
## Using the model
|
37 |
First, downlowd the file `go-basic.obo` [from here](https://huggingface.co/datasets/AmelieSchreiber/cafa_5)
|