AmelieSchreiber
/

cafa_5_protein_function_prediction

Text Classification

protein language model

protein function prediction

Inference Endpoints

Model card Files Files and versions Community

AmelieSchreiber commited on Aug 21, 2023

Commit

52a3820

•

1 Parent(s): 0a52d93

Update README.md

Files changed (1) hide show

README.md +19 -0

README.md CHANGED Viewed

@@ -12,7 +12,26 @@ tags:
 - cafa 5
 - protein function prediction
 ---
 ## Using the model
 First, downlowd the file `go-basic.obo` [from here](https://huggingface.co/datasets/AmelieSchreiber/cafa_5)

 - cafa 5
 - protein function prediction
 ---
+# ESM-2 for Protein Function Prediction
+This is an experimental model fine-tuned from the [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model
+for multi-label classification. In particular, the model is fine-tuned on the CAFA-5 protein sequence dataset available
+[here](). More precisely, the `train_sequences.fasta` file is the list of protein sequences that were trained on, and the
+`train_terms.tsv` file contains the gene ontology protein function labels for each protein sequence. For more details on using
+ESM-2 models for multi-label sequence classification, [see here](https://huggingface.co/docs/transformers/model_doc/esm).
+## Fine-Tuning
+The model was fine-tuned for 7 epochs at a learning rate of `5e-5`, and achieves the following metrics:
+```
+Validation Loss: 0.0027,
+Validation Micro F1: 0.3672,
+Validation Macro F1: 0.9967,
+Validation Micro Precision: 0.6052,
+Validation Macro Precision: 0.9996,
+Validation Micro Recall: 0.2626,
+Validation Macro Recall: 0.9966
+```
 ## Using the model
 First, downlowd the file `go-basic.obo` [from here](https://huggingface.co/datasets/AmelieSchreiber/cafa_5)