AmelieSchreiber
/

esm2_t6_8M_finetuned_cafa5

Text Classification

protein language model

multilabel sequence classification

Inference Endpoints

Model card Files Files and versions Community

AmelieSchreiber commited on Aug 27, 2023

Commit

c05f762

•

1 Parent(s): 542ec7f

Update README.md

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -17,6 +17,17 @@ metrics:
 ---
 # ESM-2 Fine-tuned CAFA-5
 ## Training
 Macro

 ---
 # ESM-2 Fine-tuned CAFA-5
+## ESM-2 for Protein Function Prediction
+This is an experimental model fine-tuned from the
+[esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model
+for multi-label classification. In particular, the model is fine-tuned on the CAFA-5 protein sequence dataset available
+[here](https://huggingface.co/datasets/AmelieSchreiber/cafa_5). More precisely, the `train_sequences.fasta` file is the
+list of protein sequences that were trained on, and the
+`train_terms.tsv` file contains the gene ontology protein function labels for each protein sequence. For more details on using
+ESM-2 models for multi-label sequence classification, [see here](https://huggingface.co/docs/transformers/model_doc/esm).
+Due to the potentially complicated class weighting necessary for the hierarchical ontology, further fine-tuning will be necessary.
 ## Training
 Macro