AmelieSchreiber commited on
Commit
c05f762
1 Parent(s): 542ec7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -17,6 +17,17 @@ metrics:
17
  ---
18
  # ESM-2 Fine-tuned CAFA-5
19
 
 
 
 
 
 
 
 
 
 
 
 
20
  ## Training
21
 
22
  Macro
 
17
  ---
18
  # ESM-2 Fine-tuned CAFA-5
19
 
20
+ ## ESM-2 for Protein Function Prediction
21
+
22
+ This is an experimental model fine-tuned from the
23
+ [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model
24
+ for multi-label classification. In particular, the model is fine-tuned on the CAFA-5 protein sequence dataset available
25
+ [here](https://huggingface.co/datasets/AmelieSchreiber/cafa_5). More precisely, the `train_sequences.fasta` file is the
26
+ list of protein sequences that were trained on, and the
27
+ `train_terms.tsv` file contains the gene ontology protein function labels for each protein sequence. For more details on using
28
+ ESM-2 models for multi-label sequence classification, [see here](https://huggingface.co/docs/transformers/model_doc/esm).
29
+ Due to the potentially complicated class weighting necessary for the hierarchical ontology, further fine-tuning will be necessary.
30
+
31
  ## Training
32
 
33
  Macro