AmelieSchreiber commited on
Commit
8301fe6
1 Parent(s): 4eb269f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -1,3 +1,31 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - AmelieSchreiber/cafa5_pickle_split
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ - f1
10
+ - precision
11
+ - recall
12
+ - roc_auc
13
+ library_name: transformers
14
+ tags:
15
+ - esm
16
+ - esm2
17
+ - protein language model
18
+ - biology
19
+ - cafa5
20
  ---
21
+
22
+ # ESM-2 Pre-finetuned for CAFA-5 for Protein Function Prediction
23
+ This model is a pre-finetuned for CAFA-5 protein function prediction for four epochs.
24
+ This model is meant to be finetuned in a second stage of training with a Low Rank Adaptation.
25
+ The training script for both the pre-finetuning and second stage finetuning with LoRA is
26
+ [available here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_lora_cafa5/blob/main/cafa_5_finetune_v2.ipynb).
27
+ This notebook allows you to pre-finetune the base model, and then use a LoRA for the second stage of training.
28
+ Note, the second stage of training is a harder curriculum for the model as it uses class weights so that the
29
+ model better captures the hierarchical (weighted) structure of the gene ontology (GO) terms that serve as
30
+ the labels for the multilabel sequence classification task of predicting a protein's functions (GO terms).
31
+