damlab commited on
Commit
314af26
1 Parent(s): 44f2895

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -20,7 +20,7 @@ license: mit
20
 
21
  ## Summary
22
 
23
- The HIV-BERT-Protease-Resistance model was trained as a refinement of the HIV-BERT model (insert link) and serves to better predict whether an HIV protease sequence will be resistant to certain protease inhibitors. HIV-BERT is a model refined from the ProtBert-BFD model (https://huggingface.co/Rostlab/prot_bert_bfd) to better fulfill HIV-centric tasks. This model was then trained using HIV protease sequences from the Stanford HIV Genotype-Phenotype Database (https://hivdb.stanford.edu/pages/genotype-phenotype.html), allowing even more precise prediction protease inhibitor resistance than the HIV-BERT model can provide.
24
 
25
  ## Model Description
26
 
@@ -36,7 +36,7 @@ This tool can be used as a predictor of protease resistance mutations within an
36
 
37
  ## Training Data
38
 
39
- This model was trained using the damlab/HIV_PI dataset using the 0th fold. The dataset consists of 1959 sequences (approximately 99 tokens each) extracted from the Los Alamos HIV Sequence database.
40
 
41
  ## Training Procedure
42
 
@@ -46,7 +46,7 @@ As with the rostlab/Prot-bert-bfd model, the rare amino acids U, Z, O, and B wer
46
 
47
  ### Training
48
 
49
- The damlab/HIV-BERT model was used as the initial weights for an AutoModelforClassificiation. The model was trained with a learning rate of 1E-5, 50K warm-up steps, and a cosine_with_restarts learning rate schedule and continued until 3 consecutive epochs did not improve the loss on the held-out dataset. As this is a multiple classification task (a protein can be resistant to multiple drugs) the loss was calculated as the Binary Cross Entropy for each category. The BCE was weighted by the inverse of the class ratio to balance the weight across the class imbalance.
50
 
51
  ## Evaluation Results
52
 
 
20
 
21
  ## Summary
22
 
23
+ The HIV-BERT-Protease-Resistance model was trained as a refinement of the HIV-BERT model (insert link) and serves to better predict whether an HIV protease sequence will be resistant to certain protease inhibitors. HIV-BERT is a model refined from the [ProtBert-BFD model](https://huggingface.co/Rostlab/prot_bert_bfd) to better fulfill HIV-centric tasks. This model was then trained using HIV protease sequences from the [Stanford HIV Genotype-Phenotype Database](https://hivdb.stanford.edu/pages/genotype-phenotype.html), allowing even more precise prediction protease inhibitor resistance than the HIV-BERT model can provide.
24
 
25
  ## Model Description
26
 
 
36
 
37
  ## Training Data
38
 
39
+ This model was trained using the [damlab/HIV-PI dataset](https://huggingface.co/datasets/damlab/HIV_PI) using the 0th fold. The dataset consists of 1959 sequences (approximately 99 tokens each) extracted from the Stanford HIV Genotype-Phenotype Database.
40
 
41
  ## Training Procedure
42
 
 
46
 
47
  ### Training
48
 
49
+ The [damlab/HIV-BERT model](https://huggingface.co/damlab/HIV_BERT) was used as the initial weights for an AutoModelforClassificiation. The model was trained with a learning rate of 1E-5, 50K warm-up steps, and a cosine_with_restarts learning rate schedule and continued until 3 consecutive epochs did not improve the loss on the held-out dataset. As this is a multiple classification task (a protein can be resistant to multiple drugs) the loss was calculated as the Binary Cross Entropy for each category. The BCE was weighted by the inverse of the class ratio to balance the weight across the class imbalance.
50
 
51
  ## Evaluation Results
52