damlab commited on
Commit
93acdc5
1 Parent(s): 9f8a750

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -41,13 +41,13 @@ As a masked language model this tool can be used to predict expected mutations u
41
 
42
  ## Training Data
43
 
44
- The dataset damlab/HIV_FLT was used to refine the original rostlab/Prot-bert-bfd. This dataset contains 1790 full HIV genomes from across the globe. When translated, these genomes contain approximately 3.9 million amino-acid tokens.
45
 
46
  ## Training Procedure
47
 
48
  ### Preprocessing
49
 
50
- As with the rostlab/Prot-bert-bfd model, the rare amino acids U, Z, O, and B were converted to X and spaces were added between each amino acid. All strings were concatenated and chunked into 256 token chunks for training. A random 20% of chunks were held for validation.
51
 
52
  ### Training
53
 
 
41
 
42
  ## Training Data
43
 
44
+ The dataset [damlab/HIV_FLT](https://huggingface.co/datasets/damlab/HIV_FLT) was used to refine the original [rostlab/Prot-bert-bfd](https://huggingface.co/Rostlab/prot_bert_bfd). This dataset contains 1790 full HIV genomes from across the globe. When translated, these genomes contain approximately 3.9 million amino-acid tokens.
45
 
46
  ## Training Procedure
47
 
48
  ### Preprocessing
49
 
50
+ As with the [rostlab/Prot-bert-bfd](https://huggingface.co/Rostlab/prot_bert_bfd) model, the rare amino acids U, Z, O, and B were converted to X and spaces were added between each amino acid. All strings were concatenated and chunked into 256 token chunks for training. A random 20% of chunks were held for validation.
51
 
52
  ### Training
53