Update README.md
Browse files
README.md
CHANGED
@@ -41,7 +41,7 @@ As a masked language model this tool can be used to predict expected mutations u
|
|
41 |
|
42 |
## Training Data
|
43 |
|
44 |
-
The dataset damlab/
|
45 |
|
46 |
## Training Procedure
|
47 |
|
|
|
41 |
|
42 |
## Training Data
|
43 |
|
44 |
+
The dataset damlab/HIV_FLT was used to refine the original rostlab/Prot-bert-bfd. This dataset contains 1790 full HIV genomes from across the globe. When translated, these genomes contain approximately 3.9 million amino-acid tokens.
|
45 |
|
46 |
## Training Procedure
|
47 |
|