ai-forever
commited on
Commit
•
14ac632
1
Parent(s):
80cd3eb
Update README.md
Browse files
README.md
CHANGED
@@ -102,13 +102,14 @@ Model includes 60 languages: (iso codes)
|
|
102 |
|
103 |
## Training Data Statistics
|
104 |
|
105 |
-
- Tokens:
|
|
|
106 |
|
107 |
<img style="text-align:center; display:block;" src="https://huggingface.co/sberbank-ai/mGPT/resolve/main/stats.png">
|
108 |
"General training corpus statistics"
|
109 |
|
110 |
|
111 |
## Details
|
112 |
-
Model was trained with sequence length
|
113 |
|
114 |
-
Total training time was around
|
|
|
102 |
|
103 |
## Training Data Statistics
|
104 |
|
105 |
+
- Tokens: 488 Billion BBPE tokens
|
106 |
+
|
107 |
|
108 |
<img style="text-align:center; display:block;" src="https://huggingface.co/sberbank-ai/mGPT/resolve/main/stats.png">
|
109 |
"General training corpus statistics"
|
110 |
|
111 |
|
112 |
## Details
|
113 |
+
Model was trained with sequence length 512 using Megatron and Deepspeed libs by [SberDevices](https://sberdevices.ru/) team on a dataset of 600 GB of texts in 60 languages. The model has seen 440 billion BPE tokens in total.
|
114 |
|
115 |
+
Total training time was around 12 days on 256 Nvidia V100 GPUs.
|