ai-forever commited on
Commit
14ac632
1 Parent(s): 80cd3eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -102,13 +102,14 @@ Model includes 60 languages: (iso codes)
102
 
103
  ## Training Data Statistics
104
 
105
- - Tokens: 559B
 
106
 
107
  <img style="text-align:center; display:block;" src="https://huggingface.co/sberbank-ai/mGPT/resolve/main/stats.png">
108
  "General training corpus statistics"
109
 
110
 
111
  ## Details
112
- Model was trained with sequence length 1024 using transformers lib by [SberDevices](https://sberdevices.ru/) team on 80B tokens for 3 epochs. After that model was finetuned 1 epoch with sequence length 2048.
113
 
114
- Total training time was around n days on n GPUs for n context and few days on n GPUs for n context.
 
102
 
103
  ## Training Data Statistics
104
 
105
+ - Tokens: 488 Billion BBPE tokens
106
+
107
 
108
  <img style="text-align:center; display:block;" src="https://huggingface.co/sberbank-ai/mGPT/resolve/main/stats.png">
109
  "General training corpus statistics"
110
 
111
 
112
  ## Details
113
+ Model was trained with sequence length 512 using Megatron and Deepspeed libs by [SberDevices](https://sberdevices.ru/) team on a dataset of 600 GB of texts in 60 languages. The model has seen 440 billion BPE tokens in total.
114
 
115
+ Total training time was around 12 days on 256 Nvidia V100 GPUs.