ai-forever commited on
Commit
c3c2888
1 Parent(s): 1c6d739

add model card

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -1 +1,16 @@
1
- ruGPT3xl language model with sparse attention
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ru
4
+ tags:
5
+ - PyTorch
6
+ - Transformers
7
+ thumbnail: "https://github.com/sberbank-ai/ru-gpts"
8
+ ---
9
+
10
+ # rugpt3xl
11
+ Model was trained with 512 sequence length using [Deepspeed](https://github.com/microsoft/DeepSpeed) and [Megatron](https://github.com/NVIDIA/Megatron-LM) code by [SberDevices](https://sberdevices.ru/) team, on 80B tokens dataset for 4 epochs. After that model was finetuned 1 epoch with sequence length 2048.
12
+ *Note! Model has sparse attention blocks.*
13
+
14
+ Total training time was around 10 days on 256 GPUs.
15
+ Final perplexity on test set is `12.05`.
16
+ Model parameters: 1.3B.