ai-forever vmkhlv commited on
Commit
9489709
1 Parent(s): a65111a

Update README.md (#1)

Browse files

- Update README.md (c21ff05169ab6cb1c8bf423b15112e1b6243bfc0)


Co-authored-by: Vladislav Mikhailov <vmkhlv@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +16 -2
README.md CHANGED
@@ -8,12 +8,26 @@ thumbnail: "https://github.com/sberbank-ai/ru-gpts"
8
  ---
9
 
10
  # rugpt3medium\_based\_on\_gpt2
 
11
 
12
- Model was trained with sequence length 1024 using transformers lib by [SberDevices](https://sberdevices.ru/) team on 80B tokens for 3 epoch. After that model was finetuned on 2048 context.
 
13
 
14
  Total training time was around 16 days on 64 GPUs.
15
- Final perplexity on test set is `17.4`.
16
 
17
  # Authors
18
  + NLP core team RnD [Telegram channel](https://t.me/nlpcoreteam):
19
  + Dmitry Zmitrovich
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
  # rugpt3medium\_based\_on\_gpt2
11
+ The model architecture design, pretraining, and evaluation are documented in our preprint: [**A Family of Pretrained Transformer Language Models for Russian**](https://arxiv.org/abs/2309.10931).
12
 
13
+
14
+ The model was pretrained with sequence length 1024 using the Transformers library by the [SberDevices](https://sberdevices.ru/) team on 80B tokens for 3 epochs. After that, the model was finetuned with the context size of 2048 tokens.
15
 
16
  Total training time was around 16 days on 64 GPUs.
17
+ The final perplexity on the test set is `17.4`.
18
 
19
  # Authors
20
  + NLP core team RnD [Telegram channel](https://t.me/nlpcoreteam):
21
  + Dmitry Zmitrovich
22
+
23
+ # Cite us
24
+ ```
25
+ @misc{zmitrovich2023family,
26
+ title={A Family of Pretrained Transformer Language Models for Russian},
27
+ author={Dmitry Zmitrovich and Alexander Abramov and Andrey Kalmykov and Maria Tikhonova and Ekaterina Taktasheva and Danil Astafurov and Mark Baushenko and Artem Snegirev and Tatiana Shavrina and Sergey Markov and Vladislav Mikhailov and Alena Fenogenova},
28
+ year={2023},
29
+ eprint={2309.10931},
30
+ archivePrefix={arXiv},
31
+ primaryClass={cs.CL}
32
+ }
33
+ ```