Edit model card


Model was trained with sequence length 1024 using transformers lib by SberDevices team on 80B tokens for 3 epochs. After that model was finetuned 1 epoch with sequence length 2048.

Total training time was around 14 days on 128 GPUs for 1024 context and few days on 16 GPUs for 2048 context.
Final perplexity on test set is 13.6.

Downloads last month
Hosted inference API
Text Generation
This model can be loaded on the Inference API on-demand.

Spaces using sberbank-ai/rugpt3large_based_on_gpt2 2