Edit model card


Model was trained with sequence length 1024 using transformers lib by SberDevices team on 80B tokens for 3 epoch. After that model was finetuned on 2048 context.

Total training time was around 16 days on 64 GPUs.
Final perplexity on test set is 17.4.

Downloads last month
Hosted inference API

Unable to determine this model’s pipeline type. Check the docs .

Space using sberbank-ai/rugpt3medium_based_on_gpt2 1