rugpt3medium_based_on_gpt2

Model was trained with sequence length 1024 using transformers lib by SberDevices team on 80B tokens for 3 epoch. After that model was finetuned on 2048 context.

Total training time was around 16 days on 64 GPUs.
Final perplexity on test set is 17.4.

Downloads last month
3,728
Hosted inference API

Unable to determine this model’s pipeline type. Check the docs .