--- license: cc-by-4.0 datasets: - clarin-knext/msmarco-pl - clarin-knext/nq-pl - clarin-knext/hotpotqa-pl - clarin-knext/scidocs-pl - clarin-knext/nfcorpus-pl - clarin-knext/dbpedia-pl - clarin-knext/trec-covid-pl - clarin-knext/quora-pl - clarin-knext/arguana-pl - clarin-knext/fiqa-pl language: - pl library_name: transformers tags: - gpt2 - from-scratch - polish-gpt2 --- ## Description This is the polish gpt2 model in small architecture. This model was released on 11.08.2023, actually is **deprecated**. New version (`radlab/polish-gpt2-small-v2`) of this model is available there https://huggingface.co/radlab/polish-gpt2-small-v2 ## Datasets Data which are used to train this model: - clarin-knext/msmarco-pl - clarin-knext/nq-pl - clarin-knext/hotpotqa-pl - clarin-knext/scidocs-pl - clarin-knext/nfcorpus-pl - clarin-knext/dbpedia-pl - clarin-knext/trec-covid-pl - clarin-knext/quora-pl - clarin-knext/arguana-pl - clarin-knext/fiqa-pl - own corpora not published yet It is about 10,5 GB of data. ## Metrics from W&B - train/loss: 2.9569 - train/train_samples_per_second: 31.797 - train/epoch: 20 - train/train_steps_per_second: 3.18 - train/total_flos: 16645483478384640000 - train/train_loss: 3.106043342053213 - train/learning_rate: 2.2070550413783577e-8 - train/global_step: 3185240 - train/train_runtime:1001735.8967 - eval/samples_per_second: 57.896 - eval/runtime: 1447.4458 - eval/steps_per_second: 5.79 - eval/loss: 2.890829086303711 - eval/accuracy: 0.4637797431547294 ## Changelog - _11.08.2023_ publishig the first release of the model.