Text Generation
Transformers
Safetensors
Polish
gpt2
from-scratch
polish-gpt2
Inference Endpoints
text-generation-inference
Edit model card

Description

This is the polish gpt2 model in small architecture.

This model was released on 11.08.2023, actually is deprecated.

New version (radlab/polish-gpt2-small-v2) of this model is available there https://huggingface.co/radlab/polish-gpt2-small-v2

Datasets

Data which are used to train this model:

  • clarin-knext/msmarco-pl
  • clarin-knext/nq-pl
  • clarin-knext/hotpotqa-pl
  • clarin-knext/scidocs-pl
  • clarin-knext/nfcorpus-pl
  • clarin-knext/dbpedia-pl
  • clarin-knext/trec-covid-pl
  • clarin-knext/quora-pl
  • clarin-knext/arguana-pl
  • clarin-knext/fiqa-pl
  • own corpora not published yet

It is about 10,5 GB of data.

Metrics from W&B

  • train/loss: 2.9569
  • train/train_samples_per_second: 31.797
  • train/epoch: 20
  • train/train_steps_per_second: 3.18
  • train/total_flos: 16645483478384640000
  • train/train_loss: 3.106043342053213
  • train/learning_rate: 2.2070550413783577e-8
  • train/global_step: 3185240
  • train/train_runtime:1001735.8967
  • eval/samples_per_second: 57.896
  • eval/runtime: 1447.4458
  • eval/steps_per_second: 5.79
  • eval/loss: 2.890829086303711
  • eval/accuracy: 0.4637797431547294

Changelog

  • 11.08.2023 publishig the first release of the model.
Downloads last month
1
Safetensors
Model size
126M params
Tensor type
F32
·

Datasets used to train radlab/polish-gpt2-small

Collection including radlab/polish-gpt2-small