gpt-neo-pl-125m / README.md
mbien's picture
Draft uploaded
e2270ad
metadata
language: pl
tags:
  - generated_from_trainer
  - text-generation
widget:
  - text: Bolesław Leśmian - polski poeta
datasets:
  - wikipedia
metrics:
  - accuracy
model-index:
  - name: gpt_neo_pl_125M
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: wikipedia 20220720.pl
          type: wikipedia
          args: 20220720.pl
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.4312838299951148

gpt_neo_pl_125M_v2

This model was trained from scratch on the wikipedia 20220720.pl dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3862
  • Accuracy: 0.4313

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
5.9469 0.02 1000 6.5843 0.1435
4.9953 0.05 2000 5.7709 0.1911
4.3754 0.07 3000 5.2624 0.2331
3.9795 0.1 4000 4.8752 0.2731
3.7099 0.12 5000 4.5927 0.3039
3.4747 0.15 6000 4.3942 0.3230
3.343 0.17 7000 4.2879 0.3349
3.2767 0.2 8000 4.1698 0.3459
3.1852 0.22 9000 4.0925 0.3534
3.0871 0.25 10000 4.0239 0.3608
3.0746 0.27 11000 3.9646 0.3664
2.9473 0.3 12000 3.9245 0.3706
2.9737 0.32 13000 3.8742 0.3754
2.9193 0.35 14000 3.8285 0.3796
2.8833 0.37 15000 3.7952 0.3837
2.8533 0.4 16000 3.7616 0.3873
2.8654 0.42 17000 3.7296 0.3907
2.8196 0.44 18000 3.7049 0.3936
2.7883 0.47 19000 3.6786 0.3966
2.747 0.49 20000 3.6488 0.3990
2.7355 0.52 21000 3.6243 0.4021
2.7355 0.54 22000 3.5982 0.4053
2.6999 0.57 23000 3.5765 0.4075
2.7243 0.59 24000 3.5558 0.4101
2.6526 0.62 25000 3.5371 0.4125
2.641 0.64 26000 3.5150 0.4146
2.6602 0.67 27000 3.4971 0.4168
2.644 0.69 28000 3.4812 0.4192
2.6558 0.72 29000 3.4622 0.4215
2.5664 0.74 30000 3.4504 0.4229
2.5669 0.77 31000 3.4376 0.4245
2.5498 0.79 32000 3.4263 0.4263
2.5874 0.82 33000 3.4169 0.4274
2.5555 0.84 34000 3.4067 0.4286
2.5502 0.86 35000 3.3997 0.4298
2.5232 0.89 36000 3.3946 0.4302
2.5369 0.91 37000 3.3898 0.4309
2.5335 0.94 38000 3.3869 0.4313
2.6032 0.96 39000 3.3853 0.4315
2.5244 0.99 40000 3.3850 0.4314

Framework versions

  • Transformers 4.22.0.dev0
  • Pytorch 1.12.0
  • Datasets 2.4.0
  • Tokenizers 0.12.1