Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Github

Discord

Request more models

my_rugpt3medium_finetune - bnb 4bits

Original model description:

base_model: ai-forever/rugpt3medium_based_on_gpt2 tags: - generated_from_trainer model-index: - name: my_rugpt3medium_finetune results: []

my_rugpt3medium_finetune

This model is a fine-tuned version of ai-forever/rugpt3medium_based_on_gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9955

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 3
  • total_train_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 35
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.5373 0.46 25 3.4828
3.5265 0.93 50 3.4708
3.478 1.39 75 3.4398
3.4851 1.85 100 3.3995
3.4407 2.31 125 3.3609
3.3731 2.78 150 3.3241
3.3584 3.24 175 3.2886
3.3267 3.7 200 3.2540
3.3043 4.17 225 3.2200
3.229 4.63 250 3.1853
3.2618 5.09 275 3.1508
3.1823 5.56 300 3.1164
3.172 6.02 325 3.0779
3.1354 6.48 350 3.0395
3.0899 6.94 375 2.9987
3.0741 7.41 400 2.9577
3.009 7.87 425 2.9140
2.9598 8.33 450 2.8737
2.9187 8.8 475 2.8294
2.9378 9.26 500 2.7842
2.8396 9.72 525 2.7374
2.8608 10.19 550 2.6889
2.7296 10.65 575 2.6405
2.7452 11.11 600 2.5926
2.6882 11.57 625 2.5389
2.6463 12.04 650 2.4893
2.572 12.5 675 2.4356
2.5384 12.96 700 2.3788
2.5246 13.43 725 2.3296
2.4055 13.89 750 2.2747
2.3759 14.35 775 2.2155
2.3351 14.81 800 2.1606
2.286 15.28 825 2.1061
2.2694 15.74 850 2.0504
2.1745 16.2 875 1.9967
2.1053 16.67 900 1.9411
2.1184 17.13 925 1.8878
2.0107 17.59 950 1.8362
2.027 18.06 975 1.7854
1.9153 18.52 1000 1.7304
1.9267 18.98 1025 1.6854
1.8131 19.44 1050 1.6331
1.8405 19.91 1075 1.5839
1.7294 20.37 1100 1.5370
1.7154 20.83 1125 1.4971
1.6573 21.3 1150 1.4476
1.6391 21.76 1175 1.4130
1.5497 22.22 1200 1.3727
1.5194 22.69 1225 1.3378
1.535 23.15 1250 1.3000
1.4514 23.61 1275 1.2714
1.4711 24.07 1300 1.2388
1.4105 24.54 1325 1.2136
1.4202 25.0 1350 1.1890
1.3351 25.46 1375 1.1679
1.3575 25.93 1400 1.1440
1.2882 26.39 1425 1.1202
1.3378 26.85 1450 1.1074
1.3094 27.31 1475 1.0864
1.2793 27.78 1500 1.0743
1.2377 28.24 1525 1.0626
1.2693 28.7 1550 1.0468
1.2157 29.17 1575 1.0368
1.2007 29.63 1600 1.0263
1.2376 30.09 1625 1.0221
1.2216 30.56 1650 1.0136
1.1923 31.02 1675 1.0102
1.2143 31.48 1700 1.0039
1.1764 31.94 1725 1.0014
1.1654 32.41 1750 0.9990
1.2031 32.87 1775 0.9976
1.1952 33.33 1800 0.9965
1.1852 33.8 1825 0.9961
1.1737 34.26 1850 0.9959
1.1609 34.72 1875 0.9955

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.0
  • Tokenizers 0.15.0
Downloads last month
0
Safetensors
Model size
210M params
Tensor type
F32
FP16
U8
Inference API
Unable to determine this model's library. Check the docs .