metadata
library_name: transformers
license: mit
base_model: gpt2
tags:
- generated_from_trainer
model-index:
- name: codeparrot-ds
results: []
codeparrot-ds
This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 5.0287
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 5
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
9.5172 | 0.1699 | 10 | 8.0210 |
7.1368 | 0.3397 | 20 | 7.3939 |
6.7858 | 0.5096 | 30 | 6.8556 |
6.3872 | 0.6794 | 40 | 6.6165 |
6.0964 | 0.8493 | 50 | 6.3605 |
5.8534 | 1.0191 | 60 | 6.1202 |
5.6031 | 1.1890 | 70 | 5.9613 |
5.4271 | 1.3588 | 80 | 5.8534 |
5.3319 | 1.5287 | 90 | 5.7526 |
5.1911 | 1.6985 | 100 | 5.6603 |
5.1143 | 1.8684 | 110 | 5.5964 |
5.024 | 2.0382 | 120 | 5.5203 |
4.8772 | 2.2081 | 130 | 5.4652 |
4.8455 | 2.3779 | 140 | 5.4071 |
4.7629 | 2.5478 | 150 | 5.3446 |
4.6666 | 2.7176 | 160 | 5.2905 |
4.6672 | 2.8875 | 170 | 5.2415 |
4.5738 | 3.0573 | 180 | 5.2033 |
4.4949 | 3.2272 | 190 | 5.1688 |
4.4406 | 3.3970 | 200 | 5.1329 |
4.4166 | 3.5669 | 210 | 5.1085 |
4.3886 | 3.7367 | 220 | 5.0823 |
4.3302 | 3.9066 | 230 | 5.0652 |
4.3089 | 4.0764 | 240 | 5.0498 |
4.2768 | 4.2463 | 250 | 5.0409 |
4.2667 | 4.4161 | 260 | 5.0344 |
4.2604 | 4.5860 | 270 | 5.0300 |
4.2389 | 4.7558 | 280 | 5.0290 |
4.2726 | 4.9257 | 290 | 5.0287 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.1+cu121
- Datasets 3.0.0
- Tokenizers 0.19.1