---
license: mit
base_model: gpt2
tags:
- generated_from_trainer
model-index:
- name: '130000'
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# 130000

This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 5.9987

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 50
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| No log        | 0.92  | 3    | 7.0396          |
| No log        | 1.85  | 6    | 6.5398          |
| No log        | 2.77  | 9    | 6.3337          |
| 6.6916        | 4.0   | 13   | 6.3694          |
| 6.6916        | 4.92  | 16   | 6.2945          |
| 6.6916        | 5.85  | 19   | 6.3184          |
| 6.1092        | 6.77  | 22   | 6.3726          |
| 6.1092        | 8.0   | 26   | 6.2948          |
| 6.1092        | 8.92  | 29   | 6.3374          |
| 6.5151        | 9.85  | 32   | 6.3641          |
| 6.5151        | 10.77 | 35   | 6.2335          |
| 6.5151        | 12.0  | 39   | 6.1965          |
| 5.998         | 12.92 | 42   | 6.0595          |
| 5.998         | 13.85 | 45   | 6.0374          |
| 5.998         | 14.77 | 48   | 6.0562          |
| 5.6623        | 16.0  | 52   | 6.0128          |
| 5.6623        | 16.92 | 55   | 5.9999          |
| 5.6623        | 17.85 | 58   | 6.0008          |
| 5.611         | 18.77 | 61   | 5.9992          |
| 5.611         | 20.0  | 65   | 6.0017          |
| 5.611         | 20.92 | 68   | 6.0005          |
| 5.5519        | 21.85 | 71   | 5.9962          |
| 5.5519        | 22.77 | 74   | 5.9964          |
| 5.5519        | 24.0  | 78   | 5.9975          |
| 5.5841        | 24.92 | 81   | 5.9974          |
| 5.5841        | 25.85 | 84   | 6.0000          |
| 5.5841        | 26.77 | 87   | 6.0019          |
| 5.5582        | 28.0  | 91   | 6.0014          |
| 5.5582        | 28.92 | 94   | 6.0016          |
| 5.5582        | 29.85 | 97   | 5.9987          |
| 5.591         | 30.77 | 100  | 5.9992          |
| 5.591         | 32.0  | 104  | 5.9986          |
| 5.591         | 32.92 | 107  | 5.9982          |
| 5.5638        | 33.85 | 110  | 5.9983          |
| 5.5638        | 34.77 | 113  | 5.9987          |
| 5.5638        | 36.0  | 117  | 5.9989          |
| 5.5683        | 36.92 | 120  | 5.9992          |
| 5.5683        | 37.85 | 123  | 5.9995          |
| 5.5683        | 38.77 | 126  | 5.9991          |
| 5.5628        | 40.0  | 130  | 5.9992          |
| 5.5628        | 40.92 | 133  | 5.9992          |
| 5.5628        | 41.85 | 136  | 5.9991          |
| 5.5628        | 42.77 | 139  | 5.9989          |
| 5.5683        | 44.0  | 143  | 5.9987          |
| 5.5683        | 44.92 | 146  | 5.9987          |
| 5.5683        | 45.85 | 149  | 5.9987          |
| 5.5534        | 46.15 | 150  | 5.9987          |


### Framework versions

- Transformers 4.38.2
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2