orion / README.md
cuba6112's picture
Fine-tuned GPT-2 on Wikitext-2
aeab63c verified
|
raw
history blame
3.07 kB
metadata
license: mit
base_model: cuba6112/orion
tags:
  - generated_from_trainer
model-index:
  - name: orion
    results: []

orion

This model is a fine-tuned version of cuba6112/orion on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5251

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0.0871 400 3.5515
1.4611 0.1743 800 3.6262
1.174 0.2614 1200 3.6164
1.2082 0.3486 1600 3.6236
1.4301 0.4357 2000 3.5137
1.4301 0.5229 2400 3.5171
1.4987 0.6100 2800 3.5004
1.518 0.6972 3200 3.4667
1.5859 0.7843 3600 3.4521
1.6333 0.8715 4000 3.4452
1.6333 0.9586 4400 3.4300
1.6698 1.0458 4800 3.5143
1.4993 1.1329 5200 3.5234
1.4858 1.2200 5600 3.5240
1.4804 1.3072 6000 3.4979
1.4804 1.3943 6400 3.5131
1.4814 1.4815 6800 3.5177
1.478 1.5686 7200 3.4989
1.5073 1.6558 7600 3.5158
1.4952 1.7429 8000 3.5145
1.4952 1.8301 8400 3.4975
1.5367 1.9172 8800 3.5075
1.5085 2.0044 9200 3.5058
1.462 2.0915 9600 3.5352
1.4378 2.1786 10000 3.5335
1.4378 2.2658 10400 3.5378
1.4514 2.3529 10800 3.5383
1.448 2.4401 11200 3.5369
1.46 2.5272 11600 3.5361
1.4722 2.6144 12000 3.5337
1.4722 2.7015 12400 3.5277
1.4726 2.7887 12800 3.5284
1.4829 2.8758 13200 3.5257
1.4963 2.9630 13600 3.5253

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1