Edit model card

fresh-8-layer-swag-distill-of-fresh-8-layer-gpqa

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 22.9598
  • Accuracy: 0.4040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 321
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 63 22.7715 0.2677
No log 2.0 126 24.4035 0.2879
No log 3.0 189 21.6171 0.3131
No log 4.0 252 22.9241 0.3333
No log 5.0 315 36.3034 0.3788
No log 6.0 378 22.9598 0.4040
No log 7.0 441 25.2469 0.3485
5.5235 8.0 504 29.2667 0.3687
5.5235 9.0 567 24.0718 0.3687
5.5235 10.0 630 25.5240 0.3030
5.5235 11.0 693 28.6147 0.3283
5.5235 12.0 756 33.3811 0.3434
5.5235 13.0 819 28.3026 0.3232
5.5235 14.0 882 27.7010 0.2677
5.5235 15.0 945 26.9798 0.3182
3.9997 16.0 1008 26.8561 0.3232
3.9997 17.0 1071 25.9683 0.3687
3.9997 18.0 1134 23.6478 0.3333
3.9997 19.0 1197 24.1695 0.3232
3.9997 20.0 1260 24.7100 0.3485

Framework versions

  • Transformers 4.34.0.dev0
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.5
  • Tokenizers 0.14.0
Downloads last month
1
Inference API (serverless) does not yet support transformers models for this pipeline type.