Edit model card

fresh-2-layer-swag-distill-of-fresh-2-layer-gpqa

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 24.6057
  • Accuracy: 0.3737

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 321
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 63 15.4510 0.2576
No log 2.0 126 17.9625 0.3232
No log 3.0 189 15.3798 0.3434
No log 4.0 252 15.4925 0.2929
No log 5.0 315 18.1665 0.3283
No log 6.0 378 19.0829 0.3384
No log 7.0 441 24.6057 0.3737
2.1946 8.0 504 20.6331 0.3333
2.1946 9.0 567 18.3985 0.3283
2.1946 10.0 630 19.1103 0.3535
2.1946 11.0 693 18.6291 0.3636
2.1946 12.0 756 22.3409 0.3333
2.1946 13.0 819 18.9510 0.3434
2.1946 14.0 882 20.9000 0.3485
2.1946 15.0 945 18.1215 0.3384
0.284 16.0 1008 19.2466 0.3434
0.284 17.0 1071 18.9343 0.3384
0.284 18.0 1134 19.4002 0.3586
0.284 19.0 1197 18.9731 0.3535
0.284 20.0 1260 19.2574 0.3636

Framework versions

  • Transformers 4.34.0.dev0
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.5
  • Tokenizers 0.14.0
Downloads last month
5
Inference API (serverless) does not yet support transformers models for this pipeline type.