Edit model card

fresh-4-layer-swag-distill-of-fresh-4-layer-gpqa

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 11.8632
  • Accuracy: 0.4293

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 321
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 125 13.8015 0.2778
No log 2.0 250 14.0268 0.3535
No log 3.0 375 13.0123 0.3838
1.8616 4.0 500 12.3288 0.3535
1.8616 5.0 625 12.1718 0.3737
1.8616 6.0 750 12.7654 0.3889
1.8616 7.0 875 12.6711 0.3838
0.4769 8.0 1000 12.0719 0.4141
0.4769 9.0 1125 11.8960 0.4091
0.4769 10.0 1250 12.0726 0.4192
0.4769 11.0 1375 11.8632 0.4293
0.1853 12.0 1500 11.6135 0.4141
0.1853 13.0 1625 12.2307 0.4141
0.1853 14.0 1750 11.7646 0.4040
0.1853 15.0 1875 11.6897 0.4141
0.0913 16.0 2000 12.0394 0.4091
0.0913 17.0 2125 11.7915 0.4040
0.0913 18.0 2250 12.0047 0.3990
0.0913 19.0 2375 11.9798 0.3939
0.0436 20.0 2500 12.0208 0.4040

Framework versions

  • Transformers 4.34.0.dev0
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.5
  • Tokenizers 0.14.0
Downloads last month
12
Inference API (serverless) does not yet support transformers models for this pipeline type.