Edit model card

hbertv1-massive-logit_KD-tiny_ffn_1

This model is a fine-tuned version of gokuls/model_v1_complete_training_wt_init_48_tiny_freeze_new_ffn_1 on the massive dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6331
  • Accuracy: 0.8342

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 33
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Accuracy
4.296 1.0 180 3.7345 0.2081
3.5386 2.0 360 3.0526 0.2730
2.9946 3.0 540 2.6051 0.3360
2.6126 4.0 720 2.2810 0.4215
2.3148 5.0 900 2.0377 0.4683
2.0838 6.0 1080 1.8401 0.5371
1.9016 7.0 1260 1.6686 0.6080
1.7431 8.0 1440 1.5358 0.6439
1.613 9.0 1620 1.4238 0.6886
1.4952 10.0 1800 1.3339 0.7127
1.4 11.0 1980 1.2511 0.7162
1.3069 12.0 2160 1.1877 0.7285
1.2288 13.0 2340 1.1277 0.7329
1.1684 14.0 2520 1.0877 0.7418
1.0971 15.0 2700 1.0285 0.7570
1.0424 16.0 2880 0.9811 0.7619
0.9865 17.0 3060 0.9552 0.7629
0.943 18.0 3240 0.9216 0.7742
0.9047 19.0 3420 0.8812 0.7762
0.857 20.0 3600 0.8619 0.7821
0.8274 21.0 3780 0.8326 0.7914
0.7955 22.0 3960 0.8086 0.7919
0.7618 23.0 4140 0.7861 0.7973
0.7356 24.0 4320 0.7750 0.7993
0.7109 25.0 4500 0.7580 0.8028
0.6872 26.0 4680 0.7430 0.8077
0.6683 27.0 4860 0.7417 0.8101
0.6503 28.0 5040 0.7132 0.8155
0.6279 29.0 5220 0.7100 0.8106
0.6168 30.0 5400 0.6991 0.8165
0.5981 31.0 5580 0.6935 0.8185
0.5816 32.0 5760 0.6843 0.8200
0.5746 33.0 5940 0.6795 0.8155
0.5602 34.0 6120 0.6775 0.8210
0.5525 35.0 6300 0.6683 0.8244
0.5403 36.0 6480 0.6641 0.8219
0.5289 37.0 6660 0.6598 0.8278
0.5245 38.0 6840 0.6546 0.8278
0.518 39.0 7020 0.6523 0.8259
0.5105 40.0 7200 0.6488 0.8283
0.4988 41.0 7380 0.6463 0.8278
0.4971 42.0 7560 0.6414 0.8308
0.491 43.0 7740 0.6376 0.8318
0.4901 44.0 7920 0.6395 0.8298
0.4846 45.0 8100 0.6348 0.8298
0.4805 46.0 8280 0.6357 0.8313
0.481 47.0 8460 0.6320 0.8313
0.4767 48.0 8640 0.6331 0.8342
0.474 49.0 8820 0.6319 0.8328
0.4765 50.0 9000 0.6318 0.8308

Framework versions

  • Transformers 4.35.2
  • Pytorch 1.14.0a0+410ce96
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
1
Safetensors
Model size
4.25M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Evaluation results