metadata
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: bert_12_layer_model_v1_complete_training_new_48_KD
results: []
bert_12_layer_model_v1_complete_training_new_48_KD
This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:
- Loss: 326.4413
- Accuracy: 0.3018
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 36
- eval_batch_size: 36
- seed: 10
- distributed_type: multi-GPU
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 10000
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
849.2694 | 0.06 | 10000 | 802.2138 | 0.1435 |
603.4255 | 0.12 | 20000 | 597.5114 | 0.1445 |
552.5588 | 0.18 | 30000 | 549.1310 | 0.1454 |
525.5738 | 0.25 | 40000 | 523.0781 | 0.1460 |
508.5192 | 0.31 | 50000 | 507.5772 | 0.1463 |
496.0482 | 0.37 | 60000 | 494.5385 | 0.1457 |
487.2105 | 0.43 | 70000 | 484.7273 | 0.1464 |
476.1281 | 0.49 | 80000 | 473.3444 | 0.1490 |
456.0017 | 0.55 | 90000 | 445.0464 | 0.1662 |
421.6633 | 0.61 | 100000 | 404.1071 | 0.2046 |
382.6604 | 0.68 | 110000 | 369.2148 | 0.2446 |
358.6727 | 0.74 | 120000 | 341.1114 | 0.2776 |
339.9395 | 0.8 | 130000 | 326.4413 | 0.3018 |
Framework versions
- Transformers 4.30.1
- Pytorch 1.14.0a0+410ce96
- Datasets 2.12.0
- Tokenizers 0.13.3