bert-tiny-Massive-intent-KD-distilBERT

This model is a fine-tuned version of google/bert_uncased_L-2_H-128_A-2 on the massive dataset. It achieves the following results on the evaluation set:

Loss: 1.6612
Accuracy: 0.8396

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 33
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
10.9795	1.0	720	9.3236	0.2917
9.4239	2.0	1440	7.9792	0.4092
8.2632	3.0	2160	6.9824	0.4811
7.3425	4.0	2880	6.1545	0.5514
6.56	5.0	3600	5.4829	0.6060
5.9032	6.0	4320	4.8994	0.6463
5.3078	7.0	5040	4.4129	0.6911
4.819	8.0	5760	4.0152	0.7073
4.3866	9.0	6480	3.6734	0.7324
3.9954	10.0	7200	3.3729	0.7516
3.6764	11.0	7920	3.1251	0.7600
3.3712	12.0	8640	2.9077	0.7752
3.1037	13.0	9360	2.7361	0.7787
2.8617	14.0	10080	2.5791	0.7860
2.6667	15.0	10800	2.4383	0.7944
2.476	16.0	11520	2.3301	0.7944
2.3203	17.0	12240	2.2099	0.8052
2.1698	18.0	12960	2.1351	0.8101
2.0563	19.0	13680	2.0554	0.8111
1.9294	20.0	14400	2.0100	0.8190
1.8304	21.0	15120	1.9566	0.8210
1.7315	22.0	15840	1.9076	0.8224
1.6587	23.0	16560	1.8511	0.8283
1.5876	24.0	17280	1.8230	0.8298
1.5173	25.0	18000	1.8002	0.8259
1.4676	26.0	18720	1.7667	0.8278
1.3956	27.0	19440	1.7512	0.8313
1.3436	28.0	20160	1.7233	0.8298
1.3031	29.0	20880	1.6802	0.8318
1.2584	30.0	21600	1.6768	0.8328
1.2233	31.0	22320	1.6612	0.8396
1.1884	32.0	23040	1.6608	0.8352
1.1374	33.0	23760	1.6195	0.8387
1.1299	34.0	24480	1.5969	0.8377

Framework versions

Transformers 4.22.1
Pytorch 1.12.1+cu113
Datasets 2.5.1
Tokenizers 0.12.1

gokuls
/

bert-tiny-Massive-intent-KD-distilBERT

bert-tiny-Massive-intent-KD-distilBERT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results