bert-tiny-Massive-intent-KD-BERT

This model is a fine-tuned version of google/bert_uncased_L-2_H-128_A-2 on the massive dataset. It achieves the following results on the evaluation set:

Loss: 0.8380
Accuracy: 0.8534

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 33
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
5.83	1.0	720	4.8826	0.3050
4.7602	2.0	1440	3.9904	0.4191
4.0301	3.0	2160	3.3806	0.5032
3.4797	4.0	2880	2.9065	0.5967
3.0352	5.0	3600	2.5389	0.6596
2.6787	6.0	4320	2.2342	0.7044
2.3644	7.0	5040	1.9873	0.7354
2.1145	8.0	5760	1.7928	0.7462
1.896	9.0	6480	1.6293	0.7644
1.7138	10.0	7200	1.5062	0.7752
1.5625	11.0	7920	1.3923	0.7885
1.4229	12.0	8640	1.3092	0.7978
1.308	13.0	9360	1.2364	0.8018
1.201	14.0	10080	1.1759	0.8155
1.1187	15.0	10800	1.1322	0.8214
1.0384	16.0	11520	1.0990	0.8234
0.976	17.0	12240	1.0615	0.8308
0.9163	18.0	12960	1.0377	0.8328
0.8611	19.0	13680	1.0054	0.8337
0.812	20.0	14400	0.9926	0.8367
0.7721	21.0	15120	0.9712	0.8382
0.7393	22.0	15840	0.9586	0.8357
0.7059	23.0	16560	0.9428	0.8372
0.6741	24.0	17280	0.9377	0.8396
0.6552	25.0	18000	0.9229	0.8377
0.627	26.0	18720	0.9100	0.8416
0.5972	27.0	19440	0.9028	0.8416
0.5784	28.0	20160	0.8996	0.8406
0.5595	29.0	20880	0.8833	0.8451
0.5438	30.0	21600	0.8772	0.8475
0.5218	31.0	22320	0.8758	0.8451
0.509	32.0	23040	0.8728	0.8480
0.4893	33.0	23760	0.8640	0.8480
0.4948	34.0	24480	0.8541	0.8475
0.4722	35.0	25200	0.8595	0.8495
0.468	36.0	25920	0.8488	0.8495
0.4517	37.0	26640	0.8460	0.8505
0.4462	38.0	27360	0.8450	0.8485
0.4396	39.0	28080	0.8422	0.8490
0.427	40.0	28800	0.8380	0.8534
0.4287	41.0	29520	0.8385	0.8480
0.4222	42.0	30240	0.8319	0.8510
0.421	43.0	30960	0.8296	0.8510

Framework versions

Transformers 4.22.1
Pytorch 1.12.1+cu113
Datasets 2.5.1
Tokenizers 0.12.1

gokuls
/

bert-tiny-Massive-intent-KD-BERT

bert-tiny-Massive-intent-KD-BERT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results