dit-tiny_tobacco3482_kd_MSE

This model is a fine-tuned version of microsoft/dit-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 6.8328
Accuracy: 0.19
Brier Loss: 0.8942
Nll: 7.0296
F1 Micro: 0.19
F1 Macro: 0.0703
Ece: 0.2429
Aurc: 0.8146

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 25

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Brier Loss	Nll	F1 Micro	F1 Macro	Ece	Aurc
No log	0.96	3	7.1188	0.145	0.9003	10.1627	0.145	0.0253	0.2218	0.8463
No log	1.96	6	7.0608	0.145	0.8969	9.8809	0.145	0.0253	0.2197	0.8454
No log	2.96	9	6.9777	0.145	0.8929	8.9712	0.145	0.0442	0.2065	0.7921
No log	3.96	12	6.9144	0.17	0.8908	4.9924	0.17	0.0413	0.2325	0.7807
No log	4.96	15	6.8797	0.145	0.8912	6.8983	0.145	0.0399	0.2089	0.7932
No log	5.96	18	6.8636	0.085	0.8926	6.9917	0.085	0.0299	0.1822	0.8755
No log	6.96	21	6.8545	0.075	0.8946	7.0604	0.075	0.0307	0.1849	0.8758
No log	7.96	24	6.8486	0.06	0.8958	7.1035	0.06	0.0230	0.1801	0.8891
No log	8.96	27	6.8455	0.165	0.8967	7.1315	0.165	0.0604	0.2414	0.8438
No log	9.96	30	6.8450	0.185	0.8973	7.1546	0.185	0.0468	0.2477	0.8436
No log	10.96	33	6.8438	0.18	0.8969	7.1569	0.18	0.0308	0.2406	0.8504
No log	11.96	36	6.8414	0.18	0.8962	7.1492	0.18	0.0306	0.2510	0.8501
No log	12.96	39	6.8390	0.18	0.8958	7.1455	0.18	0.0306	0.2374	0.8494
No log	13.96	42	6.8365	0.18	0.8950	7.0793	0.18	0.0306	0.2436	0.8488
No log	14.96	45	6.8349	0.18	0.8944	7.0591	0.18	0.0306	0.2369	0.8486
No log	15.96	48	6.8338	0.18	0.8942	7.0493	0.18	0.0306	0.2396	0.8482
No log	16.96	51	6.8335	0.18	0.8940	7.0429	0.18	0.0309	0.2390	0.8486
No log	17.96	54	6.8341	0.18	0.8943	7.0410	0.18	0.0314	0.2351	0.8514
No log	18.96	57	6.8338	0.19	0.8943	7.0391	0.19	0.0495	0.2480	0.8471
No log	19.96	60	6.8335	0.205	0.8943	7.0342	0.205	0.0722	0.2562	0.8204
No log	20.96	63	6.8334	0.2	0.8942	7.0308	0.2000	0.0683	0.2541	0.8199
No log	21.96	66	6.8332	0.195	0.8942	7.0296	0.195	0.0714	0.2511	0.8099
No log	22.96	69	6.8330	0.195	0.8942	7.0297	0.195	0.0717	0.2572	0.8123
No log	23.96	72	6.8329	0.19	0.8942	7.0294	0.19	0.0703	0.2459	0.8148
No log	24.96	75	6.8328	0.19	0.8942	7.0296	0.19	0.0703	0.2429	0.8146

Framework versions

Transformers 4.26.1
Pytorch 1.13.1.post200
Datasets 2.9.0
Tokenizers 0.13.2

jordyvl
/

dit-tiny_tobacco3482_kd_MSE

dit-tiny_tobacco3482_kd_MSE

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results