dit-tiny_tobacco3482_simkd_CEKD_t1_aNone

This model is a fine-tuned version of microsoft/dit-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.9983
Accuracy: 0.18
Brier Loss: 0.8965
Nll: 6.7849
F1 Micro: 0.18
F1 Macro: 0.0305
Ece: 0.2195
Aurc: 0.8182

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 25

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Brier Loss	Nll	F1 Micro	F1 Macro	Ece	Aurc
No log	0.96	12	1.0062	0.18	0.8980	6.1518	0.18	0.0309	0.2213	0.7838
No log	1.96	24	1.0034	0.18	0.8987	5.7795	0.18	0.0305	0.2273	0.8165
No log	2.96	36	1.0025	0.18	0.8984	6.4819	0.18	0.0305	0.2249	0.8306
No log	3.96	48	1.0018	0.18	0.8982	6.8521	0.18	0.0306	0.2205	0.8505
No log	4.96	60	1.0015	0.16	0.8980	6.6853	0.16	0.0324	0.2089	0.8798
No log	5.96	72	1.0011	0.175	0.8979	6.8349	0.175	0.0314	0.2134	0.8345
No log	6.96	84	1.0008	0.18	0.8976	6.8293	0.18	0.0313	0.2249	0.8208
No log	7.96	96	1.0005	0.18	0.8975	6.9400	0.18	0.0305	0.2230	0.8140
No log	8.96	108	1.0003	0.18	0.8974	6.5877	0.18	0.0306	0.2230	0.8246
No log	9.96	120	1.0000	0.18	0.8973	6.5454	0.18	0.0306	0.2188	0.8188
No log	10.96	132	0.9998	0.18	0.8972	6.5555	0.18	0.0306	0.2274	0.8151
No log	11.96	144	0.9996	0.18	0.8971	6.5819	0.18	0.0306	0.2254	0.8131
No log	12.96	156	0.9994	0.18	0.8970	6.7150	0.18	0.0305	0.2255	0.8162
No log	13.96	168	0.9993	0.18	0.8969	6.6542	0.18	0.0305	0.2213	0.8220
No log	14.96	180	0.9991	0.18	0.8968	6.6025	0.18	0.0305	0.2213	0.8125
No log	15.96	192	0.9990	0.18	0.8968	7.0424	0.18	0.0305	0.2301	0.8201
No log	16.96	204	0.9988	0.18	0.8967	6.6676	0.18	0.0305	0.2258	0.8153
No log	17.96	216	0.9987	0.18	0.8967	6.6621	0.18	0.0305	0.2270	0.8145
No log	18.96	228	0.9986	0.18	0.8967	7.0058	0.18	0.0305	0.2259	0.8214
No log	19.96	240	0.9985	0.18	0.8966	6.8777	0.18	0.0305	0.2194	0.8183
No log	20.96	252	0.9984	0.18	0.8966	6.7612	0.18	0.0305	0.2282	0.8131
No log	21.96	264	0.9984	0.18	0.8966	6.7811	0.18	0.0305	0.2282	0.8145
No log	22.96	276	0.9983	0.18	0.8965	6.7044	0.18	0.0305	0.2239	0.8167
No log	23.96	288	0.9983	0.18	0.8965	6.7813	0.18	0.0305	0.2217	0.8183
No log	24.96	300	0.9983	0.18	0.8965	6.7849	0.18	0.0305	0.2195	0.8182

Framework versions

Transformers 4.26.1
Pytorch 1.13.1.post200
Datasets 2.9.0
Tokenizers 0.13.2

jordyvl
/

dit-tiny_tobacco3482_simkd_CEKD_t1_aNone

dit-tiny_tobacco3482_simkd_CEKD_t1_aNone

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results