metadata

license: mit
base_model: microsoft/xtremedistil-l12-h384-uncased
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: xtremedistil-l12-h384-uncased-zeroshot-v1.1-none
    results: []
pipeline_tag: zero-shot-classification

xtremedistil-l12-h384-uncased-zeroshot-v1.1-none

A slightly larger sibling to https://hf.co/MoritzLaurer/xtremedistil-l6-h256-zeroshot-v1.1-all-33

Model description

This model is a fine-tuned version of microsoft/xtremedistil-l12-h384-uncased on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.2063
F1 Macro: 0.5570
F1 Micro: 0.6385
Accuracy Balanced: 0.6104
Accuracy: 0.6385
Precision Macro: 0.5705
Recall Macro: 0.6104
Precision Micro: 0.6385
Recall Micro: 0.6385

Training and evaluation data

See https://github.com/MoritzLaurer/zeroshot-classifier/blob/main/datasets_overview.csv

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 80085
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.04
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	F1 Macro	F1 Micro	Accuracy Balanced	Accuracy	Precision Macro	Recall Macro	Precision Micro	Recall Micro
0.2756	0.32	5000	0.4155	0.8146	0.8255	0.8215	0.8255	0.8101	0.8215	0.8255	0.8255
0.2395	0.65	10000	0.4166	0.8182	0.8303	0.8222	0.8303	0.8151	0.8222	0.8303	0.8303
0.2464	0.97	15000	0.4114	0.8204	0.8325	0.8239	0.8325	0.8175	0.8239	0.8325	0.8325
0.2105	1.3	20000	0.4051	0.8236	0.8363	0.8254	0.8363	0.8219	0.8254	0.8363	0.8363
0.2267	1.62	25000	0.4030	0.8244	0.8373	0.8257	0.8373	0.8231	0.8257	0.8373	0.8373
0.2312	1.95	30000	0.4088	0.8233	0.836	0.8250	0.836	0.8217	0.8250	0.836	0.836
0.2241	2.27	35000	0.4061	0.8257	0.8375	0.8291	0.8375	0.8229	0.8291	0.8375	0.8375
0.2183	2.6	40000	0.4043	0.8259	0.838	0.8285	0.838	0.8235	0.8285	0.838	0.838
0.2285	2.92	45000	0.4041	0.8241	0.8365	0.8263	0.8365	0.8220	0.8263	0.8365	0.8365

Framework versions

Transformers 4.36.2
Pytorch 2.1.2+cu121
Datasets 2.16.1
Tokenizers 0.15.0