Week 6 Distilled Student (ModernBERT-base, KD from Qwen3-32B)

Course material for ECBS5200 Applied Deep Learning at CEU Vienna. See earino/applied-deep-learning.

Knowledge-distilled student from a Qwen3-32B teacher. Trained with KL on softmax + α-blended hard-label CE.

Training recipe

Base: answerdotai/ModernBERT-base (149M params, full fine-tune, fresh classifier head)
Data: train+test combined from determined-ai/consumer_complaints_medium with the canonical course merge map + MIN_CLASS_COUNT=5 filter (79,278 examples, 113 classes)
Loss: KD (see below)
Optimizer: AdamW, lr=5e-05, weight_decay=0.01, linear schedule, warmup ratio 0.06
Batch size: 32, max sequence length: 128, epochs: 3
Hardware: T4 fp16 + GradScaler (compute capability 7.5)
Seed: 42

Distillation recipe

Teacher: earino/ecbs5200-qwen3-32b-phase1-v4-teacher-canonical
Teacher logits dataset: earino/ecbs5200-week6-teacher-logits (file: train_test_logits_qwen3_32b_canonical_final.npz)
KD loss: α·KL(s/T_d || t/T_d)·T_d² + (1−α)·CE(s, hard_labels)
T_d = 4.0, α = 0.7
Effective softening from raw teacher logits: T_total = canonical T (1.2538) × T_d ≈ 5.0

Verified eval (val_ds, 6,430 examples)

Metric	Value
Macro F1	0.2789
Accuracy	0.6300
NLL	1.3178
ECE	0.0445
Head F1 (top-20 classes, n=5155)	0.6314
Mid F1 (rank 20–60, n=1065)	0.3990
Tail F1 (rank 60–113, n=210)	0.1298

Files in this repo

model.safetensors + config.json — model weights
tokenizer*.json, special_tokens_map.json — tokenizer
val_predictions.npz — raw fp16 val logits + per-example predictions + tier assignments. Useful for re-doing per-tier analysis without re-running inference.

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("earino/ecbs5200-week6-distilled-student")
tokenizer = AutoTokenizer.from_pretrained("earino/ecbs5200-week6-distilled-student")

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for earino/ecbs5200-week6-distilled-student

Base model

answerdotai/ModernBERT-base

Finetuned

(1274)

this model