Week 6 Distilled Student (ModernBERT-base, KD from Qwen3-32B)

Course material for ECBS5200 Applied Deep Learning at CEU Vienna. See earino/applied-deep-learning.

Knowledge-distilled student from a Qwen3-32B teacher. Trained with KL on softmax + α-blended hard-label CE.

Training recipe

  • Base: answerdotai/ModernBERT-base (149M params, full fine-tune, fresh classifier head)
  • Data: train+test combined from determined-ai/consumer_complaints_medium with the canonical course merge map + MIN_CLASS_COUNT=5 filter (79,278 examples, 113 classes)
  • Loss: KD (see below)
  • Optimizer: AdamW, lr=5e-05, weight_decay=0.01, linear schedule, warmup ratio 0.06
  • Batch size: 32, max sequence length: 128, epochs: 3
  • Hardware: T4 fp16 + GradScaler (compute capability 7.5)
  • Seed: 42

Distillation recipe

  • Teacher: earino/ecbs5200-qwen3-32b-phase1-v4-teacher-canonical
  • Teacher logits dataset: earino/ecbs5200-week6-teacher-logits (file: train_test_logits_qwen3_32b_canonical_final.npz)
  • KD loss: α·KL(s/T_d || t/T_d)·T_d² + (1−α)·CE(s, hard_labels)
  • T_d = 4.0, α = 0.7
  • Effective softening from raw teacher logits: T_total = canonical T (1.2538) × T_d ≈ 5.0

Verified eval (val_ds, 6,430 examples)

Metric Value
Macro F1 0.2789
Accuracy 0.6300
NLL 1.3178
ECE 0.0445
Head F1 (top-20 classes, n=5155) 0.6314
Mid F1 (rank 20–60, n=1065) 0.3990
Tail F1 (rank 60–113, n=210) 0.1298

Files in this repo

  • model.safetensors + config.json — model weights
  • tokenizer*.json, special_tokens_map.json — tokenizer
  • val_predictions.npz — raw fp16 val logits + per-example predictions + tier assignments. Useful for re-doing per-tier analysis without re-running inference.

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("earino/ecbs5200-week6-distilled-student")
tokenizer = AutoTokenizer.from_pretrained("earino/ecbs5200-week6-distilled-student")
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for earino/ecbs5200-week6-distilled-student

Finetuned
(1274)
this model