Text Classification
Transformers
Safetensors
English
modernbert
long-tail
educational
distillation
text-embeddings-inference
Instructions to use earino/ecbs5200-week6-distilled-student with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use earino/ecbs5200-week6-distilled-student with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="earino/ecbs5200-week6-distilled-student")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("earino/ecbs5200-week6-distilled-student") model = AutoModelForSequenceClassification.from_pretrained("earino/ecbs5200-week6-distilled-student") - Notebooks
- Google Colab
- Kaggle
Week 6 Distilled Student (ModernBERT-base, KD from Qwen3-32B)
Course material for ECBS5200 Applied Deep Learning at CEU Vienna. See earino/applied-deep-learning.
Knowledge-distilled student from a Qwen3-32B teacher. Trained with KL on softmax + α-blended hard-label CE.
Training recipe
- Base:
answerdotai/ModernBERT-base(149M params, full fine-tune, fresh classifier head) - Data: train+test combined from
determined-ai/consumer_complaints_mediumwith the canonical course merge map + MIN_CLASS_COUNT=5 filter (79,278 examples, 113 classes) - Loss: KD (see below)
- Optimizer: AdamW, lr=5e-05, weight_decay=0.01, linear schedule, warmup ratio 0.06
- Batch size: 32, max sequence length: 128, epochs: 3
- Hardware: T4 fp16 + GradScaler (compute capability 7.5)
- Seed: 42
Distillation recipe
- Teacher:
earino/ecbs5200-qwen3-32b-phase1-v4-teacher-canonical - Teacher logits dataset:
earino/ecbs5200-week6-teacher-logits(file:train_test_logits_qwen3_32b_canonical_final.npz) - KD loss: α·KL(s/T_d || t/T_d)·T_d² + (1−α)·CE(s, hard_labels)
- T_d = 4.0, α = 0.7
- Effective softening from raw teacher logits: T_total = canonical T (1.2538) × T_d ≈ 5.0
Verified eval (val_ds, 6,430 examples)
| Metric | Value |
|---|---|
| Macro F1 | 0.2789 |
| Accuracy | 0.6300 |
| NLL | 1.3178 |
| ECE | 0.0445 |
| Head F1 (top-20 classes, n=5155) | 0.6314 |
| Mid F1 (rank 20–60, n=1065) | 0.3990 |
| Tail F1 (rank 60–113, n=210) | 0.1298 |
Files in this repo
model.safetensors+config.json— model weightstokenizer*.json,special_tokens_map.json— tokenizerval_predictions.npz— raw fp16 val logits + per-example predictions + tier assignments. Useful for re-doing per-tier analysis without re-running inference.
Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("earino/ecbs5200-week6-distilled-student")
tokenizer = AutoTokenizer.from_pretrained("earino/ecbs5200-week6-distilled-student")
- Downloads last month
- 1
Model tree for earino/ecbs5200-week6-distilled-student
Base model
answerdotai/ModernBERT-base