Week 6 Vanilla Baseline (ModernBERT-base, plain CE on train+test)

Course material for ECBS5200 Applied Deep Learning at CEU Vienna. See earino/applied-deep-learning.

Vanilla cross-entropy baseline. Same model + same training data as the distilled student, but no teacher signal — exists to isolate the marginal contribution of distillation.

Training recipe

Base: answerdotai/ModernBERT-base (149M params, full fine-tune, fresh classifier head)
Data: train+test combined from determined-ai/consumer_complaints_medium with the canonical course merge map + MIN_CLASS_COUNT=5 filter (79,278 examples, 113 classes)
Loss: plain cross-entropy on hard labels
Optimizer: AdamW, lr=5e-05, weight_decay=0.01, linear schedule, warmup ratio 0.06
Batch size: 32, max sequence length: 128, epochs: 3
Hardware: T4 fp16 + GradScaler (compute capability 7.5)
Seed: 42

Verified eval (val_ds, 6,430 examples)

Metric	Value
Macro F1	0.2638
Accuracy	0.6106
NLL	1.5416
ECE	0.1300
Head F1 (top-20 classes, n=5155)	0.6105
Mid F1 (rank 20–60, n=1065)	0.3797
Tail F1 (rank 60–113, n=210)	0.1249

Files in this repo

model.safetensors + config.json — model weights
tokenizer*.json, special_tokens_map.json — tokenizer
val_predictions.npz — raw fp16 val logits + per-example predictions + tier assignments. Useful for re-doing per-tier analysis without re-running inference.

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("earino/ecbs5200-week6-vanilla-baseline")
tokenizer = AutoTokenizer.from_pretrained("earino/ecbs5200-week6-vanilla-baseline")

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for earino/ecbs5200-week6-vanilla-baseline

Base model

answerdotai/ModernBERT-base

Finetuned

(1274)

this model