Week 6 Vanilla Baseline (ModernBERT-base, plain CE on train+test)

Course material for ECBS5200 Applied Deep Learning at CEU Vienna. See earino/applied-deep-learning.

Vanilla cross-entropy baseline. Same model + same training data as the distilled student, but no teacher signal โ€” exists to isolate the marginal contribution of distillation.

Training recipe

  • Base: answerdotai/ModernBERT-base (149M params, full fine-tune, fresh classifier head)
  • Data: train+test combined from determined-ai/consumer_complaints_medium with the canonical course merge map + MIN_CLASS_COUNT=5 filter (79,278 examples, 113 classes)
  • Loss: plain cross-entropy on hard labels
  • Optimizer: AdamW, lr=5e-05, weight_decay=0.01, linear schedule, warmup ratio 0.06
  • Batch size: 32, max sequence length: 128, epochs: 3
  • Hardware: T4 fp16 + GradScaler (compute capability 7.5)
  • Seed: 42

Verified eval (val_ds, 6,430 examples)

Metric Value
Macro F1 0.2638
Accuracy 0.6106
NLL 1.5416
ECE 0.1300
Head F1 (top-20 classes, n=5155) 0.6105
Mid F1 (rank 20โ€“60, n=1065) 0.3797
Tail F1 (rank 60โ€“113, n=210) 0.1249

Files in this repo

  • model.safetensors + config.json โ€” model weights
  • tokenizer*.json, special_tokens_map.json โ€” tokenizer
  • val_predictions.npz โ€” raw fp16 val logits + per-example predictions + tier assignments. Useful for re-doing per-tier analysis without re-running inference.

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("earino/ecbs5200-week6-vanilla-baseline")
tokenizer = AutoTokenizer.from_pretrained("earino/ecbs5200-week6-vanilla-baseline")
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for earino/ecbs5200-week6-vanilla-baseline

Finetuned
(1274)
this model