MAP — DeBERTa-v3-large 5-Fold Classifier

Kaggle Competition: MAP — Charting Student Math Misunderstandings

Final Score: Public LB 0.91924 / Private LB 0.91107 (DeBERTa-v3-large, 5-fold ensemble)

Model Description

This repository contains 5-fold checkpoints of a DeBERTa-v3-large classifier trained for the MAP Kaggle competition. The task is to predict the Category:Misconception label for each student response, given the question text, the student's selected answer, and the student's written explanation.

The label space has 65 classes combining:

Category (6 types): True_Correct, True_Neither, True_Misconception, False_Correct, False_Neither, False_Misconception
Misconception name (or NA if no misconception)

Repository Structure

├── deberta_fold0/
│   ├── config.json
│   ├── model.safetensors      # DeBERTa-v3-large backbone weights
│   ├── head_weights.pt        # Custom pooler + classifier head weights
│   ├── tokenizer.json
│   └── tokenizer_config.json
├── deberta_fold1/  ... (same structure)
├── deberta_fold2/  ... (same structure)
├── deberta_fold3/  ... (same structure)
├── deberta_fold4/  ... (same structure)
└── deberta_label_list.txt     # All 65 label strings, one per line

Model Architecture

A custom DebertaClassifier wrapping microsoft/deberta-v3-large:

class DebertaClassifier(nn.Module):
    def __init__(self, backbone, num_labels):
        super().__init__()
        self.backbone   = backbone
        hidden_size     = backbone.config.hidden_size  # 1024 for large
        self.pooler     = nn.Linear(hidden_size, hidden_size)
        self.classifier = nn.Linear(hidden_size, num_labels)
        self.dropout    = nn.Dropout(0.1)

    def forward(self, input_ids, attention_mask, token_type_ids=None, labels=None, **kwargs):
        out    = self.backbone(input_ids=input_ids, attention_mask=attention_mask)
        cls    = out.last_hidden_state[:, 0, :].float()
        pooled = torch.tanh(self.pooler(self.dropout(cls)))
        logits = self.classifier(self.dropout(pooled))
        loss   = None
        if labels is not None:
            loss = nn.CrossEntropyLoss()(logits, labels)
        return SequenceClassifierOutput(loss=loss, logits=logits)

The backbone weights are stored as model.safetensors (HuggingFace standard format). The custom pooler and classifier head weights are stored separately in head_weights.pt.

Training Details

Hyperparameter	Value
Base model	microsoft/deberta-v3-large
Max length	256
Batch size	16
Learning rate	2e-5
Warmup ratio	0.1
Weight decay	0.01
Epochs	3 (with early stopping, patience=1)
LR scheduler	Cosine
Optimizer	AdamW
Mixed precision	BF16
Cross-validation	GroupKFold (n=5, grouped by QuestionId)
Loss function	CrossEntropyLoss with inverse-frequency class weights, clipped to [0.5, 10.0]

Input format:

Question: {QuestionText}
Student selected: {MC_Answer}
Student explanation: {StudentExplanation}

Inference

import torch
import numpy as np
from transformers import AutoTokenizer, AutoModel
from torch import nn
from transformers.modeling_outputs import SequenceClassifierOutput

class DebertaClassifier(nn.Module):
    def __init__(self, backbone, num_labels):
        super().__init__()
        self.backbone   = backbone
        hidden_size     = backbone.config.hidden_size
        self.pooler     = nn.Linear(hidden_size, hidden_size)
        self.classifier = nn.Linear(hidden_size, num_labels)
        self.dropout    = nn.Dropout(0.1)

    def forward(self, input_ids, attention_mask, **kwargs):
        out    = self.backbone(input_ids=input_ids, attention_mask=attention_mask)
        cls    = out.last_hidden_state[:, 0, :].float()
        pooled = torch.tanh(self.pooler(self.dropout(cls)))
        logits = self.classifier(self.dropout(pooled))
        return SequenceClassifierOutput(logits=logits)


# Load label list
with open("deberta_label_list.txt") as f:
    LABEL_LIST = [l.strip() for l in f if l.strip()]

device = "cuda" if torch.cuda.is_available() else "cpu"
fold_logits = []

for fold in range(5):
    ckpt_path = f"deberta_fold{fold}"

    tok      = AutoTokenizer.from_pretrained(ckpt_path)
    backbone = AutoModel.from_pretrained(ckpt_path)
    model    = DebertaClassifier(backbone, num_labels=len(LABEL_LIST))

    head = torch.load(f"{ckpt_path}/head_weights.pt", map_location="cpu", weights_only=True)
    model.pooler.load_state_dict(head["pooler"])
    model.classifier.load_state_dict(head["classifier"])
    model.eval().to(device)

    texts = [
        "Question: Which fraction is equivalent to 0.5?\nStudent selected: 1/2\nStudent explanation: Because 1 divided by 2 equals 0.5"
    ]

    with torch.no_grad():
        enc    = tok(texts, padding=True, truncation=True, max_length=256, return_tensors="pt")
        enc    = {k: v.to(device) for k, v in enc.items()}
        logits = model(**enc).logits.float().cpu().numpy()
        fold_logits.append(logits)

# Logit ensemble (average logits, then softmax)
mean_logits = np.mean(fold_logits, axis=0)
probs       = torch.softmax(torch.tensor(mean_logits), dim=-1).numpy()
top3_idx    = np.argsort(-probs, axis=1)[:, :3]
top3_labels = [[LABEL_LIST[j] for j in row] for row in top3_idx]

print(top3_labels)
# e.g. [['True_Correct:NA', 'True_Neither:NA', 'False_Correct:NA']]

Results

Version	CV MAP@3	Simulated LB	Public LB	Private LB
deberta-v3-base (GroupKFold)	0.3213	0.9051	0.89397	0.89433
deberta-v3-large (this repo)	0.2925	0.9351	0.91924	0.91107
deberta-v3-large + Logit Ensemble	0.2925	0.9351	0.93081	0.92442

Note on CV vs LB: GroupKFold CV is low (0.29) because each fold validates on completely unseen questions (only 15 unique questions in training data). The Kaggle test set shares the same question IDs as training, so the 5-fold ensemble effectively has seen 4/5 of the test questions during training — making the LB much higher than CV suggests.

Citation

@misc{lyixuan2026map,
  author    = {Li, Yi-Shiuan},
  title     = {MAP Charting Student Math Misunderstandings — DeBERTa-v3-large 5-Fold},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/lyixuan0718/map-deberta-v3-large-5fold}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for lyixuan0718/map-deberta-v3-large-5fold

Base model

microsoft/deberta-v3-large

Finetuned

(282)

this model