MAP β€” DeBERTa-v3-large 5-Fold Classifier

Kaggle Competition: MAP β€” Charting Student Math Misunderstandings

Final Score: Public LB 0.91924 / Private LB 0.91107 (DeBERTa-v3-large, 5-fold ensemble)


Model Description

This repository contains 5-fold checkpoints of a DeBERTa-v3-large classifier trained for the MAP Kaggle competition. The task is to predict the Category:Misconception label for each student response, given the question text, the student's selected answer, and the student's written explanation.

The label space has 65 classes combining:

  • Category (6 types): True_Correct, True_Neither, True_Misconception, False_Correct, False_Neither, False_Misconception
  • Misconception name (or NA if no misconception)

Repository Structure

β”œβ”€β”€ deberta_fold0/
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ model.safetensors      # DeBERTa-v3-large backbone weights
β”‚   β”œβ”€β”€ head_weights.pt        # Custom pooler + classifier head weights
β”‚   β”œβ”€β”€ tokenizer.json
β”‚   └── tokenizer_config.json
β”œβ”€β”€ deberta_fold1/  ... (same structure)
β”œβ”€β”€ deberta_fold2/  ... (same structure)
β”œβ”€β”€ deberta_fold3/  ... (same structure)
β”œβ”€β”€ deberta_fold4/  ... (same structure)
└── deberta_label_list.txt     # All 65 label strings, one per line

Model Architecture

A custom DebertaClassifier wrapping microsoft/deberta-v3-large:

class DebertaClassifier(nn.Module):
    def __init__(self, backbone, num_labels):
        super().__init__()
        self.backbone   = backbone
        hidden_size     = backbone.config.hidden_size  # 1024 for large
        self.pooler     = nn.Linear(hidden_size, hidden_size)
        self.classifier = nn.Linear(hidden_size, num_labels)
        self.dropout    = nn.Dropout(0.1)

    def forward(self, input_ids, attention_mask, token_type_ids=None, labels=None, **kwargs):
        out    = self.backbone(input_ids=input_ids, attention_mask=attention_mask)
        cls    = out.last_hidden_state[:, 0, :].float()
        pooled = torch.tanh(self.pooler(self.dropout(cls)))
        logits = self.classifier(self.dropout(pooled))
        loss   = None
        if labels is not None:
            loss = nn.CrossEntropyLoss()(logits, labels)
        return SequenceClassifierOutput(loss=loss, logits=logits)

The backbone weights are stored as model.safetensors (HuggingFace standard format). The custom pooler and classifier head weights are stored separately in head_weights.pt.


Training Details

Hyperparameter Value
Base model microsoft/deberta-v3-large
Max length 256
Batch size 16
Learning rate 2e-5
Warmup ratio 0.1
Weight decay 0.01
Epochs 3 (with early stopping, patience=1)
LR scheduler Cosine
Optimizer AdamW
Mixed precision BF16
Cross-validation GroupKFold (n=5, grouped by QuestionId)
Loss function CrossEntropyLoss with inverse-frequency class weights, clipped to [0.5, 10.0]

Input format:

Question: {QuestionText}
Student selected: {MC_Answer}
Student explanation: {StudentExplanation}

Inference

import torch
import numpy as np
from transformers import AutoTokenizer, AutoModel
from torch import nn
from transformers.modeling_outputs import SequenceClassifierOutput

class DebertaClassifier(nn.Module):
    def __init__(self, backbone, num_labels):
        super().__init__()
        self.backbone   = backbone
        hidden_size     = backbone.config.hidden_size
        self.pooler     = nn.Linear(hidden_size, hidden_size)
        self.classifier = nn.Linear(hidden_size, num_labels)
        self.dropout    = nn.Dropout(0.1)

    def forward(self, input_ids, attention_mask, **kwargs):
        out    = self.backbone(input_ids=input_ids, attention_mask=attention_mask)
        cls    = out.last_hidden_state[:, 0, :].float()
        pooled = torch.tanh(self.pooler(self.dropout(cls)))
        logits = self.classifier(self.dropout(pooled))
        return SequenceClassifierOutput(logits=logits)


# Load label list
with open("deberta_label_list.txt") as f:
    LABEL_LIST = [l.strip() for l in f if l.strip()]

device = "cuda" if torch.cuda.is_available() else "cpu"
fold_logits = []

for fold in range(5):
    ckpt_path = f"deberta_fold{fold}"

    tok      = AutoTokenizer.from_pretrained(ckpt_path)
    backbone = AutoModel.from_pretrained(ckpt_path)
    model    = DebertaClassifier(backbone, num_labels=len(LABEL_LIST))

    head = torch.load(f"{ckpt_path}/head_weights.pt", map_location="cpu", weights_only=True)
    model.pooler.load_state_dict(head["pooler"])
    model.classifier.load_state_dict(head["classifier"])
    model.eval().to(device)

    texts = [
        "Question: Which fraction is equivalent to 0.5?\nStudent selected: 1/2\nStudent explanation: Because 1 divided by 2 equals 0.5"
    ]

    with torch.no_grad():
        enc    = tok(texts, padding=True, truncation=True, max_length=256, return_tensors="pt")
        enc    = {k: v.to(device) for k, v in enc.items()}
        logits = model(**enc).logits.float().cpu().numpy()
        fold_logits.append(logits)

# Logit ensemble (average logits, then softmax)
mean_logits = np.mean(fold_logits, axis=0)
probs       = torch.softmax(torch.tensor(mean_logits), dim=-1).numpy()
top3_idx    = np.argsort(-probs, axis=1)[:, :3]
top3_labels = [[LABEL_LIST[j] for j in row] for row in top3_idx]

print(top3_labels)
# e.g. [['True_Correct:NA', 'True_Neither:NA', 'False_Correct:NA']]

Results

Version CV MAP@3 Simulated LB Public LB Private LB
deberta-v3-base (GroupKFold) 0.3213 0.9051 0.89397 0.89433
deberta-v3-large (this repo) 0.2925 0.9351 0.91924 0.91107
deberta-v3-large + Logit Ensemble 0.2925 0.9351 0.93081 0.92442

Note on CV vs LB: GroupKFold CV is low (0.29) because each fold validates on completely unseen questions (only 15 unique questions in training data). The Kaggle test set shares the same question IDs as training, so the 5-fold ensemble effectively has seen 4/5 of the test questions during training β€” making the LB much higher than CV suggests.


Citation

@misc{lyixuan2026map,
  author    = {Li, Yi-Shiuan},
  title     = {MAP Charting Student Math Misunderstandings β€” DeBERTa-v3-large 5-Fold},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/lyixuan0718/map-deberta-v3-large-5fold}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lyixuan0718/map-deberta-v3-large-5fold

Finetuned
(282)
this model