Argument Quality Ranking - RoBERTa v3 (Best Model)
Fine-tuned RoBERTa-base for pairwise argument quality ranking using margin ranking loss and test-time pair flipping. Achieves 65.7% accuracy on both in-domain and cross-topic test sets, matching GPT-5.5 zero-shot (66.5%) with a 50x smaller model.
Model Details
- Base model: roberta-base
- Task: Pairwise argument quality classification (A wins / B wins)
- Training data: IBM ArgQ corpus (3,587 pairs, 60 topics)
- Input format:
[CLS] topic [SEP] arg_a [SEP] arg_b - Inference: Test-time pair flipping (predict both orderings, average scores)
Key Improvements over v2
- Margin ranking loss (margin=0.3) replaces cross-entropy, directly optimising the score gap between winner and loser
- Test-time pair flipping eliminates positional bias at inference
Performance
| Split | Accuracy | F1 | Precision | Recall |
|---|---|---|---|---|
| In-domain | 65.7% | 0.644 | 0.673 | 0.616 |
| Cross-topic | 65.7% | 0.673 | 0.669 | 0.677 |
Zero generalization gap between in-domain and cross-topic -- the only model in our experiments to achieve this.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("SambhavSBU/argument-quality-roberta-v3")
model = AutoModelForSequenceClassification.from_pretrained("SambhavSBU/argument-quality-roberta-v3")
def predict(topic, arg_a, arg_b):
def score(a, b):
inp = tokenizer(topic + " [SEP] " + a, b,
return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inp).logits
return (logits[0, 1] - logits[0, 0]).item()
# test-time pair flipping: average both orderings
margin = (score(arg_a, arg_b) - score(arg_b, arg_a)) / 2
return "A" if margin > 0 else "B"
topic = "We should ban social media"
arg_a = "Social media spreads misinformation at an unprecedented scale."
arg_b = "Social media connects people across the world."
print(f"Higher quality argument: {predict(topic, arg_a, arg_b)}")
Citation
Code and full experiments: https://github.com/Sambhav101/Argument-Quality-Ranking
- Downloads last month
- 25