SciBERT Fine-tuned for SciCite Intent Classification

This model is a fine-tuned version of allenai/scibert_scivocab_uncased on the SciCite dataset.

Model Description

Base Model: AllenAI's SciBERT pre-trained on scientific text with a domain-specific vocabulary.

Task: Citation Intent Classification โ€” predicting why an author is citing another work (background, method, or result).

Labels:

  • background (0): Citations providing prior work context
  • method (1): Citations of techniques or methodologies
  • result (2): Citations comparing or contrasting experimental results

Results

Achieved on the SciCite test set:

Metric Score
Accuracy 85.60%
Macro F1 0.8431
Weighted F1 0.8566

Per-class performance:

Class Precision Recall F1-Score Support
background 0.88 0.87 0.88 997
method 0.88 0.81 0.84 605
result 0.74 0.89 0.81 259

Intended Uses & Limitations

Intended Use: Automatically classify citation intents in academic papers to improve literature mining, knowledge graph construction, and semantic search applications.

Limitations: Model trained on arXiv scientific abstracts; may not generalize to other domains (biomedical, legal, etc.). Best performance on background/method classes; result class has lower precision due to class imbalance.

Training and Evaluation Data

Dataset: SciCite โ€” 8,243 training examples, 916 validation, 1,861 test (citation contexts from arXiv papers).

Format: Citation sentence + class label. Max length: 256 tokens. Split: 80% train, 10% val, 10% test.

How to Use

Installation

pip install transformers torch

Inference

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("lostelf/scibert_scicite_finetuned")
model = AutoModelForSequenceClassification.from_pretrained("lostelf/scibert_scicite_finetuned")

text = "We use the BERT architecture as in Devlin et al."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = logits.argmax(dim=-1).item()

labels = {0: "background", 1: "method", 2: "result"}
print(f"Predicted: {labels[predicted_class]}")

Batch Prediction

texts = [
    "We build on the transformer framework introduced by Vaswani et al.",
    "Our implementation follows the optimization procedure in Kingma & Ba.",
    "These results exceed prior work by Devlin et al. (BERT)."
]

inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256)
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=-1)
print([labels[p] for p in predictions])

Training Hyperparameters

Parameter Value
Model allenai/scibert_scivocab_uncased
Epochs 8
Batch Size 32
Learning Rate 1e-05
Warmup Steps 25 (~10% of training)
Weight Decay 0.01
Optimizer AdamW
LR Scheduler linear
Gradient Accumulation 2 steps
FP16 Enabled

Training Results

Best checkpoint: Epoch 8 (macro F1 = 0.8431). Early stopping patience = 4 epochs.

Training curve (eval epochs):

Epoch | Train Loss | Val Loss | Macro F1 | Micro F1
------|------------|----------|----------|----------
  1 | 0.7104     | 0.4155     | 0.8143   | 0.8395  
  2 | 0.5514     | 0.4304     | 0.8176   | 0.8428  
  3 | 0.4605     | 0.4256     | 0.8288   | 0.8504  
  4 | 0.3829     | 0.4514     | 0.8310   | 0.8515  
  5 | 0.3176     | 0.4908     | 0.8311   | 0.8537  
  6 | 0.2678     | 0.5162     | 0.8334   | 0.8548  
  7 | 0.2288     | 0.5507     | 0.8351   | 0.8548  
  8 | N/A        | 0.5648     | 0.8275   | 0.8493  

Framework Versions

  • Python: 3.11
  • PyTorch: 2.0+
  • Transformers: 4.38+
  • Datasets: 2.14+
  • Scikit-learn: 1.3+

Generated: 2026-04-14 22:24:37
Training GPU: 2

Downloads last month
367
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results