BioBERT for Adverse Drug Effect (ADE) Classification

This model is a fine-tuned version of dmis-lab/biobert-base-cased-v1.2 for binary sentence classification: Does a sentence describe an adverse drug effect (ADE)? It was fine-tuned on the ADE Corpus V2 dataset and compared against a classical TF-IDF + Logistic Regression baseline as part of a broader project benchmarking classical vs. transformer approaches on imbalanced biomedical text.

Project Repo: GitHub

Results (Test Set: N=3,528)

Model Weighted F1 ADE Class F1 Accuracy Total Errors
TF-IDF + Logistic Regression 0.90 0.84 90% 349
BioBERT (this model) 0.96 0.93 96% 145

BioBERT reduced misclassifications by 58% (349 → 145 errors) compared to the classical baseline.

Training Details

  • Base model: dmis-lab/biobert-base-cased-v1.2 (110M parameters)
  • Epochs: 3 (Best checkpoint selected by validation F1)
  • Learning rate: 2e-5
  • Batch size: 16
  • Max sequence length: 128
  • Precision: fp16
  • Data split: stratified 70/15/15 train/val/test (seed=42)
Epoch Train Loss Val F1 Val Accuracy
1 0.175 0.943 0.943
2 0.114 0.952 0.952
3 0.043 0.952 0.952

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("scheun/biobert-ade-classifier")
tokenizer = AutoTokenizer.from_pretrained("scheun/biobert-ade-classifier")

inputs = tokenizer("Patient developed severe nausea after taking the medication.", return_tensors="pt")
outputs = model(**inputs)
prediction = outputs.logits.argmax(-1).item()
print(prediction)  # 0 = not ADE, 1 = ADE

Limitations

  • Trained on MEDLINE case report sentences. Performance may vary on other text domains.
  • Binary classification only. It does not extract which drug or which effect is mentioned.

References

  • Gurulingappa et al. (2012), Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports
  • Lee et al. (2020), BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Downloads last month
38
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for scheun/biobert-ade-classifier

Finetuned
(35)
this model

Dataset used to train scheun/biobert-ade-classifier