PubMedBERT for Biomedical Relation Extraction
Fine-tuned PubMedBERT for multi-class relation extraction in biomedical text.
Model Description
This model extracts semantic relations between biomedical entities (chemicals, diseases, genes, proteins) from scientific literature.
Base Model: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract
Training Data: chemprot, bc5cdr, gad, biored, ddi
Relation Types (9):
activatesinhibitsconvertscausestreatsassociated_withinteracts_withlocated_inNO_RELATION
Performance
| Metric | Value |
|---|---|
| F1 Macro | 0.7347 |
| Accuracy | 75.3% |
Per-Class F1 Scores
| Relation | F1 | Support |
|---|---|---|
| interacts_with | 0.85 | 1,304 |
| inhibits | 0.84 | 2,704 |
| activates | 0.83 | 3,412 |
| converts | 0.82 | 884 |
| associated_with | 0.81 | 1,769 |
| causes | 0.81 | 6,760 |
| NO_RELATION | 0.63 | 6,760 |
| treats | 0.28 | 678 |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "your-username/pubmedbert-relation-extraction"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Add entity markers
special_tokens = {"additional_special_tokens": ["[E1]", "[/E1]", "[E2]", "[/E2]"]}
tokenizer.add_special_tokens(special_tokens)
model.resize_token_embeddings(len(tokenizer))
# Example: Extract relation between aspirin and pain
text = "[E1]Aspirin[/E1] reduces [E2]pain[/E2] in patients."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(probs, dim=-1).item()
print(f"Predicted relation: {model.config.id2label[predicted_class]}")
print(f"Confidence: {probs[0][predicted_class].item():.3f}")
Input Format
Text must contain entity markers [E1], [/E1], [E2], [/E2] around the two entities:
[E1]Entity1[/E1] ... context ... [E2]Entity2[/E2]
Training Details
- Optimizer: AdamW
- Learning Rate: 2e-5
- Batch Size: 16
- Epochs: 15 (early stopping)
- Max Length: 256 tokens
- Loss: Weighted CrossEntropy
Limitations
treatsrelation has low F1 (0.28) due to limited training data- Best performance on Chemical↔Gene/Protein and Disease relations
- Requires entity markers in input text
- Trained on English biomedical abstracts
Citation
@misc{pubmedbert-relation-extraction,
author = {Your Name},
title = {PubMedBERT for Biomedical Relation Extraction},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/your-username/pubmedbert-relation-extraction}}
}
Acknowledgments
- Base model: PubMedBERT
- Datasets: ChemProt, BC5CDR, GAD, BioRED, DDI Corpus
- Downloads last month
- 114
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Evaluation results
- F1 Macroself-reported0.735