medclassify-ai

DistilBERT fine-tuned on the PubMed 200k RCT dataset for structural classification of medical abstract sentences.

Given a sentence from a clinical abstract, the model predicts one of five structural roles: BACKGROUND, OBJECTIVE, METHODS, RESULTS, or CONCLUSIONS.


Model details

Base model distilbert-base-uncased
Task 5-class text classification
Parameters 67M
Max input length 128 tokens
Dataset PubMed 200k RCT
Training framework HuggingFace Transformers Trainer API
Author Mohammed Suhail Ahmed Khan — GitHub

Label mapping

ID Label
0 BACKGROUND
1 CONCLUSIONS
2 METHODS
3 OBJECTIVE
4 RESULTS

Quick start

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="SuhailKhan06/medclassify-ai"
)

sentences = [
    "Patients were randomly assigned to two treatment groups.",
    "The aim of this study was to evaluate the safety of drug X.",
    "These findings suggest the intervention is effective.",
    "Cardiovascular disease is a leading cause of death.",
    "The treatment significantly improved 30-day survival rates."
]

for s in sentences:
    print(classifier(s))

Training data

PubMed 200k RCT (source) — sentences extracted from PubMed abstracts of randomized controlled trials, labeled with their structural role.

Split Sentences
Train 176,642
Validation 29,672
Test 29,578

Baseline comparison

Before fine-tuning, a TF-IDF (50k features, unigram + bigram) + Logistic Regression baseline was trained and evaluated on the same splits.

Model Test accuracy Weighted F1
TF-IDF + Logistic Regression 77.55% 77.10%
DistilBERT (this model) checkpoint saved — full eval pending

DistilBERT training was interrupted before full convergence. The saved checkpoint is available and full evaluation metrics will be added once training completes.


Limitations

  • Trained on PubMed abstracts from randomized controlled trials. Performance on other abstract types (observational studies, case reports, reviews) is untested and likely lower.
  • English-only.
  • Short sentences (under 128 tokens). Very long sentences will be truncated.
  • The BACKGROUND and OBJECTIVE classes are the most confused by this model — they are structurally and lexically similar, and the baseline shows this clearly (F1 of 0.56 and 0.55 respectively).

Citation

If you use this model or the PubMed 200k RCT dataset, please cite the original dataset paper:

Dernoncourt, F., & Lee, J. Y. (2017).
PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts.
arXiv:1710.06071
Downloads last month
5
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SuhailKhan06/medclassify-ai

Finetuned
(11885)
this model

Dataset used to train SuhailKhan06/medclassify-ai

Space using SuhailKhan06/medclassify-ai 1

Paper for SuhailKhan06/medclassify-ai