Text Classification
Transformers
Safetensors
English
distilbert
nlp
healthcare
medical
drug-reviews
text-embeddings-inference
Instructions to use Talip7/distilbert-drug-cls with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Talip7/distilbert-drug-cls with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Talip7/distilbert-drug-cls")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Talip7/distilbert-drug-cls") model = AutoModelForSequenceClassification.from_pretrained("Talip7/distilbert-drug-cls") - Notebooks
- Google Colab
- Kaggle
Drug Review Condition Classifier (DistilBERT)
This model is a multi-class text classification model trained to predict a medical condition based on patient drug reviews from the Drugs.com dataset.
π Model Overview
- Base model: distilbert-base-uncased
- Task: Text Classification
- Number of labels: ~770+ medical conditions
- Max sequence length: 256
- Training epochs: 3
- Optimizer: AdamW
- Weight decay: 0.01
π Evaluation Results
Validation
- Accuracy: ~0.74
- Macro F1: ~0.15
- Loss: ~1.15
Test
- Accuracy: ~0.74
- Macro F1: ~0.15
- Loss: ~1.13
Macro F1 is relatively low due to strong class imbalance and a large number of rare condition labels. Accuracy reflects strong performance on frequent condition classes.
π§ Training Details
- Hugging Face Trainer API
- Dynamic padding with
DataCollatorWithPadding - Automatic acceleration via Accelerate
- Train-only label space (no label leakage)
- Evaluation on held-out validation and test splits
π Example Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Talip7/distilbert-drug-cls"
)
classifier(
"This medication significantly reduced my migraine but caused nausea and dizziness."
)
β οΈ Limitations
Highly imbalanced class distribution
User-generated reviews may contain noise
Not intended for medical advice or diagnosis
π Dataset
Source: Drugs.com reviews dataset
Preprocessing:
lowercasing
HTML cleanup
minimum review length filtering
π¨βπ» Author
Trained and published as part of hands-on NLP / LLM learning with Hugging Face.
- Downloads last month
- 1