Model Card for drug-BERT

This is a multiclass classification model, built on top of google-bert/bert-base-uncased, trained on the Drug Review Dataset (Drugs.com), and is useful for making a best attempt classification for the condition someone has, based on their review of a drug.

Model Details

Model Description

This is a multiclass classification model, built on top of google-bert/bert-base-uncased, trained on the Drug Review Dataset (Drugs.com), and is useful for making a best attempt classification for the condition someone has, based on their review of a drug.

It was created as a learning exercise covering:

  • Colab
  • Transformer architecture
  • Finetuning/training on top of existing NLP models
  • Huggingface libraries

Developed by: lloydmeta of beachape.com
License: Apache 2.0
Finetuned from model: google-bert/bert-base-uncased

Uses

Classifying (identifying) the condition someone has, based on their review of a drug.

Out-of-Scope Use

Actual, clinical diagnosis.

Bias, Risks, and Limitations

  • Biases from the base bert-base-uncased model apply here
  • Only drugs and conditions in the drugs review dataset are included

How to Get Started with the Model

from transformers import pipeline

condition_from_drug_review_classifier = pipeline("text-classification", model = "lloydmeta/drug-bert")
text_sentiment = "I have been taking ambien or zolphidem for almost 15 years."
condition_from_drug_review_classifier(text_sentiment)

Training Details

Training Data

  • Trained on Drug Review Dataset (Drugs.com), cleaned up by removing html tags from reviews, with samples that lacked condition removed.
  • 60% of the data set was split for training data.
  • Irrelevant columns like patient_id, drugName, rating, date, etc were removed

Training Procedure

  • review data was tokenised with a max of 512
  • Learning rate: 2e-5
  • Epochs: 3
  • Weight decay: 0.01
  • Per device train batch size: 4

Evaluation

15% of the data set was split for evaluation.

Testing Data, Factors & Metrics

Testing Data

25% of the data set was split for testing.

Model Card Authors

lloydmeta

Model Card Contact

lloydmeta

Downloads last month
33
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train lloydmeta/drug-bert