drug-bert / README.md
lloydmeta's picture
Upload BertForSequenceClassification
8156326 verified
|
raw
history blame
3 kB
metadata
language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - drugs
  - classification
  - bert
datasets:
  - lloydmeta/drug_dataset_cleaned
widget:
  - text: 'I have been taking ambien or zolphidem for almost 15 years. '
    example_title: Drugs for insomnia

Model Card for drug-BERT

This is a multiclass classification model, built on top of google-bert/bert-base-uncased, trained on the Drug Review Dataset (Drugs.com), and is useful for making a best attempt classification for the condition someone has, based on their review of a drug.

Model Details

Model Description

This is a multiclass classification model, built on top of google-bert/bert-base-uncased, trained on the Drug Review Dataset (Drugs.com), and is useful for making a best attempt classification for the condition someone has, based on their review of a drug.

It was created as a learning exercise covering:

  • Colab
  • Transformer architecture
  • Finetuning/training on top of existing NLP models
  • Huggingface libraries

Developed by: lloydmeta of beachape.com
License: Apache 2.0
Finetuned from model: google-bert/bert-base-uncased

Uses

Classifying (identifying) the condition someone has, based on their review of a drug.

Out-of-Scope Use

Actual, clinical diagnosis.

Bias, Risks, and Limitations

  • Biases from the base bert-base-uncased model apply here
  • Only drugs and conditions in the drugs review dataset are included

How to Get Started with the Model

from transformers import pipeline

condition_from_drug_review_classifier = pipeline("text-classification", model = "lloydmeta/drug-bert")
text_sentiment = "I have been taking ambien or zolphidem for almost 15 years."
condition_from_drug_review_classifier(text_sentiment)

Training Details

Training Data

  • Trained on Drug Review Dataset (Drugs.com), cleaned up by removing html tags from reviews, with samples that lacked condition removed.
  • 60% of the data set was split for training data.
  • Irrelevant columns like patient_id, drugName, rating, date, etc were removed

Training Procedure

  • review data was tokenised with a max of 512
  • Learning rate: 2e-5
  • Epochs: 3
  • Weight decay: 0.01
  • Per device train batch size: 4

Evaluation

15% of the data set was split for evaluation.

Testing Data, Factors & Metrics

Testing Data

25% of the data set was split for testing.

Model Card Authors

lloydmeta

Model Card Contact

lloydmeta