drug-bert / README.md
lloydmeta's picture
Upload BertForSequenceClassification
8156326 verified
|
raw
history blame
3 kB
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- drugs
- classification
- bert
datasets:
- lloydmeta/drug_dataset_cleaned
widget:
- text: 'I have been taking ambien or zolphidem for almost 15 years. '
example_title: Drugs for insomnia
---
# Model Card for drug-BERT
This is a multiclass classification model, built on top of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased), trained on
the [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), and
is useful for making a best attempt classification for the _condition_ someone has, based on their review of a drug.
## Model Details
### Model Description
This is a multiclass classification model, built on top of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased), trained on
the [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), and
is useful for making a best attempt classification for the _condition_ someone has, based on their review of a drug.
It was created as a learning exercise covering:
* Colab
* Transformer architecture
* Finetuning/training on top of existing NLP models
* Huggingface libraries
**Developed by:** [lloydmeta](http://github.com/lloydmeta) of [beachape.com](https://beachape.com)
**License:** Apache 2.0
**Finetuned from model:** [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)
## Uses
Classifying (identifying) the _condition_ someone has, based on their review of a drug.
### Out-of-Scope Use
Actual, clinical diagnosis.
## Bias, Risks, and Limitations
* Biases from the base `bert-base-uncased` model apply here
* Only drugs and conditions in the drugs review dataset are included
## How to Get Started with the Model
```python
from transformers import pipeline
condition_from_drug_review_classifier = pipeline("text-classification", model = "lloydmeta/drug-bert")
text_sentiment = "I have been taking ambien or zolphidem for almost 15 years."
condition_from_drug_review_classifier(text_sentiment)
```
## Training Details
### Training Data
* Trained on [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), cleaned up by removing html tags from reviews, with
samples that lacked `condition` removed.
* 60% of the data set was split for training data.
* Irrelevant columns like `patient_id`, `drugName`, `rating`, `date`, etc were removed
### Training Procedure
* `review` data was tokenised with a max of 512
* Learning rate: 2e-5
* Epochs: 3
* Weight decay: 0.01
* Per device train batch size: 4
## Evaluation
15% of the data set was split for evaluation.
### Testing Data, Factors & Metrics
#### Testing Data
25% of the data set was split for testing.
## Model Card Authors
[lloydmeta](http://github.com/lloydmeta)
## Model Card Contact
[lloydmeta](http://github.com/lloydmeta)