|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- drugs |
|
- classification |
|
- bert |
|
datasets: |
|
- lloydmeta/drug_dataset_cleaned |
|
widget: |
|
- text: 'I have been taking ambien or zolphidem for almost 15 years. ' |
|
example_title: Drugs for insomnia |
|
--- |
|
|
|
# Model Card for drug-BERT |
|
|
|
This is a multiclass classification model, built on top of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased), trained on |
|
the [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), and |
|
is useful for making a best attempt classification for the _condition_ someone has, based on their review of a drug. |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This is a multiclass classification model, built on top of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased), trained on |
|
the [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), and |
|
is useful for making a best attempt classification for the _condition_ someone has, based on their review of a drug. |
|
|
|
It was created as a learning exercise covering: |
|
* Colab |
|
* Transformer architecture |
|
* Finetuning/training on top of existing NLP models |
|
* Huggingface libraries |
|
|
|
|
|
**Developed by:** [lloydmeta](http://github.com/lloydmeta) of [beachape.com](https://beachape.com) |
|
**License:** Apache 2.0 |
|
**Finetuned from model:** [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) |
|
|
|
## Uses |
|
|
|
Classifying (identifying) the _condition_ someone has, based on their review of a drug. |
|
|
|
### Out-of-Scope Use |
|
|
|
Actual, clinical diagnosis. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
* Biases from the base `bert-base-uncased` model apply here |
|
* Only drugs and conditions in the drugs review dataset are included |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
condition_from_drug_review_classifier = pipeline("text-classification", model = "lloydmeta/drug-bert") |
|
text_sentiment = "I have been taking ambien or zolphidem for almost 15 years." |
|
condition_from_drug_review_classifier(text_sentiment) |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
* Trained on [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), cleaned up by removing html tags from reviews, with |
|
samples that lacked `condition` removed. |
|
* 60% of the data set was split for training data. |
|
* Irrelevant columns like `patient_id`, `drugName`, `rating`, `date`, etc were removed |
|
|
|
|
|
### Training Procedure |
|
|
|
* `review` data was tokenised with a max of 512 |
|
* Learning rate: 2e-5 |
|
* Epochs: 3 |
|
* Weight decay: 0.01 |
|
* Per device train batch size: 4 |
|
|
|
|
|
## Evaluation |
|
|
|
15% of the data set was split for evaluation. |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
25% of the data set was split for testing. |
|
|
|
## Model Card Authors |
|
|
|
[lloydmeta](http://github.com/lloydmeta) |
|
|
|
## Model Card Contact |
|
|
|
[lloydmeta](http://github.com/lloydmeta) |
|
|
|
|
|
|