|
--- |
|
base_model: |
|
- FacebookAI/roberta-base |
|
datasets: |
|
- MarioBarbeque/UCI_drug_reviews |
|
language: |
|
- en |
|
library_name: transformers |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- precision |
|
- recall |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
We fine-tune the RoBERTa base model [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) for multi-label classification of medical conditions. |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
|
|
The RoBERTa base model is fined-tuned in a quick fashion for the purpose of introducing ourselves to the entirety of the π€ ecosystem. We supervise a training of |
|
RoBERTa for the purpose of multi-label classification on [MarioBarbeque/UCI_drug_reviews](https://huggingface.co/datasets/MarioBarbeque/UCI_drug_reviews), an open source |
|
dateset available through the [UC Irvine ML Repository](https://archive.ics.uci.edu), that we downloaded and preprocessed. The model is trained to classify patient conditions |
|
based on the same patient's review of drugs they took as part of treatment. |
|
|
|
Subsequently, we evaluate our model by introducing a new set of metrics to address bugs found in |
|
the π€ Evaluate package. We construct the `FixedF1`, `FixedPrecision`, and `FixedRecall` evaluation metrics available |
|
[here](https://github.com/johngrahamreynolds/FixedMetricsForHF) as a simple workaround for a long-term issue related to π€ Evaluate's |
|
ability to `combine` various metrics for collective evaluation. These metrics subclass the `Metric` class from π€ Evaluate to generalize each of the `F1`, |
|
`Precision`, and `Recall` classes to allow for `combine`d multi-label classification. Without such a generalization, attempts to use the built-in classes raise an error |
|
when attempting to classify a non-binary 1 label. |
|
|
|
During the process of running into errors and debugging, we researched the underlying issue(s) and proposed a |
|
[plausible solution](https://github.com/huggingface/evaluate/issues/462#issuecomment-2448686687), awaiting repo owner review, that would close a set of longstanding open |
|
issues on the π€ Evaluate GitHub repo. |
|
|
|
|
|
|
|
- **Developed by:** John Graham Reynolds |
|
- **Funded by:** Vanderbilt University |
|
- **Model type:** Multi-label Text Classification |
|
- **Language(s) (NLP):** English |
|
- **Finetuned from model:** "FacebookAI/roberta-base" |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/johngrahamreynolds/RoBERTa-base-DReiFT |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
|
|
In order to query the model effectively, one must pass it a string detailing the review of a drug taken to address an underlying medical condition. The model will attempt |
|
to classify the medical condition based on its pre-trained knowledge of hundreds of thousands of total drug reviews for 805 medical conditions. |
|
|
|
## How to Use and Query the Model |
|
|
|
Use the code below to get started with the model. Users pass into the `drug_review` list a string detailing the review of some drug. The model will attempt |
|
to classify the condition for which the drug is being taken. Users are free to pass any string they like (relevant to a drug review or not), but the model has been trained |
|
specifically on drug reviews for the purpose of multi-label classification. It will output to the best of its ability a medical condition to which the string most relates |
|
as an extended non-trivial relation. See the example below: |
|
|
|
``` python |
|
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
model_name = "MarioBarbeque/RoBERTa-base-DReiFT" |
|
tokenizer_name = "FacebookAI/roberta-base" |
|
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name, device_map="auto") |
|
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) |
|
|
|
# Pass a unique 'drug-review' to classify the underlying issue based upon 805 pretrained medical conditions |
|
drug_review = [ |
|
"My tonsils were swollen and I had a hard time swallowing. |
|
I had a minimal fever to accompany the pain in my throat. |
|
Taking Aleve at regular intervals throughout the day improved my swallowing. |
|
I am now taking Aleve every 4 hours." |
|
] |
|
tokenized_review = tokenizer(drug_review, return_tensors="pt").to("cuda") |
|
|
|
output = model(**tokenized_review) |
|
label_id = torch.argmax(output.logits, dim=-1).item() |
|
predicted_label = model.config.id2label[label_id] |
|
print(f"The model predicted the underlying condition to be: {predicted_label}") |
|
|
|
``` |
|
|
|
This code outputs the following: |
|
|
|
``` python |
|
The model predicted the underlying condition to be: tonsillitis/pharyngitis |
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data / Preprocessing |
|
|
|
The data used comes from the UC Irvine Machine Learning Repository. It has been preprocessed to only contain reviews at least 13 or more words in length. The model card |
|
can be found [here](https://huggingface.co/datasets/MarioBarbeque/UCI_drug_reviews). |
|
|
|
### Training Procedure |
|
|
|
The model was trained in a distributed fashion on a single-node with 4 16GB Nvidia V100s using π€ Transformers, π€ Tokenizers, the π€ Trainer, and the Apache (Py)Spark |
|
`TorchDistributor` class. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** We use FP32 precision, as follows immediately from the precision inhereted for the original "FacebookAI/roberta-base" model. |
|
|
|
|
|
## Evaluation / Metrics |
|
|
|
We evaluated this quick model using the combined π€ Evaluate library, which included a bug that required a necessary |
|
[workaround](https://github.com/johngrahamreynolds/FixedMetricsForHF) for expedited evaluation. |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
We configured a train/test split using the standard 80/20 rule of thumb on the shuffled UC Irvine data set. The dataset [model card](https://huggingface.co/datasets/MarioBarbeque/UCI_drug_reviews) |
|
contains in its base form a `DataDict` with splits for train, validation, and test. The dataset used for testing can be found there in the test split. |
|
|
|
|
|
### Results |
|
|
|
We find the following modest metrics: |
|
|
|
| metric | value | |
|
|--------|--------| |
|
|f1 | 0.714 | |
|
|accuracy | 0.745 | |
|
|recall | 0.746 | |
|
|precision | 0.749 | |
|
|
|
#### Summary |
|
|
|
As dicussed initially, this model was trained and introduced with a main goal of introducing ourselves to the π€ ecosystem. The model results have not be very rigorously |
|
improved from the initial training as would be standard in a production grade model. We look forward to introducing rigorously trained models in the near future with |
|
this foundation under our feet. |
|
|
|
## Environmental Impact |
|
|
|
- **Hardware Type:** Nvidia Tesla V100-SXM2-16GB |
|
- **Hours used:** .5 |
|
- **Cloud Provider:** Microsoft Azure |
|
- **Compute Region:** EastUS |
|
- **Carbon Emitted:** 0.05 kgCO2 |
|
|
|
|
|
Experiments were conducted using Azure in region eastus, which has a carbon efficiency of 0.37 kgCO2/kWh. A cumulative of 0.5 hours of computation was performed on |
|
hardware of type Tesla V100-SXM2-16GB (TDP of 250W). |
|
|
|
Total emissions are estimated to be 0.05 kgCO2 of which 100 percents were directly offset by the cloud provider. |
|
|
|
Estimations were conducted using the MachineLearning Impact calculator presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
#### Hardware |
|
|
|
The model was trained in a distributed fashion using a single node with 4 16GB Nvidia V100s for a little more than 2 GPU Hours. |
|
|
|
#### Software |
|
|
|
As discussed above, we propose a solution to a set of longstanding issues in the π€ Evaluate library. While awaiting review on our proposal, we temporarily define a new |
|
set of evaluation metrics by subclassing the π€ Evaluate `Metric` to introduce more general multilabel classification accuracy, precision, f1, and recall metrics. |
|
|
|
Training utilized PyTorch, Apache Spark, π€ Transformers, π€ Tokenizers, π€ Evaluate, π€ Datasets, and more in an Azure Databricks execution environment. |
|
|
|
#### Citations |
|
|
|
@online{MarioBbqF1, |
|
author = {John Graham Reynolds aka @MarioBarbeque}, |
|
title = {{Fixed F1 Hugging Face Metric}, |
|
year = 2024, |
|
url = {https://huggingface.co/spaces/MarioBarbeque/FixedF1}, |
|
urldate = {2024-11-5} |
|
} |
|
|
|
@online{MarioBbqPrec, |
|
author = {John Graham Reynolds aka @MarioBarbeque}, |
|
title = {{Fixed Precision Hugging Face Metric}, |
|
year = 2024, |
|
url = {https://huggingface.co/spaces/MarioBarbeque/FixedPrecision}, |
|
urldate = {2024-11-6} |
|
} |
|
|
|
@online{MarioBbqRec, |
|
author = {John Graham Reynolds aka @MarioBarbeque}, |
|
title = {{Fixed Recall Hugging Face Metric}, |
|
year = 2024, |
|
url = {https://huggingface.co/spaces/MarioBarbeque/FixedRecall}, |
|
urldate = {2024-11-6} |
|
} |
|
|
|
@article{lacoste2019quantifying, |
|
title={Quantifying the Carbon Emissions of Machine Learning}, |
|
author={Lacoste, Alexandre and Luccioni, Alexandra and Schmidt, Victor and Dandres, Thomas}, |
|
journal={arXiv preprint arXiv:1910.09700}, |
|
year={2019} |
|
} |
|
|