added citations

4193fa8 verified 3 months ago

8.87 kB

	---
	base_model:
	- FacebookAI/roberta-base
	datasets:
	- MarioBarbeque/UCI_drug_reviews
	language:
	- en
	library_name: transformers
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	---

	# Model Card for Model ID

	We fine-tune the RoBERTa base model [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) for multi-label classification of medical conditions.


	## Model Details

	### Model Description


	The RoBERTa base model is fined-tuned in a quick fashion for the purpose of introducing ourselves to the entirety of the 🤗 ecosystem. We supervise a training of
	RoBERTa for the purpose of multi-label classification on [MarioBarbeque/UCI_drug_reviews](https://huggingface.co/datasets/MarioBarbeque/UCI_drug_reviews), an open source
	dateset available through the [UC Irvine ML Repository](https://archive.ics.uci.edu), that we downloaded and preprocessed. The model is trained to classify patient conditions
	based on the same patient's review of drugs they took as part of treatment.

	Subsequently, we evaluate our model by introducing a new set of metrics to address bugs found in
	the 🤗 Evaluate package. We construct the `FixedF1`, `FixedPrecision`, and `FixedRecall` evaluation metrics available
	[here](https://github.com/johngrahamreynolds/FixedMetricsForHF) as a simple workaround for a long-term issue related to 🤗 Evaluate's
	ability to `combine` various metrics for collective evaluation. These metrics subclass the `Metric` class from 🤗 Evaluate to generalize each of the `F1`,
	`Precision`, and `Recall` classes to allow for `combine`d multi-label classification. Without such a generalization, attempts to use the built-in classes raise an error
	when attempting to classify a non-binary 1 label.

	During the process of running into errors and debugging, we researched the underlying issue(s) and proposed a
	[plausible solution](https://github.com/huggingface/evaluate/issues/462#issuecomment-2448686687), awaiting repo owner review, that would close a set of longstanding open
	issues on the 🤗 Evaluate GitHub repo.



	- Developed by: John Graham Reynolds
	- Funded by: Vanderbilt University
	- Model type: Multi-label Text Classification
	- Language(s) (NLP): English
	- Finetuned from model: "FacebookAI/roberta-base"

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/johngrahamreynolds/RoBERTa-base-DReiFT

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use

	In order to query the model effectively, one must pass it a string detailing the review of a drug taken to address an underlying medical condition. The model will attempt
	to classify the medical condition based on its pre-trained knowledge of hundreds of thousands of total drug reviews for 805 medical conditions.

	## How to Use and Query the Model

	Use the code below to get started with the model. Users pass into the `drug_review` list a string detailing the review of some drug. The model will attempt
	to classify the condition for which the drug is being taken. Users are free to pass any string they like (relevant to a drug review or not), but the model has been trained
	specifically on drug reviews for the purpose of multi-label classification. It will output to the best of its ability a medical condition to which the string most relates
	as an extended non-trivial relation. See the example below:

	``` python

	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	model_name = "MarioBarbeque/RoBERTa-base-DReiFT"
	tokenizer_name = "FacebookAI/roberta-base"

	model = AutoModelForSequenceClassification.from_pretrained(model_name, device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

	# Pass a unique 'drug-review' to classify the underlying issue based upon 805 pretrained medical conditions
	drug_review = [
	"My tonsils were swollen and I had a hard time swallowing.
	I had a minimal fever to accompany the pain in my throat.
	Taking Aleve at regular intervals throughout the day improved my swallowing.
	I am now taking Aleve every 4 hours."
	]
	tokenized_review = tokenizer(drug_review, return_tensors="pt").to("cuda")

	output = model(**tokenized_review)
	label_id = torch.argmax(output.logits, dim=-1).item()
	predicted_label = model.config.id2label[label_id]
	print(f"The model predicted the underlying condition to be: {predicted_label}")

	```

	This code outputs the following:

	``` python
	The model predicted the underlying condition to be: tonsillitis/pharyngitis
	```


	## Training Details

	### Training Data / Preprocessing

	The data used comes from the UC Irvine Machine Learning Repository. It has been preprocessed to only contain reviews at least 13 or more words in length. The model card
	can be found [here](https://huggingface.co/datasets/MarioBarbeque/UCI_drug_reviews).

	### Training Procedure

	The model was trained in a distributed fashion on a single-node with 4 16GB Nvidia V100s using 🤗 Transformers, 🤗 Tokenizers, the 🤗 Trainer, and the Apache (Py)Spark
	`TorchDistributor` class.


	#### Training Hyperparameters

	- Training regime: We use FP32 precision, as follows immediately from the precision inhereted for the original "FacebookAI/roberta-base" model.


	## Evaluation / Metrics

	We evaluated this quick model using the combined 🤗 Evaluate library, which included a bug that required a necessary
	[workaround](https://github.com/johngrahamreynolds/FixedMetricsForHF) for expedited evaluation.


	### Testing Data, Factors & Metrics

	#### Testing Data

	We configured a train/test split using the standard 80/20 rule of thumb on the shuffled UC Irvine data set. The dataset [model card](https://huggingface.co/datasets/MarioBarbeque/UCI_drug_reviews)
	contains in its base form a `DataDict` with splits for train, validation, and test. The dataset used for testing can be found there in the test split.


	### Results

	We find the following modest metrics:

	\| metric \| value \|
	\|--------\|--------\|
	\|f1 \| 0.714 \|
	\|accuracy \| 0.745 \|
	\|recall \| 0.746 \|
	\|precision \| 0.749 \|

	#### Summary

	As dicussed initially, this model was trained and introduced with a main goal of introducing ourselves to the 🤗 ecosystem. The model results have not be very rigorously
	improved from the initial training as would be standard in a production grade model. We look forward to introducing rigorously trained models in the near future with
	this foundation under our feet.

	## Environmental Impact

	- Hardware Type: Nvidia Tesla V100-SXM2-16GB
	- Hours used: .5
	- Cloud Provider: Microsoft Azure
	- Compute Region: EastUS
	- Carbon Emitted: 0.05 kgCO2


	Experiments were conducted using Azure in region eastus, which has a carbon efficiency of 0.37 kgCO2/kWh. A cumulative of 0.5 hours of computation was performed on
	hardware of type Tesla V100-SXM2-16GB (TDP of 250W).

	Total emissions are estimated to be 0.05 kgCO2 of which 100 percents were directly offset by the cloud provider.

	Estimations were conducted using the MachineLearning Impact calculator presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	#### Hardware

	The model was trained in a distributed fashion using a single node with 4 16GB Nvidia V100s for a little more than 2 GPU Hours.

	#### Software

	As discussed above, we propose a solution to a set of longstanding issues in the 🤗 Evaluate library. While awaiting review on our proposal, we temporarily define a new
	set of evaluation metrics by subclassing the 🤗 Evaluate `Metric` to introduce more general multilabel classification accuracy, precision, f1, and recall metrics.

	Training utilized PyTorch, Apache Spark, 🤗 Transformers, 🤗 Tokenizers, 🤗 Evaluate, 🤗 Datasets, and more in an Azure Databricks execution environment.

	#### Citations

	@online{MarioBbqF1,
	author = {John Graham Reynolds aka @MarioBarbeque},
	title = {{Fixed F1 Hugging Face Metric},
	year = 2024,
	url = {https://huggingface.co/spaces/MarioBarbeque/FixedF1},
	urldate = {2024-11-5}
	}

	@online{MarioBbqPrec,
	author = {John Graham Reynolds aka @MarioBarbeque},
	title = {{Fixed Precision Hugging Face Metric},
	year = 2024,
	url = {https://huggingface.co/spaces/MarioBarbeque/FixedPrecision},
	urldate = {2024-11-6}
	}

	@online{MarioBbqRec,
	author = {John Graham Reynolds aka @MarioBarbeque},
	title = {{Fixed Recall Hugging Face Metric},
	year = 2024,
	url = {https://huggingface.co/spaces/MarioBarbeque/FixedRecall},
	urldate = {2024-11-6}
	}

	@article{lacoste2019quantifying,
	title={Quantifying the Carbon Emissions of Machine Learning},
	author={Lacoste, Alexandre and Luccioni, Alexandra and Schmidt, Victor and Dandres, Thomas},
	journal={arXiv preprint arXiv:1910.09700},
	year={2019}
	}