--- license: apache-2.0 datasets: - Zakia/drugscom_reviews language: - en metrics: - accuracy library_name: transformers pipeline_tag: text-classification tags: - health - medicine - patient reviews - drug reviews - depression - text classification widgets: - text_classification: - text: "This medication has changed my life for the better. I've experienced no side effects and my symptoms of depression have significantly decreased." - text: "I've had a terrible experience with this medication. It made me feel nauseous and I didn't notice any improvement in my condition." --- # Model Card for Zakia/distilbert-drugscom_depression_reviews This model is a DistilBERT-based classifier fine-tuned on drug reviews for the depression medical condition from Drugs.com. The dataset used for fine-tuning is the [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) dataset, which is filtered for the condition 'Depression'. The base model for fine-tuning was the [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased). ## Model Details ### Model Description - Developed by: Zakia - Model type: Text Classification - Language(s) (NLP): English - License: Apache 2.0 - Finetuned from model: distilbert-base-uncased ## Uses ### Direct Use This model is intended to classify drug reviews into high or low quality, aiding in the analysis of patient feedback on depression medications. ### Out-of-Scope Use This model is not designed to diagnose or treat depression or to replace professional medical advice. ## Bias, Risks, and Limitations The model may inherit biases present in the dataset and should not be used as the sole decision-maker for healthcare or treatment options. ### Recommendations Use the model as a tool to support, not replace, professional judgment. ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch.nn.functional as F model_name = "Zakia/distilbert-drugscom_depression_reviews" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) # Define a function to print predictions with labels def print_predictions(review_text, model, tokenizer): inputs = tokenizer(review_text, return_tensors="pt") outputs = model(**inputs) predictions = F.softmax(outputs.logits, dim=-1) # LABEL_0 is for low quality and LABEL_1 for high quality print(f"Review: \"{review_text}\"") print(f"Prediction: {{'LABEL_0 (Low quality)': {predictions[0][0].item():.4f}, 'LABEL_1 (High quality)': {predictions[0][1].item():.4f}}}\n") # High quality review example high_quality_review = "This medication has changed my life for the better. I've experienced no side effects and my symptoms of depression have significantly decreased." print_predictions(high_quality_review, model, tokenizer) # Low quality review example low_quality_review = "I've had a terrible experience with this medication. It made me feel nauseous and I didn't notice any improvement in my condition." print_predictions(low_quality_review, model, tokenizer) ``` ## Training Details ### Training Data The model was fine-tuned on a dataset of drug reviews specifically related to depression, filtered from Drugs.com. This dataset is accessible from [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) on Hugging Face datasets (condition = 'Depression') for 'train'. Number of records in train dataset: 9069 rows. ### Training Procedure #### Preprocessing The reviews were cleaned and preprocessed to remove quotes, HTML tags and decode HTML entities. A new column called 'high_quality_review' was also added to the reviews. 'high_quality_review' was computed as 1 if rating > 5 (positive rating) and usefulCount > the 75th percentile of usefulCount (65) or 0, otherwise. Train dataset high_quality_review counts: Counter({0: 6949, 1: 2120}) Then: This training data was balanced by downsampling low quality reviews (high_quality_review = 0). The final training data had 4240 rows of reviews: Train dataset high_quality_review counts: Counter({0: 2120, 1: 2120}) #### Training Hyperparameters - Learning Rate: 3e-5 - Batch Size: 16 - Epochs: 1 ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The model was tested on a dataset of drug reviews specifically related to depression, filtered from Drugs.com. This dataset is accessible from [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) on Hugging Face datasets (condition = 'Depression') for 'test'. Number of records in test dataset: 3095 rows. #### Preprocessing The reviews were cleaned and preprocessed to remove quotes, HTML tags and decode HTML entities. A new column called 'high_quality_review' was also added to the reviews. 'high_quality_review' was computed as 1 if rating > 5 (positive rating) and usefulCount > the 75th percentile of usefulCount (65) or 0, otherwise. Note: the 75th percentile of usefulCount is based on the train dataset. Test dataset high_quality_review counts: Counter({0: 2365, 1: 730}) #### Metrics The model's performance was evaluated based on accuracy. ### Results The fine-tuning process yielded the following results: | Epoch | Training Loss | Validation Loss | Accuracy | |-------|---------------|-----------------|----------| | 1 | 0.38 | 0.80 | 0.77 | The model demonstrates its capability to classify drug reviews as high or low quality with an accuracy of 77%. Low Quality: high_quality_review=0 High Quality: high_quality_review=1 ## Technical Specifications ### Model Architecture and Objective DistilBERT model architecture was used, with a binary classification head for high and low quality review classification. ### Compute Infrastructure The model was trained using a T4 GPU on Google Colab. #### Hardware T4 GPU via Google Colab. ## Citation If you use this model, please cite the original DistilBERT paper: **BibTeX:** ```bibtex @article{sanh2019distilbert, title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas}, journal={arXiv preprint arXiv:1910.01108}, year={2019} } ``` **APA:** Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. ## Glossary - Low Quality Review: high_quality_review=0 - High Quality Review: high_quality_review=1 ## More Information For further queries or issues with the model, please use the [discussions section on this model's Hugging Face page](https://huggingface.co/Zakia/distilbert-drugscom_depression_reviews/discussions). ## Model Card Authors - Zakia ## Model Card Contact For more information or inquiries regarding this model, please use the [discussions section on this model's Hugging Face page](https://huggingface.co/Zakia/distilbert-drugscom_depression_reviews/discussions).