Zakia commited on
Commit
b12cad8
1 Parent(s): 01897dd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +193 -0
README.md ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Zakia/drugscom_reviews
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ library_name: transformers
10
+ pipeline_tag: text-classification
11
+ tags:
12
+ - health
13
+ - medicine
14
+ - patient reviews
15
+ - drug reviews
16
+ - depression
17
+ - text classification
18
+ ---
19
+
20
+ # Model Card for Zakia/distilbert-drugscom_depression_reviews
21
+
22
+ This model is a DistilBERT-based classifier fine-tuned on drug reviews for the depression medical condition from Drugs.com.
23
+ The dataset used for fine-tuning is the [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) dataset, which is filtered for the condition 'Depression'.
24
+ The base model for fine-tuning was the [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased).
25
+
26
+ ## Model Details
27
+
28
+ ### Model Description
29
+
30
+ - **Developed by:*Zakia*
31
+ - **Model type:*Text Classification*
32
+ - **Language(s) (NLP):*English*
33
+ - **License:*Apache 2.0*
34
+ - **Finetuned from model:*distilbert-base-uncased*
35
+
36
+ ## Uses
37
+
38
+ ### Direct Use
39
+
40
+ This model is intended to classify drug reviews into high or low quality, aiding in the analysis of patient feedback on depression medications.
41
+
42
+ ### Out-of-Scope Use
43
+
44
+ This model is not designed to diagnose or treat depression or to replace professional medical advice.
45
+
46
+ ## Bias, Risks, and Limitations
47
+
48
+ The model may inherit biases present in the dataset and should not be used as the sole decision-maker for healthcare or treatment options.
49
+
50
+ ### Recommendations
51
+
52
+ Use the model as a tool to support, not replace, professional judgment.
53
+
54
+ ## How to Get Started with the Model
55
+
56
+ Use the code below to get started with the model.
57
+
58
+ ```python
59
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
60
+ import torch.nn.functional as F
61
+
62
+ model_name = "Zakia/distilbert-drugscom_depression_reviews"
63
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
64
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
65
+
66
+ # Define a function to print predictions with labels
67
+ def print_predictions(review_text, model, tokenizer):
68
+ inputs = tokenizer(review_text, return_tensors="pt")
69
+ outputs = model(**inputs)
70
+ predictions = F.softmax(outputs.logits, dim=-1)
71
+ # LABEL_0 is for low quality and LABEL_1 for high quality
72
+ print(f"Review: \"{review_text}\"")
73
+ print(f"Prediction: {{'LABEL_0 (Low quality)': {predictions[0][0].item():.4f}, 'LABEL_1 (High quality)': {predictions[0][1].item():.4f}}}\n")
74
+
75
+ # High quality review example
76
+ high_quality_review = "This medication has changed my life for the better. I've experienced no side effects and my symptoms of depression have significantly decreased."
77
+ print_predictions(high_quality_review, model, tokenizer)
78
+
79
+ # Low quality review example
80
+ low_quality_review = "I've had a terrible experience with this medication. It made me feel nauseous and I didn't notice any improvement in my condition."
81
+ print_predictions(low_quality_review, model, tokenizer)
82
+ ```
83
+
84
+ ## Training Details
85
+
86
+ ### Training Data
87
+
88
+ The model was fine-tuned on a dataset of drug reviews specifically related to depression, filtered from Drugs.com.
89
+ This dataset is accessible from [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) on Hugging Face datasets (condition = 'Depression') for 'train'.
90
+ Number of records in train dataset: 9069 rows.
91
+
92
+ ### Training Procedure
93
+
94
+ #### Preprocessing
95
+
96
+ The reviews were cleaned and preprocessed to remove quotes, HTML tags and decode HTML entities.
97
+ A new column called 'high_quality_review' was also added to the reviews.
98
+ 'high_quality_review' was computed as 1 if rating > 5 (positive rating) and usefulCount > the 75th percentile of usefulCount (65) or 0, otherwise.
99
+ Train dataset high_quality_review counts: Counter({0: 6949, 1: 2120})
100
+ Then:
101
+ This training data was balanced by downsampling low quality reviews (high_quality_review = 0).
102
+ The final training data had 4240 rows of reviews:
103
+ Train dataset high_quality_review counts: Counter({0: 2120, 1: 2120})
104
+
105
+ #### Training Hyperparameters
106
+
107
+ - **Learning Rate: *3e-5*
108
+ - **Batch Size:*16*
109
+ - **Epochs:*1*
110
+
111
+ ## Evaluation
112
+
113
+ ### Testing Data, Factors & Metrics
114
+
115
+ #### Testing Data
116
+
117
+ The model was tested on a dataset of drug reviews specifically related to depression, filtered from Drugs.com.
118
+ This dataset is accessible from [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) on Hugging Face datasets (condition = 'Depression') for 'test'.
119
+ Number of records in test dataset: 3095 rows.
120
+
121
+ #### Preprocessing
122
+
123
+ The reviews were cleaned and preprocessed to remove quotes, HTML tags and decode HTML entities.
124
+ A new column called 'high_quality_review' was also added to the reviews.
125
+ 'high_quality_review' was computed as 1 if rating > 5 (positive rating) and usefulCount > the 75th percentile of usefulCount (65) or 0, otherwise.
126
+ Note: the 75th percentile of usefulCount is based on the train dataset.
127
+ Test dataset high_quality_review counts: Counter({0: 2365, 1: 730})
128
+
129
+ #### Metrics
130
+
131
+ The model's performance was evaluated based on accuracy.
132
+
133
+ ### Results
134
+
135
+ The fine-tuning process yielded the following results:
136
+
137
+ | Epoch | Training Loss | Validation Loss | Accuracy |
138
+ |-------|---------------|-----------------|----------|
139
+ | 1 | 0.38 | 0.80 | 0.77 |
140
+
141
+ The model demonstrates its capability to classify drug reviews as high or low quality with an accuracy of 77%.
142
+ Low Quality: high_quality_review=0
143
+ High Quality: high_quality_review=1
144
+
145
+ ## Technical Specifications
146
+
147
+ ### Model Architecture and Objective
148
+
149
+ DistilBERT model architecture was used, with a binary classification head for high and low quality review classification.
150
+
151
+ ### Compute Infrastructure
152
+
153
+ The model was trained using a T4 GPU on Google Colab.
154
+
155
+ #### Hardware
156
+
157
+ T4 GPU via Google Colab.
158
+
159
+ ## Citation
160
+
161
+ If you use this model, please cite the original DistilBERT paper:
162
+
163
+ **BibTeX:**
164
+
165
+ ```bibtex
166
+ @article{sanh2019distilbert,
167
+ title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
168
+ author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
169
+ journal={arXiv preprint arXiv:1910.01108},
170
+ year={2019}
171
+ }
172
+ ```
173
+ **APA:**
174
+
175
+ Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
176
+
177
+ ## Glossary
178
+
179
+ - **Low Quality Review: *high_quality_review=0*
180
+ - **High Quality Review:*high_quality_review=1*
181
+
182
+ ## More Information
183
+
184
+ For further queries or issues with the model, please use the [discussions section on this model's Hugging Face page](https://huggingface.co/Zakia/distilbert-drugscom_depression_reviews/discussions).
185
+
186
+
187
+ ## Model Card Authors
188
+
189
+ - Zakia
190
+
191
+ ## Model Card Contact
192
+
193
+ For more information or inquiries regarding this model, please use the [discussions section on this model's Hugging Face page](https://huggingface.co/Zakia/distilbert-drugscom_depression_reviews/discussions).