--- license: apache-2.0 base_model: bert-base-uncased tags: - generated_from_trainer - medical - biology - text-classification - multiclass classification - pathologies - illness - diagnose metrics: - accuracy - precision - recall - f1 model-index: - name: bert-drug-review-to-condition results: [] datasets: - Zakia/drugscom_reviews language: - en --- # bert-drug-review-to-condition This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on this dataset: Kallumadi,Surya and Grer,Felix. (2018). Drug Reviews (Drugs.com). UCI Machine Learning Repository. https://doi.org/10.24432/C5SK5S. It achieves the following results on the evaluation set: - Loss: 0.6678 - Accuracy: 0.8376 - Precision: 0.8325 - Recall: 0.8376 - F1: 0.8317 ## Model description "bert-base-uncased" fine-tuned for text-classification (multiclass): from input text, the model outputs the most likely medical pathology of the person. Training based on predicting 'condition' feature from 'review' feature (i.e., the person reviews the drugs they are taking for their condition) ## Intended uses & limitations Personal project ## Training and evaluation data The 100 most frequent conditions of the dataset are selected: {0: 'multiple sclerosis', 1: 'overactive bladde', 2: 'hyperhidrosis', 3: 'ibromyalgia', 4: 'menstrual disorders', 5: 'hypogonadism, male', 6: 'rosacea', 7: 'muscle spasm', 8: 'high blood pressure', 9: 'epilepsy', 10: 'psoriatic arthritis', 11: 'post traumatic stress disorde', 12: 'smoking cessation', 13: 'not listed / othe', 14: 'herpes simplex', 15: 'opiate dependence', 16: 'social anxiety disorde', 17: 'urticaria', 18: 'allergic rhinitis', 19: 'polycystic ovary syndrome', 20: 'obsessive compulsive disorde', 21: 'depression', 22: 'migraine prevention', 23: 'neuropathic pain', 24: 'ankylosing spondylitis', 25: 'skin or soft tissue infection', 26: 'constipation, drug induced', 27: 'obesity', 28: 'vaginal yeast infection', 29: 'osteoarthritis', 30: 'restless legs syndrome', 31: 'plaque psoriasis', 32: 'panic disorde', 33: 'abnormal uterine bleeding', 34: 'adhd', 35: 'high cholesterol', 36: 'diabetes, type 2', 37: 'anxiety and stress', 38: 'asthma, maintenance', 39: 'pneumonia', 40: 'schizophrenia', 41: 'opiate withdrawal', 42: 'osteoporosis', 43: 'influenza', 44: 'weight loss', 45: 'cough and nasal congestion', 46: 'birth control', 47: 'benign prostatic hyperplasia', 48: 'helicobacter pylori infection', 49: 'anxiety', 50: 'bronchitis', 51: 'rheumatoid arthritis', 52: 'narcolepsy', 53: 'generalized anxiety disorde', 54: 'insomnia', 55: 'nasal congestion', 56: 'major depressive disorde', 57: 'schizoaffective disorde', 58: 'psoriasis', 59: 'premenstrual dysphoric disorde', 60: 'bacterial vaginitis', 61: 'motion sickness', 62: 'erectile dysfunction', 63: 'constipation, chronic', 64: 'copd, maintenance', 65: 'back pain', 66: 'alcohol dependence', 67: 'migraine', 68: 'bladder infection', 69: 'underactive thyroid', 70: 'ulcerative colitis', 71: 'chronic pain', 72: 'hiv infection', 73: 'cold sores', 74: 'breast cance', 75: 'bipolar disorde', 76: 'irritable bowel syndrome', 77: 'anesthesia', 78: 'onychomycosis, toenail', 79: 'chlamydia infection', 80: 'gerd', 81: 'endometriosis', 82: 'seizures', 83: 'alcohol withdrawal', 84: 'bowel preparation', 85: 'hot flashes', 86: 'bacterial infection', 87: 'inflammatory conditions', 88: 'constipation', 89: 'headache', 90: 'urinary tract infection', 91: 'sinusitis', 92: 'emergency contraception', 93: 'cough', 94: 'acne', 95: 'atrial fibrillation', 96: 'pain', 97: 'nausea/vomiting', 98: 'hepatitis c', 99: 'postmenopausal symptoms'} The 'review' feature is lowercased and are only selected examples with more than 16 characters. ## Training procedure See code available at: https://github.com/mlafuentem/Marcuswas-bert-drug-review-to-condition/blob/main/Exercise_classification_conditions_code.ipynb ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 3.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 | |:-------------:|:-----:|:-----:|:---------------:|:--------:|:---------:|:------:|:------:| | 0.8469 | 1.0 | 13390 | 0.8275 | 0.7673 | 0.7686 | 0.7673 | 0.7551 | | 0.6319 | 2.0 | 26780 | 0.6895 | 0.8094 | 0.8090 | 0.8094 | 0.7978 | | 0.4116 | 3.0 | 40170 | 0.6678 | 0.8376 | 0.8325 | 0.8376 | 0.8317 | ### Framework versions - Transformers 4.40.0 - Pytorch 2.2.1+cu121 - Datasets 2.19.0 - Tokenizers 0.19.1