lloydmeta
/

drug-bert

Text Classification

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

drug-bert / README.md

lloydmeta's picture

Upload BertForSequenceClassification

8156326 verified 10 months ago

|

history blame contribute delete

3 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- drugs
	- classification
	- bert
	datasets:
	- lloydmeta/drug_dataset_cleaned
	widget:
	- text: 'I have been taking ambien or zolphidem for almost 15 years. '
	example_title: Drugs for insomnia
	---

	# Model Card for drug-BERT

	This is a multiclass classification model, built on top of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased), trained on
	the [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), and
	is useful for making a best attempt classification for the _condition_ someone has, based on their review of a drug.


	## Model Details

	### Model Description

	This is a multiclass classification model, built on top of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased), trained on
	the [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), and
	is useful for making a best attempt classification for the _condition_ someone has, based on their review of a drug.

	It was created as a learning exercise covering:
	* Colab
	* Transformer architecture
	* Finetuning/training on top of existing NLP models
	* Huggingface libraries


	Developed by: [lloydmeta](http://github.com/lloydmeta) of [beachape.com](https://beachape.com)
	License: Apache 2.0
	Finetuned from model: [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)

	## Uses

	Classifying (identifying) the _condition_ someone has, based on their review of a drug.

	### Out-of-Scope Use

	Actual, clinical diagnosis.

	## Bias, Risks, and Limitations

	* Biases from the base `bert-base-uncased` model apply here
	* Only drugs and conditions in the drugs review dataset are included

	## How to Get Started with the Model

	```python
	from transformers import pipeline

	condition_from_drug_review_classifier = pipeline("text-classification", model = "lloydmeta/drug-bert")
	text_sentiment = "I have been taking ambien or zolphidem for almost 15 years."
	condition_from_drug_review_classifier(text_sentiment)
	```

	## Training Details

	### Training Data

	* Trained on [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), cleaned up by removing html tags from reviews, with
	samples that lacked `condition` removed.
	* 60% of the data set was split for training data.
	* Irrelevant columns like `patient_id`, `drugName`, `rating`, `date`, etc were removed


	### Training Procedure

	* `review` data was tokenised with a max of 512
	* Learning rate: 2e-5
	* Epochs: 3
	* Weight decay: 0.01
	* Per device train batch size: 4


	## Evaluation

	15% of the data set was split for evaluation.

	### Testing Data, Factors & Metrics

	#### Testing Data

	25% of the data set was split for testing.

	## Model Card Authors

	[lloydmeta](http://github.com/lloydmeta)

	## Model Card Contact

	[lloydmeta](http://github.com/lloydmeta)