HamSpamBERT

This model is a fine-tuned version of bert-base-uncased on Spam-Ham dataset. It achieves the following results on the evaluation set:

Loss: 0.0072
Accuracy: 0.9991
Precision: 1.0
Recall: 0.9933
F1: 0.9966

from transformers import pipeline, BertTokenizer, BertForSequenceClassification

tokenizer = BertTokenizer.from_pretrained("udit-k/HamSpamBERT")
model = BertForSequenceClassification.from_pretrained("udit-k/HamSpamBERT")

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
print(classifier("Call this number to win FREE IPL FINAL tickets!!!"))
print(classifier("Call me when you reach home :)"))

[{'label': 'LABEL_1', 'score': 0.9999189376831055}]
[{'label': 'LABEL_0', 'score': 0.9999370574951172}]

Model description

This model is a fine-tuned version of the BERT model on Spam-Ham dataset to improve the performance of sentiment analysis on Spam Detection tasks.

LABEL_0 = Ham (Not spam)
LABEL_1 = Spam

Intended uses & limitations

This model can be used to detect spam texts. The primary limitation of this model is that it was trained on a corpus of about 4700 rows and evaluated on around 1200 rows.

Training and evaluation data

Training corpus = 80%
Evaluation corpus = 20%

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 7

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
No log	1.0	279	0.0492	0.9901	1.0	0.9262	0.9617
0.0635	2.0	558	0.0117	0.9982	1.0	0.9866	0.9932
0.0635	3.0	837	0.0120	0.9982	0.9933	0.9933	0.9933
0.0138	4.0	1116	0.0072	0.9991	1.0	0.9933	0.9966
0.0138	5.0	1395	0.0086	0.9982	0.9933	0.9933	0.9933
0.0007	6.0	1674	0.0090	0.9982	0.9933	0.9933	0.9933
0.0007	7.0	1953	0.0091	0.9982	0.9933	0.9933	0.9933

Framework versions

Transformers 4.30.0
Pytorch 2.1.2
Datasets 2.18.0
Tokenizers 0.13.3

udit-k
/

HamSpamBERT