File size: 2,936 Bytes
2c9d5de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d1ecfe
2c9d5de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2f0ac5d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2c9d5de
 
 
2f0ac5d
 
2c9d5de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
license: apache-2.0
base_model: albert-base-v2
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: training_dir
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# training_dir

This model is a fine-tuned version of [albert-base-v2](https://huggingface.co/albert-base-v2) on an [Spam Data Collection](https://www.kaggle.com/datasets/abhishek14398/sms-spam-collection) dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0393
- Accuracy: 0.9946
- F1 Score: 0.9946

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

SMS 1:

Message: Hey, I'll be there in 10 minutes. See you soon!

Label: label_0 (ham)

SMS 2:

Message: Congratulations! You've won a $1000 gift card. Claim it now by clicking the link.

Label: label_1 (spam)

In this SMS classification example, the first message is labeled as "label_0" because it appears to be a legitimate text message (ham) with someone informing they will arrive shortly. 
The second message is labeled as "label_1" because it is clearly spam, offering a prize and urging the recipient to click a link, which is a common characteristic of spam messages. 
The classification model uses these labels to identify and filter out spammy SMS messages, ensuring that legitimate messages reach the user's inbox (ham).

## Training procedure

[Colab](https://colab.research.google.com/drive/1aCE5jBRlqN7KKBIuEjQ40mx3eOzPEfBd?usp=sharing)

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Score |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|
| No log        | 1.0   | 244  | 0.1070          | 0.9785   | 0.9785   |
| No log        | 2.0   | 488  | 0.0673          | 0.9880   | 0.9880   |
| 0.0885        | 3.0   | 732  | 0.0293          | 0.9946   | 0.9946   |
| 0.0885        | 4.0   | 976  | 0.0280          | 0.9964   | 0.9964   |
| 0.0306        | 5.0   | 1220 | 0.0355          | 0.9952   | 0.9952   |
| 0.0306        | 6.0   | 1464 | 0.0364          | 0.9952   | 0.9952   |
| 0.0087        | 7.0   | 1708 | 0.0448          | 0.9946   | 0.9946   |
| 0.0087        | 8.0   | 1952 | 0.0618          | 0.9922   | 0.9922   |
| 0.0047        | 9.0   | 2196 | 0.0420          | 0.9946   | 0.9946   |
| 0.0047        | 10.0  | 2440 | 0.0393          | 0.9946   | 0.9946   |


### Framework versions

- Transformers 4.33.2
- Pytorch 2.0.1+cu118
- Datasets 2.14.5
- Tokenizers 0.13.3