burkelive commited on
Commit
306c46e
1 Parent(s): 508ccbe

Model save

Browse files
README.md CHANGED
@@ -1,71 +1,91 @@
1
- ---
2
- license: apache-2.0
3
- base_model: distilbert-base-uncased
4
- tags:
5
- - generated_from_trainer
6
- model-index:
7
- - name: distilbert-base-uncased-pii-200
8
- results: []
9
- ---
10
-
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
-
14
- # distilbert-base-uncased-pii-200
15
-
16
- This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the None dataset.
17
- It achieves the following results on the evaluation set:
18
- - Loss: 1.5384
19
- - Overall Precision: 0.0
20
- - Overall Recall: 0.0
21
- - Overall F1: 0.0
22
- - Overall Accuracy: 0.8065
23
- - 0 F1: 0.0
24
- - 100 F1: 0.0
25
- - F1: 0.0
26
-
27
- ## Model description
28
-
29
- More information needed
30
-
31
- ## Intended uses & limitations
32
-
33
- More information needed
34
-
35
- ## Training and evaluation data
36
-
37
- More information needed
38
-
39
- ## Training procedure
40
-
41
- ### Training hyperparameters
42
-
43
- The following hyperparameters were used during training:
44
- - learning_rate: 5e-05
45
- - train_batch_size: 32
46
- - eval_batch_size: 32
47
- - seed: 42
48
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
- - lr_scheduler_type: linear
50
- - lr_scheduler_warmup_ratio: 0.2
51
- - num_epochs: 7
52
-
53
- ### Training results
54
-
55
- | Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy | 0 F1 | 1 F1 | 100 F1 | 2 F1 | 3 F1 | 5 F1 | 6 F1 | F1 |
56
- |:-------------:|:-----:|:----:|:---------------:|:-----------------:|:--------------:|:----------:|:----------------:|:----:|:----:|:------:|:----:|:----:|:----:|:----:|:----:|
57
- | No log | 1.0 | 1 | 2.8031 | 0.0 | 0.0 | 0.0 | 0.1935 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
58
- | No log | 2.0 | 2 | 2.6237 | 0.0 | 0.0 | 0.0 | 0.6129 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
59
- | No log | 3.0 | 3 | 2.2814 | 0.0 | 0.0 | 0.0 | 0.7742 | 0.0 | 0.0 | 0.0 | 0.0 |
60
- | No log | 4.0 | 4 | 2.0014 | 0.0 | 0.0 | 0.0 | 0.7903 | 0.0 | 0.0 | 0.0 | 0.0 |
61
- | No log | 5.0 | 5 | 1.7758 | 0.0 | 0.0 | 0.0 | 0.8065 | 0.0 | 0.0 | 0.0 |
62
- | No log | 6.0 | 6 | 1.6176 | 0.0 | 0.0 | 0.0 | 0.8065 | 0.0 | 0.0 | 0.0 |
63
- | No log | 7.0 | 7 | 1.5384 | 0.0 | 0.0 | 0.0 | 0.8065 | 0.0 | 0.0 | 0.0 |
64
-
65
-
66
- ### Framework versions
67
-
68
- - Transformers 4.40.1
69
- - Pytorch 2.2.2+cpu
70
- - Datasets 2.19.0
71
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: distilbert/distilbert-base-uncased
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: distilbert-base-uncased-pii-200
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # distilbert-base-uncased-pii-200
15
+
16
+ This model is a fine-tuned version of [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) on the None dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.0786
19
+ - Overall Precision: 0.9472
20
+ - Overall Recall: 0.9567
21
+ - Overall F1: 0.9519
22
+ - Overall Accuracy: 0.9678
23
+ - 0 F1: 0.8918
24
+ - 00 F1: 0.9351
25
+ - 01 F1: 0.2727
26
+ - 02 F1: 0.3439
27
+ - 03 F1: 0.9481
28
+ - 04 F1: 0.8169
29
+ - 05 F1: 0.8037
30
+ - 06 F1: 0.8732
31
+ - 07 F1: 0.8910
32
+ - 08 F1: 0.9636
33
+ - 09 F1: 0.9077
34
+ - 1 F1: 0.9461
35
+ - 10 F1: 0.0
36
+ - 100 F1: 0.9788
37
+ - 2 F1: 0.9052
38
+ - 3 F1: 0.9488
39
+ - 4 F1: 0.9129
40
+ - 5 F1: 0.9431
41
+ - 6 F1: 0.9765
42
+ - 7 F1: 0.9618
43
+ - 8 F1: 0.9574
44
+ - 9 F1: 0.9131
45
+ - F1: 0.9659
46
+
47
+ ## Model description
48
+
49
+ More information needed
50
+
51
+ ## Intended uses & limitations
52
+
53
+ More information needed
54
+
55
+ ## Training and evaluation data
56
+
57
+ More information needed
58
+
59
+ ## Training procedure
60
+
61
+ ### Training hyperparameters
62
+
63
+ The following hyperparameters were used during training:
64
+ - learning_rate: 5e-05
65
+ - train_batch_size: 32
66
+ - eval_batch_size: 32
67
+ - seed: 42
68
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
69
+ - lr_scheduler_type: linear
70
+ - lr_scheduler_warmup_ratio: 0.2
71
+ - num_epochs: 7
72
+
73
+ ### Training results
74
+
75
+ | Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy | 0 F1 | 00 F1 | 01 F1 | 02 F1 | 03 F1 | 04 F1 | 05 F1 | 06 F1 | 07 F1 | 08 F1 | 09 F1 | 1 F1 | 10 F1 | 100 F1 | 2 F1 | 3 F1 | 4 F1 | 5 F1 | 6 F1 | 7 F1 | 8 F1 | 9 F1 | F1 |
76
+ |:-------------:|:-----:|:----:|:---------------:|:-----------------:|:--------------:|:----------:|:----------------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:-----:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
77
+ | 0.2545 | 1.0 | 1088 | 0.1255 | 0.9224 | 0.9142 | 0.9182 | 0.9575 | 0.8578 | 0.9054 | 0.0 | 0.0 | 0.7402 | 0.6939 | 0.6694 | 0.3099 | 0.1647 | 0.0 | 0.9048 | 0.9171 | 0.0 | 0.9609 | 0.9003 | 0.9280 | 0.8847 | 0.9121 | 0.9371 | 0.9085 | 0.8524 | 0.8536 | 0.9117 |
78
+ | 0.092 | 2.0 | 2176 | 0.0819 | 0.9439 | 0.9521 | 0.9480 | 0.9657 | 0.8955 | 0.9548 | 0.4305 | 0.4601 | 0.9635 | 0.7525 | 0.5925 | 0.8138 | 0.8468 | 0.9455 | 0.9291 | 0.9426 | 0.0 | 0.9756 | 0.9291 | 0.9466 | 0.9122 | 0.9362 | 0.9687 | 0.9532 | 0.9446 | 0.9067 | 0.9623 |
79
+ | 0.0716 | 3.0 | 3264 | 0.0786 | 0.9472 | 0.9567 | 0.9519 | 0.9678 | 0.8918 | 0.9351 | 0.2727 | 0.3439 | 0.9481 | 0.8169 | 0.8037 | 0.8732 | 0.8910 | 0.9636 | 0.9077 | 0.9461 | 0.0 | 0.9788 | 0.9052 | 0.9488 | 0.9129 | 0.9431 | 0.9765 | 0.9618 | 0.9574 | 0.9131 | 0.9659 |
80
+ | 0.0575 | 4.0 | 4352 | 0.0808 | 0.9501 | 0.9577 | 0.9539 | 0.9673 | 0.8882 | 0.9751 | 0.4669 | 0.3951 | 0.9781 | 0.8206 | 0.8034 | 0.8941 | 0.9196 | 0.9550 | 0.9508 | 0.9438 | 0.0 | 0.9800 | 0.9068 | 0.9545 | 0.9235 | 0.9503 | 0.9744 | 0.9626 | 0.9624 | 0.9086 | 0.9674 |
81
+ | 0.0463 | 5.0 | 5440 | 0.0801 | 0.9559 | 0.9604 | 0.9581 | 0.9693 | 0.9050 | 0.9634 | 0.4693 | 0.4950 | 0.9781 | 0.8 | 0.7726 | 0.9006 | 0.9211 | 0.9636 | 0.9291 | 0.9506 | 0.0 | 0.9814 | 0.9328 | 0.9549 | 0.9278 | 0.9548 | 0.9766 | 0.9647 | 0.9624 | 0.9176 | 0.9707 |
82
+ | 0.0325 | 6.0 | 6528 | 0.1021 | 0.9559 | 0.9611 | 0.9585 | 0.9690 | 0.9019 | 0.9667 | 0.4477 | 0.4275 | 0.9781 | 0.7926 | 0.7870 | 0.9080 | 0.9457 | 0.9541 | 0.9431 | 0.9516 | 0.0 | 0.9820 | 0.9276 | 0.9583 | 0.9298 | 0.9577 | 0.9769 | 0.9654 | 0.9642 | 0.9196 | 0.9695 |
83
+ | 0.0159 | 7.0 | 7616 | 0.1300 | 0.9543 | 0.9601 | 0.9572 | 0.9673 | 0.8968 | 0.9642 | 0.4610 | 0.4408 | 0.9781 | 0.7788 | 0.7702 | 0.9096 | 0.9236 | 0.9550 | 0.9516 | 0.9484 | 0.0 | 0.9823 | 0.9185 | 0.9569 | 0.9273 | 0.9573 | 0.9774 | 0.9652 | 0.9667 | 0.9157 | 0.9706 |
84
+
85
+
86
+ ### Framework versions
87
+
88
+ - Transformers 4.42.4
89
+ - Pytorch 2.3.1+cu121
90
+ - Datasets 2.20.0
91
+ - Tokenizers 0.19.1
all_results.json CHANGED
@@ -1,20 +1,41 @@
1
  {
2
  "epoch": 7.0,
3
- "eval_100_f1": 0.0,
4
- "eval___f1": 0.0,
5
- "eval_loss": 1.1882282495498657,
6
- "eval_overall_accuracy": 0.8378378378378378,
7
- "eval_overall_f1": 0.0,
8
- "eval_overall_precision": 0.0,
9
- "eval_overall_recall": 0.0,
10
- "eval_runtime": 0.266,
11
- "eval_samples": 1,
12
- "eval_samples_per_second": 3.759,
13
- "eval_steps_per_second": 3.759,
14
- "total_flos": 228688400640.0,
15
- "train_loss": 1.8831114087785994,
16
- "train_runtime": 26.9803,
17
- "train_samples": 2,
18
- "train_samples_per_second": 0.519,
19
- "train_steps_per_second": 0.259
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  }
 
1
  {
2
  "epoch": 7.0,
3
+ "eval_00_f1": 0.9351351351351351,
4
+ "eval_01_f1": 0.2727272727272727,
5
+ "eval_02_f1": 0.3438914027149321,
6
+ "eval_03_f1": 0.9481481481481482,
7
+ "eval_04_f1": 0.8168701442841289,
8
+ "eval_05_f1": 0.8036951501154733,
9
+ "eval_06_f1": 0.8732394366197183,
10
+ "eval_07_f1": 0.8909657320872275,
11
+ "eval_08_f1": 0.9636363636363636,
12
+ "eval_09_f1": 0.9076923076923077,
13
+ "eval_0_f1": 0.8918362091166624,
14
+ "eval_100_f1": 0.9787716689913094,
15
+ "eval_10_f1": 0.0,
16
+ "eval_1_f1": 0.9460573633891765,
17
+ "eval_2_f1": 0.9052096569250319,
18
+ "eval_3_f1": 0.9488174195970466,
19
+ "eval_4_f1": 0.9129169464965301,
20
+ "eval_5_f1": 0.9431066419687748,
21
+ "eval_6_f1": 0.9764898851831602,
22
+ "eval_7_f1": 0.9617969579059075,
23
+ "eval_8_f1": 0.9573971403559964,
24
+ "eval_9_f1": 0.9131222981453074,
25
+ "eval___f1": 0.965883121123082,
26
+ "eval_loss": 0.07855656743049622,
27
+ "eval_overall_accuracy": 0.9678325102233014,
28
+ "eval_overall_f1": 0.9519101855680437,
29
+ "eval_overall_precision": 0.9471512280264306,
30
+ "eval_overall_recall": 0.9567172073342737,
31
+ "eval_runtime": 31.7972,
32
+ "eval_samples": 8700,
33
+ "eval_samples_per_second": 273.609,
34
+ "eval_steps_per_second": 8.554,
35
+ "total_flos": 6493939017778920.0,
36
+ "train_loss": 0.18183875429843152,
37
+ "train_runtime": 1332.3436,
38
+ "train_samples": 34796,
39
+ "train_samples_per_second": 182.815,
40
+ "train_steps_per_second": 5.716
41
  }
eval_results.json CHANGED
@@ -1,14 +1,35 @@
1
  {
2
  "epoch": 7.0,
3
- "eval_100_f1": 0.0,
4
- "eval___f1": 0.0,
5
- "eval_loss": 1.1882282495498657,
6
- "eval_overall_accuracy": 0.8378378378378378,
7
- "eval_overall_f1": 0.0,
8
- "eval_overall_precision": 0.0,
9
- "eval_overall_recall": 0.0,
10
- "eval_runtime": 0.266,
11
- "eval_samples": 1,
12
- "eval_samples_per_second": 3.759,
13
- "eval_steps_per_second": 3.759
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  }
 
1
  {
2
  "epoch": 7.0,
3
+ "eval_00_f1": 0.9351351351351351,
4
+ "eval_01_f1": 0.2727272727272727,
5
+ "eval_02_f1": 0.3438914027149321,
6
+ "eval_03_f1": 0.9481481481481482,
7
+ "eval_04_f1": 0.8168701442841289,
8
+ "eval_05_f1": 0.8036951501154733,
9
+ "eval_06_f1": 0.8732394366197183,
10
+ "eval_07_f1": 0.8909657320872275,
11
+ "eval_08_f1": 0.9636363636363636,
12
+ "eval_09_f1": 0.9076923076923077,
13
+ "eval_0_f1": 0.8918362091166624,
14
+ "eval_100_f1": 0.9787716689913094,
15
+ "eval_10_f1": 0.0,
16
+ "eval_1_f1": 0.9460573633891765,
17
+ "eval_2_f1": 0.9052096569250319,
18
+ "eval_3_f1": 0.9488174195970466,
19
+ "eval_4_f1": 0.9129169464965301,
20
+ "eval_5_f1": 0.9431066419687748,
21
+ "eval_6_f1": 0.9764898851831602,
22
+ "eval_7_f1": 0.9617969579059075,
23
+ "eval_8_f1": 0.9573971403559964,
24
+ "eval_9_f1": 0.9131222981453074,
25
+ "eval___f1": 0.965883121123082,
26
+ "eval_loss": 0.07855656743049622,
27
+ "eval_overall_accuracy": 0.9678325102233014,
28
+ "eval_overall_f1": 0.9519101855680437,
29
+ "eval_overall_precision": 0.9471512280264306,
30
+ "eval_overall_recall": 0.9567172073342737,
31
+ "eval_runtime": 31.7972,
32
+ "eval_samples": 8700,
33
+ "eval_samples_per_second": 273.609,
34
+ "eval_steps_per_second": 8.554
35
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2da5dc563652c6aeb7625314cab988dfbea7f6b9e3457d47f1b41acc4bb4ba37
3
  size 265811460
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6374688a8ff2e3bc6d4ee78fc7b6058429092d7ea3d1296f9bb1bd5d8e85b184
3
  size 265811460
runs/Jul26_04-39-00_66bdfda16dc0/events.out.tfevents.1721970107.66bdfda16dc0.1806.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13d13aebe76c04d8f2085983386d9f15afca04ca9d168c085d4e987cfd22813b
3
+ size 1709
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 7.0,
3
- "total_flos": 228688400640.0,
4
- "train_loss": 1.8831114087785994,
5
- "train_runtime": 26.9803,
6
- "train_samples": 2,
7
- "train_samples_per_second": 0.519,
8
- "train_steps_per_second": 0.259
9
  }
 
1
  {
2
  "epoch": 7.0,
3
+ "total_flos": 6493939017778920.0,
4
+ "train_loss": 0.18183875429843152,
5
+ "train_runtime": 1332.3436,
6
+ "train_samples": 34796,
7
+ "train_samples_per_second": 182.815,
8
+ "train_steps_per_second": 5.716
9
  }
trainer_state.json CHANGED
@@ -1,139 +1,410 @@
1
  {
2
- "best_metric": 1.1882282495498657,
3
- "best_model_checkpoint": "data/outputs/distilbert-base-uncased-pii-200/checkpoint-7",
4
  "epoch": 7.0,
5
  "eval_steps": 500,
6
- "global_step": 7,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  {
12
  "epoch": 1.0,
13
- "eval_0_f1": 0.0,
14
- "eval_100_f1": 0.0,
15
- "eval___f1": 0.0,
16
- "eval_loss": 2.613192081451416,
17
- "eval_overall_accuracy": 0.02702702702702703,
18
- "eval_overall_f1": 0.0,
19
- "eval_overall_precision": 0.0,
20
- "eval_overall_recall": 0.0,
21
- "eval_runtime": 0.5664,
22
- "eval_samples_per_second": 1.766,
23
- "eval_steps_per_second": 1.766,
24
- "step": 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  },
26
  {
27
  "epoch": 2.0,
28
- "eval_0_f1": 0.0,
29
- "eval_100_f1": 0.18181818181818182,
30
- "eval___f1": 0.0,
31
- "eval_loss": 2.405207633972168,
32
- "eval_overall_accuracy": 0.2972972972972973,
33
- "eval_overall_f1": 0.09523809523809523,
34
- "eval_overall_precision": 0.0625,
35
- "eval_overall_recall": 0.2,
36
- "eval_runtime": 0.2767,
37
- "eval_samples_per_second": 3.614,
38
- "eval_steps_per_second": 3.614,
39
- "step": 2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  },
41
  {
42
  "epoch": 3.0,
43
- "eval_100_f1": 0.0,
44
- "eval___f1": 0.0,
45
- "eval_loss": 1.9990681409835815,
46
- "eval_overall_accuracy": 0.8108108108108109,
47
- "eval_overall_f1": 0.0,
48
- "eval_overall_precision": 0.0,
49
- "eval_overall_recall": 0.0,
50
- "eval_runtime": 0.2831,
51
- "eval_samples_per_second": 3.532,
52
- "eval_steps_per_second": 3.532,
53
- "step": 3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  },
55
  {
56
  "epoch": 4.0,
57
- "eval_100_f1": 0.0,
58
- "eval___f1": 0.0,
59
- "eval_loss": 1.671056866645813,
60
- "eval_overall_accuracy": 0.8378378378378378,
61
- "eval_overall_f1": 0.0,
62
- "eval_overall_precision": 0.0,
63
- "eval_overall_recall": 0.0,
64
- "eval_runtime": 0.2912,
65
- "eval_samples_per_second": 3.434,
66
- "eval_steps_per_second": 3.434,
67
- "step": 4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  },
69
  {
70
  "epoch": 5.0,
71
- "eval_100_f1": 0.0,
72
- "eval___f1": 0.0,
73
- "eval_loss": 1.4230600595474243,
74
- "eval_overall_accuracy": 0.8378378378378378,
75
- "eval_overall_f1": 0.0,
76
- "eval_overall_precision": 0.0,
77
- "eval_overall_recall": 0.0,
78
- "eval_runtime": 0.2824,
79
- "eval_samples_per_second": 3.541,
80
- "eval_steps_per_second": 3.541,
81
- "step": 5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  },
83
  {
84
  "epoch": 6.0,
85
- "eval_100_f1": 0.0,
86
- "eval___f1": 0.0,
87
- "eval_loss": 1.2626835107803345,
88
- "eval_overall_accuracy": 0.8378378378378378,
89
- "eval_overall_f1": 0.0,
90
- "eval_overall_precision": 0.0,
91
- "eval_overall_recall": 0.0,
92
- "eval_runtime": 0.2997,
93
- "eval_samples_per_second": 3.336,
94
- "eval_steps_per_second": 3.336,
95
- "step": 6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  },
97
  {
98
  "epoch": 7.0,
99
- "eval_100_f1": 0.0,
100
- "eval___f1": 0.0,
101
- "eval_loss": 1.1882282495498657,
102
- "eval_overall_accuracy": 0.8378378378378378,
103
- "eval_overall_f1": 0.0,
104
- "eval_overall_precision": 0.0,
105
- "eval_overall_recall": 0.0,
106
- "eval_runtime": 0.2721,
107
- "eval_samples_per_second": 3.675,
108
- "eval_steps_per_second": 3.675,
109
- "step": 7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  },
111
  {
112
  "epoch": 7.0,
113
- "step": 7,
114
- "total_flos": 228688400640.0,
115
- "train_loss": 1.8831114087785994,
116
- "train_runtime": 26.9803,
117
- "train_samples_per_second": 0.519,
118
- "train_steps_per_second": 0.259
119
  },
120
  {
121
  "epoch": 7.0,
122
- "eval_100_f1": 0.0,
123
- "eval___f1": 0.0,
124
- "eval_loss": 1.1882282495498657,
125
- "eval_overall_accuracy": 0.8378378378378378,
126
- "eval_overall_f1": 0.0,
127
- "eval_overall_precision": 0.0,
128
- "eval_overall_recall": 0.0,
129
- "eval_runtime": 0.266,
130
- "eval_samples_per_second": 3.759,
131
- "eval_steps_per_second": 3.759,
132
- "step": 7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
  }
134
  ],
135
  "logging_steps": 500,
136
- "max_steps": 7,
137
  "num_input_tokens_seen": 0,
138
  "num_train_epochs": 7,
139
  "save_steps": 500,
@@ -149,7 +420,7 @@
149
  "attributes": {}
150
  }
151
  },
152
- "total_flos": 228688400640.0,
153
  "train_batch_size": 32,
154
  "trial_name": null,
155
  "trial_params": null
 
1
  {
2
+ "best_metric": 0.07855656743049622,
3
+ "best_model_checkpoint": "data/outputs/distilbert-base-uncased-pii-200/checkpoint-3264",
4
  "epoch": 7.0,
5
  "eval_steps": 500,
6
+ "global_step": 7616,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
+ {
12
+ "epoch": 0.45955882352941174,
13
+ "grad_norm": 1.3934886455535889,
14
+ "learning_rate": 1.6404199475065617e-05,
15
+ "loss": 1.7881,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.9191176470588235,
20
+ "grad_norm": 0.784841775894165,
21
+ "learning_rate": 3.280839895013123e-05,
22
+ "loss": 0.2545,
23
+ "step": 1000
24
+ },
25
  {
26
  "epoch": 1.0,
27
+ "eval_00_f1": 0.9054441260744985,
28
+ "eval_01_f1": 0.0,
29
+ "eval_02_f1": 0.0,
30
+ "eval_03_f1": 0.7401574803149606,
31
+ "eval_04_f1": 0.6938775510204082,
32
+ "eval_05_f1": 0.6694045174537987,
33
+ "eval_06_f1": 0.30985915492957744,
34
+ "eval_07_f1": 0.16470588235294117,
35
+ "eval_08_f1": 0.0,
36
+ "eval_09_f1": 0.9047619047619048,
37
+ "eval_0_f1": 0.8577833823735463,
38
+ "eval_100_f1": 0.9608561749307052,
39
+ "eval_10_f1": 0.0,
40
+ "eval_1_f1": 0.9170798427081553,
41
+ "eval_2_f1": 0.9002621395581076,
42
+ "eval_3_f1": 0.9280289330922241,
43
+ "eval_4_f1": 0.8847179031541538,
44
+ "eval_5_f1": 0.9121113162004828,
45
+ "eval_6_f1": 0.9370610480821178,
46
+ "eval_7_f1": 0.9085239085239085,
47
+ "eval_8_f1": 0.8524124881740777,
48
+ "eval_9_f1": 0.8535798122065729,
49
+ "eval___f1": 0.9117056318791676,
50
+ "eval_loss": 0.12546201050281525,
51
+ "eval_overall_accuracy": 0.9575080375008431,
52
+ "eval_overall_f1": 0.918246790209348,
53
+ "eval_overall_precision": 0.9223653782623059,
54
+ "eval_overall_recall": 0.9141648196655249,
55
+ "eval_runtime": 33.5549,
56
+ "eval_samples_per_second": 259.277,
57
+ "eval_steps_per_second": 8.106,
58
+ "step": 1088
59
+ },
60
+ {
61
+ "epoch": 1.3786764705882353,
62
+ "grad_norm": 0.5563145279884338,
63
+ "learning_rate": 4.9212598425196856e-05,
64
+ "loss": 0.1199,
65
+ "step": 1500
66
+ },
67
+ {
68
+ "epoch": 1.8382352941176472,
69
+ "grad_norm": 0.48911386728286743,
70
+ "learning_rate": 4.609323703217334e-05,
71
+ "loss": 0.092,
72
+ "step": 2000
73
  },
74
  {
75
  "epoch": 2.0,
76
+ "eval_00_f1": 0.9548022598870056,
77
+ "eval_01_f1": 0.4304635761589404,
78
+ "eval_02_f1": 0.46009389671361506,
79
+ "eval_03_f1": 0.9635036496350365,
80
+ "eval_04_f1": 0.7525252525252526,
81
+ "eval_05_f1": 0.5924812030075188,
82
+ "eval_06_f1": 0.8138297872340425,
83
+ "eval_07_f1": 0.8468468468468469,
84
+ "eval_08_f1": 0.9454545454545454,
85
+ "eval_09_f1": 0.9291338582677166,
86
+ "eval_0_f1": 0.8954570333880678,
87
+ "eval_100_f1": 0.975552255713914,
88
+ "eval_10_f1": 0.0,
89
+ "eval_1_f1": 0.9426179998274226,
90
+ "eval_2_f1": 0.9290612143124298,
91
+ "eval_3_f1": 0.9465685683271017,
92
+ "eval_4_f1": 0.9122260540660807,
93
+ "eval_5_f1": 0.9362116991643455,
94
+ "eval_6_f1": 0.9686864579097194,
95
+ "eval_7_f1": 0.953225525995479,
96
+ "eval_8_f1": 0.9446140427387701,
97
+ "eval_9_f1": 0.9067169592340055,
98
+ "eval___f1": 0.9622872037142625,
99
+ "eval_loss": 0.08186369389295578,
100
+ "eval_overall_accuracy": 0.9657341417048245,
101
+ "eval_overall_f1": 0.9479689782509201,
102
+ "eval_overall_precision": 0.9439151985816489,
103
+ "eval_overall_recall": 0.9520577271811405,
104
+ "eval_runtime": 31.5004,
105
+ "eval_samples_per_second": 276.187,
106
+ "eval_steps_per_second": 8.635,
107
+ "step": 2176
108
+ },
109
+ {
110
+ "epoch": 2.297794117647059,
111
+ "grad_norm": 1.011751413345337,
112
+ "learning_rate": 4.198949441891005e-05,
113
+ "loss": 0.0772,
114
+ "step": 2500
115
+ },
116
+ {
117
+ "epoch": 2.7573529411764706,
118
+ "grad_norm": 0.7153874635696411,
119
+ "learning_rate": 3.788575180564675e-05,
120
+ "loss": 0.0716,
121
+ "step": 3000
122
  },
123
  {
124
  "epoch": 3.0,
125
+ "eval_00_f1": 0.9351351351351351,
126
+ "eval_01_f1": 0.2727272727272727,
127
+ "eval_02_f1": 0.3438914027149321,
128
+ "eval_03_f1": 0.9481481481481482,
129
+ "eval_04_f1": 0.8168701442841289,
130
+ "eval_05_f1": 0.8036951501154733,
131
+ "eval_06_f1": 0.8732394366197183,
132
+ "eval_07_f1": 0.8909657320872275,
133
+ "eval_08_f1": 0.9636363636363636,
134
+ "eval_09_f1": 0.9076923076923077,
135
+ "eval_0_f1": 0.8918362091166624,
136
+ "eval_100_f1": 0.9787716689913094,
137
+ "eval_10_f1": 0.0,
138
+ "eval_1_f1": 0.9460573633891765,
139
+ "eval_2_f1": 0.9052096569250319,
140
+ "eval_3_f1": 0.9488174195970466,
141
+ "eval_4_f1": 0.9129169464965301,
142
+ "eval_5_f1": 0.9431066419687748,
143
+ "eval_6_f1": 0.9764898851831602,
144
+ "eval_7_f1": 0.9617969579059075,
145
+ "eval_8_f1": 0.9573971403559964,
146
+ "eval_9_f1": 0.9131222981453074,
147
+ "eval___f1": 0.965883121123082,
148
+ "eval_loss": 0.07855656743049622,
149
+ "eval_overall_accuracy": 0.9678325102233014,
150
+ "eval_overall_f1": 0.9519101855680437,
151
+ "eval_overall_precision": 0.9471512280264306,
152
+ "eval_overall_recall": 0.9567172073342737,
153
+ "eval_runtime": 31.5113,
154
+ "eval_samples_per_second": 276.091,
155
+ "eval_steps_per_second": 8.632,
156
+ "step": 3264
157
+ },
158
+ {
159
+ "epoch": 3.2169117647058822,
160
+ "grad_norm": 0.566089928150177,
161
+ "learning_rate": 3.378200919238346e-05,
162
+ "loss": 0.064,
163
+ "step": 3500
164
+ },
165
+ {
166
+ "epoch": 3.6764705882352944,
167
+ "grad_norm": 0.5392144322395325,
168
+ "learning_rate": 2.9678266579120157e-05,
169
+ "loss": 0.0575,
170
+ "step": 4000
171
  },
172
  {
173
  "epoch": 4.0,
174
+ "eval_00_f1": 0.9750692520775622,
175
+ "eval_01_f1": 0.46692607003891046,
176
+ "eval_02_f1": 0.39506172839506176,
177
+ "eval_03_f1": 0.9781021897810219,
178
+ "eval_04_f1": 0.8205741626794257,
179
+ "eval_05_f1": 0.8033573141486811,
180
+ "eval_06_f1": 0.8941176470588235,
181
+ "eval_07_f1": 0.9196141479099678,
182
+ "eval_08_f1": 0.9549549549549549,
183
+ "eval_09_f1": 0.9508196721311476,
184
+ "eval_0_f1": 0.8882090503505418,
185
+ "eval_100_f1": 0.9800042935565983,
186
+ "eval_10_f1": 0.0,
187
+ "eval_1_f1": 0.9437605172261093,
188
+ "eval_2_f1": 0.9068274144935132,
189
+ "eval_3_f1": 0.9544898458527037,
190
+ "eval_4_f1": 0.9235419232060899,
191
+ "eval_5_f1": 0.9502912095354191,
192
+ "eval_6_f1": 0.974421768707483,
193
+ "eval_7_f1": 0.9625987708516243,
194
+ "eval_8_f1": 0.9623994147768837,
195
+ "eval_9_f1": 0.9085659287776707,
196
+ "eval___f1": 0.9674214041374817,
197
+ "eval_loss": 0.08075448125600815,
198
+ "eval_overall_accuracy": 0.9673478870178436,
199
+ "eval_overall_f1": 0.9538741337681478,
200
+ "eval_overall_precision": 0.9500668357340063,
201
+ "eval_overall_recall": 0.9577120693129155,
202
+ "eval_runtime": 31.7082,
203
+ "eval_samples_per_second": 274.377,
204
+ "eval_steps_per_second": 8.578,
205
+ "step": 4352
206
+ },
207
+ {
208
+ "epoch": 4.136029411764706,
209
+ "grad_norm": 0.993366003036499,
210
+ "learning_rate": 2.557452396585686e-05,
211
+ "loss": 0.052,
212
+ "step": 4500
213
+ },
214
+ {
215
+ "epoch": 4.595588235294118,
216
+ "grad_norm": 0.6269740462303162,
217
+ "learning_rate": 2.1470781352593567e-05,
218
+ "loss": 0.0463,
219
+ "step": 5000
220
  },
221
  {
222
  "epoch": 5.0,
223
+ "eval_00_f1": 0.9633802816901408,
224
+ "eval_01_f1": 0.4693140794223827,
225
+ "eval_02_f1": 0.49504950495049505,
226
+ "eval_03_f1": 0.9781021897810219,
227
+ "eval_04_f1": 0.8,
228
+ "eval_05_f1": 0.772609819121447,
229
+ "eval_06_f1": 0.9005847953216374,
230
+ "eval_07_f1": 0.9211356466876972,
231
+ "eval_08_f1": 0.9636363636363636,
232
+ "eval_09_f1": 0.9291338582677166,
233
+ "eval_0_f1": 0.9050072664817017,
234
+ "eval_100_f1": 0.9814164045116244,
235
+ "eval_10_f1": 0.0,
236
+ "eval_1_f1": 0.9505915100904663,
237
+ "eval_2_f1": 0.932791259052217,
238
+ "eval_3_f1": 0.9548890514508657,
239
+ "eval_4_f1": 0.9277958132766148,
240
+ "eval_5_f1": 0.954789061426412,
241
+ "eval_6_f1": 0.9766105867870333,
242
+ "eval_7_f1": 0.964676792652773,
243
+ "eval_8_f1": 0.9624082232011747,
244
+ "eval_9_f1": 0.9176204606471668,
245
+ "eval___f1": 0.9707106143428246,
246
+ "eval_loss": 0.08005847036838531,
247
+ "eval_overall_accuracy": 0.9692788856663944,
248
+ "eval_overall_f1": 0.9581124205342111,
249
+ "eval_overall_precision": 0.9558786663324141,
250
+ "eval_overall_recall": 0.9603566391295587,
251
+ "eval_runtime": 31.5692,
252
+ "eval_samples_per_second": 275.585,
253
+ "eval_steps_per_second": 8.616,
254
+ "step": 5440
255
+ },
256
+ {
257
+ "epoch": 5.055147058823529,
258
+ "grad_norm": 0.6535865068435669,
259
+ "learning_rate": 1.736703873933027e-05,
260
+ "loss": 0.0429,
261
+ "step": 5500
262
+ },
263
+ {
264
+ "epoch": 5.514705882352941,
265
+ "grad_norm": 0.42540472745895386,
266
+ "learning_rate": 1.3263296126066974e-05,
267
+ "loss": 0.0321,
268
+ "step": 6000
269
+ },
270
+ {
271
+ "epoch": 5.974264705882353,
272
+ "grad_norm": 0.3176397979259491,
273
+ "learning_rate": 9.159553512803678e-06,
274
+ "loss": 0.0325,
275
+ "step": 6500
276
  },
277
  {
278
  "epoch": 6.0,
279
+ "eval_00_f1": 0.9666666666666666,
280
+ "eval_01_f1": 0.4476534296028881,
281
+ "eval_02_f1": 0.4275362318840579,
282
+ "eval_03_f1": 0.9781021897810219,
283
+ "eval_04_f1": 0.7926380368098158,
284
+ "eval_05_f1": 0.7869674185463661,
285
+ "eval_06_f1": 0.9080459770114941,
286
+ "eval_07_f1": 0.9456869009584665,
287
+ "eval_08_f1": 0.9541284403669724,
288
+ "eval_09_f1": 0.943089430894309,
289
+ "eval_0_f1": 0.9018794556059624,
290
+ "eval_100_f1": 0.9819615302021658,
291
+ "eval_10_f1": 0.0,
292
+ "eval_1_f1": 0.9515930293962331,
293
+ "eval_2_f1": 0.9276000518067608,
294
+ "eval_3_f1": 0.9582514734774068,
295
+ "eval_4_f1": 0.9297636384003585,
296
+ "eval_5_f1": 0.9576626538617611,
297
+ "eval_6_f1": 0.9769031023643571,
298
+ "eval_7_f1": 0.9654320987654321,
299
+ "eval_8_f1": 0.9641913707073672,
300
+ "eval_9_f1": 0.9196490739451331,
301
+ "eval___f1": 0.9694843342036553,
302
+ "eval_loss": 0.10214365273714066,
303
+ "eval_overall_accuracy": 0.9690265818326251,
304
+ "eval_overall_f1": 0.9584994756704824,
305
+ "eval_overall_precision": 0.9558884310459276,
306
+ "eval_overall_recall": 0.9611248236953456,
307
+ "eval_runtime": 31.5742,
308
+ "eval_samples_per_second": 275.541,
309
+ "eval_steps_per_second": 8.615,
310
+ "step": 6528
311
+ },
312
+ {
313
+ "epoch": 6.4338235294117645,
314
+ "grad_norm": 1.4613324403762817,
315
+ "learning_rate": 5.0558108995403805e-06,
316
+ "loss": 0.0193,
317
+ "step": 7000
318
+ },
319
+ {
320
+ "epoch": 6.893382352941177,
321
+ "grad_norm": 0.6644862294197083,
322
+ "learning_rate": 9.520682862770847e-07,
323
+ "loss": 0.0159,
324
+ "step": 7500
325
  },
326
  {
327
  "epoch": 7.0,
328
+ "eval_00_f1": 0.9641873278236915,
329
+ "eval_01_f1": 0.461038961038961,
330
+ "eval_02_f1": 0.4407894736842105,
331
+ "eval_03_f1": 0.9781021897810219,
332
+ "eval_04_f1": 0.7787610619469025,
333
+ "eval_05_f1": 0.7702182284980743,
334
+ "eval_06_f1": 0.9096209912536442,
335
+ "eval_07_f1": 0.9235668789808917,
336
+ "eval_08_f1": 0.9549549549549549,
337
+ "eval_09_f1": 0.9516129032258064,
338
+ "eval_0_f1": 0.8967626816212082,
339
+ "eval_100_f1": 0.9823135095335246,
340
+ "eval_10_f1": 0.0,
341
+ "eval_1_f1": 0.94840600301927,
342
+ "eval_2_f1": 0.9184932405827536,
343
+ "eval_3_f1": 0.9568531038721574,
344
+ "eval_4_f1": 0.927282913165266,
345
+ "eval_5_f1": 0.9573369565217391,
346
+ "eval_6_f1": 0.9773841961852859,
347
+ "eval_7_f1": 0.9652387640449438,
348
+ "eval_8_f1": 0.966681344488478,
349
+ "eval_9_f1": 0.9156825048063718,
350
+ "eval___f1": 0.9706314243759178,
351
+ "eval_loss": 0.1300029456615448,
352
+ "eval_overall_accuracy": 0.9673478870178436,
353
+ "eval_overall_f1": 0.9572067971952467,
354
+ "eval_overall_precision": 0.9543013780931997,
355
+ "eval_overall_recall": 0.9601299617167036,
356
+ "eval_runtime": 31.698,
357
+ "eval_samples_per_second": 274.465,
358
+ "eval_steps_per_second": 8.581,
359
+ "step": 7616
360
  },
361
  {
362
  "epoch": 7.0,
363
+ "step": 7616,
364
+ "total_flos": 6493939017778920.0,
365
+ "train_loss": 0.18183875429843152,
366
+ "train_runtime": 1332.3436,
367
+ "train_samples_per_second": 182.815,
368
+ "train_steps_per_second": 5.716
369
  },
370
  {
371
  "epoch": 7.0,
372
+ "eval_00_f1": 0.9351351351351351,
373
+ "eval_01_f1": 0.2727272727272727,
374
+ "eval_02_f1": 0.3438914027149321,
375
+ "eval_03_f1": 0.9481481481481482,
376
+ "eval_04_f1": 0.8168701442841289,
377
+ "eval_05_f1": 0.8036951501154733,
378
+ "eval_06_f1": 0.8732394366197183,
379
+ "eval_07_f1": 0.8909657320872275,
380
+ "eval_08_f1": 0.9636363636363636,
381
+ "eval_09_f1": 0.9076923076923077,
382
+ "eval_0_f1": 0.8918362091166624,
383
+ "eval_100_f1": 0.9787716689913094,
384
+ "eval_10_f1": 0.0,
385
+ "eval_1_f1": 0.9460573633891765,
386
+ "eval_2_f1": 0.9052096569250319,
387
+ "eval_3_f1": 0.9488174195970466,
388
+ "eval_4_f1": 0.9129169464965301,
389
+ "eval_5_f1": 0.9431066419687748,
390
+ "eval_6_f1": 0.9764898851831602,
391
+ "eval_7_f1": 0.9617969579059075,
392
+ "eval_8_f1": 0.9573971403559964,
393
+ "eval_9_f1": 0.9131222981453074,
394
+ "eval___f1": 0.965883121123082,
395
+ "eval_loss": 0.07855656743049622,
396
+ "eval_overall_accuracy": 0.9678325102233014,
397
+ "eval_overall_f1": 0.9519101855680437,
398
+ "eval_overall_precision": 0.9471512280264306,
399
+ "eval_overall_recall": 0.9567172073342737,
400
+ "eval_runtime": 31.7972,
401
+ "eval_samples_per_second": 273.609,
402
+ "eval_steps_per_second": 8.554,
403
+ "step": 7616
404
  }
405
  ],
406
  "logging_steps": 500,
407
+ "max_steps": 7616,
408
  "num_input_tokens_seen": 0,
409
  "num_train_epochs": 7,
410
  "save_steps": 500,
 
420
  "attributes": {}
421
  }
422
  },
423
+ "total_flos": 6493939017778920.0,
424
  "train_batch_size": 32,
425
  "trial_name": null,
426
  "trial_params": null