napsternxg commited on
Commit
5582a81
1 Parent(s): eb5a86b

Added LP FT model

Browse files
README.md CHANGED
@@ -5,7 +5,7 @@ tags:
5
  datasets:
6
  - wnut_17
7
  model-index:
8
- - name: fine_tune_bert_output
9
  results: []
10
  ---
11
 
@@ -17,17 +17,17 @@ should probably proofread and complete it, then remove this comment. -->
17
  This model is a fine-tuned version of [vinai/bertweet-base](https://huggingface.co/vinai/bertweet-base) on the [wnut_17](https://huggingface.co/datasets/wnut_17) dataset.
18
 
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.3239
21
- - Overall Precision: 0.6913
22
- - Overall Recall: 0.5914
23
- - Overall F1: 0.6374
24
- - Overall Accuracy: 0.9499
25
- - Corporation F1: 0.2703
26
- - Creative-work F1: 0.3636
27
- - Group F1: 0.4030
28
- - Location F1: 0.7500
29
- - Person F1: 0.7733
30
- - Product F1: 0.4152
31
 
32
  ## Model description
33
 
@@ -46,7 +46,7 @@ More information needed
46
  ### Training hyperparameters
47
 
48
  The following hyperparameters were used during training:
49
- - learning_rate: 2e-05
50
  - train_batch_size: 16
51
  - eval_batch_size: 16
52
  - seed: 42
@@ -58,21 +58,20 @@ The following hyperparameters were used during training:
58
 
59
  | Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy | Corporation F1 | Creative-work F1 | Group F1 | Location F1 | Person F1 | Product F1 |
60
  |:-------------:|:-----:|:----:|:---------------:|:-----------------:|:--------------:|:----------:|:----------------:|:--------------:|:----------------:|:--------:|:-----------:|:---------:|:----------:|
61
- | 0.2691 | 1.0 | 213 | 0.4035 | 0.0 | 0.0 | 0.0 | 0.8979 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
62
- | 0.1604 | 2.0 | 426 | 0.3054 | 0.6255 | 0.4161 | 0.4998 | 0.9324 | 0.0 | 0.0 | 0.0 | 0.3534 | 0.6877 | 0.0 |
63
- | 0.1118 | 3.0 | 639 | 0.2864 | 0.6655 | 0.4643 | 0.5470 | 0.9404 | 0.1961 | 0.1164 | 0.1538 | 0.5803 | 0.7221 | 0.1865 |
64
- | 0.0524 | 4.0 | 852 | 0.2891 | 0.6945 | 0.5042 | 0.5842 | 0.9442 | 0.2017 | 0.3273 | 0.2472 | 0.6522 | 0.7366 | 0.2581 |
65
- | 0.0446 | 5.0 | 1065 | 0.2691 | 0.6815 | 0.5847 | 0.6294 | 0.9486 | 0.2737 | 0.3415 | 0.3007 | 0.6703 | 0.7768 | 0.3243 |
66
- | 0.0296 | 6.0 | 1278 | 0.2739 | 0.6740 | 0.5615 | 0.6126 | 0.9479 | 0.3065 | 0.3766 | 0.3333 | 0.7 | 0.7582 | 0.3472 |
67
- | 0.0261 | 7.0 | 1491 | 0.3150 | 0.6907 | 0.5415 | 0.6071 | 0.9457 | 0.2292 | 0.3350 | 0.304 | 0.6369 | 0.7547 | 0.2982 |
68
- | 0.0193 | 8.0 | 1704 | 0.2922 | 0.6957 | 0.5772 | 0.6310 | 0.9496 | 0.2887 | 0.3621 | 0.3676 | 0.7475 | 0.7645 | 0.4158 |
69
- | 0.0173 | 9.0 | 1917 | 0.2823 | 0.6845 | 0.5963 | 0.6374 | 0.9501 | 0.25 | 0.3863 | 0.3660 | 0.6729 | 0.7810 | 0.4064 |
70
- | 0.0227 | 10.0 | 2130 | 0.2912 | 0.6719 | 0.5681 | 0.6157 | 0.9482 | 0.2268 | 0.3797 | 0.3625 | 0.7045 | 0.7572 | 0.4286 |
71
- | 0.0185 | 11.0 | 2343 | 0.3140 | 0.6941 | 0.5598 | 0.6198 | 0.9482 | 0.2532 | 0.3896 | 0.3382 | 0.7059 | 0.7601 | 0.3961 |
72
- | 0.0221 | 12.0 | 2556 | 0.3527 | 0.6937 | 0.5473 | 0.6119 | 0.9470 | 0.3220 | 0.3687 | 0.35 | 0.7245 | 0.7502 | 0.3308 |
73
- | 0.0099 | 13.0 | 2769 | 0.3332 | 0.6872 | 0.5748 | 0.6260 | 0.9493 | 0.3168 | 0.3782 | 0.3597 | 0.7391 | 0.7627 | 0.4027 |
74
- | 0.0062 | 14.0 | 2982 | 0.3637 | 0.7287 | 0.5465 | 0.6246 | 0.9479 | 0.25 | 0.3700 | 0.4065 | 0.7340 | 0.7526 | 0.3468 |
75
- | 0.0075 | 15.0 | 3195 | 0.3239 | 0.6913 | 0.5914 | 0.6374 | 0.9499 | 0.2703 | 0.3636 | 0.4030 | 0.7500 | 0.7733 | 0.4152 |
76
 
77
 
78
  ### Framework versions
5
  datasets:
6
  - wnut_17
7
  model-index:
8
+ - name: fine_tune_bert_output_LP_FP
9
  results: []
10
  ---
11
 
17
  This model is a fine-tuned version of [vinai/bertweet-base](https://huggingface.co/vinai/bertweet-base) on the [wnut_17](https://huggingface.co/datasets/wnut_17) dataset.
18
 
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.3376
21
+ - Overall Precision: 0.6803
22
+ - Overall Recall: 0.6096
23
+ - Overall F1: 0.6430
24
+ - Overall Accuracy: 0.9509
25
+ - Corporation F1: 0.2975
26
+ - Creative-work F1: 0.4436
27
+ - Group F1: 0.3624
28
+ - Location F1: 0.6834
29
+ - Person F1: 0.7902
30
+ - Product F1: 0.3887
31
 
32
  ## Model description
33
 
46
  ### Training hyperparameters
47
 
48
  The following hyperparameters were used during training:
49
+ - learning_rate: 1e-05
50
  - train_batch_size: 16
51
  - eval_batch_size: 16
52
  - seed: 42
58
 
59
  | Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy | Corporation F1 | Creative-work F1 | Group F1 | Location F1 | Person F1 | Product F1 |
60
  |:-------------:|:-----:|:----:|:---------------:|:-----------------:|:--------------:|:----------:|:----------------:|:--------------:|:----------------:|:--------:|:-----------:|:---------:|:----------:|
61
+ | 0.0215 | 1.0 | 213 | 0.2913 | 0.7026 | 0.5905 | 0.6417 | 0.9507 | 0.2832 | 0.4444 | 0.2975 | 0.6854 | 0.7788 | 0.4015 |
62
+ | 0.0213 | 2.0 | 426 | 0.3052 | 0.6774 | 0.5772 | 0.6233 | 0.9495 | 0.2830 | 0.3483 | 0.3231 | 0.6857 | 0.7728 | 0.3794 |
63
+ | 0.0288 | 3.0 | 639 | 0.3378 | 0.7061 | 0.5507 | 0.6188 | 0.9467 | 0.3077 | 0.4184 | 0.3529 | 0.6222 | 0.7532 | 0.3910 |
64
+ | 0.0124 | 4.0 | 852 | 0.2712 | 0.6574 | 0.6121 | 0.6340 | 0.9502 | 0.3077 | 0.4842 | 0.3167 | 0.6809 | 0.7735 | 0.3986 |
65
+ | 0.0208 | 5.0 | 1065 | 0.2905 | 0.7108 | 0.6063 | 0.6544 | 0.9518 | 0.3063 | 0.4286 | 0.3419 | 0.7052 | 0.7913 | 0.4223 |
66
+ | 0.0071 | 6.0 | 1278 | 0.3189 | 0.6756 | 0.5847 | 0.6269 | 0.9494 | 0.2759 | 0.4380 | 0.3256 | 0.6744 | 0.7781 | 0.3779 |
67
+ | 0.0073 | 7.0 | 1491 | 0.3593 | 0.7330 | 0.5540 | 0.6310 | 0.9476 | 0.3061 | 0.4388 | 0.3784 | 0.6946 | 0.7631 | 0.3374 |
68
+ | 0.0135 | 8.0 | 1704 | 0.3564 | 0.6875 | 0.5482 | 0.6100 | 0.9471 | 0.34 | 0.4179 | 0.3088 | 0.6632 | 0.7486 | 0.3695 |
69
+ | 0.0097 | 9.0 | 1917 | 0.3085 | 0.6598 | 0.6395 | 0.6495 | 0.9516 | 0.3111 | 0.4609 | 0.3836 | 0.7090 | 0.7906 | 0.4083 |
70
+ | 0.0108 | 10.0 | 2130 | 0.3045 | 0.6605 | 0.6478 | 0.6541 | 0.9509 | 0.3529 | 0.4580 | 0.3649 | 0.6897 | 0.7843 | 0.4387 |
71
+ | 0.013 | 11.0 | 2343 | 0.3383 | 0.6788 | 0.6179 | 0.6470 | 0.9507 | 0.2783 | 0.4248 | 0.3358 | 0.7368 | 0.7958 | 0.3655 |
72
+ | 0.0076 | 12.0 | 2556 | 0.3617 | 0.6920 | 0.5523 | 0.6143 | 0.9474 | 0.2708 | 0.3985 | 0.3333 | 0.6740 | 0.7566 | 0.3525 |
73
+ | 0.0042 | 13.0 | 2769 | 0.3747 | 0.6896 | 0.5664 | 0.6220 | 0.9473 | 0.2478 | 0.3915 | 0.3521 | 0.6561 | 0.7742 | 0.3539 |
74
+ | 0.0049 | 14.0 | 2982 | 0.3376 | 0.6803 | 0.6096 | 0.6430 | 0.9509 | 0.2975 | 0.4436 | 0.3624 | 0.6834 | 0.7902 | 0.3887 |
 
75
 
76
 
77
  ### Framework versions
all_results.json CHANGED
@@ -1,46 +1,46 @@
1
  {
2
- "epoch": 15.0,
3
- "test_corporation_f1": 0.22680412371134023,
4
- "test_creative-work_f1": 0.3375796178343949,
5
- "test_group_f1": 0.3180722891566265,
6
- "test_location_f1": 0.5961538461538461,
7
- "test_loss": 0.27450963854789734,
8
- "test_overall_accuracy": 0.9475309541150765,
9
- "test_overall_f1": 0.5243938940436995,
10
- "test_overall_precision": 0.6208362863217576,
11
- "test_overall_recall": 0.4538860103626943,
12
- "test_person_f1": 0.6735751295336786,
13
- "test_product_f1": 0.1962264150943396,
14
- "test_runtime": 7.5404,
15
- "test_samples_per_second": 170.68,
16
- "test_steps_per_second": 10.742,
17
- "total_flos": 1172435714212020.0,
18
- "train_corporation_f1": 0.8200972447325771,
19
- "train_creative-work_f1": 0.8151898734177214,
20
- "train_group_f1": 0.8691358024691358,
21
- "train_location_f1": 0.9257142857142856,
22
- "train_loss": 0.03221079334616661,
23
- "train_overall_accuracy": 0.9946731734139803,
24
- "train_overall_f1": 0.8949512843224092,
25
- "train_overall_precision": 0.884453781512605,
26
- "train_overall_recall": 0.9057009680889208,
27
- "train_person_f1": 0.9489291598023064,
28
- "train_product_f1": 0.7822014051522247,
29
- "train_runtime": 13.2179,
30
- "train_samples_per_second": 256.773,
31
- "train_steps_per_second": 16.114,
32
- "validation_corporation_f1": 0.2736842105263158,
33
- "validation_creative-work_f1": 0.34146341463414637,
34
- "validation_group_f1": 0.3006535947712418,
35
- "validation_location_f1": 0.6703296703296703,
36
- "validation_loss": 0.2690572142601013,
37
- "validation_overall_accuracy": 0.9486386874331233,
38
- "validation_overall_f1": 0.629414394278051,
39
- "validation_overall_precision": 0.6815101645692159,
40
- "validation_overall_recall": 0.584717607973422,
41
- "validation_person_f1": 0.7768115942028985,
42
- "validation_product_f1": 0.32432432432432434,
43
- "validation_runtime": 6.5668,
44
- "validation_samples_per_second": 153.651,
45
- "validation_steps_per_second": 9.746
46
  }
1
  {
2
+ "epoch": 14.0,
3
+ "test_corporation_f1": 0.3686274509803922,
4
+ "test_creative-work_f1": 0.41527446300715987,
5
+ "test_group_f1": 0.4113475177304965,
6
+ "test_location_f1": 0.6386946386946387,
7
+ "test_loss": 0.2739429473876953,
8
+ "test_overall_accuracy": 0.9499198834668608,
9
+ "test_overall_f1": 0.5508159175493844,
10
+ "test_overall_precision": 0.6154830454254638,
11
+ "test_overall_recall": 0.49844559585492226,
12
+ "test_person_f1": 0.6898263027295285,
13
+ "test_product_f1": 0.27042253521126763,
14
+ "test_runtime": 8.5968,
15
+ "test_samples_per_second": 149.707,
16
+ "test_steps_per_second": 9.422,
17
+ "total_flos": 1093917417133104.0,
18
+ "train_corporation_f1": 0.9368770764119602,
19
+ "train_creative-work_f1": 0.8805620608899297,
20
+ "train_group_f1": 0.9755469755469754,
21
+ "train_location_f1": 0.9788867562380038,
22
+ "train_loss": 0.012029974721372128,
23
+ "train_overall_accuracy": 0.9977088917909593,
24
+ "train_overall_f1": 0.9611130931145201,
25
+ "train_overall_precision": 0.9563365282215123,
26
+ "train_overall_recall": 0.9659376120473288,
27
+ "train_person_f1": 0.9842519685039369,
28
+ "train_product_f1": 0.8932461873638344,
29
+ "train_runtime": 16.2924,
30
+ "train_samples_per_second": 208.318,
31
+ "train_steps_per_second": 13.074,
32
+ "validation_corporation_f1": 0.30769230769230765,
33
+ "validation_creative-work_f1": 0.48421052631578954,
34
+ "validation_group_f1": 0.31666666666666665,
35
+ "validation_location_f1": 0.6808510638297872,
36
+ "validation_loss": 0.27115458250045776,
37
+ "validation_overall_accuracy": 0.9502437284508382,
38
+ "validation_overall_f1": 0.6339784946236559,
39
+ "validation_overall_precision": 0.6574487065120428,
40
+ "validation_overall_recall": 0.6121262458471761,
41
+ "validation_person_f1": 0.7734553775743707,
42
+ "validation_product_f1": 0.3986254295532646,
43
+ "validation_runtime": 5.0688,
44
+ "validation_samples_per_second": 199.06,
45
+ "validation_steps_per_second": 12.626
46
  }
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "/home/jupyter/bertweet-base",
3
  "architectures": [
4
  "RobertaForTokenClassification"
5
  ],
1
  {
2
+ "_name_or_path": "./bertweet-base_wnut17_ner_LP",
3
  "architectures": [
4
  "RobertaForTokenClassification"
5
  ],
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4f317fe36e0066121ed30bbfacd0922a679b5025542899fd1d264d7ee3755a9f
3
  size 537360049
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b9c36319539aecd20f2abd9ca4cc349453802d1a83aa9b6cf08ff2cd537a51f
3
  size 537360049
test_results.json CHANGED
@@ -1,17 +1,17 @@
1
  {
2
- "epoch": 15.0,
3
- "test_corporation_f1": 0.22680412371134023,
4
- "test_creative-work_f1": 0.3375796178343949,
5
- "test_group_f1": 0.3180722891566265,
6
- "test_location_f1": 0.5961538461538461,
7
- "test_loss": 0.27450963854789734,
8
- "test_overall_accuracy": 0.9475309541150765,
9
- "test_overall_f1": 0.5243938940436995,
10
- "test_overall_precision": 0.6208362863217576,
11
- "test_overall_recall": 0.4538860103626943,
12
- "test_person_f1": 0.6735751295336786,
13
- "test_product_f1": 0.1962264150943396,
14
- "test_runtime": 7.5404,
15
- "test_samples_per_second": 170.68,
16
- "test_steps_per_second": 10.742
17
  }
1
  {
2
+ "epoch": 14.0,
3
+ "test_corporation_f1": 0.3686274509803922,
4
+ "test_creative-work_f1": 0.41527446300715987,
5
+ "test_group_f1": 0.4113475177304965,
6
+ "test_location_f1": 0.6386946386946387,
7
+ "test_loss": 0.2739429473876953,
8
+ "test_overall_accuracy": 0.9499198834668608,
9
+ "test_overall_f1": 0.5508159175493844,
10
+ "test_overall_precision": 0.6154830454254638,
11
+ "test_overall_recall": 0.49844559585492226,
12
+ "test_person_f1": 0.6898263027295285,
13
+ "test_product_f1": 0.27042253521126763,
14
+ "test_runtime": 8.5968,
15
+ "test_samples_per_second": 149.707,
16
+ "test_steps_per_second": 9.422
17
  }
tokenizer_config.json CHANGED
@@ -1 +1 @@
1
- {"normalization": false, "bos_token": "<s>", "eos_token": "</s>", "sep_token": "</s>", "cls_token": "<s>", "unk_token": "<unk>", "pad_token": "<pad>", "mask_token": "<mask>", "special_tokens_map_file": null, "name_or_path": "/home/jupyter/bertweet-base", "tokenizer_class": "BertweetTokenizer"}
1
+ {"normalization": false, "bos_token": "<s>", "eos_token": "</s>", "sep_token": "</s>", "cls_token": "<s>", "unk_token": "<unk>", "pad_token": "<pad>", "mask_token": "<mask>", "special_tokens_map_file": null, "name_or_path": "./bertweet-base_wnut17_ner_LP", "tokenizer_class": "BertweetTokenizer"}
train_results.json CHANGED
@@ -1,17 +1,17 @@
1
  {
2
- "epoch": 15.0,
3
- "train_corporation_f1": 0.8200972447325771,
4
- "train_creative-work_f1": 0.8151898734177214,
5
- "train_group_f1": 0.8691358024691358,
6
- "train_location_f1": 0.9257142857142856,
7
- "train_loss": 0.03221079334616661,
8
- "train_overall_accuracy": 0.9946731734139803,
9
- "train_overall_f1": 0.8949512843224092,
10
- "train_overall_precision": 0.884453781512605,
11
- "train_overall_recall": 0.9057009680889208,
12
- "train_person_f1": 0.9489291598023064,
13
- "train_product_f1": 0.7822014051522247,
14
- "train_runtime": 13.2179,
15
- "train_samples_per_second": 256.773,
16
- "train_steps_per_second": 16.114
17
  }
1
  {
2
+ "epoch": 14.0,
3
+ "train_corporation_f1": 0.9368770764119602,
4
+ "train_creative-work_f1": 0.8805620608899297,
5
+ "train_group_f1": 0.9755469755469754,
6
+ "train_location_f1": 0.9788867562380038,
7
+ "train_loss": 0.012029974721372128,
8
+ "train_overall_accuracy": 0.9977088917909593,
9
+ "train_overall_f1": 0.9611130931145201,
10
+ "train_overall_precision": 0.9563365282215123,
11
+ "train_overall_recall": 0.9659376120473288,
12
+ "train_person_f1": 0.9842519685039369,
13
+ "train_product_f1": 0.8932461873638344,
14
+ "train_runtime": 16.2924,
15
+ "train_samples_per_second": 208.318,
16
+ "train_steps_per_second": 13.074
17
  }
trainer_state.json CHANGED
@@ -1,2215 +1,2071 @@
1
  {
2
- "best_metric": 0.2690572142601013,
3
- "best_model_checkpoint": "./fine_tune_bert_output/checkpoint-1065",
4
- "epoch": 15.0,
5
- "global_step": 3195,
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
9
  "log_history": [
10
  {
11
  "epoch": 0.0,
12
- "learning_rate": 1.999906103286385e-05,
13
- "loss": 2.6192,
14
  "step": 1
15
  },
16
  {
17
  "epoch": 0.05,
18
- "learning_rate": 1.99906103286385e-05,
19
- "loss": 1.6847,
20
  "step": 10
21
  },
22
  {
23
  "epoch": 0.09,
24
- "learning_rate": 1.9981220657276997e-05,
25
- "loss": 0.6569,
26
  "step": 20
27
  },
28
  {
29
  "epoch": 0.14,
30
- "learning_rate": 1.9971830985915494e-05,
31
- "loss": 0.3745,
32
  "step": 30
33
  },
34
  {
35
  "epoch": 0.19,
36
- "learning_rate": 1.9962441314553992e-05,
37
- "loss": 0.3366,
38
  "step": 40
39
  },
40
  {
41
  "epoch": 0.23,
42
- "learning_rate": 1.995305164319249e-05,
43
- "loss": 0.3634,
44
  "step": 50
45
  },
46
  {
47
  "epoch": 0.28,
48
- "learning_rate": 1.9943661971830987e-05,
49
- "loss": 0.331,
50
  "step": 60
51
  },
52
  {
53
  "epoch": 0.33,
54
- "learning_rate": 1.9934272300469485e-05,
55
- "loss": 0.3005,
56
  "step": 70
57
  },
58
  {
59
  "epoch": 0.38,
60
- "learning_rate": 1.9924882629107982e-05,
61
- "loss": 0.3371,
62
  "step": 80
63
  },
64
  {
65
  "epoch": 0.42,
66
- "learning_rate": 1.991549295774648e-05,
67
- "loss": 0.3082,
68
  "step": 90
69
  },
70
  {
71
  "epoch": 0.47,
72
- "learning_rate": 1.9906103286384977e-05,
73
- "loss": 0.3887,
74
  "step": 100
75
  },
76
  {
77
  "epoch": 0.52,
78
- "learning_rate": 1.9896713615023475e-05,
79
- "loss": 0.2901,
80
  "step": 110
81
  },
82
  {
83
  "epoch": 0.56,
84
- "learning_rate": 1.9887323943661973e-05,
85
- "loss": 0.3117,
86
  "step": 120
87
  },
88
  {
89
  "epoch": 0.61,
90
- "learning_rate": 1.9877934272300474e-05,
91
- "loss": 0.3219,
92
  "step": 130
93
  },
94
  {
95
  "epoch": 0.66,
96
- "learning_rate": 1.9868544600938968e-05,
97
- "loss": 0.3424,
98
  "step": 140
99
  },
100
  {
101
  "epoch": 0.7,
102
- "learning_rate": 1.9859154929577465e-05,
103
- "loss": 0.3169,
104
  "step": 150
105
  },
106
  {
107
  "epoch": 0.75,
108
- "learning_rate": 1.9849765258215966e-05,
109
- "loss": 0.3056,
110
  "step": 160
111
  },
112
  {
113
  "epoch": 0.8,
114
- "learning_rate": 1.984037558685446e-05,
115
- "loss": 0.298,
116
  "step": 170
117
  },
118
  {
119
  "epoch": 0.85,
120
- "learning_rate": 1.9830985915492958e-05,
121
- "loss": 0.2622,
122
  "step": 180
123
  },
124
  {
125
  "epoch": 0.89,
126
- "learning_rate": 1.982159624413146e-05,
127
- "loss": 0.2267,
128
  "step": 190
129
  },
130
  {
131
  "epoch": 0.94,
132
- "learning_rate": 1.9812206572769953e-05,
133
- "loss": 0.2567,
134
  "step": 200
135
  },
136
  {
137
  "epoch": 0.99,
138
- "learning_rate": 1.9802816901408454e-05,
139
- "loss": 0.2691,
140
  "step": 210
141
  },
142
  {
143
  "epoch": 1.0,
144
- "eval_corporation_f1": 0.0,
145
- "eval_creative-work_f1": 0.0,
146
- "eval_group_f1": 0.0,
147
- "eval_location_f1": 0.0,
148
- "eval_loss": 0.4035243093967438,
149
- "eval_overall_accuracy": 0.8979312804660563,
150
- "eval_overall_f1": 0.0,
151
- "eval_overall_precision": 0.0,
152
- "eval_overall_recall": 0.0,
153
- "eval_person_f1": 0.0,
154
- "eval_product_f1": 0.0,
155
- "eval_runtime": 5.2489,
156
- "eval_samples_per_second": 192.229,
157
- "eval_steps_per_second": 12.193,
158
  "step": 213
159
  },
160
  {
161
  "epoch": 1.03,
162
- "learning_rate": 1.9793427230046952e-05,
163
- "loss": 0.2481,
164
  "step": 220
165
  },
166
  {
167
  "epoch": 1.08,
168
- "learning_rate": 1.9784037558685446e-05,
169
- "loss": 0.2141,
170
  "step": 230
171
  },
172
  {
173
  "epoch": 1.13,
174
- "learning_rate": 1.9774647887323947e-05,
175
- "loss": 0.201,
176
  "step": 240
177
  },
178
  {
179
  "epoch": 1.17,
180
- "learning_rate": 1.9765258215962445e-05,
181
- "loss": 0.211,
182
  "step": 250
183
  },
184
  {
185
  "epoch": 1.22,
186
- "learning_rate": 1.975586854460094e-05,
187
- "loss": 0.2825,
188
  "step": 260
189
  },
190
  {
191
  "epoch": 1.27,
192
- "learning_rate": 1.974647887323944e-05,
193
- "loss": 0.1779,
194
  "step": 270
195
  },
196
  {
197
  "epoch": 1.31,
198
- "learning_rate": 1.9737089201877937e-05,
199
- "loss": 0.1794,
200
  "step": 280
201
  },
202
  {
203
  "epoch": 1.36,
204
- "learning_rate": 1.972769953051643e-05,
205
- "loss": 0.1838,
206
  "step": 290
207
  },
208
  {
209
  "epoch": 1.41,
210
- "learning_rate": 1.9718309859154933e-05,
211
- "loss": 0.1716,
212
  "step": 300
213
  },
214
  {
215
  "epoch": 1.46,
216
- "learning_rate": 1.9708920187793427e-05,
217
- "loss": 0.215,
218
  "step": 310
219
  },
220
  {
221
  "epoch": 1.5,
222
- "learning_rate": 1.9699530516431928e-05,
223
- "loss": 0.1581,
224
  "step": 320
225
  },
226
  {
227
  "epoch": 1.55,
228
- "learning_rate": 1.9690140845070425e-05,
229
- "loss": 0.2208,
230
  "step": 330
231
  },
232
  {
233
  "epoch": 1.6,
234
- "learning_rate": 1.968075117370892e-05,
235
- "loss": 0.176,
236
  "step": 340
237
  },
238
  {
239
  "epoch": 1.64,
240
- "learning_rate": 1.967136150234742e-05,
241
- "loss": 0.1615,
242
  "step": 350
243
  },
244
  {
245
  "epoch": 1.69,
246
- "learning_rate": 1.9661971830985918e-05,
247
- "loss": 0.1776,
248
  "step": 360
249
  },
250
  {
251
  "epoch": 1.74,
252
- "learning_rate": 1.9652582159624412e-05,
253
- "loss": 0.181,
254
  "step": 370
255
  },
256
  {
257
  "epoch": 1.78,
258
- "learning_rate": 1.9643192488262913e-05,
259
- "loss": 0.1619,
260
  "step": 380
261
  },
262
  {
263
  "epoch": 1.83,
264
- "learning_rate": 1.963380281690141e-05,
265
- "loss": 0.1635,
266
  "step": 390
267
  },
268
  {
269
  "epoch": 1.88,
270
- "learning_rate": 1.962441314553991e-05,
271
- "loss": 0.1613,
272
  "step": 400
273
  },
274
  {
275
  "epoch": 1.92,
276
- "learning_rate": 1.9615023474178406e-05,
277
- "loss": 0.1767,
278
  "step": 410
279
  },
280
  {
281
  "epoch": 1.97,
282
- "learning_rate": 1.9605633802816904e-05,
283
- "loss": 0.1604,
284
  "step": 420
285
  },
286
  {
287
  "epoch": 2.0,
288
- "eval_corporation_f1": 0.0,
289
- "eval_creative-work_f1": 0.0,
290
- "eval_group_f1": 0.0,
291
- "eval_location_f1": 0.35341365461847396,
292
- "eval_loss": 0.3053900897502899,
293
- "eval_overall_accuracy": 0.9324099393651171,
294
- "eval_overall_f1": 0.4997506234413965,
295
- "eval_overall_precision": 0.6254681647940075,
296
- "eval_overall_recall": 0.4161129568106312,
297
- "eval_person_f1": 0.6877351392024078,
298
- "eval_product_f1": 0.0,
299
- "eval_runtime": 5.2536,
300
- "eval_samples_per_second": 192.06,
301
- "eval_steps_per_second": 12.182,
302
  "step": 426
303
  },
304
  {
305
  "epoch": 2.02,
306
- "learning_rate": 1.95962441314554e-05,
307
- "loss": 0.1743,
308
  "step": 430
309
  },
310
  {
311
  "epoch": 2.07,
312
- "learning_rate": 1.95868544600939e-05,
313
- "loss": 0.1375,
314
  "step": 440
315
  },
316
  {
317
  "epoch": 2.11,
318
- "learning_rate": 1.9577464788732396e-05,
319
- "loss": 0.1216,
320
  "step": 450
321
  },
322
  {
323
  "epoch": 2.16,
324
- "learning_rate": 1.9568075117370894e-05,
325
- "loss": 0.1184,
326
  "step": 460
327
  },
328
  {
329
  "epoch": 2.21,
330
- "learning_rate": 1.955868544600939e-05,
331
- "loss": 0.1318,
332
  "step": 470
333
  },
334
  {
335
  "epoch": 2.25,
336
- "learning_rate": 1.954929577464789e-05,
337
- "loss": 0.1388,
338
  "step": 480
339
  },
340
  {
341
  "epoch": 2.3,
342
- "learning_rate": 1.9539906103286387e-05,
343
- "loss": 0.0979,
344
  "step": 490
345
  },
346
  {
347
  "epoch": 2.35,
348
- "learning_rate": 1.9530516431924884e-05,
349
- "loss": 0.1566,
350
  "step": 500
351
  },
352
  {
353
  "epoch": 2.39,
354
- "learning_rate": 1.9521126760563382e-05,
355
- "loss": 0.112,
356
  "step": 510
357
  },
358
  {
359
  "epoch": 2.44,
360
- "learning_rate": 1.951173708920188e-05,
361
- "loss": 0.1084,
362
  "step": 520
363
  },
364
  {
365
  "epoch": 2.49,
366
- "learning_rate": 1.9502347417840377e-05,
367
- "loss": 0.1319,
368
  "step": 530
369
  },
370
  {
371
  "epoch": 2.54,
372
- "learning_rate": 1.9492957746478875e-05,
373
- "loss": 0.0861,
374
  "step": 540
375
  },
376
  {
377
  "epoch": 2.58,
378
- "learning_rate": 1.9483568075117372e-05,
379
- "loss": 0.078,
380
  "step": 550
381
  },
382
  {
383
  "epoch": 2.63,
384
- "learning_rate": 1.947417840375587e-05,
385
- "loss": 0.1164,
386
  "step": 560
387
  },
388
  {
389
  "epoch": 2.68,
390
- "learning_rate": 1.9464788732394367e-05,
391
- "loss": 0.1113,
392
  "step": 570
393
  },
394
  {
395
  "epoch": 2.72,
396
- "learning_rate": 1.9455399061032865e-05,
397
- "loss": 0.1429,
398
  "step": 580
399
  },
400
  {
401
  "epoch": 2.77,
402
- "learning_rate": 1.9446009389671362e-05,
403
- "loss": 0.0821,
404
  "step": 590
405
  },
406
  {
407
  "epoch": 2.82,
408
- "learning_rate": 1.943661971830986e-05,
409
- "loss": 0.1092,
410
  "step": 600
411
  },
412
  {
413
  "epoch": 2.86,
414
- "learning_rate": 1.9427230046948358e-05,
415
- "loss": 0.088,
416
  "step": 610
417
  },
418
  {
419
  "epoch": 2.91,
420
- "learning_rate": 1.9417840375586855e-05,
421
- "loss": 0.0994,
422
  "step": 620
423
  },
424
  {
425
  "epoch": 2.96,
426
- "learning_rate": 1.9408450704225356e-05,
427
- "loss": 0.1118,
428
  "step": 630
429
  },
430
  {
431
  "epoch": 3.0,
432
- "eval_corporation_f1": 0.19607843137254902,
433
- "eval_creative-work_f1": 0.11640211640211641,
434
- "eval_group_f1": 0.15384615384615383,
435
- "eval_location_f1": 0.5803108808290156,
436
- "eval_loss": 0.2864352762699127,
437
- "eval_overall_accuracy": 0.9404351444536916,
438
- "eval_overall_f1": 0.5469667318982387,
439
- "eval_overall_precision": 0.6654761904761904,
440
- "eval_overall_recall": 0.4642857142857143,
441
- "eval_person_f1": 0.7220902612826603,
442
- "eval_product_f1": 0.18652849740932645,
443
- "eval_runtime": 6.5005,
444
- "eval_samples_per_second": 155.22,
445
- "eval_steps_per_second": 9.845,
446
  "step": 639
447
  },
448
  {
449
  "epoch": 3.0,
450
- "learning_rate": 1.939906103286385e-05,
451
- "loss": 0.0867,
452
  "step": 640
453
  },
454
  {
455
  "epoch": 3.05,
456
- "learning_rate": 1.9389671361502348e-05,
457
- "loss": 0.0856,
458
  "step": 650
459
  },
460
  {
461
  "epoch": 3.1,
462
- "learning_rate": 1.938028169014085e-05,
463
- "loss": 0.0787,
464
  "step": 660
465
  },
466
  {
467
  "epoch": 3.15,
468
- "learning_rate": 1.9370892018779343e-05,
469
- "loss": 0.1159,
470
  "step": 670
471
  },
472
  {
473
  "epoch": 3.19,
474
- "learning_rate": 1.936150234741784e-05,
475
- "loss": 0.0728,
476
  "step": 680
477
  },
478
  {
479
  "epoch": 3.24,
480
- "learning_rate": 1.935211267605634e-05,
481
- "loss": 0.0641,
482
  "step": 690
483
  },
484
  {
485
  "epoch": 3.29,
486
- "learning_rate": 1.9342723004694836e-05,
487
- "loss": 0.0787,
488
  "step": 700
489
  },
490
  {
491
  "epoch": 3.33,
492
- "learning_rate": 1.9333333333333333e-05,
493
- "loss": 0.0639,
494
  "step": 710
495
  },
496
  {
497
  "epoch": 3.38,
498
- "learning_rate": 1.9323943661971834e-05,
499
- "loss": 0.0604,
500
  "step": 720
501
  },
502
  {
503
  "epoch": 3.43,
504
- "learning_rate": 1.931455399061033e-05,
505
- "loss": 0.069,
506
  "step": 730
507
  },
508
  {
509
  "epoch": 3.47,
510
- "learning_rate": 1.930516431924883e-05,
511
- "loss": 0.066,
512
  "step": 740
513
  },
514
  {
515
  "epoch": 3.52,
516
- "learning_rate": 1.9295774647887327e-05,
517
- "loss": 0.0645,
518
  "step": 750
519
  },
520
  {
521
  "epoch": 3.57,
522
- "learning_rate": 1.928638497652582e-05,
523
- "loss": 0.0656,
524
  "step": 760
525
  },
526
  {
527
  "epoch": 3.62,
528
- "learning_rate": 1.9276995305164322e-05,
529
- "loss": 0.0598,
530
  "step": 770
531
  },
532
  {
533
  "epoch": 3.66,
534
- "learning_rate": 1.926760563380282e-05,
535
- "loss": 0.0791,
536
  "step": 780
537
  },
538
  {
539
  "epoch": 3.71,
540
- "learning_rate": 1.9258215962441314e-05,
541
- "loss": 0.0723,
542
  "step": 790
543
  },
544
  {
545
  "epoch": 3.76,
546
- "learning_rate": 1.9248826291079815e-05,
547
- "loss": 0.0898,
548
  "step": 800
549
  },
550
  {
551
  "epoch": 3.8,
552
- "learning_rate": 1.9239436619718313e-05,
553
- "loss": 0.0543,
554
  "step": 810
555
  },
556
  {
557
  "epoch": 3.85,
558
- "learning_rate": 1.923004694835681e-05,
559
- "loss": 0.0797,
560
  "step": 820
561
  },
562
  {
563
  "epoch": 3.9,
564
- "learning_rate": 1.9220657276995308e-05,
565
- "loss": 0.0749,
566
  "step": 830
567
  },
568
  {
569
  "epoch": 3.94,
570
- "learning_rate": 1.9211267605633805e-05,
571
- "loss": 0.0547,
572
  "step": 840
573
  },
574
  {
575
  "epoch": 3.99,
576
- "learning_rate": 1.9201877934272303e-05,
577
- "loss": 0.0524,
578
  "step": 850
579
  },
580
  {
581
  "epoch": 4.0,
582
- "eval_corporation_f1": 0.20168067226890757,
583
- "eval_creative-work_f1": 0.32727272727272727,
584
- "eval_group_f1": 0.24719101123595505,
585
- "eval_location_f1": 0.6521739130434783,
586
- "eval_loss": 0.289055734872818,
587
- "eval_overall_accuracy": 0.9442396861253121,
588
- "eval_overall_f1": 0.5842155919153033,
589
- "eval_overall_precision": 0.6945080091533181,
590
- "eval_overall_recall": 0.5041528239202658,
591
- "eval_person_f1": 0.7365892714171338,
592
- "eval_product_f1": 0.25806451612903225,
593
- "eval_runtime": 5.2651,
594
- "eval_samples_per_second": 191.641,
595
- "eval_steps_per_second": 12.156,
596
  "step": 852
597
  },
598
  {
599
  "epoch": 4.04,
600
- "learning_rate": 1.91924882629108e-05,
601
- "loss": 0.0596,
602
  "step": 860
603
  },
604
  {
605
  "epoch": 4.08,
606
- "learning_rate": 1.9183098591549298e-05,
607
- "loss": 0.0523,
608
  "step": 870
609
  },
610
  {
611
  "epoch": 4.13,
612
- "learning_rate": 1.9173708920187796e-05,
613
- "loss": 0.0608,
614
  "step": 880
615
  },
616
  {
617
  "epoch": 4.18,
618
- "learning_rate": 1.9164319248826293e-05,
619
- "loss": 0.0512,
620
  "step": 890
621
  },
622
  {
623
  "epoch": 4.23,
624
- "learning_rate": 1.9154929577464788e-05,
625
- "loss": 0.0434,
626
  "step": 900
627
  },
628
  {
629
  "epoch": 4.27,
630
- "learning_rate": 1.914553990610329e-05,
631
- "loss": 0.046,
632
  "step": 910
633
  },
634
  {
635
  "epoch": 4.32,
636
- "learning_rate": 1.9136150234741786e-05,
637
- "loss": 0.0677,
638
  "step": 920
639
  },
640
  {
641
  "epoch": 4.37,
642
- "learning_rate": 1.9126760563380284e-05,
643
- "loss": 0.052,
644
  "step": 930
645
  },
646
  {
647
  "epoch": 4.41,
648
- "learning_rate": 1.911737089201878e-05,
649
- "loss": 0.04,
650
  "step": 940
651
  },
652
  {
653
  "epoch": 4.46,
654
- "learning_rate": 1.910798122065728e-05,
655
- "loss": 0.0299,
656
  "step": 950
657
  },
658
  {
659
  "epoch": 4.51,
660
- "learning_rate": 1.9098591549295776e-05,
661
- "loss": 0.0736,
662
  "step": 960
663
  },
664
  {
665
  "epoch": 4.55,
666
- "learning_rate": 1.9089201877934274e-05,
667
- "loss": 0.0628,
668
  "step": 970
669
  },
670
  {
671
  "epoch": 4.6,
672
- "learning_rate": 1.907981220657277e-05,
673
- "loss": 0.0373,
674
  "step": 980
675
  },
676
  {
677
  "epoch": 4.65,
678
- "learning_rate": 1.907042253521127e-05,
679
- "loss": 0.0488,
680
  "step": 990
681
  },
682
  {
683
  "epoch": 4.69,
684
- "learning_rate": 1.9061032863849767e-05,
685
- "loss": 0.0612,
686
  "step": 1000
687
  },
688
  {
689
  "epoch": 4.74,
690
- "learning_rate": 1.9051643192488264e-05,
691
- "loss": 0.0329,
692
  "step": 1010
693
  },
694
  {
695
  "epoch": 4.79,
696
- "learning_rate": 1.9042253521126762e-05,
697
- "loss": 0.0408,
698
  "step": 1020
699
  },
700
  {
701
  "epoch": 4.84,
702
- "learning_rate": 1.903286384976526e-05,
703
- "loss": 0.0398,
704
  "step": 1030
705
  },
706
  {
707
  "epoch": 4.88,
708
- "learning_rate": 1.9023474178403757e-05,
709
- "loss": 0.0518,
710
  "step": 1040
711
  },
712
  {
713
  "epoch": 4.93,
714
- "learning_rate": 1.9014084507042255e-05,
715
- "loss": 0.0479,
716
  "step": 1050
717
  },
718
  {
719
  "epoch": 4.98,
720
- "learning_rate": 1.9004694835680752e-05,
721
- "loss": 0.0446,
722
  "step": 1060
723
  },
724
  {
725
  "epoch": 5.0,
726
- "eval_corporation_f1": 0.2736842105263158,
727
- "eval_creative-work_f1": 0.34146341463414637,
728
- "eval_group_f1": 0.3006535947712418,
729
- "eval_location_f1": 0.6703296703296703,
730
- "eval_loss": 0.2690572142601013,
731
- "eval_overall_accuracy": 0.9486386874331233,
732
- "eval_overall_f1": 0.629414394278051,
733
- "eval_overall_precision": 0.6815101645692159,
734
- "eval_overall_recall": 0.584717607973422,
735
- "eval_person_f1": 0.7768115942028985,
736
- "eval_product_f1": 0.32432432432432434,
737
- "eval_runtime": 6.4238,
738
- "eval_samples_per_second": 157.073,
739
- "eval_steps_per_second": 9.963,
740
  "step": 1065
741
  },
742
  {
743
  "epoch": 5.02,
744
- "learning_rate": 1.899530516431925e-05,
745
- "loss": 0.0301,
746
  "step": 1070
747
  },
748
  {
749
  "epoch": 5.07,
750
- "learning_rate": 1.8985915492957747e-05,
751
- "loss": 0.0341,
752
  "step": 1080
753
  },
754
  {
755
  "epoch": 5.12,
756
- "learning_rate": 1.8976525821596245e-05,
757
- "loss": 0.032,
758
  "step": 1090
759
  },
760
  {
761
  "epoch": 5.16,
762
- "learning_rate": 1.8967136150234743e-05,
763
- "loss": 0.0359,
764
  "step": 1100
765
  },
766
  {
767
  "epoch": 5.21,
768
- "learning_rate": 1.895774647887324e-05,
769
- "loss": 0.046,
770
  "step": 1110
771
  },
772
  {
773
  "epoch": 5.26,
774
- "learning_rate": 1.8948356807511738e-05,
775
- "loss": 0.0413,
776
  "step": 1120
777
  },
778
  {
779
  "epoch": 5.31,
780
- "learning_rate": 1.8938967136150235e-05,
781
- "loss": 0.0235,
782
  "step": 1130
783
  },
784
  {
785
  "epoch": 5.35,
786
- "learning_rate": 1.8929577464788733e-05,
787
- "loss": 0.0388,
788
  "step": 1140
789
  },
790
  {
791
  "epoch": 5.4,
792
- "learning_rate": 1.892018779342723e-05,
793
- "loss": 0.0409,
794
  "step": 1150
795
  },
796
  {
797
  "epoch": 5.45,
798
- "learning_rate": 1.891079812206573e-05,
799
- "loss": 0.0517,
800
  "step": 1160
801
  },
802
  {
803
  "epoch": 5.49,
804
- "learning_rate": 1.8901408450704226e-05,
805
- "loss": 0.029,
806
  "step": 1170
807
  },
808
  {
809
  "epoch": 5.54,
810
- "learning_rate": 1.8892018779342723e-05,
811
- "loss": 0.0248,
812
  "step": 1180
813
  },
814
  {
815
  "epoch": 5.59,
816
- "learning_rate": 1.8882629107981224e-05,
817
- "loss": 0.0347,
818
  "step": 1190
819
  },
820
  {
821
  "epoch": 5.63,
822
- "learning_rate": 1.887323943661972e-05,
823
- "loss": 0.0392,
824
  "step": 1200
825
  },
826
  {
827
  "epoch": 5.68,
828
- "learning_rate": 1.8863849765258216e-05,
829
- "loss": 0.0308,
830
  "step": 1210
831
  },
832
  {
833
  "epoch": 5.73,
834
- "learning_rate": 1.8854460093896717e-05,
835
- "loss": 0.0534,
836
  "step": 1220
837
  },
838
  {
839
  "epoch": 5.77,
840
- "learning_rate": 1.884507042253521e-05,
841
- "loss": 0.0555,
842
  "step": 1230
843
  },
844
  {
845
  "epoch": 5.82,
846
- "learning_rate": 1.8835680751173712e-05,
847
- "loss": 0.0461,
848
  "step": 1240
849
  },
850
  {
851
  "epoch": 5.87,
852
- "learning_rate": 1.882629107981221e-05,
853
- "loss": 0.0424,
854
  "step": 1250
855
  },
856
  {
857
  "epoch": 5.92,
858
- "learning_rate": 1.8816901408450704e-05,
859
- "loss": 0.0286,
860
  "step": 1260
861
  },
862
  {
863
  "epoch": 5.96,
864
- "learning_rate": 1.8807511737089205e-05,
865
- "loss": 0.0296,
866
  "step": 1270
867
  },
868
  {
869
  "epoch": 6.0,
870
- "eval_corporation_f1": 0.3064516129032258,
871
- "eval_creative-work_f1": 0.37656903765690375,
872
- "eval_group_f1": 0.33333333333333337,
873
- "eval_location_f1": 0.7,
874
- "eval_loss": 0.27394580841064453,
875
- "eval_overall_accuracy": 0.9479253358696944,
876
- "eval_overall_f1": 0.6125962845491617,
877
- "eval_overall_precision": 0.6739780658025922,
878
- "eval_overall_recall": 0.5614617940199336,
879
- "eval_person_f1": 0.7581803671189147,
880
- "eval_product_f1": 0.3471698113207547,
881
- "eval_runtime": 6.7404,
882
- "eval_samples_per_second": 149.695,
883
- "eval_steps_per_second": 9.495,
884
  "step": 1278
885
  },
886
  {
887
  "epoch": 6.01,
888
- "learning_rate": 1.8798122065727702e-05,
889
- "loss": 0.0288,
890
  "step": 1280
891
  },
892
  {
893
  "epoch": 6.06,
894
- "learning_rate": 1.8788732394366197e-05,
895
- "loss": 0.0235,
896
  "step": 1290
897
  },
898
  {
899
  "epoch": 6.1,
900
- "learning_rate": 1.8779342723004698e-05,
901
- "loss": 0.0211,
902
  "step": 1300
903
  },
904
  {
905
  "epoch": 6.15,
906
- "learning_rate": 1.8769953051643195e-05,
907
- "loss": 0.0359,
908
  "step": 1310
909
  },
910
  {
911
  "epoch": 6.2,
912
- "learning_rate": 1.876056338028169e-05,
913
- "loss": 0.0264,
914
  "step": 1320
915
  },
916
  {
917
  "epoch": 6.24,
918
- "learning_rate": 1.875117370892019e-05,
919
- "loss": 0.0288,
920
  "step": 1330
921
  },
922
  {
923
  "epoch": 6.29,
924
- "learning_rate": 1.8741784037558688e-05,
925
- "loss": 0.0328,
926
  "step": 1340
927
  },
928
  {
929
  "epoch": 6.34,
930
- "learning_rate": 1.8732394366197186e-05,
931
- "loss": 0.0402,
932
  "step": 1350
933
  },
934
  {
935
  "epoch": 6.38,
936
- "learning_rate": 1.8723004694835683e-05,
937
- "loss": 0.0216,
938
  "step": 1360
939
  },
940
  {
941
  "epoch": 6.43,
942
- "learning_rate": 1.871361502347418e-05,
943
- "loss": 0.0288,
944
  "step": 1370
945
  },
946
  {
947
  "epoch": 6.48,
948
- "learning_rate": 1.870422535211268e-05,
949
- "loss": 0.0204,
950
  "step": 1380
951
  },
952
  {
953
  "epoch": 6.53,
954
- "learning_rate": 1.8694835680751176e-05,
955
- "loss": 0.0402,
956
  "step": 1390
957
  },
958
  {
959
  "epoch": 6.57,
960
- "learning_rate": 1.8685446009389673e-05,
961
- "loss": 0.0274,
962
  "step": 1400
963
  },
964
  {
965
  "epoch": 6.62,
966
- "learning_rate": 1.867605633802817e-05,
967
- "loss": 0.0393,
968
  "step": 1410
969
  },
970
  {
971
  "epoch": 6.67,
972
- "learning_rate": 1.866666666666667e-05,
973
- "loss": 0.0237,
974
  "step": 1420
975
  },
976
  {
977
  "epoch": 6.71,
978
- "learning_rate": 1.8657276995305166e-05,
979
- "loss": 0.0262,
980
  "step": 1430
981
  },
982
  {
983
  "epoch": 6.76,
984
- "learning_rate": 1.8647887323943664e-05,
985
- "loss": 0.0226,
986
  "step": 1440
987
  },
988
  {
989
  "epoch": 6.81,
990
- "learning_rate": 1.863849765258216e-05,
991
- "loss": 0.0334,
992
  "step": 1450
993
  },
994
  {
995
  "epoch": 6.85,
996
- "learning_rate": 1.862910798122066e-05,
997
- "loss": 0.0451,
998
  "step": 1460
999
  },
1000
  {
1001
  "epoch": 6.9,
1002
- "learning_rate": 1.8619718309859157e-05,
1003
- "loss": 0.0134,
1004
  "step": 1470
1005
  },
1006
  {
1007
  "epoch": 6.95,
1008
- "learning_rate": 1.8610328638497654e-05,
1009
- "loss": 0.0252,
1010
  "step": 1480
1011
  },
1012
  {
1013
  "epoch": 7.0,
1014
- "learning_rate": 1.8600938967136152e-05,
1015
- "loss": 0.0261,
1016
  "step": 1490
1017
  },
1018
  {
1019
  "epoch": 7.0,
1020
- "eval_corporation_f1": 0.22916666666666666,
1021
- "eval_creative-work_f1": 0.33497536945812806,
1022
- "eval_group_f1": 0.304,
1023
- "eval_location_f1": 0.6368715083798884,
1024
- "eval_loss": 0.3150090277194977,
1025
- "eval_overall_accuracy": 0.9456663892521697,
1026
- "eval_overall_f1": 0.6070763500931099,
1027
- "eval_overall_precision": 0.690677966101695,
1028
- "eval_overall_recall": 0.5415282392026578,
1029
- "eval_person_f1": 0.7547456340167047,
1030
- "eval_product_f1": 0.29824561403508776,
1031
- "eval_runtime": 6.8397,
1032
- "eval_samples_per_second": 147.521,
1033
- "eval_steps_per_second": 9.357,
1034
  "step": 1491
1035
  },
1036
  {
1037
  "epoch": 7.04,
1038
- "learning_rate": 1.859154929577465e-05,
1039
- "loss": 0.0539,
1040
  "step": 1500
1041
  },
1042
  {
1043
  "epoch": 7.09,
1044
- "learning_rate": 1.8582159624413147e-05,
1045
- "loss": 0.0173,
1046
  "step": 1510
1047
  },
1048
  {
1049
  "epoch": 7.14,
1050
- "learning_rate": 1.8572769953051644e-05,
1051
- "loss": 0.0261,
1052
  "step": 1520
1053
  },
1054
  {
1055
  "epoch": 7.18,
1056
- "learning_rate": 1.8563380281690142e-05,
1057
- "loss": 0.0199,
1058
  "step": 1530
1059
  },
1060
  {
1061
  "epoch": 7.23,
1062
- "learning_rate": 1.855399061032864e-05,
1063
- "loss": 0.0233,
1064
  "step": 1540
1065
  },
1066
  {
1067
  "epoch": 7.28,
1068
- "learning_rate": 1.8544600938967137e-05,
1069
- "loss": 0.024,
1070
  "step": 1550
1071
  },
1072
  {
1073
  "epoch": 7.32,
1074
- "learning_rate": 1.8535211267605635e-05,
1075
- "loss": 0.0141,
1076
  "step": 1560
1077
  },
1078
  {
1079
  "epoch": 7.37,
1080
- "learning_rate": 1.8525821596244132e-05,
1081
- "loss": 0.0257,
1082
  "step": 1570
1083
  },
1084
  {
1085
  "epoch": 7.42,
1086
- "learning_rate": 1.851643192488263e-05,
1087
- "loss": 0.0249,
1088
  "step": 1580
1089
  },
1090
  {
1091
  "epoch": 7.46,
1092
- "learning_rate": 1.8507042253521128e-05,
1093
- "loss": 0.0244,
1094
  "step": 1590
1095
  },
1096
  {
1097
  "epoch": 7.51,
1098
- "learning_rate": 1.8497652582159625e-05,
1099
- "loss": 0.0164,
1100
  "step": 1600
1101
  },
1102
  {
1103
  "epoch": 7.56,
1104
- "learning_rate": 1.8488262910798123e-05,
1105
- "loss": 0.0161,
1106
  "step": 1610
1107
  },
1108
  {
1109
  "epoch": 7.61,
1110
- "learning_rate": 1.847887323943662e-05,
1111
- "loss": 0.0387,
1112
  "step": 1620
1113
  },
1114
  {
1115
  "epoch": 7.65,
1116
- "learning_rate": 1.8469483568075118e-05,
1117
- "loss": 0.0315,
1118
  "step": 1630
1119
  },
1120
  {
1121
  "epoch": 7.7,
1122
- "learning_rate": 1.8460093896713615e-05,
1123
- "loss": 0.0232,
1124
  "step": 1640
1125
  },
1126
  {
1127
  "epoch": 7.75,
1128
- "learning_rate": 1.8450704225352113e-05,
1129
- "loss": 0.0337,
1130
  "step": 1650
1131
  },
1132
  {
1133
  "epoch": 7.79,
1134
- "learning_rate": 1.8441314553990614e-05,
1135
- "loss": 0.0195,
1136
  "step": 1660
1137
  },
1138
  {
1139
  "epoch": 7.84,
1140
- "learning_rate": 1.8431924882629108e-05,
1141
- "loss": 0.0155,
1142
  "step": 1670
1143
  },
1144
  {
1145
  "epoch": 7.89,
1146
- "learning_rate": 1.8422535211267606e-05,
1147
- "loss": 0.0338,
1148
  "step": 1680
1149
  },
1150
  {
1151
  "epoch": 7.93,
1152
- "learning_rate": 1.8413145539906107e-05,
1153
- "loss": 0.0211,
1154
  "step": 1690
1155
  },
1156
  {
1157
  "epoch": 7.98,
1158
- "learning_rate": 1.84037558685446e-05,
1159
- "loss": 0.0193,
1160
  "step": 1700
1161
  },
1162
  {
1163
  "epoch": 8.0,
1164
- "eval_corporation_f1": 0.288659793814433,
1165
- "eval_creative-work_f1": 0.36206896551724144,
1166
- "eval_group_f1": 0.3676470588235294,
1167
- "eval_location_f1": 0.7474747474747475,
1168
- "eval_loss": 0.292193740606308,
1169
- "eval_overall_accuracy": 0.9496492688146475,
1170
- "eval_overall_f1": 0.6309577848388561,
1171
- "eval_overall_precision": 0.6956956956956957,
1172
- "eval_overall_recall": 0.5772425249169435,
1173
- "eval_person_f1": 0.7644726407613005,
1174
- "eval_product_f1": 0.4157706093189964,
1175
- "eval_runtime": 6.4302,
1176
- "eval_samples_per_second": 156.916,
1177
- "eval_steps_per_second": 9.953,
1178
  "step": 1704
1179
  },
1180
  {
1181
  "epoch": 8.03,
1182
- "learning_rate": 1.83943661971831e-05,
1183
- "loss": 0.014,
1184
  "step": 1710
1185
  },
1186
  {
1187
  "epoch": 8.08,
1188
- "learning_rate": 1.83849765258216e-05,
1189
- "loss": 0.011,
1190
  "step": 1720
1191
  },
1192
  {
1193
  "epoch": 8.12,
1194
- "learning_rate": 1.8375586854460094e-05,
1195
- "loss": 0.0326,
1196
  "step": 1730
1197
  },
1198
  {
1199
  "epoch": 8.17,
1200
- "learning_rate": 1.836619718309859e-05,
1201
- "loss": 0.0183,
1202
  "step": 1740
1203
  },
1204
  {
1205
  "epoch": 8.22,
1206
- "learning_rate": 1.8356807511737092e-05,
1207
- "loss": 0.0117,
1208
  "step": 1750
1209
  },
1210
  {
1211
  "epoch": 8.26,
1212
- "learning_rate": 1.8347417840375586e-05,
1213
- "loss": 0.0159,
1214
  "step": 1760
1215
  },
1216
  {
1217
  "epoch": 8.31,
1218
- "learning_rate": 1.8338028169014087e-05,
1219
- "loss": 0.0347,
1220
  "step": 1770
1221
  },
1222
  {
1223
  "epoch": 8.36,
1224
- "learning_rate": 1.8328638497652585e-05,
1225
- "loss": 0.0177,
1226
  "step": 1780
1227
  },
1228
  {
1229
  "epoch": 8.4,
1230
- "learning_rate": 1.831924882629108e-05,
1231
- "loss": 0.0176,
1232
  "step": 1790
1233
  },
1234
  {
1235
  "epoch": 8.45,
1236
- "learning_rate": 1.830985915492958e-05,
1237
- "loss": 0.0088,
1238
  "step": 1800
1239
  },
1240
  {
1241
  "epoch": 8.5,
1242
- "learning_rate": 1.8300469483568078e-05,
1243
- "loss": 0.0185,
1244
  "step": 1810
1245
  },
1246
  {
1247
  "epoch": 8.54,
1248
- "learning_rate": 1.8291079812206572e-05,
1249
- "loss": 0.0313,
1250
  "step": 1820
1251
  },
1252
  {
1253
  "epoch": 8.59,
1254
- "learning_rate": 1.8281690140845073e-05,
1255
- "loss": 0.0252,
1256
  "step": 1830
1257
  },
1258
  {
1259
  "epoch": 8.64,
1260
- "learning_rate": 1.827230046948357e-05,
1261
- "loss": 0.0212,
1262
  "step": 1840
1263
  },
1264
  {
1265
  "epoch": 8.69,
1266
- "learning_rate": 1.8262910798122068e-05,
1267
- "loss": 0.0309,
1268
  "step": 1850
1269
  },
1270
  {
1271
  "epoch": 8.73,
1272
- "learning_rate": 1.8253521126760566e-05,
1273
- "loss": 0.0165,
1274
  "step": 1860
1275
  },
1276
  {
1277
  "epoch": 8.78,
1278
- "learning_rate": 1.8244131455399063e-05,
1279
- "loss": 0.0151,
1280
  "step": 1870
1281
  },
1282
  {
1283
  "epoch": 8.83,
1284
- "learning_rate": 1.823474178403756e-05,
1285
- "loss": 0.0107,
1286
  "step": 1880
1287
  },
1288
  {
1289
  "epoch": 8.87,
1290
- "learning_rate": 1.822535211267606e-05,
1291
- "loss": 0.0144,
1292
  "step": 1890
1293
  },
1294
  {
1295
  "epoch": 8.92,
1296
- "learning_rate": 1.8215962441314556e-05,
1297
- "loss": 0.0163,
1298
  "step": 1900
1299
  },
1300
  {
1301
  "epoch": 8.97,
1302
- "learning_rate": 1.8206572769953054e-05,
1303
- "loss": 0.0173,
1304
  "step": 1910
1305
  },
1306
  {
1307
  "epoch": 9.0,
1308
- "eval_corporation_f1": 0.25,
1309
- "eval_creative-work_f1": 0.38626609442060084,
1310
- "eval_group_f1": 0.3660130718954248,
1311
- "eval_location_f1": 0.6728971962616823,
1312
- "eval_loss": 0.28229841589927673,
1313
- "eval_overall_accuracy": 0.9501248365236,
1314
- "eval_overall_f1": 0.6373723923657345,
1315
- "eval_overall_precision": 0.684461391801716,
1316
- "eval_overall_recall": 0.5963455149501661,
1317
- "eval_person_f1": 0.7810107197549772,
1318
- "eval_product_f1": 0.40637450199203184,
1319
- "eval_runtime": 6.479,
1320
- "eval_samples_per_second": 155.733,
1321
- "eval_steps_per_second": 9.878,
1322
  "step": 1917
1323
  },
1324
  {
1325
  "epoch": 9.01,
1326
- "learning_rate": 1.819718309859155e-05,
1327
- "loss": 0.014,
1328
  "step": 1920
1329
  },
1330
  {
1331
  "epoch": 9.06,
1332
- "learning_rate": 1.818779342723005e-05,
1333
- "loss": 0.0112,
1334
  "step": 1930
1335
  },
1336
  {
1337
  "epoch": 9.11,
1338
- "learning_rate": 1.8178403755868546e-05,
1339
- "loss": 0.0232,
1340
  "step": 1940
1341
  },
1342
  {
1343
  "epoch": 9.15,
1344
- "learning_rate": 1.8169014084507044e-05,
1345
- "loss": 0.0143,
1346
  "step": 1950
1347
  },
1348
  {
1349
  "epoch": 9.2,
1350
- "learning_rate": 1.815962441314554e-05,
1351
- "loss": 0.0248,
1352
  "step": 1960
1353
  },
1354
  {
1355
  "epoch": 9.25,
1356
- "learning_rate": 1.815023474178404e-05,
1357
- "loss": 0.0198,
1358
  "step": 1970
1359
  },
1360
  {
1361
  "epoch": 9.3,
1362
- "learning_rate": 1.8140845070422537e-05,
1363
- "loss": 0.0095,
1364
  "step": 1980
1365
  },
1366
  {
1367
  "epoch": 9.34,
1368
- "learning_rate": 1.8131455399061034e-05,
1369
- "loss": 0.0142,
1370
  "step": 1990
1371
  },
1372
  {
1373
  "epoch": 9.39,
1374
- "learning_rate": 1.8122065727699532e-05,
1375
- "loss": 0.008,
1376
  "step": 2000
1377
  },
1378
  {
1379
  "epoch": 9.44,
1380
- "learning_rate": 1.811267605633803e-05,
1381
- "loss": 0.0103,
1382
  "step": 2010
1383
  },
1384
  {
1385
  "epoch": 9.48,
1386
- "learning_rate": 1.8103286384976527e-05,
1387
- "loss": 0.0215,
1388
  "step": 2020
1389
  },
1390
  {
1391
  "epoch": 9.53,
1392
- "learning_rate": 1.8093896713615025e-05,
1393
- "loss": 0.0332,
1394
  "step": 2030
1395
  },
1396
  {
1397
  "epoch": 9.58,
1398
- "learning_rate": 1.8084507042253522e-05,
1399
- "loss": 0.0146,
1400
  "step": 2040
1401
  },
1402
  {
1403
  "epoch": 9.62,
1404
- "learning_rate": 1.807511737089202e-05,
1405
- "loss": 0.0179,
1406
  "step": 2050
1407
  },
1408
  {
1409
  "epoch": 9.67,
1410
- "learning_rate": 1.8065727699530517e-05,
1411
- "loss": 0.0162,
1412
  "step": 2060
1413
  },
1414
  {
1415
  "epoch": 9.72,
1416
- "learning_rate": 1.8056338028169015e-05,
1417
- "loss": 0.0182,
1418
  "step": 2070
1419
  },
1420
  {
1421
  "epoch": 9.77,
1422
- "learning_rate": 1.8046948356807513e-05,
1423
- "loss": 0.0166,
1424
  "step": 2080
1425
  },
1426
  {
1427
  "epoch": 9.81,
1428
- "learning_rate": 1.803755868544601e-05,
1429
- "loss": 0.0077,
1430
  "step": 2090
1431
  },
1432
  {
1433
  "epoch": 9.86,
1434
- "learning_rate": 1.8028169014084508e-05,
1435
- "loss": 0.0141,
1436
  "step": 2100
1437
  },
1438
  {
1439
  "epoch": 9.91,
1440
- "learning_rate": 1.8018779342723005e-05,
1441
- "loss": 0.0163,
1442
  "step": 2110
1443
  },
1444
  {
1445
  "epoch": 9.95,
1446
- "learning_rate": 1.8009389671361503e-05,
1447
- "loss": 0.0141,
1448
  "step": 2120
1449
  },
1450
  {
1451
  "epoch": 10.0,
1452
- "learning_rate": 1.8e-05,
1453
- "loss": 0.0227,
1454
  "step": 2130
1455
  },
1456
  {
1457
  "epoch": 10.0,
1458
- "eval_corporation_f1": 0.2268041237113402,
1459
- "eval_creative-work_f1": 0.379746835443038,
1460
- "eval_group_f1": 0.36249999999999993,
1461
- "eval_location_f1": 0.7045454545454545,
1462
- "eval_loss": 0.29116329550743103,
1463
- "eval_overall_accuracy": 0.9481631197241708,
1464
- "eval_overall_f1": 0.6156615661566157,
1465
- "eval_overall_precision": 0.6719056974459725,
1466
- "eval_overall_recall": 0.5681063122923588,
1467
- "eval_person_f1": 0.757234726688103,
1468
- "eval_product_f1": 0.4285714285714286,
1469
- "eval_runtime": 3.3974,
1470
- "eval_samples_per_second": 296.989,
1471
- "eval_steps_per_second": 18.838,
1472
  "step": 2130
1473
  },
1474
  {
1475
  "epoch": 10.05,
1476
- "learning_rate": 1.7990610328638498e-05,
1477
- "loss": 0.0116,
1478
  "step": 2140
1479
  },
1480
  {
1481
  "epoch": 10.09,
1482
- "learning_rate": 1.7981220657276996e-05,
1483
- "loss": 0.01,
1484
  "step": 2150
1485
  },
1486
  {
1487
  "epoch": 10.14,
1488
- "learning_rate": 1.7971830985915497e-05,
1489
- "loss": 0.021,
1490
  "step": 2160
1491
  },
1492
  {
1493
  "epoch": 10.19,
1494
- "learning_rate": 1.796244131455399e-05,
1495
- "loss": 0.0075,
1496
  "step": 2170
1497
  },
1498
  {
1499
  "epoch": 10.23,
1500
- "learning_rate": 1.795305164319249e-05,
1501
- "loss": 0.0281,
1502
  "step": 2180
1503
  },
1504
  {
1505
  "epoch": 10.28,
1506
- "learning_rate": 1.794366197183099e-05,
1507
- "loss": 0.0119,
1508
  "step": 2190
1509
  },
1510
  {
1511
  "epoch": 10.33,
1512
- "learning_rate": 1.7934272300469484e-05,
1513
- "loss": 0.009,
1514
  "step": 2200
1515
  },
1516
  {
1517
  "epoch": 10.38,
1518
- "learning_rate": 1.792488262910798e-05,
1519
- "loss": 0.0097,
1520
  "step": 2210
1521
  },
1522
  {
1523
  "epoch": 10.42,
1524
- "learning_rate": 1.7915492957746482e-05,
1525
- "loss": 0.0054,
1526
  "step": 2220
1527
  },
1528
  {
1529
  "epoch": 10.47,
1530
- "learning_rate": 1.7906103286384976e-05,
1531
- "loss": 0.0078,
1532
  "step": 2230
1533
  },
1534
  {
1535
  "epoch": 10.52,
1536
- "learning_rate": 1.7896713615023474e-05,
1537
- "loss": 0.0095,
1538
  "step": 2240
1539
  },
1540
  {
1541
  "epoch": 10.56,
1542
- "learning_rate": 1.7887323943661975e-05,
1543
- "loss": 0.0158,
1544
  "step": 2250
1545
  },
1546
  {
1547
  "epoch": 10.61,
1548
- "learning_rate": 1.787793427230047e-05,
1549
- "loss": 0.0199,
1550
  "step": 2260
1551
  },
1552
  {
1553
  "epoch": 10.66,
1554
- "learning_rate": 1.786854460093897e-05,
1555
- "loss": 0.0128,
1556
  "step": 2270
1557
  },
1558
  {
1559
  "epoch": 10.7,
1560
- "learning_rate": 1.7859154929577468e-05,
1561
- "loss": 0.019,
1562
  "step": 2280
1563
  },
1564
  {
1565
  "epoch": 10.75,
1566
- "learning_rate": 1.7849765258215962e-05,
1567
- "loss": 0.0096,
1568
  "step": 2290
1569
  },
1570
  {
1571
  "epoch": 10.8,
1572
- "learning_rate": 1.7840375586854463e-05,
1573
- "loss": 0.0155,
1574
  "step": 2300
1575
  },
1576
  {
1577
  "epoch": 10.85,
1578
- "learning_rate": 1.783098591549296e-05,
1579
- "loss": 0.008,
1580
  "step": 2310
1581
  },
1582
  {
1583
  "epoch": 10.89,
1584
- "learning_rate": 1.7821596244131455e-05,
1585
- "loss": 0.017,
1586
  "step": 2320
1587
  },
1588
  {
1589
  "epoch": 10.94,
1590
- "learning_rate": 1.7812206572769956e-05,
1591
- "loss": 0.0162,
1592
  "step": 2330
1593
  },
1594
  {
1595
  "epoch": 10.99,
1596
- "learning_rate": 1.7802816901408453e-05,
1597
- "loss": 0.0185,
1598
  "step": 2340
1599
  },
1600
  {
1601
  "epoch": 11.0,
1602
- "eval_corporation_f1": 0.2531645569620253,
1603
- "eval_creative-work_f1": 0.3896103896103896,
1604
- "eval_group_f1": 0.338235294117647,
1605
- "eval_location_f1": 0.7058823529411765,
1606
- "eval_loss": 0.3139878213405609,
1607
- "eval_overall_accuracy": 0.9482225656877898,
1608
- "eval_overall_f1": 0.6197701149425288,
1609
- "eval_overall_precision": 0.694129763130793,
1610
- "eval_overall_recall": 0.5598006644518272,
1611
- "eval_person_f1": 0.7601296596434362,
1612
- "eval_product_f1": 0.3961038961038961,
1613
- "eval_runtime": 6.297,
1614
- "eval_samples_per_second": 160.235,
1615
- "eval_steps_per_second": 10.164,
1616
  "step": 2343
1617
  },
1618
  {
1619
  "epoch": 11.03,
1620
- "learning_rate": 1.779342723004695e-05,
1621
- "loss": 0.0189,
1622
  "step": 2350
1623
  },
1624
  {
1625
  "epoch": 11.08,
1626
- "learning_rate": 1.7784037558685448e-05,
1627
- "loss": 0.0132,
1628
  "step": 2360
1629
  },
1630
  {
1631
  "epoch": 11.13,
1632
- "learning_rate": 1.7774647887323946e-05,
1633
- "loss": 0.0059,
1634
  "step": 2370
1635
  },
1636
  {
1637
  "epoch": 11.17,
1638
- "learning_rate": 1.7765258215962443e-05,
1639
- "loss": 0.0108,
1640
  "step": 2380
1641
  },
1642
  {
1643
  "epoch": 11.22,
1644
- "learning_rate": 1.775586854460094e-05,
1645
- "loss": 0.0101,
1646
  "step": 2390
1647
  },
1648
  {
1649
  "epoch": 11.27,
1650
- "learning_rate": 1.774647887323944e-05,
1651
- "loss": 0.0102,
1652
  "step": 2400
1653
  },
1654
  {
1655
  "epoch": 11.31,
1656
- "learning_rate": 1.7737089201877936e-05,
1657
- "loss": 0.0094,
1658
  "step": 2410
1659
  },
1660
  {
1661
  "epoch": 11.36,
1662
- "learning_rate": 1.7727699530516434e-05,
1663
- "loss": 0.0099,
1664
  "step": 2420
1665
  },
1666
  {
1667
  "epoch": 11.41,
1668
- "learning_rate": 1.771830985915493e-05,
1669
- "loss": 0.0068,
1670
  "step": 2430
1671
  },
1672
  {
1673
  "epoch": 11.46,
1674
- "learning_rate": 1.770892018779343e-05,
1675
- "loss": 0.0089,
1676
  "step": 2440
1677
  },
1678
  {
1679
  "epoch": 11.5,
1680
- "learning_rate": 1.7699530516431927e-05,
1681
- "loss": 0.0054,
1682
  "step": 2450
1683
  },
1684
  {
1685
  "epoch": 11.55,
1686
- "learning_rate": 1.7690140845070424e-05,
1687
- "loss": 0.0089,
1688
  "step": 2460
1689
  },
1690
  {
1691
  "epoch": 11.6,
1692
- "learning_rate": 1.768075117370892e-05,
1693
- "loss": 0.0076,
1694
  "step": 2470
1695
  },
1696
  {
1697
  "epoch": 11.64,
1698
- "learning_rate": 1.767136150234742e-05,
1699
- "loss": 0.0183,
1700
  "step": 2480
1701
  },
1702
  {
1703
  "epoch": 11.69,
1704
- "learning_rate": 1.7661971830985917e-05,
1705
- "loss": 0.0178,
1706
  "step": 2490
1707
  },
1708
  {
1709
  "epoch": 11.74,
1710
- "learning_rate": 1.7652582159624414e-05,
1711
- "loss": 0.0285,
1712
  "step": 2500
1713
  },
1714
  {
1715
  "epoch": 11.78,
1716
- "learning_rate": 1.7643192488262912e-05,
1717
- "loss": 0.0232,
1718
  "step": 2510
1719
  },
1720
  {
1721
  "epoch": 11.83,
1722
- "learning_rate": 1.763380281690141e-05,
1723
- "loss": 0.0071,
1724
  "step": 2520
1725
  },
1726
  {
1727
  "epoch": 11.88,
1728
- "learning_rate": 1.7624413145539907e-05,
1729
- "loss": 0.0132,
1730
  "step": 2530
1731
  },
1732
  {
1733
  "epoch": 11.92,
1734
- "learning_rate": 1.7615023474178405e-05,
1735
- "loss": 0.0046,
1736
  "step": 2540
1737
  },
1738
  {
1739
  "epoch": 11.97,
1740
- "learning_rate": 1.7605633802816902e-05,
1741
- "loss": 0.0221,
1742
  "step": 2550
1743
  },
1744
  {
1745
  "epoch": 12.0,
1746
- "eval_corporation_f1": 0.3220338983050847,
1747
- "eval_creative-work_f1": 0.3686635944700461,
1748
- "eval_group_f1": 0.35,
1749
- "eval_location_f1": 0.7244897959183673,
1750
- "eval_loss": 0.3527165353298187,
1751
- "eval_overall_accuracy": 0.9470336464154084,
1752
- "eval_overall_f1": 0.6118848653667595,
1753
- "eval_overall_precision": 0.6936842105263158,
1754
- "eval_overall_recall": 0.5473421926910299,
1755
- "eval_person_f1": 0.750202101859337,
1756
- "eval_product_f1": 0.33082706766917297,
1757
- "eval_runtime": 4.561,
1758
- "eval_samples_per_second": 221.226,
1759
- "eval_steps_per_second": 14.032,
1760
  "step": 2556
1761
  },
1762
  {
1763
  "epoch": 12.02,
1764
- "learning_rate": 1.75962441314554e-05,
1765
- "loss": 0.0095,
1766
  "step": 2560
1767
  },
1768
  {
1769
  "epoch": 12.07,
1770
- "learning_rate": 1.7586854460093898e-05,
1771
- "loss": 0.0089,
1772
  "step": 2570
1773
  },
1774
  {
1775
  "epoch": 12.11,
1776
- "learning_rate": 1.75774647887324e-05,
1777
- "loss": 0.0103,
1778
  "step": 2580
1779
  },
1780
  {
1781
  "epoch": 12.16,
1782
- "learning_rate": 1.7568075117370893e-05,
1783
- "loss": 0.02,
1784
  "step": 2590
1785
  },
1786
  {
1787
  "epoch": 12.21,
1788
- "learning_rate": 1.755868544600939e-05,
1789
- "loss": 0.0084,
1790
  "step": 2600
1791
  },
1792
  {
1793
  "epoch": 12.25,
1794
- "learning_rate": 1.754929577464789e-05,
1795
- "loss": 0.0129,
1796
  "step": 2610
1797
  },
1798
  {
1799
  "epoch": 12.3,
1800
- "learning_rate": 1.7539906103286385e-05,
1801
- "loss": 0.0094,
1802
  "step": 2620
1803
  },
1804
  {
1805
  "epoch": 12.35,
1806
- "learning_rate": 1.7530516431924883e-05,
1807
- "loss": 0.0054,
1808
  "step": 2630
1809
  },
1810
  {
1811
  "epoch": 12.39,
1812
- "learning_rate": 1.7521126760563384e-05,
1813
- "loss": 0.0132,
1814
  "step": 2640
1815
  },
1816
  {
1817
  "epoch": 12.44,
1818
- "learning_rate": 1.7511737089201878e-05,
1819
- "loss": 0.0064,
1820
  "step": 2650
1821
  },
1822
  {
1823
  "epoch": 12.49,
1824
- "learning_rate": 1.7502347417840376e-05,
1825
- "loss": 0.0152,
1826
  "step": 2660
1827
  },
1828
  {
1829
  "epoch": 12.54,
1830
- "learning_rate": 1.7492957746478873e-05,
1831
- "loss": 0.0104,
1832
  "step": 2670
1833
  },
1834
  {
1835
  "epoch": 12.58,
1836
- "learning_rate": 1.748356807511737e-05,
1837
- "loss": 0.0137,
1838
  "step": 2680
1839
  },
1840
  {
1841
  "epoch": 12.63,
1842
- "learning_rate": 1.7474178403755872e-05,
1843
- "loss": 0.0077,
1844
  "step": 2690
1845
  },
1846
  {
1847
  "epoch": 12.68,
1848
- "learning_rate": 1.7464788732394366e-05,
1849
- "loss": 0.0064,
1850
  "step": 2700
1851
  },
1852
  {
1853
  "epoch": 12.72,
1854
- "learning_rate": 1.7455399061032864e-05,
1855
- "loss": 0.0036,
1856
  "step": 2710
1857
  },
1858
  {
1859
  "epoch": 12.77,
1860
- "learning_rate": 1.7446009389671365e-05,
1861
- "loss": 0.0078,
1862
  "step": 2720
1863
  },
1864
  {
1865
  "epoch": 12.82,
1866
- "learning_rate": 1.743661971830986e-05,
1867
- "loss": 0.0077,
1868
  "step": 2730
1869
  },
1870
  {
1871
  "epoch": 12.86,
1872
- "learning_rate": 1.7427230046948356e-05,
1873
- "loss": 0.0203,
1874
  "step": 2740
1875
  },
1876
  {
1877
  "epoch": 12.91,
1878
- "learning_rate": 1.7417840375586857e-05,
1879
- "loss": 0.0089,
1880
  "step": 2750
1881
  },
1882
  {
1883
  "epoch": 12.96,
1884
- "learning_rate": 1.740845070422535e-05,
1885
- "loss": 0.0099,
1886
  "step": 2760
1887
  },
1888
  {
1889
  "epoch": 13.0,
1890
- "eval_corporation_f1": 0.3168316831683168,
1891
- "eval_creative-work_f1": 0.37815126050420167,
1892
- "eval_group_f1": 0.3597122302158273,
1893
- "eval_location_f1": 0.7391304347826086,
1894
- "eval_loss": 0.33322253823280334,
1895
- "eval_overall_accuracy": 0.9492925930329331,
1896
- "eval_overall_f1": 0.6259611035730439,
1897
- "eval_overall_precision": 0.6871896722939425,
1898
- "eval_overall_recall": 0.574750830564784,
1899
- "eval_person_f1": 0.7627388535031848,
1900
- "eval_product_f1": 0.40273037542662116,
1901
- "eval_runtime": 5.2031,
1902
- "eval_samples_per_second": 193.924,
1903
- "eval_steps_per_second": 12.3,
1904
  "step": 2769
1905
  },
1906
  {
1907
  "epoch": 13.0,
1908
- "learning_rate": 1.7399061032863853e-05,
1909
- "loss": 0.0221,
1910
  "step": 2770
1911
  },
1912
  {
1913
  "epoch": 13.05,
1914
- "learning_rate": 1.738967136150235e-05,
1915
- "loss": 0.0036,
1916
  "step": 2780
1917
  },
1918
  {
1919
  "epoch": 13.1,
1920
- "learning_rate": 1.7380281690140844e-05,
1921
- "loss": 0.01,
1922
  "step": 2790
1923
  },
1924
  {
1925
  "epoch": 13.15,
1926
- "learning_rate": 1.7370892018779345e-05,
1927
- "loss": 0.0064,
1928
  "step": 2800
1929
  },
1930
  {
1931
  "epoch": 13.19,
1932
- "learning_rate": 1.7361502347417843e-05,
1933
- "loss": 0.0045,
1934
  "step": 2810
1935
  },
1936
  {
1937
  "epoch": 13.24,
1938
- "learning_rate": 1.7352112676056337e-05,
1939
- "loss": 0.018,
1940
  "step": 2820
1941
  },
1942
  {
1943
  "epoch": 13.29,
1944
- "learning_rate": 1.7342723004694838e-05,
1945
- "loss": 0.0113,
1946
  "step": 2830
1947
  },
1948
  {
1949
  "epoch": 13.33,
1950
- "learning_rate": 1.7333333333333336e-05,
1951
- "loss": 0.0301,
1952
  "step": 2840
1953
  },
1954
  {
1955
  "epoch": 13.38,
1956
- "learning_rate": 1.732394366197183e-05,
1957
- "loss": 0.0114,
1958
  "step": 2850
1959
  },
1960
  {
1961
  "epoch": 13.43,
1962
- "learning_rate": 1.731455399061033e-05,
1963
- "loss": 0.006,
1964
  "step": 2860
1965
  },
1966
  {
1967
  "epoch": 13.47,
1968
- "learning_rate": 1.730516431924883e-05,
1969
- "loss": 0.0127,
1970
  "step": 2870
1971
  },
1972
  {
1973
  "epoch": 13.52,
1974
- "learning_rate": 1.7295774647887326e-05,
1975
- "loss": 0.0039,
1976
  "step": 2880
1977
  },
1978
  {
1979
  "epoch": 13.57,
1980
- "learning_rate": 1.7286384976525824e-05,
1981
- "loss": 0.008,
1982
  "step": 2890
1983
  },
1984
  {
1985
  "epoch": 13.62,
1986
- "learning_rate": 1.727699530516432e-05,
1987
- "loss": 0.011,
1988
  "step": 2900
1989
  },
1990
  {
1991
  "epoch": 13.66,
1992
- "learning_rate": 1.726760563380282e-05,
1993
- "loss": 0.0039,
1994
  "step": 2910
1995
  },
1996
  {
1997
  "epoch": 13.71,
1998
- "learning_rate": 1.7258215962441316e-05,
1999
- "loss": 0.0096,
2000
  "step": 2920
2001
  },
2002
  {
2003
  "epoch": 13.76,
2004
- "learning_rate": 1.7248826291079814e-05,
2005
- "loss": 0.0035,
2006
  "step": 2930
2007
  },
2008
  {
2009
  "epoch": 13.8,
2010
- "learning_rate": 1.723943661971831e-05,
2011
- "loss": 0.009,
2012
  "step": 2940
2013
  },
2014
  {
2015
  "epoch": 13.85,
2016
- "learning_rate": 1.723004694835681e-05,
2017
- "loss": 0.0124,
2018
  "step": 2950
2019
  },
2020
  {
2021
  "epoch": 13.9,
2022
- "learning_rate": 1.7220657276995307e-05,
2023
- "loss": 0.0082,
2024
  "step": 2960
2025
  },
2026
  {
2027
  "epoch": 13.94,
2028
- "learning_rate": 1.7211267605633804e-05,
2029
- "loss": 0.0186,
2030
  "step": 2970
2031
  },
2032
  {
2033
  "epoch": 13.99,
2034
- "learning_rate": 1.7201877934272302e-05,
2035
- "loss": 0.0062,
2036
  "step": 2980
2037
  },
2038
  {
2039
  "epoch": 14.0,
2040
- "eval_corporation_f1": 0.25,
2041
- "eval_creative-work_f1": 0.3700440528634361,
2042
- "eval_group_f1": 0.4065040650406504,
2043
- "eval_location_f1": 0.7340425531914893,
2044
- "eval_loss": 0.36366626620292664,
2045
- "eval_overall_accuracy": 0.9479253358696944,
2046
- "eval_overall_f1": 0.6245847176079733,
2047
- "eval_overall_precision": 0.7286821705426356,
2048
- "eval_overall_recall": 0.5465116279069767,
2049
- "eval_person_f1": 0.7526020816653322,
2050
- "eval_product_f1": 0.346774193548387,
2051
- "eval_runtime": 5.1869,
2052
- "eval_samples_per_second": 194.527,
2053
- "eval_steps_per_second": 12.339,
2054
  "step": 2982
2055
  },
2056
  {
2057
- "epoch": 14.04,
2058
- "learning_rate": 1.71924882629108e-05,
2059
- "loss": 0.0045,
2060
- "step": 2990
2061
- },
2062
- {
2063
- "epoch": 14.08,
2064
- "learning_rate": 1.7183098591549297e-05,
2065
- "loss": 0.0072,
2066
- "step": 3000
2067
- },
2068
- {
2069
- "epoch": 14.13,
2070
- "learning_rate": 1.7173708920187795e-05,
2071
- "loss": 0.0081,
2072
- "step": 3010
2073
- },
2074
- {
2075
- "epoch": 14.18,
2076
- "learning_rate": 1.7164319248826292e-05,
2077
- "loss": 0.0046,
2078
- "step": 3020
2079
- },
2080
- {
2081
- "epoch": 14.23,
2082
- "learning_rate": 1.715492957746479e-05,
2083
- "loss": 0.0088,
2084
- "step": 3030
2085
- },
2086
- {
2087
- "epoch": 14.27,
2088
- "learning_rate": 1.7145539906103287e-05,
2089
- "loss": 0.0047,
2090
- "step": 3040
2091
- },
2092
- {
2093
- "epoch": 14.32,
2094
- "learning_rate": 1.7136150234741785e-05,
2095
- "loss": 0.019,
2096
- "step": 3050
2097
- },
2098
- {
2099
- "epoch": 14.37,
2100
- "learning_rate": 1.7126760563380282e-05,
2101
- "loss": 0.0099,
2102
- "step": 3060
2103
- },
2104
- {
2105
- "epoch": 14.41,
2106
- "learning_rate": 1.711737089201878e-05,
2107
- "loss": 0.0038,
2108
- "step": 3070
2109
- },
2110
- {
2111
- "epoch": 14.46,
2112
- "learning_rate": 1.7107981220657278e-05,
2113
- "loss": 0.0114,
2114
- "step": 3080
2115
- },
2116
- {
2117
- "epoch": 14.51,
2118
- "learning_rate": 1.7098591549295775e-05,
2119
- "loss": 0.0146,
2120
- "step": 3090
2121
- },
2122
- {
2123
- "epoch": 14.55,
2124
- "learning_rate": 1.7089201877934273e-05,
2125
- "loss": 0.0117,
2126
- "step": 3100
2127
- },
2128
- {
2129
- "epoch": 14.6,
2130
- "learning_rate": 1.7079812206572774e-05,
2131
- "loss": 0.0056,
2132
- "step": 3110
2133
- },
2134
- {
2135
- "epoch": 14.65,
2136
- "learning_rate": 1.7070422535211268e-05,
2137
- "loss": 0.0059,
2138
- "step": 3120
2139
- },
2140
- {
2141
- "epoch": 14.69,
2142
- "learning_rate": 1.7061032863849766e-05,
2143
- "loss": 0.0122,
2144
- "step": 3130
2145
- },
2146
- {
2147
- "epoch": 14.74,
2148
- "learning_rate": 1.7051643192488267e-05,
2149
- "loss": 0.006,
2150
- "step": 3140
2151
- },
2152
- {
2153
- "epoch": 14.79,
2154
- "learning_rate": 1.704225352112676e-05,
2155
- "loss": 0.0115,
2156
- "step": 3150
2157
- },
2158
- {
2159
- "epoch": 14.84,
2160
- "learning_rate": 1.703286384976526e-05,
2161
- "loss": 0.0037,
2162
- "step": 3160
2163
- },
2164
- {
2165
- "epoch": 14.88,
2166
- "learning_rate": 1.702347417840376e-05,
2167
- "loss": 0.0207,
2168
- "step": 3170
2169
- },
2170
- {
2171
- "epoch": 14.93,
2172
- "learning_rate": 1.7014084507042253e-05,
2173
- "loss": 0.0102,
2174
- "step": 3180
2175
- },
2176
- {
2177
- "epoch": 14.98,
2178
- "learning_rate": 1.7004694835680754e-05,
2179
- "loss": 0.0075,
2180
- "step": 3190
2181
- },
2182
- {
2183
- "epoch": 15.0,
2184
- "eval_corporation_f1": 0.27027027027027023,
2185
- "eval_creative-work_f1": 0.36363636363636365,
2186
- "eval_group_f1": 0.4029850746268656,
2187
- "eval_location_f1": 0.7499999999999999,
2188
- "eval_loss": 0.3238934278488159,
2189
- "eval_overall_accuracy": 0.9498870526691238,
2190
- "eval_overall_f1": 0.6374216651745748,
2191
- "eval_overall_precision": 0.6912621359223301,
2192
- "eval_overall_recall": 0.5913621262458472,
2193
- "eval_person_f1": 0.7732919254658385,
2194
- "eval_product_f1": 0.4152249134948097,
2195
- "eval_runtime": 2.1734,
2196
- "eval_samples_per_second": 464.252,
2197
- "eval_steps_per_second": 29.447,
2198
- "step": 3195
2199
- },
2200
- {
2201
- "epoch": 15.0,
2202
- "step": 3195,
2203
- "total_flos": 1172435714212020.0,
2204
- "train_loss": 0.06678202886758183,
2205
- "train_runtime": 827.6844,
2206
- "train_samples_per_second": 410.06,
2207
- "train_steps_per_second": 25.734
2208
  }
2209
  ],
2210
  "max_steps": 21300,
2211
  "num_train_epochs": 100,
2212
- "total_flos": 1172435714212020.0,
2213
  "trial_name": null,
2214
  "trial_params": null
2215
  }
1
  {
2
+ "best_metric": 0.27115458250045776,
3
+ "best_model_checkpoint": "./fine_tune_bert_output_LP_FP/checkpoint-852",
4
+ "epoch": 14.0,
5
+ "global_step": 2982,
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
9
  "log_history": [
10
  {
11
  "epoch": 0.0,
12
+ "learning_rate": 9.999530516431926e-06,
13
+ "loss": 0.0227,
14
  "step": 1
15
  },
16
  {
17
  "epoch": 0.05,
18
+ "learning_rate": 9.99530516431925e-06,
19
+ "loss": 0.0194,
20
  "step": 10
21
  },
22
  {
23
  "epoch": 0.09,
24
+ "learning_rate": 9.990610328638498e-06,
25
+ "loss": 0.0265,
26
  "step": 20
27
  },
28
  {
29
  "epoch": 0.14,
30
+ "learning_rate": 9.985915492957747e-06,
31
+ "loss": 0.0265,
32
  "step": 30
33
  },
34
  {
35
  "epoch": 0.19,
36
+ "learning_rate": 9.981220657276996e-06,
37
+ "loss": 0.0188,
38
  "step": 40
39
  },
40
  {
41
  "epoch": 0.23,
42
+ "learning_rate": 9.976525821596245e-06,
43
+ "loss": 0.0188,
44
  "step": 50
45
  },
46
  {
47
  "epoch": 0.28,
48
+ "learning_rate": 9.971830985915494e-06,
49
+ "loss": 0.0313,
50
  "step": 60
51
  },
52
  {
53
  "epoch": 0.33,
54
+ "learning_rate": 9.967136150234742e-06,
55
+ "loss": 0.0259,
56
  "step": 70
57
  },
58
  {
59
  "epoch": 0.38,
60
+ "learning_rate": 9.962441314553991e-06,
61
+ "loss": 0.0249,
62
  "step": 80
63
  },
64
  {
65
  "epoch": 0.42,
66
+ "learning_rate": 9.95774647887324e-06,
67
+ "loss": 0.0333,
68
  "step": 90
69
  },
70
  {
71
  "epoch": 0.47,
72
+ "learning_rate": 9.953051643192489e-06,
73
+ "loss": 0.0337,
74
  "step": 100
75
  },
76
  {
77
  "epoch": 0.52,
78
+ "learning_rate": 9.948356807511738e-06,
79
+ "loss": 0.0229,
80
  "step": 110
81
  },
82
  {
83
  "epoch": 0.56,
84
+ "learning_rate": 9.943661971830986e-06,
85
+ "loss": 0.013,
86
  "step": 120
87
  },
88
  {
89
  "epoch": 0.61,
90
+ "learning_rate": 9.938967136150237e-06,
91
+ "loss": 0.0332,
92
  "step": 130
93
  },
94
  {
95
  "epoch": 0.66,
96
+ "learning_rate": 9.934272300469484e-06,
97
+ "loss": 0.0221,
98
  "step": 140
99
  },
100
  {
101
  "epoch": 0.7,
102
+ "learning_rate": 9.929577464788733e-06,
103
+ "loss": 0.0198,
104
  "step": 150
105
  },
106
  {
107
  "epoch": 0.75,
108
+ "learning_rate": 9.924882629107983e-06,
109
+ "loss": 0.0286,
110
  "step": 160
111
  },
112
  {
113
  "epoch": 0.8,
114
+ "learning_rate": 9.92018779342723e-06,
115
+ "loss": 0.0451,
116
  "step": 170
117
  },
118
  {
119
  "epoch": 0.85,
120
+ "learning_rate": 9.915492957746479e-06,
121
+ "loss": 0.0257,
122
  "step": 180
123
  },
124
  {
125
  "epoch": 0.89,
126
+ "learning_rate": 9.91079812206573e-06,
127
+ "loss": 0.0185,
128
  "step": 190
129
  },
130
  {
131
  "epoch": 0.94,
132
+ "learning_rate": 9.906103286384977e-06,
133
+ "loss": 0.032,
134
  "step": 200
135
  },
136
  {
137
  "epoch": 0.99,
138
+ "learning_rate": 9.901408450704227e-06,
139
+ "loss": 0.0215,
140
  "step": 210
141
  },
142
  {
143
  "epoch": 1.0,
144
+ "eval_corporation_f1": 0.2831858407079646,
145
+ "eval_creative-work_f1": 0.4444444444444445,
146
+ "eval_group_f1": 0.2975206611570248,
147
+ "eval_location_f1": 0.6853932584269663,
148
+ "eval_loss": 0.2913361191749573,
149
+ "eval_overall_accuracy": 0.9507192961597908,
150
+ "eval_overall_f1": 0.641696750902527,
151
+ "eval_overall_precision": 0.7025691699604744,
152
+ "eval_overall_recall": 0.590531561461794,
153
+ "eval_person_f1": 0.7787878787878788,
154
+ "eval_product_f1": 0.4015444015444016,
155
+ "eval_runtime": 6.2036,
156
+ "eval_samples_per_second": 162.648,
157
+ "eval_steps_per_second": 10.317,
158
  "step": 213
159
  },
160
  {
161
  "epoch": 1.03,
162
+ "learning_rate": 9.896713615023476e-06,
163
+ "loss": 0.0237,
164
  "step": 220
165
  },
166
  {
167
  "epoch": 1.08,
168
+ "learning_rate": 9.892018779342723e-06,
169
+ "loss": 0.0166,
170
  "step": 230
171
  },
172
  {
173
  "epoch": 1.13,
174
+ "learning_rate": 9.887323943661974e-06,
175
+ "loss": 0.0144,
176
  "step": 240
177
  },
178
  {
179
  "epoch": 1.17,
180
+ "learning_rate": 9.882629107981222e-06,
181
+ "loss": 0.0198,
182
  "step": 250
183
  },
184
  {
185
  "epoch": 1.22,
186
+ "learning_rate": 9.87793427230047e-06,
187
+ "loss": 0.0299,
188
  "step": 260
189
  },
190
  {
191
  "epoch": 1.27,
192
+ "learning_rate": 9.87323943661972e-06,
193
+ "loss": 0.0123,
194
  "step": 270
195
  },
196
  {
197
  "epoch": 1.31,
198
+ "learning_rate": 9.868544600938969e-06,
199
+ "loss": 0.0143,
200
  "step": 280
201
  },
202
  {
203
  "epoch": 1.36,
204
+ "learning_rate": 9.863849765258216e-06,
205
+ "loss": 0.0227,
206
  "step": 290
207
  },
208
  {
209
  "epoch": 1.41,
210
+ "learning_rate": 9.859154929577466e-06,
211
+ "loss": 0.0125,
212
  "step": 300
213
  },
214
  {
215
  "epoch": 1.46,
216
+ "learning_rate": 9.854460093896713e-06,
217
+ "loss": 0.0374,
218
  "step": 310
219
  },
220
  {
221
  "epoch": 1.5,
222
+ "learning_rate": 9.849765258215964e-06,
223
+ "loss": 0.0126,
224
  "step": 320
225
  },
226
  {
227
  "epoch": 1.55,
228
+ "learning_rate": 9.845070422535213e-06,
229
+ "loss": 0.0234,
230
  "step": 330
231
  },
232
  {
233
  "epoch": 1.6,
234
+ "learning_rate": 9.84037558685446e-06,
235
+ "loss": 0.009,
236
  "step": 340
237
  },
238
  {
239
  "epoch": 1.64,
240
+ "learning_rate": 9.83568075117371e-06,
241
+ "loss": 0.0149,
242
  "step": 350
243
  },
244
  {
245
  "epoch": 1.69,
246
+ "learning_rate": 9.830985915492959e-06,
247
+ "loss": 0.0289,
248
  "step": 360
249
  },
250
  {
251
  "epoch": 1.74,
252
+ "learning_rate": 9.826291079812206e-06,
253
+ "loss": 0.03,
254
  "step": 370
255
  },
256
  {
257
  "epoch": 1.78,
258
+ "learning_rate": 9.821596244131457e-06,
259
+ "loss": 0.0278,
260
  "step": 380
261
  },
262
  {
263
  "epoch": 1.83,
264
+ "learning_rate": 9.816901408450705e-06,
265
+ "loss": 0.0192,
266
  "step": 390
267
  },
268
  {
269
  "epoch": 1.88,
270
+ "learning_rate": 9.812206572769954e-06,
271
+ "loss": 0.0187,
272
  "step": 400
273
  },
274
  {
275
  "epoch": 1.92,
276
+ "learning_rate": 9.807511737089203e-06,
277
+ "loss": 0.0184,
278
  "step": 410
279
  },
280
  {
281
  "epoch": 1.97,
282
+ "learning_rate": 9.802816901408452e-06,
283
+ "loss": 0.0213,
284
  "step": 420
285
  },
286
  {
287
  "epoch": 2.0,
288
+ "eval_corporation_f1": 0.2830188679245283,
289
+ "eval_creative-work_f1": 0.3482587064676617,
290
+ "eval_group_f1": 0.3230769230769231,
291
+ "eval_location_f1": 0.6857142857142858,
292
+ "eval_loss": 0.30522316694259644,
293
+ "eval_overall_accuracy": 0.9494709309237903,
294
+ "eval_overall_f1": 0.6233183856502242,
295
+ "eval_overall_precision": 0.6773879142300195,
296
+ "eval_overall_recall": 0.5772425249169435,
297
+ "eval_person_f1": 0.7727620504973222,
298
+ "eval_product_f1": 0.37942122186495186,
299
+ "eval_runtime": 6.2329,
300
+ "eval_samples_per_second": 161.883,
301
+ "eval_steps_per_second": 10.268,
302
  "step": 426
303
  },
304
  {
305
  "epoch": 2.02,
306
+ "learning_rate": 9.7981220657277e-06,
307
+ "loss": 0.0294,
308
  "step": 430
309
  },
310
  {
311
  "epoch": 2.07,
312
+ "learning_rate": 9.79342723004695e-06,
313
+ "loss": 0.0179,
314
  "step": 440
315
  },
316
  {
317
  "epoch": 2.11,
318
+ "learning_rate": 9.788732394366198e-06,
319
+ "loss": 0.021,
320
  "step": 450
321
  },
322
  {
323
  "epoch": 2.16,
324
+ "learning_rate": 9.784037558685447e-06,
325
+ "loss": 0.0128,
326
  "step": 460
327
  },
328
  {
329
  "epoch": 2.21,
330
+ "learning_rate": 9.779342723004696e-06,
331
+ "loss": 0.0138,
332
  "step": 470
333
  },
334
  {
335
  "epoch": 2.25,
336
+ "learning_rate": 9.774647887323945e-06,
337
+ "loss": 0.0293,
338
  "step": 480
339
  },
340
  {
341
  "epoch": 2.3,
342
+ "learning_rate": 9.769953051643193e-06,
343
+ "loss": 0.0099,
344
  "step": 490
345
  },
346
  {
347
  "epoch": 2.35,
348
+ "learning_rate": 9.765258215962442e-06,
349
+ "loss": 0.0088,
350
  "step": 500
351
  },
352
  {
353
  "epoch": 2.39,
354
+ "learning_rate": 9.760563380281691e-06,
355
+ "loss": 0.0137,
356
  "step": 510
357
  },
358
  {
359
  "epoch": 2.44,
360
+ "learning_rate": 9.75586854460094e-06,
361
+ "loss": 0.021,
362
  "step": 520
363
  },
364
  {
365
  "epoch": 2.49,
366
+ "learning_rate": 9.751173708920188e-06,
367
+ "loss": 0.0197,
368
  "step": 530
369
  },
370
  {
371
  "epoch": 2.54,
372
+ "learning_rate": 9.746478873239437e-06,
373
+ "loss": 0.0208,
374
  "step": 540
375
  },
376
  {
377
  "epoch": 2.58,
378
+ "learning_rate": 9.741784037558686e-06,
379
+ "loss": 0.0074,
380
  "step": 550
381
  },
382
  {
383
  "epoch": 2.63,
384
+ "learning_rate": 9.737089201877935e-06,
385
+ "loss": 0.017,
386
  "step": 560
387
  },
388
  {
389
  "epoch": 2.68,
390
+ "learning_rate": 9.732394366197184e-06,
391
+ "loss": 0.0205,
392
  "step": 570
393
  },
394
  {
395
  "epoch": 2.72,
396
+ "learning_rate": 9.727699530516432e-06,
397
+ "loss": 0.0159,
398
  "step": 580
399
  },
400
  {
401
  "epoch": 2.77,
402
+ "learning_rate": 9.723004694835681e-06,
403
+ "loss": 0.0139,
404
  "step": 590
405
  },
406
  {
407
  "epoch": 2.82,
408
+ "learning_rate": 9.71830985915493e-06,
409
+ "loss": 0.017,
410
  "step": 600
411
  },
412
  {
413
  "epoch": 2.86,
414
+ "learning_rate": 9.713615023474179e-06,
415
+ "loss": 0.0171,
416
  "step": 610
417
  },
418
  {
419
  "epoch": 2.91,
420
+ "learning_rate": 9.708920187793428e-06,
421
+ "loss": 0.0164,
422
  "step": 620
423
  },
424
  {
425
  "epoch": 2.96,
426
+ "learning_rate": 9.704225352112678e-06,
427
+ "loss": 0.0288,
428
  "step": 630
429
  },
430
  {
431
  "epoch": 3.0,
432
+ "eval_corporation_f1": 0.3076923076923077,
433
+ "eval_creative-work_f1": 0.41841004184100417,
434
+ "eval_group_f1": 0.3529411764705882,
435
+ "eval_location_f1": 0.6222222222222222,
436
+ "eval_loss": 0.3378466069698334,
437
+ "eval_overall_accuracy": 0.946736416597313,
438
+ "eval_overall_f1": 0.6187587494167056,
439
+ "eval_overall_precision": 0.7060702875399361,
440
+ "eval_overall_recall": 0.5506644518272426,
441
+ "eval_person_f1": 0.7532051282051282,
442
+ "eval_product_f1": 0.39097744360902253,
443
+ "eval_runtime": 2.031,
444
+ "eval_samples_per_second": 496.798,
445
+ "eval_steps_per_second": 31.511,
446
  "step": 639
447
  },
448
  {
449
  "epoch": 3.0,
450
+ "learning_rate": 9.699530516431925e-06,
451
+ "loss": 0.023,
452
  "step": 640
453
  },
454
  {
455
  "epoch": 3.05,
456
+ "learning_rate": 9.694835680751174e-06,
457
+ "loss": 0.0173,
458
  "step": 650
459
  },
460
  {
461
  "epoch": 3.1,
462
+ "learning_rate": 9.690140845070424e-06,
463
+ "loss": 0.0149,
464
  "step": 660
465
  },
466
  {
467
  "epoch": 3.15,
468
+ "learning_rate": 9.685446009389672e-06,
469
+ "loss": 0.0141,
470
  "step": 670
471
  },
472
  {
473
  "epoch": 3.19,
474
+ "learning_rate": 9.68075117370892e-06,
475
+ "loss": 0.0172,
476
  "step": 680
477
  },
478
  {
479
  "epoch": 3.24,
480
+ "learning_rate": 9.67605633802817e-06,
481
+ "loss": 0.0072,
482
  "step": 690
483
  },
484
  {
485
  "epoch": 3.29,
486
+ "learning_rate": 9.671361502347418e-06,
487
+ "loss": 0.015,
488
  "step": 700
489
  },
490
  {
491
  "epoch": 3.33,
492
+ "learning_rate": 9.666666666666667e-06,
493
+ "loss": 0.009,
494
  "step": 710
495
  },
496
  {
497
  "epoch": 3.38,
498
+ "learning_rate": 9.661971830985917e-06,
499
+ "loss": 0.0098,
500
  "step": 720
501
  },
502
  {
503
  "epoch": 3.43,
504
+ "learning_rate": 9.657276995305164e-06,
505
+ "loss": 0.018,
506
  "step": 730
507
  },
508
  {
509
  "epoch": 3.47,
510
+ "learning_rate": 9.652582159624415e-06,
511
+ "loss": 0.0092,
512
  "step": 740
513
  },
514
  {
515
  "epoch": 3.52,
516
+ "learning_rate": 9.647887323943664e-06,
517
+ "loss": 0.0166,
518
  "step": 750
519
  },
520
  {
521
  "epoch": 3.57,
522
+ "learning_rate": 9.64319248826291e-06,
523
+ "loss": 0.0085,
524
  "step": 760
525
  },
526
  {
527
  "epoch": 3.62,
528
+ "learning_rate": 9.638497652582161e-06,
529
+ "loss": 0.0168,
530
  "step": 770
531
  },
532
  {
533
  "epoch": 3.66,
534
+ "learning_rate": 9.63380281690141e-06,
535
+ "loss": 0.0201,
536
  "step": 780
537
  },
538
  {
539
  "epoch": 3.71,
540
+ "learning_rate": 9.629107981220657e-06,
541
+ "loss": 0.0143,
542
  "step": 790
543
  },
544
  {
545
  "epoch": 3.76,
546
+ "learning_rate": 9.624413145539908e-06,
547
+ "loss": 0.0189,
548
  "step": 800
549
  },
550
  {
551
  "epoch": 3.8,
552
+ "learning_rate": 9.619718309859156e-06,
553
+ "loss": 0.0099,
554
  "step": 810
555
  },
556
  {
557
  "epoch": 3.85,
558
+ "learning_rate": 9.615023474178405e-06,
559
+ "loss": 0.0259,
560
  "step": 820
561
  },
562
  {
563
  "epoch": 3.9,
564
+ "learning_rate": 9.610328638497654e-06,
565
+ "loss": 0.0187,
566
  "step": 830
567
  },
568
  {
569
  "epoch": 3.94,
570
+ "learning_rate": 9.605633802816903e-06,
571
+ "loss": 0.0102,
572
  "step": 840
573
  },
574
  {
575
  "epoch": 3.99,
576
+ "learning_rate": 9.600938967136152e-06,
577
+ "loss": 0.0124,
578
  "step": 850
579
  },
580
  {
581
  "epoch": 4.0,
582
+ "eval_corporation_f1": 0.30769230769230765,
583
+ "eval_creative-work_f1": 0.48421052631578954,
584
+ "eval_group_f1": 0.31666666666666665,
585
+ "eval_location_f1": 0.6808510638297872,
586
+ "eval_loss": 0.27115458250045776,
587
+ "eval_overall_accuracy": 0.9502437284508382,
588
+ "eval_overall_f1": 0.6339784946236559,
589
+ "eval_overall_precision": 0.6574487065120428,
590
+ "eval_overall_recall": 0.6121262458471761,
591
+ "eval_person_f1": 0.7734553775743707,
592
+ "eval_product_f1": 0.3986254295532646,
593
+ "eval_runtime": 5.0216,
594
+ "eval_samples_per_second": 200.934,
595
+ "eval_steps_per_second": 12.745,
596
  "step": 852
597
  },
598
  {
599
  "epoch": 4.04,
600
+ "learning_rate": 9.5962441314554e-06,
601
+ "loss": 0.017,
602
  "step": 860
603
  },
604
  {
605
  "epoch": 4.08,
606
+ "learning_rate": 9.591549295774649e-06,
607
+ "loss": 0.0138,
608
  "step": 870
609
  },
610
  {
611
  "epoch": 4.13,
612
+ "learning_rate": 9.586854460093898e-06,
613
+ "loss": 0.0122,
614
  "step": 880
615
  },
616
  {
617
  "epoch": 4.18,
618
+ "learning_rate": 9.582159624413147e-06,
619
+ "loss": 0.0116,
620
  "step": 890
621
  },
622
  {
623
  "epoch": 4.23,
624
+ "learning_rate": 9.577464788732394e-06,
625
+ "loss": 0.0066,
626
  "step": 900
627
  },
628
  {
629
  "epoch": 4.27,
630
+ "learning_rate": 9.572769953051644e-06,
631
+ "loss": 0.016,
632
  "step": 910
633
  },
634
  {
635
  "epoch": 4.32,
636
+ "learning_rate": 9.568075117370893e-06,
637
+ "loss": 0.0203,
638
  "step": 920
639
  },
640
  {
641
  "epoch": 4.37,
642
+ "learning_rate": 9.563380281690142e-06,
643
+ "loss": 0.0092,
644
  "step": 930
645
  },
646
  {
647
  "epoch": 4.41,
648
+ "learning_rate": 9.55868544600939e-06,
649
+ "loss": 0.0098,
650
  "step": 940
651
  },
652
  {
653
  "epoch": 4.46,
654
+ "learning_rate": 9.55399061032864e-06,
655
+ "loss": 0.0083,
656
  "step": 950
657
  },
658
  {
659
  "epoch": 4.51,
660
+ "learning_rate": 9.549295774647888e-06,
661
+ "loss": 0.0244,
662
  "step": 960
663
  },
664
  {
665
  "epoch": 4.55,
666
+ "learning_rate": 9.544600938967137e-06,
667
+ "loss": 0.0096,
668
  "step": 970
669
  },
670
  {
671
  "epoch": 4.6,
672
+ "learning_rate": 9.539906103286386e-06,
673
+ "loss": 0.0109,
674
  "step": 980
675
  },
676
  {
677
  "epoch": 4.65,
678
+ "learning_rate": 9.535211267605635e-06,
679
+ "loss": 0.0114,
680
  "step": 990
681
  },
682
  {
683
  "epoch": 4.69,
684
+ "learning_rate": 9.530516431924883e-06,
685
+ "loss": 0.0214,
686
  "step": 1000
687
  },
688
  {
689
  "epoch": 4.74,
690
+ "learning_rate": 9.525821596244132e-06,
691
+ "loss": 0.0061,
692
  "step": 1010
693
  },
694
  {
695
  "epoch": 4.79,
696
+ "learning_rate": 9.521126760563381e-06,
697
+ "loss": 0.011,
698
  "step": 1020
699
  },
700
  {
701
  "epoch": 4.84,
702
+ "learning_rate": 9.51643192488263e-06,
703
+ "loss": 0.0061,
704
  "step": 1030
705
  },
706
  {
707
  "epoch": 4.88,
708
+ "learning_rate": 9.511737089201879e-06,
709
+ "loss": 0.0202,
710
  "step": 1040
711
  },
712
  {
713
  "epoch": 4.93,
714
+ "learning_rate": 9.507042253521127e-06,
715
+ "loss": 0.0249,
716
  "step": 1050
717
  },
718
  {
719
  "epoch": 4.98,
720
+ "learning_rate": 9.502347417840376e-06,
721
+ "loss": 0.0208,
722
  "step": 1060
723
  },
724
  {
725
  "epoch": 5.0,
726
+ "eval_corporation_f1": 0.3063063063063063,
727
+ "eval_creative-work_f1": 0.42857142857142855,
728
+ "eval_group_f1": 0.3418803418803419,
729
+ "eval_location_f1": 0.7052023121387283,
730
+ "eval_loss": 0.29045674204826355,
731
+ "eval_overall_accuracy": 0.9518487694685531,
732
+ "eval_overall_f1": 0.6544150605109816,
733
+ "eval_overall_precision": 0.7108081791626095,
734
+ "eval_overall_recall": 0.606312292358804,
735
+ "eval_person_f1": 0.7912584777694045,
736
+ "eval_product_f1": 0.42231075697211157,
737
+ "eval_runtime": 4.8738,
738
+ "eval_samples_per_second": 207.026,
739
+ "eval_steps_per_second": 13.131,
740
  "step": 1065
741
  },
742
  {
743
  "epoch": 5.02,
744
+ "learning_rate": 9.497652582159625e-06,
745
+ "loss": 0.0067,
746
  "step": 1070
747
  },
748
  {
749
  "epoch": 5.07,
750
+ "learning_rate": 9.492957746478874e-06,
751
+ "loss": 0.017,
752
  "step": 1080
753
  },
754
  {
755
  "epoch": 5.12,
756
+ "learning_rate": 9.488262910798123e-06,
757
+ "loss": 0.0069,
758
  "step": 1090
759
  },
760
  {
761
  "epoch": 5.16,
762
+ "learning_rate": 9.483568075117371e-06,
763
+ "loss": 0.0109,
764
  "step": 1100
765
  },
766
  {
767
  "epoch": 5.21,
768
+ "learning_rate": 9.47887323943662e-06,
769
+ "loss": 0.0108,
770
  "step": 1110
771
  },
772
  {
773
  "epoch": 5.26,
774
+ "learning_rate": 9.474178403755869e-06,
775
+ "loss": 0.0112,
776
  "step": 1120
777
  },
778
  {
779
  "epoch": 5.31,
780
+ "learning_rate": 9.469483568075118e-06,
781
+ "loss": 0.0042,
782
  "step": 1130
783
  },
784
  {
785
  "epoch": 5.35,
786
+ "learning_rate": 9.464788732394366e-06,
787
+ "loss": 0.0086,
788
  "step": 1140
789
  },
790
  {
791
  "epoch": 5.4,
792
+ "learning_rate": 9.460093896713615e-06,
793
+ "loss": 0.0067,
794
  "step": 1150
795
  },
796
  {
797
  "epoch": 5.45,
798
+ "learning_rate": 9.455399061032866e-06,
799
+ "loss": 0.0099,
800
  "step": 1160
801
  },
802
  {
803
  "epoch": 5.49,
804
+ "learning_rate": 9.450704225352113e-06,
805
+ "loss": 0.0071,
806
  "step": 1170
807
  },
808
  {
809
  "epoch": 5.54,
810
+ "learning_rate": 9.446009389671362e-06,
811
+ "loss": 0.0061,
812
  "step": 1180
813
  },
814
  {
815
  "epoch": 5.59,
816
+ "learning_rate": 9.441314553990612e-06,
817
+ "loss": 0.0088,
818
  "step": 1190
819
  },
820
  {
821
  "epoch": 5.63,
822
+ "learning_rate": 9.43661971830986e-06,
823
+ "loss": 0.0078,
824
  "step": 1200
825
  },
826
  {
827
  "epoch": 5.68,
828
+ "learning_rate": 9.431924882629108e-06,
829
+ "loss": 0.0041,
830
  "step": 1210
831
  },
832
  {
833
  "epoch": 5.73,
834
+ "learning_rate": 9.427230046948358e-06,
835
+ "loss": 0.019,
836
  "step": 1220
837
  },
838
  {
839
  "epoch": 5.77,
840
+ "learning_rate": 9.422535211267606e-06,
841
+ "loss": 0.0113,
842
  "step": 1230
843
  },
844
  {
845
  "epoch": 5.82,
846
+ "learning_rate": 9.417840375586856e-06,
847
+ "loss": 0.0245,
848
  "step": 1240
849
  },
850
  {
851
  "epoch": 5.87,
852
+ "learning_rate": 9.413145539906105e-06,
853
+ "loss": 0.0195,
854
  "step": 1250
855
  },
856
  {
857
  "epoch": 5.92,
858
+ "learning_rate": 9.408450704225352e-06,
859
+ "loss": 0.007,
860
  "step": 1260
861
  },
862
  {
863
  "epoch": 5.96,
864
+ "learning_rate": 9.403755868544602e-06,
865
+ "loss": 0.0071,
866
  "step": 1270
867
  },
868
  {
869
  "epoch": 6.0,
870
+ "eval_corporation_f1": 0.27586206896551724,
871
+ "eval_creative-work_f1": 0.4380165289256198,
872
+ "eval_group_f1": 0.32558139534883723,
873
+ "eval_location_f1": 0.6744186046511628,
874
+ "eval_loss": 0.3188508152961731,
875
+ "eval_overall_accuracy": 0.9494114849601712,
876
+ "eval_overall_f1": 0.6268922528940339,
877
+ "eval_overall_precision": 0.6756238003838771,
878
+ "eval_overall_recall": 0.584717607973422,
879
+ "eval_person_f1": 0.7781250000000001,
880
+ "eval_product_f1": 0.37785016286644946,
881
+ "eval_runtime": 5.2125,
882
+ "eval_samples_per_second": 193.573,
883
+ "eval_steps_per_second": 12.278,
884
  "step": 1278
885
  },
886
  {
887
  "epoch": 6.01,
888
+ "learning_rate": 9.399061032863851e-06,
889
+ "loss": 0.0069,
890
  "step": 1280
891
  },
892
  {
893
  "epoch": 6.06,
894
+ "learning_rate": 9.394366197183098e-06,
895
+ "loss": 0.0062,
896
  "step": 1290
897
  },
898
  {
899
  "epoch": 6.1,
900
+ "learning_rate": 9.389671361502349e-06,
901
+ "loss": 0.0096,
902
  "step": 1300
903
  },
904
  {
905
  "epoch": 6.15,
906
+ "learning_rate": 9.384976525821598e-06,
907
+ "loss": 0.0184,
908
  "step": 1310
909
  },
910
  {
911
  "epoch": 6.2,
912
+ "learning_rate": 9.380281690140845e-06,
913
+ "loss": 0.0116,
914
  "step": 1320
915
  },
916
  {
917
  "epoch": 6.24,
918
+ "learning_rate": 9.375586854460095e-06,
919
+ "loss": 0.0092,
920
  "step": 1330
921
  },
922
  {
923
  "epoch": 6.29,
924
+ "learning_rate": 9.370892018779344e-06,
925
+ "loss": 0.0075,
926
  "step": 1340
927
  },
928
  {
929
  "epoch": 6.34,
930
+ "learning_rate": 9.366197183098593e-06,
931
+ "loss": 0.0101,
932
  "step": 1350
933
  },
934
  {
935
  "epoch": 6.38,
936
+ "learning_rate": 9.361502347417842e-06,
937
+ "loss": 0.0106,
938
  "step": 1360
939
  },
940
  {
941
  "epoch": 6.43,
942
+ "learning_rate": 9.35680751173709e-06,
943
+ "loss": 0.0069,
944
  "step": 1370
945
  },
946
  {
947
  "epoch": 6.48,
948
+ "learning_rate": 9.35211267605634e-06,
949
+ "loss": 0.0064,
950
  "step": 1380
951
  },
952
  {
953
  "epoch": 6.53,
954
+ "learning_rate": 9.347417840375588e-06,
955
+ "loss": 0.0112,
956
  "step": 1390
957
  },
958
  {
959
  "epoch": 6.57,
960
+ "learning_rate": 9.342723004694837e-06,
961
+ "loss": 0.0059,
962
  "step": 1400
963
  },
964
  {
965
  "epoch": 6.62,
966
+ "learning_rate": 9.338028169014086e-06,
967
+ "loss": 0.006,
968
  "step": 1410
969
  },
970
  {
971
  "epoch": 6.67,
972
+ "learning_rate": 9.333333333333334e-06,
973
+ "loss": 0.0089,
974
  "step": 1420
975
  },
976
  {
977
  "epoch": 6.71,
978
+ "learning_rate": 9.328638497652583e-06,
979
+ "loss": 0.0068,
980
  "step": 1430
981
  },
982
  {
983
  "epoch": 6.76,
984
+ "learning_rate": 9.323943661971832e-06,
985
+ "loss": 0.0041,
986
  "step": 1440
987
  },
988
  {
989
  "epoch": 6.81,
990
+ "learning_rate": 9.31924882629108e-06,
991
+ "loss": 0.0191,
992
  "step": 1450
993
  },
994
  {
995
  "epoch": 6.85,
996
+ "learning_rate": 9.31455399061033e-06,
997
+ "loss": 0.0123,
998
  "step": 1460
999
  },
1000
  {
1001
  "epoch": 6.9,
1002
+ "learning_rate": 9.309859154929578e-06,
1003
+ "loss": 0.0033,
1004
  "step": 1470
1005
  },
1006
  {
1007
  "epoch": 6.95,
1008
+ "learning_rate": 9.305164319248827e-06,
1009
+ "loss": 0.0085,
1010
  "step": 1480
1011
  },
1012
  {
1013
  "epoch": 7.0,
1014
+ "learning_rate": 9.300469483568076e-06,
1015
+ "loss": 0.0073,
1016
  "step": 1490
1017
  },
1018
  {
1019
  "epoch": 7.0,
1020
+ "eval_corporation_f1": 0.3061224489795918,
1021
+ "eval_creative-work_f1": 0.4388185654008439,
1022
+ "eval_group_f1": 0.37837837837837834,
1023
+ "eval_location_f1": 0.6946107784431138,
1024
+ "eval_loss": 0.35933107137680054,
1025
+ "eval_overall_accuracy": 0.94756866008798,
1026
+ "eval_overall_f1": 0.631031220435194,
1027
+ "eval_overall_precision": 0.7329670329670329,
1028
+ "eval_overall_recall": 0.5539867109634552,
1029
+ "eval_person_f1": 0.7631160572337042,
1030
+ "eval_product_f1": 0.3374485596707819,
1031
+ "eval_runtime": 4.8206,
1032
+ "eval_samples_per_second": 209.311,
1033
+ "eval_steps_per_second": 13.276,
1034
  "step": 1491
1035
  },
1036
  {
1037
  "epoch": 7.04,
1038
+ "learning_rate": 9.295774647887325e-06,
1039
+ "loss": 0.0109,
1040
  "step": 1500
1041
  },
1042
  {
1043
  "epoch": 7.09,
1044
+ "learning_rate": 9.291079812206573e-06,
1045
+ "loss": 0.0049,
1046
  "step": 1510
1047
  },
1048
  {
1049
  "epoch": 7.14,
1050
+ "learning_rate": 9.286384976525822e-06,
1051
+ "loss": 0.0098,
1052
  "step": 1520
1053
  },
1054
  {
1055
  "epoch": 7.18,
1056
+ "learning_rate": 9.281690140845071e-06,
1057
+ "loss": 0.0053,
1058
  "step": 1530
1059
  },
1060
  {
1061
  "epoch": 7.23,
1062
+ "learning_rate": 9.27699530516432e-06,
1063
+ "loss": 0.0054,
1064
  "step": 1540
1065
  },
1066
  {
1067
  "epoch": 7.28,
1068
+ "learning_rate": 9.272300469483569e-06,
1069
+ "loss": 0.0123,
1070
  "step": 1550
1071
  },
1072
  {
1073
  "epoch": 7.32,
1074
+ "learning_rate": 9.267605633802817e-06,
1075
+ "loss": 0.0037,
1076
  "step": 1560
1077
  },
1078
  {
1079
  "epoch": 7.37,
1080
+ "learning_rate": 9.262910798122066e-06,
1081
+ "loss": 0.0029,
1082
  "step": 1570
1083
  },
1084
  {
1085
  "epoch": 7.42,
1086
+ "learning_rate": 9.258215962441315e-06,
1087
+ "loss": 0.0186,
1088
  "step": 1580
1089
  },
1090
  {
1091
  "epoch": 7.46,
1092
+ "learning_rate": 9.253521126760564e-06,
1093
+ "loss": 0.0089,
1094
  "step": 1590
1095
  },
1096
  {
1097
  "epoch": 7.51,
1098
+ "learning_rate": 9.248826291079813e-06,
1099
+ "loss": 0.006,
1100
  "step": 1600
1101
  },
1102
  {
1103
  "epoch": 7.56,
1104
+ "learning_rate": 9.244131455399061e-06,
1105
+ "loss": 0.0087,
1106
  "step": 1610
1107
  },
1108
  {
1109
  "epoch": 7.61,
1110
+ "learning_rate": 9.23943661971831e-06,
1111
+ "loss": 0.0099,
1112
  "step": 1620
1113
  },
1114
  {
1115
  "epoch": 7.65,
1116
+ "learning_rate": 9.234741784037559e-06,
1117
+ "loss": 0.0293,
1118
  "step": 1630
1119
  },
1120
  {
1121
  "epoch": 7.7,
1122
+ "learning_rate": 9.230046948356808e-06,
1123
+ "loss": 0.0129,
1124
  "step": 1640
1125
  },
1126
  {
1127
  "epoch": 7.75,
1128
+ "learning_rate": 9.225352112676057e-06,
1129
+ "loss": 0.0207,
1130
  "step": 1650
1131
  },
1132
  {
1133
  "epoch": 7.79,
1134
+ "learning_rate": 9.220657276995307e-06,
1135
+ "loss": 0.0055,
1136
  "step": 1660
1137
  },
1138
  {
1139
  "epoch": 7.84,
1140
+ "learning_rate": 9.215962441314554e-06,
1141
+ "loss": 0.0059,
1142
  "step": 1670
1143
  },
1144
  {
1145
  "epoch": 7.89,
1146
+ "learning_rate": 9.211267605633803e-06,
1147
+ "loss": 0.0103,
1148
  "step": 1680
1149
  },
1150
  {
1151
  "epoch": 7.93,
1152
+ "learning_rate": 9.206572769953053e-06,
1153
+ "loss": 0.018,
1154
  "step": 1690
1155
  },
1156
  {
1157
  "epoch": 7.98,
1158
+ "learning_rate": 9.2018779342723e-06,
1159
+ "loss": 0.0135,
1160
  "step": 1700
1161
  },
1162
  {
1163
  "epoch": 8.0,
1164
+ "eval_corporation_f1": 0.34,
1165
+ "eval_creative-work_f1": 0.417910447761194,
1166
+ "eval_group_f1": 0.3088235294117647,
1167
+ "eval_location_f1": 0.6631578947368422,
1168
+ "eval_loss": 0.35640954971313477,
1169
+ "eval_overall_accuracy": 0.9470930923790275,
1170
+ "eval_overall_f1": 0.609981515711645,
1171
+ "eval_overall_precision": 0.6875,
1172
+ "eval_overall_recall": 0.5481727574750831,
1173
+ "eval_person_f1": 0.7485667485667485,
1174
+ "eval_product_f1": 0.3694779116465864,
1175
+ "eval_runtime": 6.3558,
1176
+ "eval_samples_per_second": 158.753,
1177
+ "eval_steps_per_second": 10.07,
1178
  "step": 1704
1179
  },
1180
  {
1181
  "epoch": 8.03,
1182
+ "learning_rate": 9.19718309859155e-06,
1183
+ "loss": 0.006,
1184
  "step": 1710
1185
  },
1186
  {
1187
  "epoch": 8.08,
1188
+ "learning_rate": 9.1924882629108e-06,
1189
+ "loss": 0.0065,
1190
  "step": 1720
1191
  },
1192
  {
1193
  "epoch": 8.12,
1194
+ "learning_rate": 9.187793427230047e-06,
1195
+ "loss": 0.014,
1196
  "step": 1730
1197
  },
1198
  {
1199
  "epoch": 8.17,
1200
+ "learning_rate": 9.183098591549296e-06,
1201
+ "loss": 0.0077,
1202
  "step": 1740
1203
  },
1204
  {
1205
  "epoch": 8.22,
1206
+ "learning_rate": 9.178403755868546e-06,
1207
+ "loss": 0.006,
1208
  "step": 1750
1209
  },
1210
  {
1211
  "epoch": 8.26,
1212
+ "learning_rate": 9.173708920187793e-06,
1213
+ "loss": 0.0062,
1214
  "step": 1760
1215
  },
1216
  {
1217
  "epoch": 8.31,
1218
+ "learning_rate": 9.169014084507044e-06,
1219
+ "loss": 0.0082,
1220
  "step": 1770
1221
  },
1222
  {
1223
  "epoch": 8.36,
1224
+ "learning_rate": 9.164319248826293e-06,
1225
+ "loss": 0.0064,
1226
  "step": 1780
1227
  },
1228
  {
1229
  "epoch": 8.4,
1230
+ "learning_rate": 9.15962441314554e-06,
1231
+ "loss": 0.0089,
1232
  "step": 1790
1233
  },
1234
  {
1235
  "epoch": 8.45,
1236
+ "learning_rate": 9.15492957746479e-06,
1237
+ "loss": 0.004,
1238
  "step": 1800
1239
  },
1240
  {
1241
  "epoch": 8.5,
1242
+ "learning_rate": 9.150234741784039e-06,
1243
+ "loss": 0.0082,
1244
  "step": 1810
1245
  },
1246
  {
1247
  "epoch": 8.54,
1248
+ "learning_rate": 9.145539906103286e-06,
1249
+ "loss": 0.0129,
1250
  "step": 1820
1251
  },
1252
  {
1253
  "epoch": 8.59,
1254
+ "learning_rate": 9.140845070422536e-06,
1255
+ "loss": 0.0141,
1256
  "step": 1830
1257
  },
1258
  {
1259
  "epoch": 8.64,
1260
+ "learning_rate": 9.136150234741785e-06,
1261
+ "loss": 0.0121,
1262
  "step": 1840
1263
  },
1264
  {
1265
  "epoch": 8.69,
1266
+ "learning_rate": 9.131455399061034e-06,
1267
+ "loss": 0.0054,
1268
  "step": 1850
1269
  },
1270
  {
1271
  "epoch": 8.73,
1272
+ "learning_rate": 9.126760563380283e-06,
1273
+ "loss": 0.0136,
1274
  "step": 1860
1275
  },
1276
  {
1277
  "epoch": 8.78,
1278
+ "learning_rate": 9.122065727699532e-06,
1279
+ "loss": 0.0036,
1280
  "step": 1870
1281
  },
1282
  {
1283
  "epoch": 8.83,
1284
+ "learning_rate": 9.11737089201878e-06,
1285
+ "loss": 0.0065,
1286
  "step": 1880
1287
  },
1288
  {
1289
  "epoch": 8.87,
1290
+ "learning_rate": 9.11267605633803e-06,
1291
+ "loss": 0.0079,
1292
  "step": 1890
1293
  },
1294
  {
1295
  "epoch": 8.92,
1296
+ "learning_rate": 9.107981220657278e-06,
1297
+ "loss": 0.0154,
1298
  "step": 1900
1299
  },
1300
  {
1301
  "epoch": 8.97,
1302
+ "learning_rate": 9.103286384976527e-06,
1303
+ "loss": 0.0097,
1304
  "step": 1910
1305
  },
1306
  {
1307
  "epoch": 9.0,
1308
+ "eval_corporation_f1": 0.3111111111111111,
1309
+ "eval_creative-work_f1": 0.46093750000000006,
1310
+ "eval_group_f1": 0.3835616438356164,
1311
+ "eval_location_f1": 0.7089947089947088,
1312
+ "eval_loss": 0.30845504999160767,
1313
+ "eval_overall_accuracy": 0.9515515396504577,
1314
+ "eval_overall_f1": 0.6495149725854069,
1315
+ "eval_overall_precision": 0.6598114824335904,
1316
+ "eval_overall_recall": 0.6395348837209303,
1317
+ "eval_person_f1": 0.7905604719764011,
1318
+ "eval_product_f1": 0.40830449826989623,
1319
+ "eval_runtime": 5.2314,
1320
+ "eval_samples_per_second": 192.875,
1321
+ "eval_steps_per_second": 12.234,
1322
  "step": 1917
1323
  },
1324
  {
1325
  "epoch": 9.01,
1326
+ "learning_rate": 9.098591549295776e-06,
1327
+ "loss": 0.005,
1328
  "step": 1920
1329
  },
1330
  {
1331
  "epoch": 9.06,
1332
+ "learning_rate": 9.093896713615024e-06,
1333
+ "loss": 0.0036,
1334
  "step": 1930
1335
  },
1336
  {
1337
  "epoch": 9.11,
1338
+ "learning_rate": 9.089201877934273e-06,
1339
+ "loss": 0.0157,
1340
  "step": 1940
1341
  },
1342
  {
1343
  "epoch": 9.15,
1344
+ "learning_rate": 9.084507042253522e-06,
1345
+ "loss": 0.0044,
1346
  "step": 1950
1347
  },
1348
  {
1349
  "epoch": 9.2,
1350
+ "learning_rate": 9.07981220657277e-06,
1351
+ "loss": 0.0139,
1352
  "step": 1960
1353
  },
1354
  {
1355
  "epoch": 9.25,
1356
+ "learning_rate": 9.07511737089202e-06,
1357
+ "loss": 0.0135,
1358
  "step": 1970
1359
  },
1360
  {
1361
  "epoch": 9.3,
1362
+ "learning_rate": 9.070422535211268e-06,
1363
+ "loss": 0.003,
1364
  "step": 1980
1365
  },
1366
  {
1367
  "epoch": 9.34,
1368
+ "learning_rate": 9.065727699530517e-06,
1369
+ "loss": 0.0078,
1370
  "step": 1990
1371
  },
1372
  {
1373
  "epoch": 9.39,
1374
+ "learning_rate": 9.061032863849766e-06,
1375
+ "loss": 0.0052,
1376
  "step": 2000
1377
  },
1378
  {
1379
  "epoch": 9.44,
1380
+ "learning_rate": 9.056338028169015e-06,
1381
+ "loss": 0.0043,
1382
  "step": 2010
1383
  },
1384
  {
1385
  "epoch": 9.48,
1386
+ "learning_rate": 9.051643192488264e-06,
1387
+ "loss": 0.0075,
1388
  "step": 2020
1389
  },
1390
  {
1391
  "epoch": 9.53,
1392
+ "learning_rate": 9.046948356807512e-06,
1393
+ "loss": 0.0041,
1394
  "step": 2030
1395
  },
1396
  {
1397
  "epoch": 9.58,
1398
+ "learning_rate": 9.042253521126761e-06,
1399
+ "loss": 0.0042,
1400
  "step": 2040
1401
  },
1402
  {
1403
  "epoch": 9.62,
1404
+ "learning_rate": 9.03755868544601e-06,
1405
+ "loss": 0.0064,
1406
  "step": 2050
1407
  },
1408
  {
1409
  "epoch": 9.67,
1410
+ "learning_rate": 9.032863849765259e-06,
1411
+ "loss": 0.0045,
1412
  "step": 2060
1413
  },
1414
  {
1415
  "epoch": 9.72,
1416
+ "learning_rate": 9.028169014084507e-06,
1417
+ "loss": 0.0074,
1418
  "step": 2070
1419
  },
1420
  {
1421
  "epoch": 9.77,
1422
+ "learning_rate": 9.023474178403756e-06,
1423
+ "loss": 0.0048,
1424
  "step": 2080
1425
  },
1426
  {
1427
  "epoch": 9.81,
1428
+ "learning_rate": 9.018779342723005e-06,
1429
+ "loss": 0.0027,
1430
  "step": 2090
1431
  },
1432
  {
1433
  "epoch": 9.86,
1434
+ "learning_rate": 9.014084507042254e-06,
1435
+ "loss": 0.0028,
1436
  "step": 2100
1437
  },
1438
  {
1439
  "epoch": 9.91,
1440
+ "learning_rate": 9.009389671361503e-06,
1441
+ "loss": 0.0096,
1442
  "step": 2110
1443
  },
1444
  {
1445
  "epoch": 9.95,
1446
+ "learning_rate": 9.004694835680751e-06,
1447
+ "loss": 0.0058,
1448
  "step": 2120
1449
  },
1450
  {
1451
  "epoch": 10.0,
1452
+ "learning_rate": 9e-06,
1453
+ "loss": 0.0108,
1454
  "step": 2130
1455
  },
1456
  {
1457
  "epoch": 10.0,
1458
+ "eval_corporation_f1": 0.35294117647058826,
1459
+ "eval_creative-work_f1": 0.45801526717557256,
1460
+ "eval_group_f1": 0.36486486486486486,
1461
+ "eval_location_f1": 0.689655172413793,
1462
+ "eval_loss": 0.30454060435295105,
1463
+ "eval_overall_accuracy": 0.950897634050648,
1464
+ "eval_overall_f1": 0.6540880503144654,
1465
+ "eval_overall_precision": 0.6604572396274344,
1466
+ "eval_overall_recall": 0.6478405315614618,
1467
+ "eval_person_f1": 0.784313725490196,
1468
+ "eval_product_f1": 0.43870967741935485,
1469
+ "eval_runtime": 6.5888,
1470
+ "eval_samples_per_second": 153.138,
1471
+ "eval_steps_per_second": 9.713,
1472
  "step": 2130
1473
  },
1474
  {
1475
  "epoch": 10.05,
1476
+ "learning_rate": 8.995305164319249e-06,
1477
+ "loss": 0.0035,
1478
  "step": 2140
1479
  },
1480
  {
1481
  "epoch": 10.09,
1482
+ "learning_rate": 8.990610328638498e-06,
1483
+ "loss": 0.003,
1484
  "step": 2150
1485
  },
1486
  {
1487
  "epoch": 10.14,
1488
+ "learning_rate": 8.985915492957748e-06,
1489
+ "loss": 0.0117,
1490
  "step": 2160
1491
  },
1492
  {
1493
  "epoch": 10.19,
1494
+ "learning_rate": 8.981220657276995e-06,
1495
+ "loss": 0.004,
1496
  "step": 2170
1497
  },
1498
  {
1499
  "epoch": 10.23,
1500
+ "learning_rate": 8.976525821596244e-06,
1501
+ "loss": 0.0038,
1502
  "step": 2180
1503
  },
1504
  {
1505
  "epoch": 10.28,
1506
+ "learning_rate": 8.971830985915495e-06,
1507
+ "loss": 0.0057,
1508
  "step": 2190
1509
  },
1510
  {
1511
  "epoch": 10.33,
1512
+ "learning_rate": 8.967136150234742e-06,
1513
+ "loss": 0.0025,
1514
  "step": 2200
1515
  },
1516
  {
1517
  "epoch": 10.38,
1518
+ "learning_rate": 8.96244131455399e-06,
1519
+ "loss": 0.0035,
1520
  "step": 2210
1521
  },
1522
  {
1523
  "epoch": 10.42,
1524
+ "learning_rate": 8.957746478873241e-06,
1525
+ "loss": 0.008,
1526
  "step": 2220
1527
  },
1528
  {
1529
  "epoch": 10.47,
1530
+ "learning_rate": 8.953051643192488e-06,
1531
+ "loss": 0.0025,
1532
  "step": 2230
1533
  },
1534
  {
1535
  "epoch": 10.52,
1536
+ "learning_rate": 8.948356807511737e-06,
1537
+ "loss": 0.0023,
1538
  "step": 2240
1539
  },
1540
  {
1541
  "epoch": 10.56,
1542
+ "learning_rate": 8.943661971830987e-06,
1543
+ "loss": 0.0086,
1544
  "step": 2250
1545
  },
1546
  {
1547
  "epoch": 10.61,
1548
+ "learning_rate": 8.938967136150235e-06,
1549
+ "loss": 0.0101,
1550
  "step": 2260
1551
  },
1552
  {
1553
  "epoch": 10.66,
1554
+ "learning_rate": 8.934272300469485e-06,
1555
+ "loss": 0.0072,
1556
  "step": 2270
1557
  },
1558
  {
1559
  "epoch": 10.7,
1560
+ "learning_rate": 8.929577464788734e-06,
1561
+ "loss": 0.004,
1562
  "step": 2280
1563
  },
1564
  {
1565
  "epoch": 10.75,
1566
+ "learning_rate": 8.924882629107981e-06,
1567
+ "loss": 0.0026,
1568
  "step": 2290
1569
  },
1570
  {
1571
  "epoch": 10.8,
1572
+ "learning_rate": 8.920187793427231e-06,
1573
+ "loss": 0.0047,
1574
  "step": 2300
1575
  },
1576
  {
1577
  "epoch": 10.85,
1578
+ "learning_rate": 8.91549295774648e-06,
1579
+ "loss": 0.002,
1580
  "step": 2310
1581
  },
1582
  {
1583
  "epoch": 10.89,
1584
+ "learning_rate": 8.910798122065727e-06,
1585
+ "loss": 0.0116,
1586
  "step": 2320
1587
  },
1588
  {
1589
  "epoch": 10.94,
1590
+ "learning_rate": 8.906103286384978e-06,
1591
+ "loss": 0.0057,
1592
  "step": 2330
1593
  },
1594
  {
1595
  "epoch": 10.99,
1596
+ "learning_rate": 8.901408450704227e-06,
1597
+ "loss": 0.013,
1598
  "step": 2340
1599
  },
1600
  {
1601
  "epoch": 11.0,
1602
+ "eval_corporation_f1": 0.2782608695652174,
1603
+ "eval_creative-work_f1": 0.4247787610619469,
1604
+ "eval_group_f1": 0.3357664233576642,
1605
+ "eval_location_f1": 0.736842105263158,
1606
+ "eval_loss": 0.3382622003555298,
1607
+ "eval_overall_accuracy": 0.9507192961597908,
1608
+ "eval_overall_f1": 0.6469565217391304,
1609
+ "eval_overall_precision": 0.6788321167883211,
1610
+ "eval_overall_recall": 0.6179401993355482,
1611
+ "eval_person_f1": 0.7958271236959762,
1612
+ "eval_product_f1": 0.3655172413793103,
1613
+ "eval_runtime": 5.3383,
1614
+ "eval_samples_per_second": 189.013,
1615
+ "eval_steps_per_second": 11.989,
1616
  "step": 2343
1617
  },
1618
  {
1619
  "epoch": 11.03,
1620
+ "learning_rate": 8.896713615023475e-06,
1621
+ "loss": 0.0059,
1622
  "step": 2350
1623
  },
1624
  {
1625
  "epoch": 11.08,
1626
+ "learning_rate": 8.892018779342724e-06,
1627
+ "loss": 0.0153,
1628
  "step": 2360
1629
  },
1630
  {
1631
  "epoch": 11.13,
1632
+ "learning_rate": 8.887323943661973e-06,
1633
+ "loss": 0.0057,
1634
  "step": 2370
1635
  },
1636
  {
1637
  "epoch": 11.17,
1638
+ "learning_rate": 8.882629107981222e-06,
1639
+ "loss": 0.0048,
1640
  "step": 2380
1641
  },
1642
  {
1643
  "epoch": 11.22,
1644
+ "learning_rate": 8.87793427230047e-06,
1645
+ "loss": 0.0118,
1646
  "step": 2390
1647
  },
1648
  {
1649
  "epoch": 11.27,
1650
+ "learning_rate": 8.87323943661972e-06,
1651
+ "loss": 0.0071,
1652
  "step": 2400
1653
  },
1654
  {
1655
  "epoch": 11.31,
1656
+ "learning_rate": 8.868544600938968e-06,
1657
+ "loss": 0.0098,
1658
  "step": 2410
1659
  },
1660
  {
1661
  "epoch": 11.36,
1662
+ "learning_rate": 8.863849765258217e-06,
1663
+ "loss": 0.0065,
1664
  "step": 2420
1665
  },
1666
  {
1667
  "epoch": 11.41,
1668
+ "learning_rate": 8.859154929577466e-06,
1669
+ "loss": 0.0051,
1670
  "step": 2430
1671
  },
1672
  {
1673
  "epoch": 11.46,
1674
+ "learning_rate": 8.854460093896714e-06,
1675
+ "loss": 0.0053,
1676
  "step": 2440
1677
  },
1678
  {
1679
  "epoch": 11.5,
1680
+ "learning_rate": 8.849765258215963e-06,
1681
+ "loss": 0.0024,
1682
  "step": 2450
1683
  },
1684
  {
1685
  "epoch": 11.55,
1686
+ "learning_rate": 8.845070422535212e-06,
1687
+ "loss": 0.0045,
1688
  "step": 2460
1689
  },
1690
  {
1691
  "epoch": 11.6,
1692
+ "learning_rate": 8.84037558685446e-06,
1693
+ "loss": 0.0172,
1694
  "step": 2470
1695
  },
1696
  {
1697
  "epoch": 11.64,
1698
+ "learning_rate": 8.83568075117371e-06,
1699
+ "loss": 0.0053,
1700
  "step": 2480
1701
  },
1702
  {
1703
  "epoch": 11.69,
1704
+ "learning_rate": 8.830985915492958e-06,
1705
+ "loss": 0.014,
1706
  "step": 2490
1707
  },
1708
  {
1709
  "epoch": 11.74,
1710
+ "learning_rate": 8.826291079812207e-06,
1711
+ "loss": 0.0178,
1712
  "step": 2500
1713
  },
1714
  {
1715
  "epoch": 11.78,
1716
+ "learning_rate": 8.821596244131456e-06,
1717
+ "loss": 0.0071,
1718
  "step": 2510
1719
  },
1720
  {
1721
  "epoch": 11.83,
1722
+ "learning_rate": 8.816901408450705e-06,
1723
+ "loss": 0.0086,
1724
  "step": 2520
1725
  },
1726
  {
1727
  "epoch": 11.88,
1728
+ "learning_rate": 8.812206572769954e-06,
1729
+ "loss": 0.0056,
1730
  "step": 2530
1731
  },
1732
  {
1733
  "epoch": 11.92,
1734
+ "learning_rate": 8.807511737089202e-06,
1735
+ "loss": 0.0029,
1736
  "step": 2540
1737
  },
1738
  {
1739
  "epoch": 11.97,
1740
+ "learning_rate": 8.802816901408451e-06,
1741
+ "loss": 0.0076,
1742
  "step": 2550
1743
  },
1744
  {
1745
  "epoch": 12.0,
1746
+ "eval_corporation_f1": 0.2708333333333333,
1747
+ "eval_creative-work_f1": 0.39852398523985244,
1748
+ "eval_group_f1": 0.33333333333333337,
1749
+ "eval_location_f1": 0.6740331491712707,
1750
+ "eval_loss": 0.36174312233924866,
1751
+ "eval_overall_accuracy": 0.9473903221971228,
1752
+ "eval_overall_f1": 0.6143187066974597,
1753
+ "eval_overall_precision": 0.6919875130072841,
1754
+ "eval_overall_recall": 0.5523255813953488,
1755
+ "eval_person_f1": 0.75658419792498,
1756
+ "eval_product_f1": 0.3524590163934426,
1757
+ "eval_runtime": 5.9165,
1758
+ "eval_samples_per_second": 170.539,
1759
+ "eval_steps_per_second": 10.817,
1760
  "step": 2556
1761
  },
1762
  {
1763
  "epoch": 12.02,
1764
+ "learning_rate": 8.7981220657277e-06,
1765
+ "loss": 0.0034,
1766
  "step": 2560
1767
  },
1768
  {
1769
  "epoch": 12.07,
1770
+ "learning_rate": 8.793427230046949e-06,
1771
+ "loss": 0.0029,
1772
  "step": 2570
1773
  },
1774
  {
1775
  "epoch": 12.11,
1776
+ "learning_rate": 8.7887323943662e-06,
1777
+ "loss": 0.0084,
1778
  "step": 2580
1779
  },
1780
  {
1781
  "epoch": 12.16,
1782
+ "learning_rate": 8.784037558685446e-06,
1783
+ "loss": 0.0058,
1784
  "step": 2590
1785
  },
1786
  {
1787
  "epoch": 12.21,
1788
+ "learning_rate": 8.779342723004695e-06,
1789
+ "loss": 0.0025,
1790
  "step": 2600
1791
  },
1792
  {
1793
  "epoch": 12.25,
1794
+ "learning_rate": 8.774647887323946e-06,
1795
+ "loss": 0.0094,
1796
  "step": 2610
1797
  },
1798
  {
1799
  "epoch": 12.3,
1800
+ "learning_rate": 8.769953051643193e-06,
1801
+ "loss": 0.0051,
1802
  "step": 2620
1803
  },
1804
  {
1805
  "epoch": 12.35,
1806
+ "learning_rate": 8.765258215962442e-06,
1807
+ "loss": 0.0064,
1808
  "step": 2630
1809
  },
1810
  {
1811
  "epoch": 12.39,
1812
+ "learning_rate": 8.760563380281692e-06,
1813
+ "loss": 0.0095,
1814
  "step": 2640
1815
  },
1816
  {
1817
  "epoch": 12.44,
1818
+ "learning_rate": 8.755868544600939e-06,
1819
+ "loss": 0.0049,
1820
  "step": 2650
1821
  },
1822
  {
1823
  "epoch": 12.49,
1824
+ "learning_rate": 8.751173708920188e-06,
1825
+ "loss": 0.0114,
1826
  "step": 2660
1827
  },
1828
  {
1829
  "epoch": 12.54,
1830
+ "learning_rate": 8.746478873239437e-06,
1831
+ "loss": 0.0054,
1832
  "step": 2670
1833
  },
1834
  {
1835
  "epoch": 12.58,
1836
+ "learning_rate": 8.741784037558685e-06,
1837
+ "loss": 0.0108,
1838
  "step": 2680
1839
  },
1840
  {
1841
  "epoch": 12.63,
1842
+ "learning_rate": 8.737089201877936e-06,
1843
+ "loss": 0.0052,
1844
  "step": 2690
1845
  },
1846
  {
1847
  "epoch": 12.68,
1848
+ "learning_rate": 8.732394366197183e-06,
1849
+ "loss": 0.0086,
1850
  "step": 2700
1851
  },
1852
  {
1853
  "epoch": 12.72,
1854
+ "learning_rate": 8.727699530516432e-06,
1855
+ "loss": 0.0019,
1856
  "step": 2710
1857
  },
1858
  {
1859
  "epoch": 12.77,
1860
+ "learning_rate": 8.723004694835682e-06,
1861
+ "loss": 0.0024,
1862
  "step": 2720
1863
  },
1864
  {
1865
  "epoch": 12.82,
1866
+ "learning_rate": 8.71830985915493e-06,
1867
+ "loss": 0.0038,
1868
  "step": 2730
1869
  },
1870
  {
1871
  "epoch": 12.86,
1872
+ "learning_rate": 8.713615023474178e-06,
1873
+ "loss": 0.0097,
1874
  "step": 2740
1875
  },
1876
  {
1877
  "epoch": 12.91,
1878
+ "learning_rate": 8.708920187793429e-06,
1879
+ "loss": 0.0054,
1880
  "step": 2750
1881
  },
1882
  {
1883
  "epoch": 12.96,
1884
+ "learning_rate": 8.704225352112676e-06,
1885
+ "loss": 0.0042,
1886
  "step": 2760
1887
  },
1888
  {
1889
  "epoch": 13.0,
1890
+ "eval_corporation_f1": 0.24778761061946902,
1891
+ "eval_creative-work_f1": 0.39148936170212767,
1892
+ "eval_group_f1": 0.35211267605633806,
1893
+ "eval_location_f1": 0.656084656084656,
1894
+ "eval_loss": 0.3746570348739624,
1895
+ "eval_overall_accuracy": 0.9472714302698847,
1896
+ "eval_overall_f1": 0.6219790241678067,
1897
+ "eval_overall_precision": 0.6895854398382204,
1898
+ "eval_overall_recall": 0.5664451827242525,
1899
+ "eval_person_f1": 0.7741935483870966,
1900
+ "eval_product_f1": 0.35390946502057613,
1901
+ "eval_runtime": 6.4418,
1902
+ "eval_samples_per_second": 156.633,
1903
+ "eval_steps_per_second": 9.935,
1904
  "step": 2769
1905
  },
1906
  {
1907
  "epoch": 13.0,
1908
+ "learning_rate": 8.699530516431926e-06,
1909
+ "loss": 0.0039,
1910
  "step": 2770
1911
  },
1912
  {
1913
  "epoch": 13.05,
1914
+ "learning_rate": 8.694835680751175e-06,
1915
+ "loss": 0.0017,
1916
  "step": 2780
1917
  },
1918
  {
1919
  "epoch": 13.1,
1920
+ "learning_rate": 8.690140845070422e-06,
1921
+ "loss": 0.0102,
1922
  "step": 2790
1923
  },
1924
  {
1925
  "epoch": 13.15,
1926
+ "learning_rate": 8.685446009389673e-06,
1927
+ "loss": 0.0013,
1928
  "step": 2800
1929
  },
1930
  {
1931
  "epoch": 13.19,
1932
+ "learning_rate": 8.680751173708921e-06,
1933
+ "loss": 0.0076,
1934
  "step": 2810
1935
  },
1936
  {
1937
  "epoch": 13.24,
1938
+ "learning_rate": 8.676056338028169e-06,
1939
+ "loss": 0.0108,
1940
  "step": 2820
1941
  },
1942
  {
1943
  "epoch": 13.29,
1944
+ "learning_rate": 8.671361502347419e-06,
1945
+ "loss": 0.0107,
1946
  "step": 2830
1947
  },
1948
  {
1949
  "epoch": 13.33,
1950
+ "learning_rate": 8.666666666666668e-06,
1951
+ "loss": 0.0058,
1952
  "step": 2840
1953
  },
1954
  {
1955
  "epoch": 13.38,
1956
+ "learning_rate": 8.661971830985915e-06,
1957
+ "loss": 0.0033,
1958
  "step": 2850
1959
  },
1960
  {
1961
  "epoch": 13.43,
1962
+ "learning_rate": 8.657276995305165e-06,
1963
+ "loss": 0.0028,
1964
  "step": 2860
1965
  },
1966
  {
1967
  "epoch": 13.47,
1968
+ "learning_rate": 8.652582159624414e-06,
1969
+ "loss": 0.0046,
1970
  "step": 2870
1971
  },
1972
  {
1973
  "epoch": 13.52,
1974
+ "learning_rate": 8.647887323943663e-06,
1975
+ "loss": 0.002,
1976
  "step": 2880
1977
  },
1978
  {
1979
  "epoch": 13.57,
1980
+ "learning_rate": 8.643192488262912e-06,
1981
+ "loss": 0.0042,
1982
  "step": 2890
1983
  },
1984
  {
1985
  "epoch": 13.62,
1986
+ "learning_rate": 8.63849765258216e-06,
1987
+ "loss": 0.0066,
1988
  "step": 2900
1989
  },
1990
  {
1991
  "epoch": 13.66,
1992
+ "learning_rate": 8.63380281690141e-06,
1993
+ "loss": 0.0022,
1994
  "step": 2910
1995
  },
1996
  {
1997
  "epoch": 13.71,
1998
+ "learning_rate": 8.629107981220658e-06,
1999
+ "loss": 0.008,
2000
  "step": 2920
2001
  },
2002
  {
2003
  "epoch": 13.76,
2004
+ "learning_rate": 8.624413145539907e-06,
2005
+ "loss": 0.0037,
2006
  "step": 2930
2007
  },
2008
  {
2009
  "epoch": 13.8,
2010
+ "learning_rate": 8.619718309859156e-06,
2011
+ "loss": 0.0106,
2012
  "step": 2940
2013
  },
2014
  {
2015
  "epoch": 13.85,
2016
+ "learning_rate": 8.615023474178405e-06,
2017
+ "loss": 0.0031,
2018
  "step": 2950
2019
  },
2020
  {
2021
  "epoch": 13.9,
2022
+ "learning_rate": 8.610328638497653e-06,
2023
+ "loss": 0.0023,
2024
  "step": 2960
2025
  },
2026
  {
2027
  "epoch": 13.94,
2028
+ "learning_rate": 8.605633802816902e-06,
2029
+ "loss": 0.0118,
2030
  "step": 2970
2031
  },
2032
  {
2033
  "epoch": 13.99,
2034
+ "learning_rate": 8.600938967136151e-06,
2035
+ "loss": 0.0049,
2036
  "step": 2980
2037
  },
2038
  {
2039
  "epoch": 14.0,
2040
+ "eval_corporation_f1": 0.29752066115702475,
2041
+ "eval_creative-work_f1": 0.44360902255639095,
2042
+ "eval_group_f1": 0.36241610738255037,
2043
+ "eval_location_f1": 0.6834170854271356,
2044
+ "eval_loss": 0.3375568985939026,
2045
+ "eval_overall_accuracy": 0.950897634050648,
2046
+ "eval_overall_f1": 0.6430135786246168,
2047
+ "eval_overall_precision": 0.680259499536608,
2048
+ "eval_overall_recall": 0.6096345514950167,
2049
+ "eval_person_f1": 0.7901614142966948,
2050
+ "eval_product_f1": 0.388663967611336,
2051
+ "eval_runtime": 6.509,
2052
+ "eval_samples_per_second": 155.016,
2053
+ "eval_steps_per_second": 9.833,
2054
  "step": 2982
2055
  },
2056
  {
2057
+ "epoch": 14.0,
2058
+ "step": 2982,
2059
+ "total_flos": 1093917417133104.0,
2060
+ "train_loss": 0.01162932096780006,
2061
+ "train_runtime": 800.644,
2062
+ "train_samples_per_second": 423.909,
2063
+ "train_steps_per_second": 26.604
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2064
  }
2065
  ],
2066
  "max_steps": 21300,
2067
  "num_train_epochs": 100,
2068
+ "total_flos": 1093917417133104.0,
2069
  "trial_name": null,
2070
  "trial_params": null
2071
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b253bcb1457bfa60ed176f85c9f9a3a1a1d8e1e3b36aac487b353765c04f06ec
3
  size 3119
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9b2b8b76170893b10db52fb47f8d28118527ff73615505009abf7d395b897f6
3
  size 3119
validation_results.json CHANGED
@@ -1,17 +1,17 @@
1
  {
2
- "epoch": 15.0,
3
- "validation_corporation_f1": 0.2736842105263158,
4
- "validation_creative-work_f1": 0.34146341463414637,
5
- "validation_group_f1": 0.3006535947712418,
6
- "validation_location_f1": 0.6703296703296703,
7
- "validation_loss": 0.2690572142601013,
8
- "validation_overall_accuracy": 0.9486386874331233,
9
- "validation_overall_f1": 0.629414394278051,
10
- "validation_overall_precision": 0.6815101645692159,
11
- "validation_overall_recall": 0.584717607973422,
12
- "validation_person_f1": 0.7768115942028985,
13
- "validation_product_f1": 0.32432432432432434,
14
- "validation_runtime": 6.5668,
15
- "validation_samples_per_second": 153.651,
16
- "validation_steps_per_second": 9.746
17
  }
1
  {
2
+ "epoch": 14.0,
3
+ "validation_corporation_f1": 0.30769230769230765,
4
+ "validation_creative-work_f1": 0.48421052631578954,
5
+ "validation_group_f1": 0.31666666666666665,
6
+ "validation_location_f1": 0.6808510638297872,
7
+ "validation_loss": 0.27115458250045776,
8
+ "validation_overall_accuracy": 0.9502437284508382,
9
+ "validation_overall_f1": 0.6339784946236559,
10
+ "validation_overall_precision": 0.6574487065120428,
11
+ "validation_overall_recall": 0.6121262458471761,
12
+ "validation_person_f1": 0.7734553775743707,
13
+ "validation_product_f1": 0.3986254295532646,
14
+ "validation_runtime": 5.0688,
15
+ "validation_samples_per_second": 199.06,
16
+ "validation_steps_per_second": 12.626
17
  }