lole25 commited on
Commit
d03d825
1 Parent(s): ac0b1b9

Model save

Browse files
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ tags:
4
+ - trl
5
+ - dpo
6
+ - generated_from_trainer
7
+ base_model: DUAL-GPO/phi-2-irepo-chatml-merged-i0
8
+ model-index:
9
+ - name: phi-2-irepo-chatml-v18-i1
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # phi-2-irepo-chatml-v18-i1
17
+
18
+ This model is a fine-tuned version of [DUAL-GPO/phi-2-irepo-chatml-merged-i0](https://huggingface.co/DUAL-GPO/phi-2-irepo-chatml-merged-i0) on the None dataset.
19
+
20
+ ## Model description
21
+
22
+ More information needed
23
+
24
+ ## Intended uses & limitations
25
+
26
+ More information needed
27
+
28
+ ## Training and evaluation data
29
+
30
+ More information needed
31
+
32
+ ## Training procedure
33
+
34
+ ### Training hyperparameters
35
+
36
+ The following hyperparameters were used during training:
37
+ - learning_rate: 5e-06
38
+ - train_batch_size: 4
39
+ - eval_batch_size: 4
40
+ - seed: 42
41
+ - distributed_type: multi-GPU
42
+ - gradient_accumulation_steps: 4
43
+ - total_train_batch_size: 16
44
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
+ - lr_scheduler_type: cosine
46
+ - lr_scheduler_warmup_ratio: 0.1
47
+ - num_epochs: 1
48
+
49
+ ### Training results
50
+
51
+
52
+
53
+ ### Framework versions
54
+
55
+ - PEFT 0.7.1
56
+ - Transformers 4.36.2
57
+ - Pytorch 2.1.2+cu121
58
+ - Datasets 2.14.6
59
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7f89d1206f4d7ac8669b352c059c71305a046d60f9caf2dd7b7c0fb9be7522d8
3
  size 335579632
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f156562ce6c6a9ddfe9a1b349818825aea7068533ba709bf040b08da5788e7c1
3
  size 335579632
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.22404387894471486,
4
+ "train_runtime": 11451.6187,
5
+ "train_samples": 30000,
6
+ "train_samples_per_second": 2.62,
7
+ "train_steps_per_second": 0.164
8
+ }
runs/May21_12-21-57_gpu4-119-5/events.out.tfevents.1716258281.gpu4-119-5.1644164.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f013387fa6691bd8a25b0ca4a81477ab586054c42c5faaa313959286f275e7c6
3
- size 113182
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ae5f567a6a46de356a6bb11ea057f82110d6ba947625311411abc7f0a5f856d
3
+ size 124314
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.22404387894471486,
4
+ "train_runtime": 11451.6187,
5
+ "train_samples": 30000,
6
+ "train_samples_per_second": 2.62,
7
+ "train_steps_per_second": 0.164
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2662 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 500,
6
+ "global_step": 1875,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 2.6595744680851065e-08,
14
+ "logits/chosen": 0.4583740830421448,
15
+ "logits/rejected": 0.45381295680999756,
16
+ "logps/chosen": -403.16717529296875,
17
+ "logps/rejected": -354.3865051269531,
18
+ "loss": 0.1853,
19
+ "rewards/accuracies": 0.0,
20
+ "rewards/chosen": 0.0,
21
+ "rewards/margins": 0.0,
22
+ "rewards/rejected": 0.0,
23
+ "step": 1
24
+ },
25
+ {
26
+ "epoch": 0.01,
27
+ "learning_rate": 2.6595744680851066e-07,
28
+ "logits/chosen": 0.18826468288898468,
29
+ "logits/rejected": 0.16825373470783234,
30
+ "logps/chosen": -402.1011962890625,
31
+ "logps/rejected": -396.2879638671875,
32
+ "loss": 0.2347,
33
+ "rewards/accuracies": 0.25,
34
+ "rewards/chosen": -0.09647607803344727,
35
+ "rewards/margins": -0.014048744924366474,
36
+ "rewards/rejected": -0.08242733776569366,
37
+ "step": 10
38
+ },
39
+ {
40
+ "epoch": 0.01,
41
+ "learning_rate": 5.319148936170213e-07,
42
+ "logits/chosen": 0.16707709431648254,
43
+ "logits/rejected": 0.2701728343963623,
44
+ "logps/chosen": -453.8223571777344,
45
+ "logps/rejected": -442.9689025878906,
46
+ "loss": 0.2389,
47
+ "rewards/accuracies": 0.3812499940395355,
48
+ "rewards/chosen": -0.06698164343833923,
49
+ "rewards/margins": -0.012189853005111217,
50
+ "rewards/rejected": -0.05479179695248604,
51
+ "step": 20
52
+ },
53
+ {
54
+ "epoch": 0.02,
55
+ "learning_rate": 7.97872340425532e-07,
56
+ "logits/chosen": 0.17515605688095093,
57
+ "logits/rejected": 0.2583461403846741,
58
+ "logps/chosen": -361.522705078125,
59
+ "logps/rejected": -338.58416748046875,
60
+ "loss": 0.2243,
61
+ "rewards/accuracies": 0.3375000059604645,
62
+ "rewards/chosen": -0.07758971303701401,
63
+ "rewards/margins": 0.0016961194342002273,
64
+ "rewards/rejected": -0.07928583025932312,
65
+ "step": 30
66
+ },
67
+ {
68
+ "epoch": 0.02,
69
+ "learning_rate": 1.0638297872340427e-06,
70
+ "logits/chosen": 0.17781862616539001,
71
+ "logits/rejected": 0.23920877277851105,
72
+ "logps/chosen": -419.521728515625,
73
+ "logps/rejected": -420.13299560546875,
74
+ "loss": 0.216,
75
+ "rewards/accuracies": 0.3499999940395355,
76
+ "rewards/chosen": -0.14323017001152039,
77
+ "rewards/margins": -0.003557709977030754,
78
+ "rewards/rejected": -0.13967247307300568,
79
+ "step": 40
80
+ },
81
+ {
82
+ "epoch": 0.03,
83
+ "learning_rate": 1.3297872340425533e-06,
84
+ "logits/chosen": 0.20945556461811066,
85
+ "logits/rejected": 0.301052451133728,
86
+ "logps/chosen": -369.31622314453125,
87
+ "logps/rejected": -369.2641906738281,
88
+ "loss": 0.2338,
89
+ "rewards/accuracies": 0.34375,
90
+ "rewards/chosen": -0.10775469243526459,
91
+ "rewards/margins": 0.00019715214148163795,
92
+ "rewards/rejected": -0.10795185714960098,
93
+ "step": 50
94
+ },
95
+ {
96
+ "epoch": 0.03,
97
+ "learning_rate": 1.595744680851064e-06,
98
+ "logits/chosen": 0.21086159348487854,
99
+ "logits/rejected": 0.2350298911333084,
100
+ "logps/chosen": -401.9427490234375,
101
+ "logps/rejected": -401.28765869140625,
102
+ "loss": 0.2272,
103
+ "rewards/accuracies": 0.3375000059604645,
104
+ "rewards/chosen": -0.13143998384475708,
105
+ "rewards/margins": -0.003936966881155968,
106
+ "rewards/rejected": -0.12750300765037537,
107
+ "step": 60
108
+ },
109
+ {
110
+ "epoch": 0.04,
111
+ "learning_rate": 1.8617021276595745e-06,
112
+ "logits/chosen": 0.22450070083141327,
113
+ "logits/rejected": 0.24035272002220154,
114
+ "logps/chosen": -421.1205139160156,
115
+ "logps/rejected": -449.4365234375,
116
+ "loss": 0.2225,
117
+ "rewards/accuracies": 0.4000000059604645,
118
+ "rewards/chosen": -0.07039676606655121,
119
+ "rewards/margins": 0.013278981670737267,
120
+ "rewards/rejected": -0.08367574214935303,
121
+ "step": 70
122
+ },
123
+ {
124
+ "epoch": 0.04,
125
+ "learning_rate": 2.1276595744680853e-06,
126
+ "logits/chosen": 0.2308211624622345,
127
+ "logits/rejected": 0.2585153579711914,
128
+ "logps/chosen": -418.4386291503906,
129
+ "logps/rejected": -397.7836608886719,
130
+ "loss": 0.2146,
131
+ "rewards/accuracies": 0.38749998807907104,
132
+ "rewards/chosen": -0.13351131975650787,
133
+ "rewards/margins": 0.0027887646574527025,
134
+ "rewards/rejected": -0.13630008697509766,
135
+ "step": 80
136
+ },
137
+ {
138
+ "epoch": 0.05,
139
+ "learning_rate": 2.393617021276596e-06,
140
+ "logits/chosen": 0.17430104315280914,
141
+ "logits/rejected": 0.24182358384132385,
142
+ "logps/chosen": -393.2335510253906,
143
+ "logps/rejected": -386.56982421875,
144
+ "loss": 0.2165,
145
+ "rewards/accuracies": 0.39375001192092896,
146
+ "rewards/chosen": -0.12810206413269043,
147
+ "rewards/margins": 0.02029189094901085,
148
+ "rewards/rejected": -0.14839394390583038,
149
+ "step": 90
150
+ },
151
+ {
152
+ "epoch": 0.05,
153
+ "learning_rate": 2.6595744680851065e-06,
154
+ "logits/chosen": 0.2197931557893753,
155
+ "logits/rejected": 0.213352769613266,
156
+ "logps/chosen": -412.4981384277344,
157
+ "logps/rejected": -393.7420349121094,
158
+ "loss": 0.2092,
159
+ "rewards/accuracies": 0.41874998807907104,
160
+ "rewards/chosen": -0.13062937557697296,
161
+ "rewards/margins": 0.031539879739284515,
162
+ "rewards/rejected": -0.16216926276683807,
163
+ "step": 100
164
+ },
165
+ {
166
+ "epoch": 0.06,
167
+ "learning_rate": 2.9255319148936174e-06,
168
+ "logits/chosen": 0.1948339343070984,
169
+ "logits/rejected": 0.3156844973564148,
170
+ "logps/chosen": -412.1725158691406,
171
+ "logps/rejected": -388.34942626953125,
172
+ "loss": 0.1994,
173
+ "rewards/accuracies": 0.42500001192092896,
174
+ "rewards/chosen": -0.22077541053295135,
175
+ "rewards/margins": 0.02942301705479622,
176
+ "rewards/rejected": -0.2501984238624573,
177
+ "step": 110
178
+ },
179
+ {
180
+ "epoch": 0.06,
181
+ "learning_rate": 3.191489361702128e-06,
182
+ "logits/chosen": 0.15572600066661835,
183
+ "logits/rejected": 0.27886322140693665,
184
+ "logps/chosen": -405.5447998046875,
185
+ "logps/rejected": -378.16302490234375,
186
+ "loss": 0.2062,
187
+ "rewards/accuracies": 0.45625001192092896,
188
+ "rewards/chosen": -0.6035518646240234,
189
+ "rewards/margins": 0.06773124635219574,
190
+ "rewards/rejected": -0.6712831258773804,
191
+ "step": 120
192
+ },
193
+ {
194
+ "epoch": 0.07,
195
+ "learning_rate": 3.457446808510639e-06,
196
+ "logits/chosen": 0.21951737999916077,
197
+ "logits/rejected": 0.21863070130348206,
198
+ "logps/chosen": -439.3607482910156,
199
+ "logps/rejected": -423.91033935546875,
200
+ "loss": 0.2045,
201
+ "rewards/accuracies": 0.5249999761581421,
202
+ "rewards/chosen": -1.0598005056381226,
203
+ "rewards/margins": 0.10918780416250229,
204
+ "rewards/rejected": -1.1689883470535278,
205
+ "step": 130
206
+ },
207
+ {
208
+ "epoch": 0.07,
209
+ "learning_rate": 3.723404255319149e-06,
210
+ "logits/chosen": 0.12865397334098816,
211
+ "logits/rejected": 0.22429411113262177,
212
+ "logps/chosen": -425.0618591308594,
213
+ "logps/rejected": -411.78656005859375,
214
+ "loss": 0.2059,
215
+ "rewards/accuracies": 0.518750011920929,
216
+ "rewards/chosen": -1.486228346824646,
217
+ "rewards/margins": 0.10093434154987335,
218
+ "rewards/rejected": -1.5871626138687134,
219
+ "step": 140
220
+ },
221
+ {
222
+ "epoch": 0.08,
223
+ "learning_rate": 3.98936170212766e-06,
224
+ "logits/chosen": 0.1495121419429779,
225
+ "logits/rejected": 0.2696373760700226,
226
+ "logps/chosen": -455.36224365234375,
227
+ "logps/rejected": -423.36993408203125,
228
+ "loss": 0.2499,
229
+ "rewards/accuracies": 0.44999998807907104,
230
+ "rewards/chosen": -2.2363038063049316,
231
+ "rewards/margins": 0.07195089757442474,
232
+ "rewards/rejected": -2.3082547187805176,
233
+ "step": 150
234
+ },
235
+ {
236
+ "epoch": 0.09,
237
+ "learning_rate": 4.255319148936171e-06,
238
+ "logits/chosen": 0.17146244645118713,
239
+ "logits/rejected": 0.33353060483932495,
240
+ "logps/chosen": -477.47491455078125,
241
+ "logps/rejected": -447.4056701660156,
242
+ "loss": 0.2363,
243
+ "rewards/accuracies": 0.46875,
244
+ "rewards/chosen": -2.126039981842041,
245
+ "rewards/margins": 0.09269236773252487,
246
+ "rewards/rejected": -2.2187325954437256,
247
+ "step": 160
248
+ },
249
+ {
250
+ "epoch": 0.09,
251
+ "learning_rate": 4.521276595744681e-06,
252
+ "logits/chosen": 0.25921598076820374,
253
+ "logits/rejected": 0.21789881587028503,
254
+ "logps/chosen": -444.46038818359375,
255
+ "logps/rejected": -439.04541015625,
256
+ "loss": 0.2294,
257
+ "rewards/accuracies": 0.45625001192092896,
258
+ "rewards/chosen": -1.5025267601013184,
259
+ "rewards/margins": 0.05006232112646103,
260
+ "rewards/rejected": -1.5525890588760376,
261
+ "step": 170
262
+ },
263
+ {
264
+ "epoch": 0.1,
265
+ "learning_rate": 4.787234042553192e-06,
266
+ "logits/chosen": 0.16481132805347443,
267
+ "logits/rejected": 0.3523365259170532,
268
+ "logps/chosen": -417.71234130859375,
269
+ "logps/rejected": -409.2852478027344,
270
+ "loss": 0.2052,
271
+ "rewards/accuracies": 0.48124998807907104,
272
+ "rewards/chosen": -1.2100210189819336,
273
+ "rewards/margins": 0.08468395471572876,
274
+ "rewards/rejected": -1.2947049140930176,
275
+ "step": 180
276
+ },
277
+ {
278
+ "epoch": 0.1,
279
+ "learning_rate": 4.999982660399688e-06,
280
+ "logits/chosen": 0.21249070763587952,
281
+ "logits/rejected": 0.1487680971622467,
282
+ "logps/chosen": -381.04010009765625,
283
+ "logps/rejected": -370.06121826171875,
284
+ "loss": 0.2126,
285
+ "rewards/accuracies": 0.44999998807907104,
286
+ "rewards/chosen": -0.23919078707695007,
287
+ "rewards/margins": 0.046520624309778214,
288
+ "rewards/rejected": -0.285711407661438,
289
+ "step": 190
290
+ },
291
+ {
292
+ "epoch": 0.11,
293
+ "learning_rate": 4.99937579964398e-06,
294
+ "logits/chosen": 0.14420565962791443,
295
+ "logits/rejected": 0.2520049512386322,
296
+ "logps/chosen": -407.8086242675781,
297
+ "logps/rejected": -404.40960693359375,
298
+ "loss": 0.2139,
299
+ "rewards/accuracies": 0.4749999940395355,
300
+ "rewards/chosen": 0.31316059827804565,
301
+ "rewards/margins": 0.07112900167703629,
302
+ "rewards/rejected": 0.24203161895275116,
303
+ "step": 200
304
+ },
305
+ {
306
+ "epoch": 0.11,
307
+ "learning_rate": 4.9979021993870645e-06,
308
+ "logits/chosen": 0.20850297808647156,
309
+ "logits/rejected": 0.15908761322498322,
310
+ "logps/chosen": -393.0495300292969,
311
+ "logps/rejected": -385.91131591796875,
312
+ "loss": 0.1919,
313
+ "rewards/accuracies": 0.4625000059604645,
314
+ "rewards/chosen": 0.2354341298341751,
315
+ "rewards/margins": 0.09232009202241898,
316
+ "rewards/rejected": 0.14311401546001434,
317
+ "step": 210
318
+ },
319
+ {
320
+ "epoch": 0.12,
321
+ "learning_rate": 4.995562370647553e-06,
322
+ "logits/chosen": 0.18888719379901886,
323
+ "logits/rejected": 0.2506316602230072,
324
+ "logps/chosen": -400.5332946777344,
325
+ "logps/rejected": -396.8497009277344,
326
+ "loss": 0.213,
327
+ "rewards/accuracies": 0.45625001192092896,
328
+ "rewards/chosen": 0.3825472295284271,
329
+ "rewards/margins": 0.1113465204834938,
330
+ "rewards/rejected": 0.27120068669319153,
331
+ "step": 220
332
+ },
333
+ {
334
+ "epoch": 0.12,
335
+ "learning_rate": 4.992357124836838e-06,
336
+ "logits/chosen": 0.18391093611717224,
337
+ "logits/rejected": 0.2248292863368988,
338
+ "logps/chosen": -414.39495849609375,
339
+ "logps/rejected": -410.48773193359375,
340
+ "loss": 0.1955,
341
+ "rewards/accuracies": 0.48750001192092896,
342
+ "rewards/chosen": 0.43902724981307983,
343
+ "rewards/margins": 0.07995086908340454,
344
+ "rewards/rejected": 0.3590763509273529,
345
+ "step": 230
346
+ },
347
+ {
348
+ "epoch": 0.13,
349
+ "learning_rate": 4.9882875734777044e-06,
350
+ "logits/chosen": 0.17369148135185242,
351
+ "logits/rejected": 0.2072429358959198,
352
+ "logps/chosen": -379.5187683105469,
353
+ "logps/rejected": -356.64935302734375,
354
+ "loss": 0.2006,
355
+ "rewards/accuracies": 0.4749999940395355,
356
+ "rewards/chosen": 0.48646441102027893,
357
+ "rewards/margins": 0.07142899930477142,
358
+ "rewards/rejected": 0.4150354266166687,
359
+ "step": 240
360
+ },
361
+ {
362
+ "epoch": 0.13,
363
+ "learning_rate": 4.983355127818882e-06,
364
+ "logits/chosen": 0.22464045882225037,
365
+ "logits/rejected": 0.21831247210502625,
366
+ "logps/chosen": -360.67718505859375,
367
+ "logps/rejected": -375.7092590332031,
368
+ "loss": 0.2103,
369
+ "rewards/accuracies": 0.42500001192092896,
370
+ "rewards/chosen": 0.6559606790542603,
371
+ "rewards/margins": 0.08051694184541702,
372
+ "rewards/rejected": 0.5754436254501343,
373
+ "step": 250
374
+ },
375
+ {
376
+ "epoch": 0.14,
377
+ "learning_rate": 4.977561498345639e-06,
378
+ "logits/chosen": 0.27521076798439026,
379
+ "logits/rejected": 0.2506524324417114,
380
+ "logps/chosen": -416.90625,
381
+ "logps/rejected": -403.52020263671875,
382
+ "loss": 0.2265,
383
+ "rewards/accuracies": 0.4000000059604645,
384
+ "rewards/chosen": 0.5058245062828064,
385
+ "rewards/margins": 0.06458650529384613,
386
+ "rewards/rejected": 0.44123801589012146,
387
+ "step": 260
388
+ },
389
+ {
390
+ "epoch": 0.14,
391
+ "learning_rate": 4.970908694186624e-06,
392
+ "logits/chosen": 0.15694484114646912,
393
+ "logits/rejected": 0.22934310138225555,
394
+ "logps/chosen": -393.33367919921875,
395
+ "logps/rejected": -378.1922912597656,
396
+ "loss": 0.2514,
397
+ "rewards/accuracies": 0.48124998807907104,
398
+ "rewards/chosen": 0.3451576232910156,
399
+ "rewards/margins": 0.09955648332834244,
400
+ "rewards/rejected": 0.24560114741325378,
401
+ "step": 270
402
+ },
403
+ {
404
+ "epoch": 0.15,
405
+ "learning_rate": 4.9633990224171305e-06,
406
+ "logits/chosen": 0.22818629443645477,
407
+ "logits/rejected": 0.3253491222858429,
408
+ "logps/chosen": -439.2140197753906,
409
+ "logps/rejected": -449.018798828125,
410
+ "loss": 0.2199,
411
+ "rewards/accuracies": 0.574999988079071,
412
+ "rewards/chosen": 0.3162192404270172,
413
+ "rewards/margins": 0.17989328503608704,
414
+ "rewards/rejected": 0.13632595539093018,
415
+ "step": 280
416
+ },
417
+ {
418
+ "epoch": 0.15,
419
+ "learning_rate": 4.955035087259046e-06,
420
+ "logits/chosen": 0.13608984649181366,
421
+ "logits/rejected": 0.22955039143562317,
422
+ "logps/chosen": -382.9739074707031,
423
+ "logps/rejected": -382.9454040527344,
424
+ "loss": 0.2369,
425
+ "rewards/accuracies": 0.4749999940395355,
426
+ "rewards/chosen": -0.2025183141231537,
427
+ "rewards/margins": 0.10606689751148224,
428
+ "rewards/rejected": -0.3085852265357971,
429
+ "step": 290
430
+ },
431
+ {
432
+ "epoch": 0.16,
433
+ "learning_rate": 4.945819789177756e-06,
434
+ "logits/chosen": 0.15429742634296417,
435
+ "logits/rejected": 0.22013196349143982,
436
+ "logps/chosen": -417.97894287109375,
437
+ "logps/rejected": -397.3367614746094,
438
+ "loss": 0.222,
439
+ "rewards/accuracies": 0.5,
440
+ "rewards/chosen": -0.940466582775116,
441
+ "rewards/margins": 0.11995697021484375,
442
+ "rewards/rejected": -1.060423493385315,
443
+ "step": 300
444
+ },
445
+ {
446
+ "epoch": 0.17,
447
+ "learning_rate": 4.935756323876306e-06,
448
+ "logits/chosen": 0.19988210499286652,
449
+ "logits/rejected": 0.19450345635414124,
450
+ "logps/chosen": -406.30523681640625,
451
+ "logps/rejected": -398.8627014160156,
452
+ "loss": 0.2481,
453
+ "rewards/accuracies": 0.48124998807907104,
454
+ "rewards/chosen": -1.3156739473342896,
455
+ "rewards/margins": 0.10400591045618057,
456
+ "rewards/rejected": -1.4196797609329224,
457
+ "step": 310
458
+ },
459
+ {
460
+ "epoch": 0.17,
461
+ "learning_rate": 4.924848181187199e-06,
462
+ "logits/chosen": 0.2409110963344574,
463
+ "logits/rejected": 0.19854632019996643,
464
+ "logps/chosen": -412.65631103515625,
465
+ "logps/rejected": -411.14892578125,
466
+ "loss": 0.2431,
467
+ "rewards/accuracies": 0.48124998807907104,
468
+ "rewards/chosen": -1.3011319637298584,
469
+ "rewards/margins": 0.14340858161449432,
470
+ "rewards/rejected": -1.4445403814315796,
471
+ "step": 320
472
+ },
473
+ {
474
+ "epoch": 0.18,
475
+ "learning_rate": 4.913099143862173e-06,
476
+ "logits/chosen": 0.22796742618083954,
477
+ "logits/rejected": 0.1370314657688141,
478
+ "logps/chosen": -410.01593017578125,
479
+ "logps/rejected": -423.0164489746094,
480
+ "loss": 0.2662,
481
+ "rewards/accuracies": 0.550000011920929,
482
+ "rewards/chosen": -1.125431776046753,
483
+ "rewards/margins": 0.15198691189289093,
484
+ "rewards/rejected": -1.2774187326431274,
485
+ "step": 330
486
+ },
487
+ {
488
+ "epoch": 0.18,
489
+ "learning_rate": 4.900513286260416e-06,
490
+ "logits/chosen": 0.23252587020397186,
491
+ "logits/rejected": 0.22327935695648193,
492
+ "logps/chosen": -422.98388671875,
493
+ "logps/rejected": -409.4496765136719,
494
+ "loss": 0.4587,
495
+ "rewards/accuracies": 0.4437499940395355,
496
+ "rewards/chosen": -1.244390845298767,
497
+ "rewards/margins": 0.12163744866847992,
498
+ "rewards/rejected": -1.366028070449829,
499
+ "step": 340
500
+ },
501
+ {
502
+ "epoch": 0.19,
503
+ "learning_rate": 4.887094972935645e-06,
504
+ "logits/chosen": 0.19463476538658142,
505
+ "logits/rejected": 0.26694804430007935,
506
+ "logps/chosen": -458.14923095703125,
507
+ "logps/rejected": -462.924560546875,
508
+ "loss": 0.5137,
509
+ "rewards/accuracies": 0.5,
510
+ "rewards/chosen": -2.0200841426849365,
511
+ "rewards/margins": 0.041815903037786484,
512
+ "rewards/rejected": -2.0619003772735596,
513
+ "step": 350
514
+ },
515
+ {
516
+ "epoch": 0.19,
517
+ "learning_rate": 4.87284885712256e-06,
518
+ "logits/chosen": 0.18824467062950134,
519
+ "logits/rejected": 0.28755128383636475,
520
+ "logps/chosen": -443.279052734375,
521
+ "logps/rejected": -453.8575134277344,
522
+ "loss": 0.3629,
523
+ "rewards/accuracies": 0.46875,
524
+ "rewards/chosen": -2.572296619415283,
525
+ "rewards/margins": 0.1011347621679306,
526
+ "rewards/rejected": -2.673431396484375,
527
+ "step": 360
528
+ },
529
+ {
530
+ "epoch": 0.2,
531
+ "learning_rate": 4.857779879123181e-06,
532
+ "logits/chosen": 0.24380216002464294,
533
+ "logits/rejected": 0.185393288731575,
534
+ "logps/chosen": -400.14642333984375,
535
+ "logps/rejected": -419.3863220214844,
536
+ "loss": 0.3068,
537
+ "rewards/accuracies": 0.4375,
538
+ "rewards/chosen": -2.2855589389801025,
539
+ "rewards/margins": 0.08926688134670258,
540
+ "rewards/rejected": -2.3748257160186768,
541
+ "step": 370
542
+ },
543
+ {
544
+ "epoch": 0.2,
545
+ "learning_rate": 4.841893264593643e-06,
546
+ "logits/chosen": 0.15558484196662903,
547
+ "logits/rejected": 0.2775138020515442,
548
+ "logps/chosen": -436.8177795410156,
549
+ "logps/rejected": -401.4188232421875,
550
+ "loss": 0.2526,
551
+ "rewards/accuracies": 0.4937500059604645,
552
+ "rewards/chosen": -1.875754714012146,
553
+ "rewards/margins": 0.08787569403648376,
554
+ "rewards/rejected": -1.9636303186416626,
555
+ "step": 380
556
+ },
557
+ {
558
+ "epoch": 0.21,
559
+ "learning_rate": 4.825194522732023e-06,
560
+ "logits/chosen": 0.1963922530412674,
561
+ "logits/rejected": 0.20690886676311493,
562
+ "logps/chosen": -378.81109619140625,
563
+ "logps/rejected": -389.9710998535156,
564
+ "loss": 0.2406,
565
+ "rewards/accuracies": 0.4937500059604645,
566
+ "rewards/chosen": -1.7464733123779297,
567
+ "rewards/margins": 0.09245175123214722,
568
+ "rewards/rejected": -1.8389251232147217,
569
+ "step": 390
570
+ },
571
+ {
572
+ "epoch": 0.21,
573
+ "learning_rate": 4.807689444367853e-06,
574
+ "logits/chosen": 0.2619530260562897,
575
+ "logits/rejected": 0.25581902265548706,
576
+ "logps/chosen": -466.8785705566406,
577
+ "logps/rejected": -460.354248046875,
578
+ "loss": 0.3071,
579
+ "rewards/accuracies": 0.4124999940395355,
580
+ "rewards/chosen": -2.3323142528533936,
581
+ "rewards/margins": 0.07366655021905899,
582
+ "rewards/rejected": -2.4059810638427734,
583
+ "step": 400
584
+ },
585
+ {
586
+ "epoch": 0.22,
587
+ "learning_rate": 4.78938409995396e-06,
588
+ "logits/chosen": 0.1963699460029602,
589
+ "logits/rejected": 0.32746168971061707,
590
+ "logps/chosen": -433.58453369140625,
591
+ "logps/rejected": -417.2007751464844,
592
+ "loss": 0.2297,
593
+ "rewards/accuracies": 0.518750011920929,
594
+ "rewards/chosen": -2.2858786582946777,
595
+ "rewards/margins": 0.13152393698692322,
596
+ "rewards/rejected": -2.417402505874634,
597
+ "step": 410
598
+ },
599
+ {
600
+ "epoch": 0.22,
601
+ "learning_rate": 4.770284837461342e-06,
602
+ "logits/chosen": 0.2083076685667038,
603
+ "logits/rejected": 0.3537079393863678,
604
+ "logps/chosen": -425.3860778808594,
605
+ "logps/rejected": -437.66357421875,
606
+ "loss": 0.2828,
607
+ "rewards/accuracies": 0.4312500059604645,
608
+ "rewards/chosen": -1.9256458282470703,
609
+ "rewards/margins": 0.05349540710449219,
610
+ "rewards/rejected": -1.9791409969329834,
611
+ "step": 420
612
+ },
613
+ {
614
+ "epoch": 0.23,
615
+ "learning_rate": 4.7503982801778015e-06,
616
+ "logits/chosen": 0.26357799768447876,
617
+ "logits/rejected": 0.24686399102210999,
618
+ "logps/chosen": -413.22430419921875,
619
+ "logps/rejected": -388.62860107421875,
620
+ "loss": 0.2328,
621
+ "rewards/accuracies": 0.4375,
622
+ "rewards/chosen": -1.4066696166992188,
623
+ "rewards/margins": 0.0737091675400734,
624
+ "rewards/rejected": -1.4803787469863892,
625
+ "step": 430
626
+ },
627
+ {
628
+ "epoch": 0.23,
629
+ "learning_rate": 4.729731324411104e-06,
630
+ "logits/chosen": 0.20708036422729492,
631
+ "logits/rejected": 0.25638309121131897,
632
+ "logps/chosen": -433.58782958984375,
633
+ "logps/rejected": -406.76568603515625,
634
+ "loss": 0.2482,
635
+ "rewards/accuracies": 0.4749999940395355,
636
+ "rewards/chosen": -1.2260335683822632,
637
+ "rewards/margins": 0.10625864565372467,
638
+ "rewards/rejected": -1.3322923183441162,
639
+ "step": 440
640
+ },
641
+ {
642
+ "epoch": 0.24,
643
+ "learning_rate": 4.7082911370974645e-06,
644
+ "logits/chosen": 0.1832914650440216,
645
+ "logits/rejected": 0.19458040595054626,
646
+ "logps/chosen": -370.4656066894531,
647
+ "logps/rejected": -370.5750732421875,
648
+ "loss": 0.2305,
649
+ "rewards/accuracies": 0.41874998807907104,
650
+ "rewards/chosen": -1.1384856700897217,
651
+ "rewards/margins": 0.10250405967235565,
652
+ "rewards/rejected": -1.2409899234771729,
653
+ "step": 450
654
+ },
655
+ {
656
+ "epoch": 0.25,
657
+ "learning_rate": 4.68608515331618e-06,
658
+ "logits/chosen": 0.1749318540096283,
659
+ "logits/rejected": 0.20707306265830994,
660
+ "logps/chosen": -378.585693359375,
661
+ "logps/rejected": -360.33709716796875,
662
+ "loss": 0.2466,
663
+ "rewards/accuracies": 0.4437499940395355,
664
+ "rewards/chosen": -1.2511470317840576,
665
+ "rewards/margins": 0.10291622579097748,
666
+ "rewards/rejected": -1.3540633916854858,
667
+ "step": 460
668
+ },
669
+ {
670
+ "epoch": 0.25,
671
+ "learning_rate": 4.663121073711269e-06,
672
+ "logits/chosen": 0.2571523189544678,
673
+ "logits/rejected": 0.18344727158546448,
674
+ "logps/chosen": -420.41204833984375,
675
+ "logps/rejected": -417.1241149902344,
676
+ "loss": 0.2738,
677
+ "rewards/accuracies": 0.39375001192092896,
678
+ "rewards/chosen": -1.400221586227417,
679
+ "rewards/margins": 0.04325573518872261,
680
+ "rewards/rejected": -1.4434772729873657,
681
+ "step": 470
682
+ },
683
+ {
684
+ "epoch": 0.26,
685
+ "learning_rate": 4.63940686182103e-06,
686
+ "logits/chosen": 0.22877541184425354,
687
+ "logits/rejected": 0.330875039100647,
688
+ "logps/chosen": -389.0229797363281,
689
+ "logps/rejected": -399.605712890625,
690
+ "loss": 0.2411,
691
+ "rewards/accuracies": 0.42500001192092896,
692
+ "rewards/chosen": -1.2312822341918945,
693
+ "rewards/margins": 0.11593781411647797,
694
+ "rewards/rejected": -1.3472201824188232,
695
+ "step": 480
696
+ },
697
+ {
698
+ "epoch": 0.26,
699
+ "learning_rate": 4.614950741316425e-06,
700
+ "logits/chosen": 0.1676156222820282,
701
+ "logits/rejected": 0.26664647459983826,
702
+ "logps/chosen": -421.76019287109375,
703
+ "logps/rejected": -394.2919921875,
704
+ "loss": 0.2616,
705
+ "rewards/accuracies": 0.4124999940395355,
706
+ "rewards/chosen": -1.175879716873169,
707
+ "rewards/margins": 0.085006944835186,
708
+ "rewards/rejected": -1.260886788368225,
709
+ "step": 490
710
+ },
711
+ {
712
+ "epoch": 0.27,
713
+ "learning_rate": 4.589761193149254e-06,
714
+ "logits/chosen": 0.19543419778347015,
715
+ "logits/rejected": 0.25549161434173584,
716
+ "logps/chosen": -401.7788391113281,
717
+ "logps/rejected": -381.381103515625,
718
+ "loss": 0.2201,
719
+ "rewards/accuracies": 0.4749999940395355,
720
+ "rewards/chosen": -0.9598444104194641,
721
+ "rewards/margins": 0.10931962728500366,
722
+ "rewards/rejected": -1.0691639184951782,
723
+ "step": 500
724
+ },
725
+ {
726
+ "epoch": 0.27,
727
+ "learning_rate": 4.563846952611112e-06,
728
+ "logits/chosen": 0.19228951632976532,
729
+ "logits/rejected": 0.24320659041404724,
730
+ "logps/chosen": -398.7080993652344,
731
+ "logps/rejected": -402.6463317871094,
732
+ "loss": 0.292,
733
+ "rewards/accuracies": 0.4437499940395355,
734
+ "rewards/chosen": -0.6720465421676636,
735
+ "rewards/margins": 0.05984077975153923,
736
+ "rewards/rejected": -0.7318873405456543,
737
+ "step": 510
738
+ },
739
+ {
740
+ "epoch": 0.28,
741
+ "learning_rate": 4.537217006304141e-06,
742
+ "logits/chosen": 0.1390676200389862,
743
+ "logits/rejected": 0.30597618222236633,
744
+ "logps/chosen": -402.3499450683594,
745
+ "logps/rejected": -367.5543212890625,
746
+ "loss": 0.2573,
747
+ "rewards/accuracies": 0.4312500059604645,
748
+ "rewards/chosen": -0.2744489014148712,
749
+ "rewards/margins": 0.06540031731128693,
750
+ "rewards/rejected": -0.33984917402267456,
751
+ "step": 520
752
+ },
753
+ {
754
+ "epoch": 0.28,
755
+ "learning_rate": 4.50988058902464e-06,
756
+ "logits/chosen": 0.23019719123840332,
757
+ "logits/rejected": 0.2985975742340088,
758
+ "logps/chosen": -491.07464599609375,
759
+ "logps/rejected": -475.8605041503906,
760
+ "loss": 0.2694,
761
+ "rewards/accuracies": 0.53125,
762
+ "rewards/chosen": -0.052188098430633545,
763
+ "rewards/margins": 0.07880185544490814,
764
+ "rewards/rejected": -0.1309899389743805,
765
+ "step": 530
766
+ },
767
+ {
768
+ "epoch": 0.29,
769
+ "learning_rate": 4.481847180560593e-06,
770
+ "logits/chosen": 0.2160387486219406,
771
+ "logits/rejected": 0.1532837599515915,
772
+ "logps/chosen": -405.447265625,
773
+ "logps/rejected": -394.2893371582031,
774
+ "loss": 0.2777,
775
+ "rewards/accuracies": 0.4625000059604645,
776
+ "rewards/chosen": -0.043611008673906326,
777
+ "rewards/margins": 0.0625891238451004,
778
+ "rewards/rejected": -0.10620013624429703,
779
+ "step": 540
780
+ },
781
+ {
782
+ "epoch": 0.29,
783
+ "learning_rate": 4.453126502404253e-06,
784
+ "logits/chosen": 0.13828139007091522,
785
+ "logits/rejected": 0.2699658274650574,
786
+ "logps/chosen": -442.07635498046875,
787
+ "logps/rejected": -409.4737243652344,
788
+ "loss": 0.2209,
789
+ "rewards/accuracies": 0.46875,
790
+ "rewards/chosen": -0.09488610178232193,
791
+ "rewards/margins": 0.10609780251979828,
792
+ "rewards/rejected": -0.2009839117527008,
793
+ "step": 550
794
+ },
795
+ {
796
+ "epoch": 0.3,
797
+ "learning_rate": 4.423728514380892e-06,
798
+ "logits/chosen": 0.17167535424232483,
799
+ "logits/rejected": 0.3206785023212433,
800
+ "logps/chosen": -432.7518005371094,
801
+ "logps/rejected": -453.96417236328125,
802
+ "loss": 0.2193,
803
+ "rewards/accuracies": 0.5,
804
+ "rewards/chosen": -0.23770996928215027,
805
+ "rewards/margins": 0.08862265199422836,
806
+ "rewards/rejected": -0.32633259892463684,
807
+ "step": 560
808
+ },
809
+ {
810
+ "epoch": 0.3,
811
+ "learning_rate": 4.393663411194918e-06,
812
+ "logits/chosen": 0.2704005837440491,
813
+ "logits/rejected": 0.23753610253334045,
814
+ "logps/chosen": -449.93560791015625,
815
+ "logps/rejected": -445.89898681640625,
816
+ "loss": 0.2345,
817
+ "rewards/accuracies": 0.53125,
818
+ "rewards/chosen": -0.6195427775382996,
819
+ "rewards/margins": 0.09155760705471039,
820
+ "rewards/rejected": -0.7111002802848816,
821
+ "step": 570
822
+ },
823
+ {
824
+ "epoch": 0.31,
825
+ "learning_rate": 4.362941618894523e-06,
826
+ "logits/chosen": 0.20778074860572815,
827
+ "logits/rejected": 0.2294239103794098,
828
+ "logps/chosen": -382.36859130859375,
829
+ "logps/rejected": -390.04559326171875,
830
+ "loss": 0.2449,
831
+ "rewards/accuracies": 0.5,
832
+ "rewards/chosen": -0.5993614196777344,
833
+ "rewards/margins": 0.12903143465518951,
834
+ "rewards/rejected": -0.7283927798271179,
835
+ "step": 580
836
+ },
837
+ {
838
+ "epoch": 0.31,
839
+ "learning_rate": 4.331573791256116e-06,
840
+ "logits/chosen": 0.2194238007068634,
841
+ "logits/rejected": 0.27026933431625366,
842
+ "logps/chosen": -362.18963623046875,
843
+ "logps/rejected": -365.9380187988281,
844
+ "loss": 0.1998,
845
+ "rewards/accuracies": 0.4437499940395355,
846
+ "rewards/chosen": -0.7672492861747742,
847
+ "rewards/margins": 0.09667088091373444,
848
+ "rewards/rejected": -0.8639200925827026,
849
+ "step": 590
850
+ },
851
+ {
852
+ "epoch": 0.32,
853
+ "learning_rate": 4.299570806089786e-06,
854
+ "logits/chosen": 0.15812727808952332,
855
+ "logits/rejected": 0.21042820811271667,
856
+ "logps/chosen": -411.093994140625,
857
+ "logps/rejected": -413.24359130859375,
858
+ "loss": 0.2049,
859
+ "rewards/accuracies": 0.45625001192092896,
860
+ "rewards/chosen": -1.074034571647644,
861
+ "rewards/margins": 0.11771629750728607,
862
+ "rewards/rejected": -1.1917510032653809,
863
+ "step": 600
864
+ },
865
+ {
866
+ "epoch": 0.33,
867
+ "learning_rate": 4.266943761467057e-06,
868
+ "logits/chosen": 0.2622697055339813,
869
+ "logits/rejected": 0.19035783410072327,
870
+ "logps/chosen": -437.5669860839844,
871
+ "logps/rejected": -422.769287109375,
872
+ "loss": 0.1986,
873
+ "rewards/accuracies": 0.5249999761581421,
874
+ "rewards/chosen": -1.0040676593780518,
875
+ "rewards/margins": 0.1156647801399231,
876
+ "rewards/rejected": -1.1197324991226196,
877
+ "step": 610
878
+ },
879
+ {
880
+ "epoch": 0.33,
881
+ "learning_rate": 4.233703971872287e-06,
882
+ "logits/chosen": 0.15742294490337372,
883
+ "logits/rejected": 0.34303316473960876,
884
+ "logps/chosen": -441.45318603515625,
885
+ "logps/rejected": -427.8976135253906,
886
+ "loss": 0.1777,
887
+ "rewards/accuracies": 0.5625,
888
+ "rewards/chosen": -0.9649977684020996,
889
+ "rewards/margins": 0.13918432593345642,
890
+ "rewards/rejected": -1.1041821241378784,
891
+ "step": 620
892
+ },
893
+ {
894
+ "epoch": 0.34,
895
+ "learning_rate": 4.1998629642789925e-06,
896
+ "logits/chosen": 0.18381045758724213,
897
+ "logits/rejected": 0.2369348555803299,
898
+ "logps/chosen": -398.6405334472656,
899
+ "logps/rejected": -413.9356384277344,
900
+ "loss": 0.1679,
901
+ "rewards/accuracies": 0.53125,
902
+ "rewards/chosen": -1.1926090717315674,
903
+ "rewards/margins": 0.12783730030059814,
904
+ "rewards/rejected": -1.3204463720321655,
905
+ "step": 630
906
+ },
907
+ {
908
+ "epoch": 0.34,
909
+ "learning_rate": 4.165432474152505e-06,
910
+ "logits/chosen": 0.19464774429798126,
911
+ "logits/rejected": 0.27594244480133057,
912
+ "logps/chosen": -402.32025146484375,
913
+ "logps/rejected": -386.37506103515625,
914
+ "loss": 0.1886,
915
+ "rewards/accuracies": 0.5,
916
+ "rewards/chosen": -1.4002799987792969,
917
+ "rewards/margins": 0.15942224860191345,
918
+ "rewards/rejected": -1.5597022771835327,
919
+ "step": 640
920
+ },
921
+ {
922
+ "epoch": 0.35,
923
+ "learning_rate": 4.130424441380308e-06,
924
+ "logits/chosen": 0.19669343531131744,
925
+ "logits/rejected": 0.280894935131073,
926
+ "logps/chosen": -427.1275329589844,
927
+ "logps/rejected": -399.8666687011719,
928
+ "loss": 0.2517,
929
+ "rewards/accuracies": 0.5249999761581421,
930
+ "rewards/chosen": -1.7949119806289673,
931
+ "rewards/margins": 0.12687289714813232,
932
+ "rewards/rejected": -1.9217851161956787,
933
+ "step": 650
934
+ },
935
+ {
936
+ "epoch": 0.35,
937
+ "learning_rate": 4.09485100613151e-06,
938
+ "logits/chosen": 0.16887667775154114,
939
+ "logits/rejected": 0.25499215722084045,
940
+ "logps/chosen": -472.88970947265625,
941
+ "logps/rejected": -476.806640625,
942
+ "loss": 0.2186,
943
+ "rewards/accuracies": 0.5,
944
+ "rewards/chosen": -2.312155246734619,
945
+ "rewards/margins": 0.11931940168142319,
946
+ "rewards/rejected": -2.431474447250366,
947
+ "step": 660
948
+ },
949
+ {
950
+ "epoch": 0.36,
951
+ "learning_rate": 4.058724504646834e-06,
952
+ "logits/chosen": 0.22942841053009033,
953
+ "logits/rejected": 0.21801376342773438,
954
+ "logps/chosen": -404.1775817871094,
955
+ "logps/rejected": -412.6273498535156,
956
+ "loss": 0.211,
957
+ "rewards/accuracies": 0.4625000059604645,
958
+ "rewards/chosen": -2.2967944145202637,
959
+ "rewards/margins": 0.11083470284938812,
960
+ "rewards/rejected": -2.4076290130615234,
961
+ "step": 670
962
+ },
963
+ {
964
+ "epoch": 0.36,
965
+ "learning_rate": 4.022057464960632e-06,
966
+ "logits/chosen": 0.21228846907615662,
967
+ "logits/rejected": 0.22944676876068115,
968
+ "logps/chosen": -413.43914794921875,
969
+ "logps/rejected": -405.81744384765625,
970
+ "loss": 0.2846,
971
+ "rewards/accuracies": 0.48124998807907104,
972
+ "rewards/chosen": -2.1489250659942627,
973
+ "rewards/margins": 0.11012575775384903,
974
+ "rewards/rejected": -2.2590508460998535,
975
+ "step": 680
976
+ },
977
+ {
978
+ "epoch": 0.37,
979
+ "learning_rate": 3.984862602556383e-06,
980
+ "logits/chosen": 0.25439196825027466,
981
+ "logits/rejected": 0.2606196105480194,
982
+ "logps/chosen": -440.023681640625,
983
+ "logps/rejected": -442.59649658203125,
984
+ "loss": 0.2369,
985
+ "rewards/accuracies": 0.48124998807907104,
986
+ "rewards/chosen": -2.2106480598449707,
987
+ "rewards/margins": 0.12710650265216827,
988
+ "rewards/rejected": -2.337754487991333,
989
+ "step": 690
990
+ },
991
+ {
992
+ "epoch": 0.37,
993
+ "learning_rate": 3.947152815957187e-06,
994
+ "logits/chosen": 0.14478982985019684,
995
+ "logits/rejected": 0.1685037910938263,
996
+ "logps/chosen": -383.38726806640625,
997
+ "logps/rejected": -375.55682373046875,
998
+ "loss": 0.193,
999
+ "rewards/accuracies": 0.4937500059604645,
1000
+ "rewards/chosen": -2.3591561317443848,
1001
+ "rewards/margins": 0.12616077065467834,
1002
+ "rewards/rejected": -2.4853172302246094,
1003
+ "step": 700
1004
+ },
1005
+ {
1006
+ "epoch": 0.38,
1007
+ "learning_rate": 3.908941182252785e-06,
1008
+ "logits/chosen": 0.17379993200302124,
1009
+ "logits/rejected": 0.27680641412734985,
1010
+ "logps/chosen": -435.744384765625,
1011
+ "logps/rejected": -435.990966796875,
1012
+ "loss": 0.2464,
1013
+ "rewards/accuracies": 0.48124998807907104,
1014
+ "rewards/chosen": -2.4536287784576416,
1015
+ "rewards/margins": 0.10799276828765869,
1016
+ "rewards/rejected": -2.56162166595459,
1017
+ "step": 710
1018
+ },
1019
+ {
1020
+ "epoch": 0.38,
1021
+ "learning_rate": 3.8702409525646535e-06,
1022
+ "logits/chosen": 0.17838087677955627,
1023
+ "logits/rejected": 0.20998401939868927,
1024
+ "logps/chosen": -420.6383361816406,
1025
+ "logps/rejected": -408.6434020996094,
1026
+ "loss": 0.2224,
1027
+ "rewards/accuracies": 0.44999998807907104,
1028
+ "rewards/chosen": -2.475475788116455,
1029
+ "rewards/margins": 0.10919404029846191,
1030
+ "rewards/rejected": -2.584669828414917,
1031
+ "step": 720
1032
+ },
1033
+ {
1034
+ "epoch": 0.39,
1035
+ "learning_rate": 3.8310655474507495e-06,
1036
+ "logits/chosen": 0.16479340195655823,
1037
+ "logits/rejected": 0.2636985778808594,
1038
+ "logps/chosen": -437.88427734375,
1039
+ "logps/rejected": -414.7217712402344,
1040
+ "loss": 0.2194,
1041
+ "rewards/accuracies": 0.5375000238418579,
1042
+ "rewards/chosen": -2.8833677768707275,
1043
+ "rewards/margins": 0.13075178861618042,
1044
+ "rewards/rejected": -3.0141196250915527,
1045
+ "step": 730
1046
+ },
1047
+ {
1048
+ "epoch": 0.39,
1049
+ "learning_rate": 3.7914285522515002e-06,
1050
+ "logits/chosen": 0.2210838496685028,
1051
+ "logits/rejected": 0.23706713318824768,
1052
+ "logps/chosen": -409.523193359375,
1053
+ "logps/rejected": -432.919189453125,
1054
+ "loss": 0.2153,
1055
+ "rewards/accuracies": 0.48750001192092896,
1056
+ "rewards/chosen": -2.8223652839660645,
1057
+ "rewards/margins": 0.0898015946149826,
1058
+ "rewards/rejected": -2.9121673107147217,
1059
+ "step": 740
1060
+ },
1061
+ {
1062
+ "epoch": 0.4,
1063
+ "learning_rate": 3.751343712378639e-06,
1064
+ "logits/chosen": 0.23539260029792786,
1065
+ "logits/rejected": 0.2847990095615387,
1066
+ "logps/chosen": -429.42828369140625,
1067
+ "logps/rejected": -439.2691955566406,
1068
+ "loss": 0.4669,
1069
+ "rewards/accuracies": 0.48124998807907104,
1070
+ "rewards/chosen": -3.1466052532196045,
1071
+ "rewards/margins": 0.06740974634885788,
1072
+ "rewards/rejected": -3.214015245437622,
1073
+ "step": 750
1074
+ },
1075
+ {
1076
+ "epoch": 0.41,
1077
+ "learning_rate": 3.710824928548546e-06,
1078
+ "logits/chosen": 0.18615157902240753,
1079
+ "logits/rejected": 0.26414626836776733,
1080
+ "logps/chosen": -444.6000061035156,
1081
+ "logps/rejected": -440.26611328125,
1082
+ "loss": 0.2051,
1083
+ "rewards/accuracies": 0.4937500059604645,
1084
+ "rewards/chosen": -2.4763169288635254,
1085
+ "rewards/margins": 0.12143449485301971,
1086
+ "rewards/rejected": -2.5977511405944824,
1087
+ "step": 760
1088
+ },
1089
+ {
1090
+ "epoch": 0.41,
1091
+ "learning_rate": 3.6698862519617225e-06,
1092
+ "logits/chosen": 0.22848829627037048,
1093
+ "logits/rejected": 0.23971283435821533,
1094
+ "logps/chosen": -435.1680603027344,
1095
+ "logps/rejected": -451.59930419921875,
1096
+ "loss": 0.2253,
1097
+ "rewards/accuracies": 0.550000011920929,
1098
+ "rewards/chosen": -1.9101835489273071,
1099
+ "rewards/margins": 0.16930529475212097,
1100
+ "rewards/rejected": -2.079489231109619,
1101
+ "step": 770
1102
+ },
1103
+ {
1104
+ "epoch": 0.42,
1105
+ "learning_rate": 3.6285418794300793e-06,
1106
+ "logits/chosen": 0.16407668590545654,
1107
+ "logits/rejected": 0.19629423320293427,
1108
+ "logps/chosen": -429.7174377441406,
1109
+ "logps/rejected": -416.3013610839844,
1110
+ "loss": 0.2255,
1111
+ "rewards/accuracies": 0.48124998807907104,
1112
+ "rewards/chosen": -1.8518050909042358,
1113
+ "rewards/margins": 0.09626470506191254,
1114
+ "rewards/rejected": -1.9480698108673096,
1115
+ "step": 780
1116
+ },
1117
+ {
1118
+ "epoch": 0.42,
1119
+ "learning_rate": 3.5868061484537365e-06,
1120
+ "logits/chosen": 0.2521757185459137,
1121
+ "logits/rejected": 0.3432837724685669,
1122
+ "logps/chosen": -433.750732421875,
1123
+ "logps/rejected": -421.22540283203125,
1124
+ "loss": 0.2085,
1125
+ "rewards/accuracies": 0.48124998807907104,
1126
+ "rewards/chosen": -1.9239532947540283,
1127
+ "rewards/margins": 0.10734070837497711,
1128
+ "rewards/rejected": -2.0312938690185547,
1129
+ "step": 790
1130
+ },
1131
+ {
1132
+ "epoch": 0.43,
1133
+ "learning_rate": 3.5446935322490285e-06,
1134
+ "logits/chosen": 0.2557790279388428,
1135
+ "logits/rejected": 0.30611905455589294,
1136
+ "logps/chosen": -436.35687255859375,
1137
+ "logps/rejected": -427.75958251953125,
1138
+ "loss": 0.2532,
1139
+ "rewards/accuracies": 0.48750001192092896,
1140
+ "rewards/chosen": -1.8817793130874634,
1141
+ "rewards/margins": 0.10908164829015732,
1142
+ "rewards/rejected": -1.990860939025879,
1143
+ "step": 800
1144
+ },
1145
+ {
1146
+ "epoch": 0.43,
1147
+ "learning_rate": 3.502218634729447e-06,
1148
+ "logits/chosen": 0.2341274917125702,
1149
+ "logits/rejected": 0.18765440583229065,
1150
+ "logps/chosen": -403.7873229980469,
1151
+ "logps/rejected": -405.88909912109375,
1152
+ "loss": 0.1786,
1153
+ "rewards/accuracies": 0.53125,
1154
+ "rewards/chosen": -1.8474352359771729,
1155
+ "rewards/margins": 0.12625011801719666,
1156
+ "rewards/rejected": -1.9736855030059814,
1157
+ "step": 810
1158
+ },
1159
+ {
1160
+ "epoch": 0.44,
1161
+ "learning_rate": 3.459396185441265e-06,
1162
+ "logits/chosen": 0.18527312576770782,
1163
+ "logits/rejected": 0.2313808649778366,
1164
+ "logps/chosen": -379.3541564941406,
1165
+ "logps/rejected": -352.1042175292969,
1166
+ "loss": 0.1985,
1167
+ "rewards/accuracies": 0.512499988079071,
1168
+ "rewards/chosen": -1.6922273635864258,
1169
+ "rewards/margins": 0.11197100579738617,
1170
+ "rewards/rejected": -1.8041980266571045,
1171
+ "step": 820
1172
+ },
1173
+ {
1174
+ "epoch": 0.44,
1175
+ "learning_rate": 3.4162410344555834e-06,
1176
+ "logits/chosen": 0.29786452651023865,
1177
+ "logits/rejected": 0.3404124677181244,
1178
+ "logps/chosen": -379.1360168457031,
1179
+ "logps/rejected": -397.03662109375,
1180
+ "loss": 0.1982,
1181
+ "rewards/accuracies": 0.512499988079071,
1182
+ "rewards/chosen": -1.7881301641464233,
1183
+ "rewards/margins": 0.11922701448202133,
1184
+ "rewards/rejected": -1.9073572158813477,
1185
+ "step": 830
1186
+ },
1187
+ {
1188
+ "epoch": 0.45,
1189
+ "learning_rate": 3.3727681472185937e-06,
1190
+ "logits/chosen": 0.2442360371351242,
1191
+ "logits/rejected": 0.22556254267692566,
1192
+ "logps/chosen": -422.0938415527344,
1193
+ "logps/rejected": -419.09063720703125,
1194
+ "loss": 0.2123,
1195
+ "rewards/accuracies": 0.5062500238418579,
1196
+ "rewards/chosen": -1.8410011529922485,
1197
+ "rewards/margins": 0.13928382098674774,
1198
+ "rewards/rejected": -1.9802849292755127,
1199
+ "step": 840
1200
+ },
1201
+ {
1202
+ "epoch": 0.45,
1203
+ "learning_rate": 3.3289925993618217e-06,
1204
+ "logits/chosen": 0.13701245188713074,
1205
+ "logits/rejected": 0.2810230851173401,
1206
+ "logps/chosen": -392.12860107421875,
1207
+ "logps/rejected": -401.2889099121094,
1208
+ "loss": 0.2089,
1209
+ "rewards/accuracies": 0.46875,
1210
+ "rewards/chosen": -1.8769737482070923,
1211
+ "rewards/margins": 0.12001794576644897,
1212
+ "rewards/rejected": -1.996991515159607,
1213
+ "step": 850
1214
+ },
1215
+ {
1216
+ "epoch": 0.46,
1217
+ "learning_rate": 3.2849295714741643e-06,
1218
+ "logits/chosen": 0.12385988235473633,
1219
+ "logits/rejected": 0.3134171962738037,
1220
+ "logps/chosen": -428.42242431640625,
1221
+ "logps/rejected": -409.5142517089844,
1222
+ "loss": 0.2026,
1223
+ "rewards/accuracies": 0.518750011920929,
1224
+ "rewards/chosen": -1.8102359771728516,
1225
+ "rewards/margins": 0.08851593732833862,
1226
+ "rewards/rejected": -1.898751974105835,
1227
+ "step": 860
1228
+ },
1229
+ {
1230
+ "epoch": 0.46,
1231
+ "learning_rate": 3.2405943438375287e-06,
1232
+ "logits/chosen": 0.12988132238388062,
1233
+ "logits/rejected": 0.2288062870502472,
1234
+ "logps/chosen": -376.00006103515625,
1235
+ "logps/rejected": -390.5958557128906,
1236
+ "loss": 0.2222,
1237
+ "rewards/accuracies": 0.48124998807907104,
1238
+ "rewards/chosen": -1.7014586925506592,
1239
+ "rewards/margins": 0.11990761756896973,
1240
+ "rewards/rejected": -1.8213660717010498,
1241
+ "step": 870
1242
+ },
1243
+ {
1244
+ "epoch": 0.47,
1245
+ "learning_rate": 3.1960022911279036e-06,
1246
+ "logits/chosen": 0.18507717549800873,
1247
+ "logits/rejected": 0.1665429174900055,
1248
+ "logps/chosen": -375.88677978515625,
1249
+ "logps/rejected": -396.2275695800781,
1250
+ "loss": 0.2264,
1251
+ "rewards/accuracies": 0.4375,
1252
+ "rewards/chosen": -1.7533376216888428,
1253
+ "rewards/margins": 0.07123322784900665,
1254
+ "rewards/rejected": -1.8245712518692017,
1255
+ "step": 880
1256
+ },
1257
+ {
1258
+ "epoch": 0.47,
1259
+ "learning_rate": 3.1511688770836844e-06,
1260
+ "logits/chosen": 0.31608790159225464,
1261
+ "logits/rejected": 0.16106575727462769,
1262
+ "logps/chosen": -444.12542724609375,
1263
+ "logps/rejected": -439.402587890625,
1264
+ "loss": 0.3027,
1265
+ "rewards/accuracies": 0.48750001192092896,
1266
+ "rewards/chosen": -1.820054292678833,
1267
+ "rewards/margins": 0.10226750373840332,
1268
+ "rewards/rejected": -1.9223216772079468,
1269
+ "step": 890
1270
+ },
1271
+ {
1272
+ "epoch": 0.48,
1273
+ "learning_rate": 3.1061096491431307e-06,
1274
+ "logits/chosen": 0.12085793167352676,
1275
+ "logits/rejected": 0.2093731164932251,
1276
+ "logps/chosen": -409.0577697753906,
1277
+ "logps/rejected": -399.948974609375,
1278
+ "loss": 0.248,
1279
+ "rewards/accuracies": 0.39375001192092896,
1280
+ "rewards/chosen": -1.4467195272445679,
1281
+ "rewards/margins": 0.04833744466304779,
1282
+ "rewards/rejected": -1.4950571060180664,
1283
+ "step": 900
1284
+ },
1285
+ {
1286
+ "epoch": 0.49,
1287
+ "learning_rate": 3.0608402330527796e-06,
1288
+ "logits/chosen": 0.18963545560836792,
1289
+ "logits/rejected": 0.2660353183746338,
1290
+ "logps/chosen": -424.49188232421875,
1291
+ "logps/rejected": -422.8685607910156,
1292
+ "loss": 0.1943,
1293
+ "rewards/accuracies": 0.4625000059604645,
1294
+ "rewards/chosen": -1.53090500831604,
1295
+ "rewards/margins": 0.08851666748523712,
1296
+ "rewards/rejected": -1.6194219589233398,
1297
+ "step": 910
1298
+ },
1299
+ {
1300
+ "epoch": 0.49,
1301
+ "learning_rate": 3.0153763274487176e-06,
1302
+ "logits/chosen": 0.20010657608509064,
1303
+ "logits/rejected": 0.2609160840511322,
1304
+ "logps/chosen": -418.9073181152344,
1305
+ "logps/rejected": -398.8419494628906,
1306
+ "loss": 0.2011,
1307
+ "rewards/accuracies": 0.5062500238418579,
1308
+ "rewards/chosen": -1.4970227479934692,
1309
+ "rewards/margins": 0.10633371025323868,
1310
+ "rewards/rejected": -1.6033565998077393,
1311
+ "step": 920
1312
+ },
1313
+ {
1314
+ "epoch": 0.5,
1315
+ "learning_rate": 2.9697336984125683e-06,
1316
+ "logits/chosen": 0.13921412825584412,
1317
+ "logits/rejected": 0.23919770121574402,
1318
+ "logps/chosen": -473.777099609375,
1319
+ "logps/rejected": -455.8279724121094,
1320
+ "loss": 0.2057,
1321
+ "rewards/accuracies": 0.4749999940395355,
1322
+ "rewards/chosen": -1.5798097848892212,
1323
+ "rewards/margins": 0.08555875718593597,
1324
+ "rewards/rejected": -1.6653684377670288,
1325
+ "step": 930
1326
+ },
1327
+ {
1328
+ "epoch": 0.5,
1329
+ "learning_rate": 2.923928174004094e-06,
1330
+ "logits/chosen": 0.29412585496902466,
1331
+ "logits/rejected": 0.19027197360992432,
1332
+ "logps/chosen": -455.6954650878906,
1333
+ "logps/rejected": -465.9042053222656,
1334
+ "loss": 0.2045,
1335
+ "rewards/accuracies": 0.44999998807907104,
1336
+ "rewards/chosen": -1.7673845291137695,
1337
+ "rewards/margins": 0.09541743248701096,
1338
+ "rewards/rejected": -1.8628019094467163,
1339
+ "step": 940
1340
+ },
1341
+ {
1342
+ "epoch": 0.51,
1343
+ "learning_rate": 2.8779756387723036e-06,
1344
+ "logits/chosen": 0.18643781542778015,
1345
+ "logits/rejected": 0.26781877875328064,
1346
+ "logps/chosen": -418.92578125,
1347
+ "logps/rejected": -428.0494079589844,
1348
+ "loss": 0.2276,
1349
+ "rewards/accuracies": 0.53125,
1350
+ "rewards/chosen": -1.4455392360687256,
1351
+ "rewards/margins": 0.11924241483211517,
1352
+ "rewards/rejected": -1.564781904220581,
1353
+ "step": 950
1354
+ },
1355
+ {
1356
+ "epoch": 0.51,
1357
+ "learning_rate": 2.831892028246968e-06,
1358
+ "logits/chosen": 0.23528459668159485,
1359
+ "logits/rejected": 0.16012924909591675,
1360
+ "logps/chosen": -420.2757263183594,
1361
+ "logps/rejected": -425.47686767578125,
1362
+ "loss": 0.2191,
1363
+ "rewards/accuracies": 0.5062500238418579,
1364
+ "rewards/chosen": -0.983513355255127,
1365
+ "rewards/margins": 0.13725998997688293,
1366
+ "rewards/rejected": -1.1207733154296875,
1367
+ "step": 960
1368
+ },
1369
+ {
1370
+ "epoch": 0.52,
1371
+ "learning_rate": 2.7856933234124617e-06,
1372
+ "logits/chosen": 0.2554696202278137,
1373
+ "logits/rejected": 0.2754512429237366,
1374
+ "logps/chosen": -383.97296142578125,
1375
+ "logps/rejected": -409.36566162109375,
1376
+ "loss": 0.2216,
1377
+ "rewards/accuracies": 0.45625001192092896,
1378
+ "rewards/chosen": -0.7281330227851868,
1379
+ "rewards/margins": 0.08484755456447601,
1380
+ "rewards/rejected": -0.8129806518554688,
1381
+ "step": 970
1382
+ },
1383
+ {
1384
+ "epoch": 0.52,
1385
+ "learning_rate": 2.7393955451658387e-06,
1386
+ "logits/chosen": 0.1969253122806549,
1387
+ "logits/rejected": 0.30447250604629517,
1388
+ "logps/chosen": -418.5328063964844,
1389
+ "logps/rejected": -383.39825439453125,
1390
+ "loss": 0.2054,
1391
+ "rewards/accuracies": 0.45625001192092896,
1392
+ "rewards/chosen": -0.7214577198028564,
1393
+ "rewards/margins": 0.08516426384449005,
1394
+ "rewards/rejected": -0.8066219091415405,
1395
+ "step": 980
1396
+ },
1397
+ {
1398
+ "epoch": 0.53,
1399
+ "learning_rate": 2.6930147487610667e-06,
1400
+ "logits/chosen": 0.14184299111366272,
1401
+ "logits/rejected": 0.2329244166612625,
1402
+ "logps/chosen": -436.410400390625,
1403
+ "logps/rejected": -420.66357421875,
1404
+ "loss": 0.217,
1405
+ "rewards/accuracies": 0.5375000238418579,
1406
+ "rewards/chosen": -0.7258288264274597,
1407
+ "rewards/margins": 0.1092677116394043,
1408
+ "rewards/rejected": -0.8350966572761536,
1409
+ "step": 990
1410
+ },
1411
+ {
1412
+ "epoch": 0.53,
1413
+ "learning_rate": 2.6465670182413487e-06,
1414
+ "logits/chosen": 0.26164674758911133,
1415
+ "logits/rejected": 0.23967313766479492,
1416
+ "logps/chosen": -431.079345703125,
1417
+ "logps/rejected": -406.0982360839844,
1418
+ "loss": 0.1821,
1419
+ "rewards/accuracies": 0.48750001192092896,
1420
+ "rewards/chosen": -0.6475759744644165,
1421
+ "rewards/margins": 0.10677828639745712,
1422
+ "rewards/rejected": -0.7543543577194214,
1423
+ "step": 1000
1424
+ },
1425
+ {
1426
+ "epoch": 0.54,
1427
+ "learning_rate": 2.6000684608614594e-06,
1428
+ "logits/chosen": 0.23683926463127136,
1429
+ "logits/rejected": 0.1576264351606369,
1430
+ "logps/chosen": -430.5428161621094,
1431
+ "logps/rejected": -448.5796813964844,
1432
+ "loss": 0.2441,
1433
+ "rewards/accuracies": 0.5625,
1434
+ "rewards/chosen": -0.7693523168563843,
1435
+ "rewards/margins": 0.13786748051643372,
1436
+ "rewards/rejected": -0.9072197675704956,
1437
+ "step": 1010
1438
+ },
1439
+ {
1440
+ "epoch": 0.54,
1441
+ "learning_rate": 2.5535352015020338e-06,
1442
+ "logits/chosen": 0.23495522141456604,
1443
+ "logits/rejected": 0.21038460731506348,
1444
+ "logps/chosen": -410.57904052734375,
1445
+ "logps/rejected": -398.42193603515625,
1446
+ "loss": 0.2056,
1447
+ "rewards/accuracies": 0.5062500238418579,
1448
+ "rewards/chosen": -0.8607665300369263,
1449
+ "rewards/margins": 0.11764762550592422,
1450
+ "rewards/rejected": -0.9784140586853027,
1451
+ "step": 1020
1452
+ },
1453
+ {
1454
+ "epoch": 0.55,
1455
+ "learning_rate": 2.506983377077741e-06,
1456
+ "logits/chosen": 0.24832454323768616,
1457
+ "logits/rejected": 0.2945554852485657,
1458
+ "logps/chosen": -438.53680419921875,
1459
+ "logps/rejected": -407.5857849121094,
1460
+ "loss": 0.2016,
1461
+ "rewards/accuracies": 0.518750011920929,
1462
+ "rewards/chosen": -1.1375848054885864,
1463
+ "rewards/margins": 0.11563225090503693,
1464
+ "rewards/rejected": -1.2532169818878174,
1465
+ "step": 1030
1466
+ },
1467
+ {
1468
+ "epoch": 0.55,
1469
+ "learning_rate": 2.460429130941289e-06,
1470
+ "logits/chosen": 0.17995227873325348,
1471
+ "logits/rejected": 0.2567993402481079,
1472
+ "logps/chosen": -371.36102294921875,
1473
+ "logps/rejected": -355.2597351074219,
1474
+ "loss": 0.2277,
1475
+ "rewards/accuracies": 0.48124998807907104,
1476
+ "rewards/chosen": -0.8424074053764343,
1477
+ "rewards/margins": 0.11410113424062729,
1478
+ "rewards/rejected": -0.9565085172653198,
1479
+ "step": 1040
1480
+ },
1481
+ {
1482
+ "epoch": 0.56,
1483
+ "learning_rate": 2.413888607285192e-06,
1484
+ "logits/chosen": 0.14955028891563416,
1485
+ "logits/rejected": 0.20088477432727814,
1486
+ "logps/chosen": -402.30767822265625,
1487
+ "logps/rejected": -393.7333068847656,
1488
+ "loss": 0.2652,
1489
+ "rewards/accuracies": 0.45625001192092896,
1490
+ "rewards/chosen": -1.1037840843200684,
1491
+ "rewards/margins": 0.11586949974298477,
1492
+ "rewards/rejected": -1.219653844833374,
1493
+ "step": 1050
1494
+ },
1495
+ {
1496
+ "epoch": 0.57,
1497
+ "learning_rate": 2.367377945543249e-06,
1498
+ "logits/chosen": 0.1821899712085724,
1499
+ "logits/rejected": 0.211843803524971,
1500
+ "logps/chosen": -395.53118896484375,
1501
+ "logps/rejected": -402.3675842285156,
1502
+ "loss": 0.1918,
1503
+ "rewards/accuracies": 0.46875,
1504
+ "rewards/chosen": -1.1062228679656982,
1505
+ "rewards/margins": 0.10730864852666855,
1506
+ "rewards/rejected": -1.2135313749313354,
1507
+ "step": 1060
1508
+ },
1509
+ {
1510
+ "epoch": 0.57,
1511
+ "learning_rate": 2.320913274793676e-06,
1512
+ "logits/chosen": 0.2403596192598343,
1513
+ "logits/rejected": 0.2753956615924835,
1514
+ "logps/chosen": -405.42047119140625,
1515
+ "logps/rejected": -420.23199462890625,
1516
+ "loss": 0.2069,
1517
+ "rewards/accuracies": 0.5,
1518
+ "rewards/chosen": -1.0861284732818604,
1519
+ "rewards/margins": 0.09204941987991333,
1520
+ "rewards/rejected": -1.178177833557129,
1521
+ "step": 1070
1522
+ },
1523
+ {
1524
+ "epoch": 0.58,
1525
+ "learning_rate": 2.27451070816582e-06,
1526
+ "logits/chosen": 0.21563324332237244,
1527
+ "logits/rejected": 0.3084403872489929,
1528
+ "logps/chosen": -478.88763427734375,
1529
+ "logps/rejected": -456.4349670410156,
1530
+ "loss": 0.21,
1531
+ "rewards/accuracies": 0.48750001192092896,
1532
+ "rewards/chosen": -1.1920944452285767,
1533
+ "rewards/margins": 0.10027352720499039,
1534
+ "rewards/rejected": -1.2923680543899536,
1535
+ "step": 1080
1536
+ },
1537
+ {
1538
+ "epoch": 0.58,
1539
+ "learning_rate": 2.228186337252414e-06,
1540
+ "logits/chosen": 0.13818100094795227,
1541
+ "logits/rejected": 0.3011969029903412,
1542
+ "logps/chosen": -417.41961669921875,
1543
+ "logps/rejected": -380.45306396484375,
1544
+ "loss": 0.1998,
1545
+ "rewards/accuracies": 0.42500001192092896,
1546
+ "rewards/chosen": -1.3052135705947876,
1547
+ "rewards/margins": 0.08900976181030273,
1548
+ "rewards/rejected": -1.3942232131958008,
1549
+ "step": 1090
1550
+ },
1551
+ {
1552
+ "epoch": 0.59,
1553
+ "learning_rate": 2.1819562265292946e-06,
1554
+ "logits/chosen": 0.16572540998458862,
1555
+ "logits/rejected": 0.22458620369434357,
1556
+ "logps/chosen": -473.16827392578125,
1557
+ "logps/rejected": -460.0870056152344,
1558
+ "loss": 0.1622,
1559
+ "rewards/accuracies": 0.5375000238418579,
1560
+ "rewards/chosen": -1.3903932571411133,
1561
+ "rewards/margins": 0.15418145060539246,
1562
+ "rewards/rejected": -1.5445747375488281,
1563
+ "step": 1100
1564
+ },
1565
+ {
1566
+ "epoch": 0.59,
1567
+ "learning_rate": 2.1358364077845236e-06,
1568
+ "logits/chosen": 0.1645340472459793,
1569
+ "logits/rejected": 0.34099048376083374,
1570
+ "logps/chosen": -454.2103576660156,
1571
+ "logps/rejected": -452.72991943359375,
1572
+ "loss": 0.2015,
1573
+ "rewards/accuracies": 0.5,
1574
+ "rewards/chosen": -1.5796992778778076,
1575
+ "rewards/margins": 0.07802970707416534,
1576
+ "rewards/rejected": -1.657728910446167,
1577
+ "step": 1110
1578
+ },
1579
+ {
1580
+ "epoch": 0.6,
1581
+ "learning_rate": 2.089842874558849e-06,
1582
+ "logits/chosen": 0.24456259608268738,
1583
+ "logits/rejected": 0.1955045759677887,
1584
+ "logps/chosen": -407.2257385253906,
1585
+ "logps/rejected": -420.85015869140625,
1586
+ "loss": 0.2256,
1587
+ "rewards/accuracies": 0.5,
1588
+ "rewards/chosen": -1.4515103101730347,
1589
+ "rewards/margins": 0.06924048811197281,
1590
+ "rewards/rejected": -1.520750880241394,
1591
+ "step": 1120
1592
+ },
1593
+ {
1594
+ "epoch": 0.6,
1595
+ "learning_rate": 2.0439915765994242e-06,
1596
+ "logits/chosen": 0.21577736735343933,
1597
+ "logits/rejected": 0.18421390652656555,
1598
+ "logps/chosen": -413.2845153808594,
1599
+ "logps/rejected": -417.5594177246094,
1600
+ "loss": 0.2097,
1601
+ "rewards/accuracies": 0.5375000238418579,
1602
+ "rewards/chosen": -1.4378559589385986,
1603
+ "rewards/margins": 0.12303999811410904,
1604
+ "rewards/rejected": -1.5608961582183838,
1605
+ "step": 1130
1606
+ },
1607
+ {
1608
+ "epoch": 0.61,
1609
+ "learning_rate": 1.9982984143287186e-06,
1610
+ "logits/chosen": 0.1870601922273636,
1611
+ "logits/rejected": 0.1403540074825287,
1612
+ "logps/chosen": -437.71875,
1613
+ "logps/rejected": -439.2840881347656,
1614
+ "loss": 0.2507,
1615
+ "rewards/accuracies": 0.4937500059604645,
1616
+ "rewards/chosen": -1.5519638061523438,
1617
+ "rewards/margins": 0.1572073996067047,
1618
+ "rewards/rejected": -1.7091710567474365,
1619
+ "step": 1140
1620
+ },
1621
+ {
1622
+ "epoch": 0.61,
1623
+ "learning_rate": 1.95277923333053e-06,
1624
+ "logits/chosen": 0.18075987696647644,
1625
+ "logits/rejected": 0.29444438219070435,
1626
+ "logps/chosen": -461.15533447265625,
1627
+ "logps/rejected": -489.20025634765625,
1628
+ "loss": 0.2153,
1629
+ "rewards/accuracies": 0.5062500238418579,
1630
+ "rewards/chosen": -1.6609361171722412,
1631
+ "rewards/margins": 0.12212083488702774,
1632
+ "rewards/rejected": -1.7830568552017212,
1633
+ "step": 1150
1634
+ },
1635
+ {
1636
+ "epoch": 0.62,
1637
+ "learning_rate": 1.9074498188550156e-06,
1638
+ "logits/chosen": 0.13356804847717285,
1639
+ "logits/rejected": 0.27090999484062195,
1640
+ "logps/chosen": -425.75665283203125,
1641
+ "logps/rejected": -410.96368408203125,
1642
+ "loss": 0.1966,
1643
+ "rewards/accuracies": 0.512499988079071,
1644
+ "rewards/chosen": -1.284343957901001,
1645
+ "rewards/margins": 0.11508820950984955,
1646
+ "rewards/rejected": -1.3994321823120117,
1647
+ "step": 1160
1648
+ },
1649
+ {
1650
+ "epoch": 0.62,
1651
+ "learning_rate": 1.862325890344643e-06,
1652
+ "logits/chosen": 0.18534113466739655,
1653
+ "logits/rejected": 0.2500341534614563,
1654
+ "logps/chosen": -387.5941467285156,
1655
+ "logps/rejected": -358.5582275390625,
1656
+ "loss": 0.2102,
1657
+ "rewards/accuracies": 0.4312500059604645,
1658
+ "rewards/chosen": -1.2002973556518555,
1659
+ "rewards/margins": 0.07647459208965302,
1660
+ "rewards/rejected": -1.276771903038025,
1661
+ "step": 1170
1662
+ },
1663
+ {
1664
+ "epoch": 0.63,
1665
+ "learning_rate": 1.817423095982972e-06,
1666
+ "logits/chosen": 0.2761315107345581,
1667
+ "logits/rejected": 0.1992635577917099,
1668
+ "logps/chosen": -410.59130859375,
1669
+ "logps/rejected": -429.77667236328125,
1670
+ "loss": 0.1798,
1671
+ "rewards/accuracies": 0.5062500238418579,
1672
+ "rewards/chosen": -1.3009172677993774,
1673
+ "rewards/margins": 0.12259545177221298,
1674
+ "rewards/rejected": -1.4235126972198486,
1675
+ "step": 1180
1676
+ },
1677
+ {
1678
+ "epoch": 0.63,
1679
+ "learning_rate": 1.7727570072681293e-06,
1680
+ "logits/chosen": 0.21099631488323212,
1681
+ "logits/rejected": 0.3009958863258362,
1682
+ "logps/chosen": -356.77490234375,
1683
+ "logps/rejected": -372.29718017578125,
1684
+ "loss": 0.1696,
1685
+ "rewards/accuracies": 0.53125,
1686
+ "rewards/chosen": -1.1670329570770264,
1687
+ "rewards/margins": 0.13822320103645325,
1688
+ "rewards/rejected": -1.3052562475204468,
1689
+ "step": 1190
1690
+ },
1691
+ {
1692
+ "epoch": 0.64,
1693
+ "learning_rate": 1.7283431136128961e-06,
1694
+ "logits/chosen": 0.16253753006458282,
1695
+ "logits/rejected": 0.29959315061569214,
1696
+ "logps/chosen": -471.822998046875,
1697
+ "logps/rejected": -446.8572692871094,
1698
+ "loss": 0.1791,
1699
+ "rewards/accuracies": 0.5,
1700
+ "rewards/chosen": -1.4546597003936768,
1701
+ "rewards/margins": 0.1095527857542038,
1702
+ "rewards/rejected": -1.5642122030258179,
1703
+ "step": 1200
1704
+ },
1705
+ {
1706
+ "epoch": 0.65,
1707
+ "learning_rate": 1.6841968169732478e-06,
1708
+ "logits/chosen": 0.21443840861320496,
1709
+ "logits/rejected": 0.2336159497499466,
1710
+ "logps/chosen": -424.06805419921875,
1711
+ "logps/rejected": -417.39141845703125,
1712
+ "loss": 0.2039,
1713
+ "rewards/accuracies": 0.5249999761581421,
1714
+ "rewards/chosen": -1.357987403869629,
1715
+ "rewards/margins": 0.11536715924739838,
1716
+ "rewards/rejected": -1.4733545780181885,
1717
+ "step": 1210
1718
+ },
1719
+ {
1720
+ "epoch": 0.65,
1721
+ "learning_rate": 1.6403334265072284e-06,
1722
+ "logits/chosen": 0.10831620544195175,
1723
+ "logits/rejected": 0.31975752115249634,
1724
+ "logps/chosen": -425.4805603027344,
1725
+ "logps/rejected": -405.4668273925781,
1726
+ "loss": 0.2061,
1727
+ "rewards/accuracies": 0.45625001192092896,
1728
+ "rewards/chosen": -1.2643564939498901,
1729
+ "rewards/margins": 0.12265012413263321,
1730
+ "rewards/rejected": -1.3870065212249756,
1731
+ "step": 1220
1732
+ },
1733
+ {
1734
+ "epoch": 0.66,
1735
+ "learning_rate": 1.5967681532660066e-06,
1736
+ "logits/chosen": 0.19904041290283203,
1737
+ "logits/rejected": 0.31774818897247314,
1738
+ "logps/chosen": -451.98895263671875,
1739
+ "logps/rejected": -464.37408447265625,
1740
+ "loss": 0.268,
1741
+ "rewards/accuracies": 0.4937500059604645,
1742
+ "rewards/chosen": -1.5613361597061157,
1743
+ "rewards/margins": 0.11387111991643906,
1744
+ "rewards/rejected": -1.6752071380615234,
1745
+ "step": 1230
1746
+ },
1747
+ {
1748
+ "epoch": 0.66,
1749
+ "learning_rate": 1.5535161049189463e-06,
1750
+ "logits/chosen": 0.1638219654560089,
1751
+ "logits/rejected": 0.24011917412281036,
1752
+ "logps/chosen": -435.06170654296875,
1753
+ "logps/rejected": -426.3074645996094,
1754
+ "loss": 0.2635,
1755
+ "rewards/accuracies": 0.48124998807907104,
1756
+ "rewards/chosen": -1.6264498233795166,
1757
+ "rewards/margins": 0.14013561606407166,
1758
+ "rewards/rejected": -1.766585350036621,
1759
+ "step": 1240
1760
+ },
1761
+ {
1762
+ "epoch": 0.67,
1763
+ "learning_rate": 1.5105922805145356e-06,
1764
+ "logits/chosen": 0.22532732784748077,
1765
+ "logits/rejected": 0.21466323733329773,
1766
+ "logps/chosen": -470.2998962402344,
1767
+ "logps/rejected": -450.73193359375,
1768
+ "loss": 0.1847,
1769
+ "rewards/accuracies": 0.5562499761581421,
1770
+ "rewards/chosen": -1.6772887706756592,
1771
+ "rewards/margins": 0.1449218988418579,
1772
+ "rewards/rejected": -1.822210669517517,
1773
+ "step": 1250
1774
+ },
1775
+ {
1776
+ "epoch": 0.67,
1777
+ "learning_rate": 1.4680115652789823e-06,
1778
+ "logits/chosen": 0.16722983121871948,
1779
+ "logits/rejected": 0.2960713505744934,
1780
+ "logps/chosen": -452.7391662597656,
1781
+ "logps/rejected": -429.21435546875,
1782
+ "loss": 0.187,
1783
+ "rewards/accuracies": 0.48750001192092896,
1784
+ "rewards/chosen": -1.6514288187026978,
1785
+ "rewards/margins": 0.11130253225564957,
1786
+ "rewards/rejected": -1.7627313137054443,
1787
+ "step": 1260
1788
+ },
1789
+ {
1790
+ "epoch": 0.68,
1791
+ "learning_rate": 1.4257887254542767e-06,
1792
+ "logits/chosen": 0.2482529580593109,
1793
+ "logits/rejected": 0.2227180004119873,
1794
+ "logps/chosen": -396.0767517089844,
1795
+ "logps/rejected": -405.37091064453125,
1796
+ "loss": 0.2139,
1797
+ "rewards/accuracies": 0.5062500238418579,
1798
+ "rewards/chosen": -1.2826321125030518,
1799
+ "rewards/margins": 0.13906559348106384,
1800
+ "rewards/rejected": -1.4216978549957275,
1801
+ "step": 1270
1802
+ },
1803
+ {
1804
+ "epoch": 0.68,
1805
+ "learning_rate": 1.3839384031775227e-06,
1806
+ "logits/chosen": 0.1646454632282257,
1807
+ "logits/rejected": 0.27516424655914307,
1808
+ "logps/chosen": -406.2199401855469,
1809
+ "logps/rejected": -418.5736389160156,
1810
+ "loss": 0.2454,
1811
+ "rewards/accuracies": 0.48124998807907104,
1812
+ "rewards/chosen": -1.3300670385360718,
1813
+ "rewards/margins": 0.11273592710494995,
1814
+ "rewards/rejected": -1.442803144454956,
1815
+ "step": 1280
1816
+ },
1817
+ {
1818
+ "epoch": 0.69,
1819
+ "learning_rate": 1.342475111403298e-06,
1820
+ "logits/chosen": 0.16665968298912048,
1821
+ "logits/rejected": 0.31475013494491577,
1822
+ "logps/chosen": -460.81640625,
1823
+ "logps/rejected": -437.21270751953125,
1824
+ "loss": 0.1881,
1825
+ "rewards/accuracies": 0.5375000238418579,
1826
+ "rewards/chosen": -1.377325415611267,
1827
+ "rewards/margins": 0.12428633123636246,
1828
+ "rewards/rejected": -1.5016119480133057,
1829
+ "step": 1290
1830
+ },
1831
+ {
1832
+ "epoch": 0.69,
1833
+ "learning_rate": 1.3014132288708209e-06,
1834
+ "logits/chosen": 0.22961941361427307,
1835
+ "logits/rejected": 0.27373284101486206,
1836
+ "logps/chosen": -383.945556640625,
1837
+ "logps/rejected": -404.753173828125,
1838
+ "loss": 0.2236,
1839
+ "rewards/accuracies": 0.4437499940395355,
1840
+ "rewards/chosen": -1.2235456705093384,
1841
+ "rewards/margins": 0.08865731954574585,
1842
+ "rewards/rejected": -1.312203049659729,
1843
+ "step": 1300
1844
+ },
1845
+ {
1846
+ "epoch": 0.7,
1847
+ "learning_rate": 1.2607669951176549e-06,
1848
+ "logits/chosen": 0.22940245270729065,
1849
+ "logits/rejected": 0.27775856852531433,
1850
+ "logps/chosen": -397.48736572265625,
1851
+ "logps/rejected": -366.0208740234375,
1852
+ "loss": 0.1997,
1853
+ "rewards/accuracies": 0.44999998807907104,
1854
+ "rewards/chosen": -1.242621660232544,
1855
+ "rewards/margins": 0.06689196825027466,
1856
+ "rewards/rejected": -1.3095134496688843,
1857
+ "step": 1310
1858
+ },
1859
+ {
1860
+ "epoch": 0.7,
1861
+ "learning_rate": 1.2205505055416891e-06,
1862
+ "logits/chosen": 0.18240857124328613,
1863
+ "logits/rejected": 0.3393421173095703,
1864
+ "logps/chosen": -417.5065002441406,
1865
+ "logps/rejected": -389.3405456542969,
1866
+ "loss": 0.1692,
1867
+ "rewards/accuracies": 0.518750011920929,
1868
+ "rewards/chosen": -1.4786723852157593,
1869
+ "rewards/margins": 0.12162399291992188,
1870
+ "rewards/rejected": -1.6002962589263916,
1871
+ "step": 1320
1872
+ },
1873
+ {
1874
+ "epoch": 0.71,
1875
+ "learning_rate": 1.1807777065131002e-06,
1876
+ "logits/chosen": 0.15853646397590637,
1877
+ "logits/rejected": 0.2295243740081787,
1878
+ "logps/chosen": -459.41912841796875,
1879
+ "logps/rejected": -431.26708984375,
1880
+ "loss": 0.2207,
1881
+ "rewards/accuracies": 0.5375000238418579,
1882
+ "rewards/chosen": -1.5732864141464233,
1883
+ "rewards/margins": 0.13572458922863007,
1884
+ "rewards/rejected": -1.7090113162994385,
1885
+ "step": 1330
1886
+ },
1887
+ {
1888
+ "epoch": 0.71,
1889
+ "learning_rate": 1.1414623905380012e-06,
1890
+ "logits/chosen": 0.1904124766588211,
1891
+ "logits/rejected": 0.2695844769477844,
1892
+ "logps/chosen": -402.2320251464844,
1893
+ "logps/rejected": -390.7575378417969,
1894
+ "loss": 0.1981,
1895
+ "rewards/accuracies": 0.5062500238418579,
1896
+ "rewards/chosen": -1.56125009059906,
1897
+ "rewards/margins": 0.12390075623989105,
1898
+ "rewards/rejected": -1.6851508617401123,
1899
+ "step": 1340
1900
+ },
1901
+ {
1902
+ "epoch": 0.72,
1903
+ "learning_rate": 1.1026181914754388e-06,
1904
+ "logits/chosen": 0.2069578617811203,
1905
+ "logits/rejected": 0.2751797139644623,
1906
+ "logps/chosen": -385.76312255859375,
1907
+ "logps/rejected": -406.4028625488281,
1908
+ "loss": 0.2131,
1909
+ "rewards/accuracies": 0.5062500238418579,
1910
+ "rewards/chosen": -1.6344740390777588,
1911
+ "rewards/margins": 0.0895460695028305,
1912
+ "rewards/rejected": -1.7240203619003296,
1913
+ "step": 1350
1914
+ },
1915
+ {
1916
+ "epoch": 0.73,
1917
+ "learning_rate": 1.0642585798094136e-06,
1918
+ "logits/chosen": 0.20105871558189392,
1919
+ "logits/rejected": 0.22045502066612244,
1920
+ "logps/chosen": -420.058349609375,
1921
+ "logps/rejected": -411.83612060546875,
1922
+ "loss": 0.167,
1923
+ "rewards/accuracies": 0.512499988079071,
1924
+ "rewards/chosen": -1.566814661026001,
1925
+ "rewards/margins": 0.12989051640033722,
1926
+ "rewards/rejected": -1.6967051029205322,
1927
+ "step": 1360
1928
+ },
1929
+ {
1930
+ "epoch": 0.73,
1931
+ "learning_rate": 1.0263968579775522e-06,
1932
+ "logits/chosen": 0.21636684238910675,
1933
+ "logits/rejected": 0.18984094262123108,
1934
+ "logps/chosen": -388.39642333984375,
1935
+ "logps/rejected": -390.4629211425781,
1936
+ "loss": 0.2022,
1937
+ "rewards/accuracies": 0.42500001192092896,
1938
+ "rewards/chosen": -1.5708179473876953,
1939
+ "rewards/margins": 0.07592545449733734,
1940
+ "rewards/rejected": -1.6467434167861938,
1941
+ "step": 1370
1942
+ },
1943
+ {
1944
+ "epoch": 0.74,
1945
+ "learning_rate": 9.89046155758058e-07,
1946
+ "logits/chosen": 0.24614374339580536,
1947
+ "logits/rejected": 0.3122064471244812,
1948
+ "logps/chosen": -384.8661193847656,
1949
+ "logps/rejected": -419.45513916015625,
1950
+ "loss": 0.2277,
1951
+ "rewards/accuracies": 0.45625001192092896,
1952
+ "rewards/chosen": -1.3731580972671509,
1953
+ "rewards/margins": 0.09444873034954071,
1954
+ "rewards/rejected": -1.467606782913208,
1955
+ "step": 1380
1956
+ },
1957
+ {
1958
+ "epoch": 0.74,
1959
+ "learning_rate": 9.52219425716534e-07,
1960
+ "logits/chosen": 0.17535480856895447,
1961
+ "logits/rejected": 0.2093926966190338,
1962
+ "logps/chosen": -457.5205078125,
1963
+ "logps/rejected": -446.33587646484375,
1964
+ "loss": 0.1623,
1965
+ "rewards/accuracies": 0.5687500238418579,
1966
+ "rewards/chosen": -1.650968313217163,
1967
+ "rewards/margins": 0.13586655259132385,
1968
+ "rewards/rejected": -1.7868350744247437,
1969
+ "step": 1390
1970
+ },
1971
+ {
1972
+ "epoch": 0.75,
1973
+ "learning_rate": 9.15929438714262e-07,
1974
+ "logits/chosen": 0.28407081961631775,
1975
+ "logits/rejected": 0.2300814837217331,
1976
+ "logps/chosen": -454.60650634765625,
1977
+ "logps/rejected": -465.85906982421875,
1978
+ "loss": 0.1815,
1979
+ "rewards/accuracies": 0.581250011920929,
1980
+ "rewards/chosen": -1.5347188711166382,
1981
+ "rewards/margins": 0.15691988170146942,
1982
+ "rewards/rejected": -1.691638708114624,
1983
+ "step": 1400
1984
+ },
1985
+ {
1986
+ "epoch": 0.75,
1987
+ "learning_rate": 8.801887794794911e-07,
1988
+ "logits/chosen": 0.2535427212715149,
1989
+ "logits/rejected": 0.211197167634964,
1990
+ "logps/chosen": -396.9275817871094,
1991
+ "logps/rejected": -406.4787902832031,
1992
+ "loss": 0.5444,
1993
+ "rewards/accuracies": 0.518750011920929,
1994
+ "rewards/chosen": -1.4354115724563599,
1995
+ "rewards/margins": 0.15605568885803223,
1996
+ "rewards/rejected": -1.591467261314392,
1997
+ "step": 1410
1998
+ },
1999
+ {
2000
+ "epoch": 0.76,
2001
+ "learning_rate": 8.450098422432787e-07,
2002
+ "logits/chosen": 0.24384653568267822,
2003
+ "logits/rejected": 0.2819363474845886,
2004
+ "logps/chosen": -444.5672912597656,
2005
+ "logps/rejected": -427.81134033203125,
2006
+ "loss": 0.1977,
2007
+ "rewards/accuracies": 0.4937500059604645,
2008
+ "rewards/chosen": -1.5881189107894897,
2009
+ "rewards/margins": 0.11528144031763077,
2010
+ "rewards/rejected": -1.7034003734588623,
2011
+ "step": 1420
2012
+ },
2013
+ {
2014
+ "epoch": 0.76,
2015
+ "learning_rate": 8.104048264413858e-07,
2016
+ "logits/chosen": 0.22661654651165009,
2017
+ "logits/rejected": 0.1742086112499237,
2018
+ "logps/chosen": -367.4610595703125,
2019
+ "logps/rejected": -356.89422607421875,
2020
+ "loss": 0.1993,
2021
+ "rewards/accuracies": 0.4625000059604645,
2022
+ "rewards/chosen": -1.2963422536849976,
2023
+ "rewards/margins": 0.09244900941848755,
2024
+ "rewards/rejected": -1.3887912034988403,
2025
+ "step": 1430
2026
+ },
2027
+ {
2028
+ "epoch": 0.77,
2029
+ "learning_rate": 7.763857324837321e-07,
2030
+ "logits/chosen": 0.24572880566120148,
2031
+ "logits/rejected": 0.21287715435028076,
2032
+ "logps/chosen": -424.8355407714844,
2033
+ "logps/rejected": -423.1205139160156,
2034
+ "loss": 0.2398,
2035
+ "rewards/accuracies": 0.42500001192092896,
2036
+ "rewards/chosen": -1.5864840745925903,
2037
+ "rewards/margins": 0.08626910299062729,
2038
+ "rewards/rejected": -1.672753095626831,
2039
+ "step": 1440
2040
+ },
2041
+ {
2042
+ "epoch": 0.77,
2043
+ "learning_rate": 7.429643575928605e-07,
2044
+ "logits/chosen": 0.16526217758655548,
2045
+ "logits/rejected": 0.2713525891304016,
2046
+ "logps/chosen": -448.96234130859375,
2047
+ "logps/rejected": -429.2787170410156,
2048
+ "loss": 0.2337,
2049
+ "rewards/accuracies": 0.518750011920929,
2050
+ "rewards/chosen": -1.5646464824676514,
2051
+ "rewards/margins": 0.11367311328649521,
2052
+ "rewards/rejected": -1.6783195734024048,
2053
+ "step": 1450
2054
+ },
2055
+ {
2056
+ "epoch": 0.78,
2057
+ "learning_rate": 7.101522917128709e-07,
2058
+ "logits/chosen": 0.26174360513687134,
2059
+ "logits/rejected": 0.17661890387535095,
2060
+ "logps/chosen": -409.60528564453125,
2061
+ "logps/rejected": -430.09033203125,
2062
+ "loss": 0.2186,
2063
+ "rewards/accuracies": 0.518750011920929,
2064
+ "rewards/chosen": -1.5871174335479736,
2065
+ "rewards/margins": 0.08051513135433197,
2066
+ "rewards/rejected": -1.6676326990127563,
2067
+ "step": 1460
2068
+ },
2069
+ {
2070
+ "epoch": 0.78,
2071
+ "learning_rate": 6.779609134902312e-07,
2072
+ "logits/chosen": 0.14115332067012787,
2073
+ "logits/rejected": 0.21945491433143616,
2074
+ "logps/chosen": -390.1023864746094,
2075
+ "logps/rejected": -373.8792419433594,
2076
+ "loss": 0.2011,
2077
+ "rewards/accuracies": 0.44999998807907104,
2078
+ "rewards/chosen": -1.3179562091827393,
2079
+ "rewards/margins": 0.11220277845859528,
2080
+ "rewards/rejected": -1.4301589727401733,
2081
+ "step": 1470
2082
+ },
2083
+ {
2084
+ "epoch": 0.79,
2085
+ "learning_rate": 6.464013863278629e-07,
2086
+ "logits/chosen": 0.105352483689785,
2087
+ "logits/rejected": 0.2724997103214264,
2088
+ "logps/chosen": -462.69989013671875,
2089
+ "logps/rejected": -417.04278564453125,
2090
+ "loss": 0.2099,
2091
+ "rewards/accuracies": 0.4937500059604645,
2092
+ "rewards/chosen": -1.6254713535308838,
2093
+ "rewards/margins": 0.10472464561462402,
2094
+ "rewards/rejected": -1.7301959991455078,
2095
+ "step": 1480
2096
+ },
2097
+ {
2098
+ "epoch": 0.79,
2099
+ "learning_rate": 6.154846545138696e-07,
2100
+ "logits/chosen": 0.2331116646528244,
2101
+ "logits/rejected": 0.163321390748024,
2102
+ "logps/chosen": -437.3831481933594,
2103
+ "logps/rejected": -436.51446533203125,
2104
+ "loss": 0.2165,
2105
+ "rewards/accuracies": 0.5375000238418579,
2106
+ "rewards/chosen": -1.76204514503479,
2107
+ "rewards/margins": 0.11055411398410797,
2108
+ "rewards/rejected": -1.8725992441177368,
2109
+ "step": 1490
2110
+ },
2111
+ {
2112
+ "epoch": 0.8,
2113
+ "learning_rate": 5.852214394262515e-07,
2114
+ "logits/chosen": 0.22001425921916962,
2115
+ "logits/rejected": 0.24056494235992432,
2116
+ "logps/chosen": -430.39471435546875,
2117
+ "logps/rejected": -417.5421447753906,
2118
+ "loss": 0.2663,
2119
+ "rewards/accuracies": 0.4749999940395355,
2120
+ "rewards/chosen": -1.5459119081497192,
2121
+ "rewards/margins": 0.06696294248104095,
2122
+ "rewards/rejected": -1.612874984741211,
2123
+ "step": 1500
2124
+ },
2125
+ {
2126
+ "epoch": 0.81,
2127
+ "learning_rate": 5.556222358149191e-07,
2128
+ "logits/chosen": 0.25534382462501526,
2129
+ "logits/rejected": 0.20129148662090302,
2130
+ "logps/chosen": -448.9977111816406,
2131
+ "logps/rejected": -448.968017578125,
2132
+ "loss": 0.2053,
2133
+ "rewards/accuracies": 0.512499988079071,
2134
+ "rewards/chosen": -1.7130622863769531,
2135
+ "rewards/margins": 0.10370539128780365,
2136
+ "rewards/rejected": -1.816767692565918,
2137
+ "step": 1510
2138
+ },
2139
+ {
2140
+ "epoch": 0.81,
2141
+ "learning_rate": 5.266973081622992e-07,
2142
+ "logits/chosen": 0.21835920214653015,
2143
+ "logits/rejected": 0.22384993731975555,
2144
+ "logps/chosen": -436.5664978027344,
2145
+ "logps/rejected": -419.2340393066406,
2146
+ "loss": 0.2178,
2147
+ "rewards/accuracies": 0.46875,
2148
+ "rewards/chosen": -1.622981071472168,
2149
+ "rewards/margins": 0.1084824651479721,
2150
+ "rewards/rejected": -1.7314636707305908,
2151
+ "step": 1520
2152
+ },
2153
+ {
2154
+ "epoch": 0.82,
2155
+ "learning_rate": 4.984566871237942e-07,
2156
+ "logits/chosen": 0.20313188433647156,
2157
+ "logits/rejected": 0.18856777250766754,
2158
+ "logps/chosen": -392.1644592285156,
2159
+ "logps/rejected": -397.46185302734375,
2160
+ "loss": 0.1923,
2161
+ "rewards/accuracies": 0.518750011920929,
2162
+ "rewards/chosen": -1.6037871837615967,
2163
+ "rewards/margins": 0.11552910506725311,
2164
+ "rewards/rejected": -1.7193162441253662,
2165
+ "step": 1530
2166
+ },
2167
+ {
2168
+ "epoch": 0.82,
2169
+ "learning_rate": 4.709101660493251e-07,
2170
+ "logits/chosen": 0.14777527749538422,
2171
+ "logits/rejected": 0.31510457396507263,
2172
+ "logps/chosen": -404.65362548828125,
2173
+ "logps/rejected": -411.5926818847656,
2174
+ "loss": 0.24,
2175
+ "rewards/accuracies": 0.5062500238418579,
2176
+ "rewards/chosen": -1.5332674980163574,
2177
+ "rewards/margins": 0.08915401995182037,
2178
+ "rewards/rejected": -1.6224216222763062,
2179
+ "step": 1540
2180
+ },
2181
+ {
2182
+ "epoch": 0.83,
2183
+ "learning_rate": 4.440672975871743e-07,
2184
+ "logits/chosen": 0.18385054171085358,
2185
+ "logits/rejected": 0.24470014870166779,
2186
+ "logps/chosen": -413.1792907714844,
2187
+ "logps/rejected": -391.4145202636719,
2188
+ "loss": 0.1848,
2189
+ "rewards/accuracies": 0.5062500238418579,
2190
+ "rewards/chosen": -1.3847424983978271,
2191
+ "rewards/margins": 0.12068633735179901,
2192
+ "rewards/rejected": -1.5054289102554321,
2193
+ "step": 1550
2194
+ },
2195
+ {
2196
+ "epoch": 0.83,
2197
+ "learning_rate": 4.1793739037129134e-07,
2198
+ "logits/chosen": 0.25826650857925415,
2199
+ "logits/rejected": 0.24141784012317657,
2200
+ "logps/chosen": -424.6695861816406,
2201
+ "logps/rejected": -403.15301513671875,
2202
+ "loss": 0.207,
2203
+ "rewards/accuracies": 0.5062500238418579,
2204
+ "rewards/chosen": -1.5093286037445068,
2205
+ "rewards/margins": 0.11515948921442032,
2206
+ "rewards/rejected": -1.624488115310669,
2207
+ "step": 1560
2208
+ },
2209
+ {
2210
+ "epoch": 0.84,
2211
+ "learning_rate": 3.9252950579322405e-07,
2212
+ "logits/chosen": 0.17827372252941132,
2213
+ "logits/rejected": 0.2383604496717453,
2214
+ "logps/chosen": -381.010498046875,
2215
+ "logps/rejected": -394.955322265625,
2216
+ "loss": 0.2787,
2217
+ "rewards/accuracies": 0.48750001192092896,
2218
+ "rewards/chosen": -1.4884707927703857,
2219
+ "rewards/margins": 0.07128787040710449,
2220
+ "rewards/rejected": -1.5597585439682007,
2221
+ "step": 1570
2222
+ },
2223
+ {
2224
+ "epoch": 0.84,
2225
+ "learning_rate": 3.6785245485978864e-07,
2226
+ "logits/chosen": 0.20974624156951904,
2227
+ "logits/rejected": 0.21026365458965302,
2228
+ "logps/chosen": -419.64788818359375,
2229
+ "logps/rejected": -432.1351013183594,
2230
+ "loss": 0.1838,
2231
+ "rewards/accuracies": 0.48750001192092896,
2232
+ "rewards/chosen": -1.6164900064468384,
2233
+ "rewards/margins": 0.08999641239643097,
2234
+ "rewards/rejected": -1.7064863443374634,
2235
+ "step": 1580
2236
+ },
2237
+ {
2238
+ "epoch": 0.85,
2239
+ "learning_rate": 3.43914795137566e-07,
2240
+ "logits/chosen": 0.16758783161640167,
2241
+ "logits/rejected": 0.19063523411750793,
2242
+ "logps/chosen": -434.1309509277344,
2243
+ "logps/rejected": -426.19061279296875,
2244
+ "loss": 0.1937,
2245
+ "rewards/accuracies": 0.512499988079071,
2246
+ "rewards/chosen": -1.5491647720336914,
2247
+ "rewards/margins": 0.10655365139245987,
2248
+ "rewards/rejected": -1.6557185649871826,
2249
+ "step": 1590
2250
+ },
2251
+ {
2252
+ "epoch": 0.85,
2253
+ "learning_rate": 3.207248277852901e-07,
2254
+ "logits/chosen": 0.29995113611221313,
2255
+ "logits/rejected": 0.23389451205730438,
2256
+ "logps/chosen": -431.3026428222656,
2257
+ "logps/rejected": -456.0216369628906,
2258
+ "loss": 0.2003,
2259
+ "rewards/accuracies": 0.5687500238418579,
2260
+ "rewards/chosen": -1.639478325843811,
2261
+ "rewards/margins": 0.15117011964321136,
2262
+ "rewards/rejected": -1.7906484603881836,
2263
+ "step": 1600
2264
+ },
2265
+ {
2266
+ "epoch": 0.86,
2267
+ "learning_rate": 2.9829059467515074e-07,
2268
+ "logits/chosen": 0.24911722540855408,
2269
+ "logits/rejected": 0.1267918050289154,
2270
+ "logps/chosen": -411.49237060546875,
2271
+ "logps/rejected": -393.44775390625,
2272
+ "loss": 0.1768,
2273
+ "rewards/accuracies": 0.512499988079071,
2274
+ "rewards/chosen": -1.5094853639602661,
2275
+ "rewards/margins": 0.12646006047725677,
2276
+ "rewards/rejected": -1.6359455585479736,
2277
+ "step": 1610
2278
+ },
2279
+ {
2280
+ "epoch": 0.86,
2281
+ "learning_rate": 2.766198756040153e-07,
2282
+ "logits/chosen": 0.18798711895942688,
2283
+ "logits/rejected": 0.22833366692066193,
2284
+ "logps/chosen": -468.82373046875,
2285
+ "logps/rejected": -442.8505859375,
2286
+ "loss": 0.1957,
2287
+ "rewards/accuracies": 0.5687500238418579,
2288
+ "rewards/chosen": -1.6605987548828125,
2289
+ "rewards/margins": 0.10856117308139801,
2290
+ "rewards/rejected": -1.7691596746444702,
2291
+ "step": 1620
2292
+ },
2293
+ {
2294
+ "epoch": 0.87,
2295
+ "learning_rate": 2.5572018559553155e-07,
2296
+ "logits/chosen": 0.1967056542634964,
2297
+ "logits/rejected": 0.3702329099178314,
2298
+ "logps/chosen": -437.220947265625,
2299
+ "logps/rejected": -398.580322265625,
2300
+ "loss": 0.2339,
2301
+ "rewards/accuracies": 0.45625001192092896,
2302
+ "rewards/chosen": -1.4941322803497314,
2303
+ "rewards/margins": 0.08855145424604416,
2304
+ "rewards/rejected": -1.582683801651001,
2305
+ "step": 1630
2306
+ },
2307
+ {
2308
+ "epoch": 0.87,
2309
+ "learning_rate": 2.3559877229404864e-07,
2310
+ "logits/chosen": 0.12015213072299957,
2311
+ "logits/rejected": 0.25555509328842163,
2312
+ "logps/chosen": -421.092529296875,
2313
+ "logps/rejected": -394.6724548339844,
2314
+ "loss": 0.1893,
2315
+ "rewards/accuracies": 0.5562499761581421,
2316
+ "rewards/chosen": -1.5294889211654663,
2317
+ "rewards/margins": 0.102900430560112,
2318
+ "rewards/rejected": -1.6323894262313843,
2319
+ "step": 1640
2320
+ },
2321
+ {
2322
+ "epoch": 0.88,
2323
+ "learning_rate": 2.1626261345126576e-07,
2324
+ "logits/chosen": 0.19819477200508118,
2325
+ "logits/rejected": 0.137999027967453,
2326
+ "logps/chosen": -409.06414794921875,
2327
+ "logps/rejected": -397.3619384765625,
2328
+ "loss": 0.2459,
2329
+ "rewards/accuracies": 0.4375,
2330
+ "rewards/chosen": -1.6599018573760986,
2331
+ "rewards/margins": 0.07884421199560165,
2332
+ "rewards/rejected": -1.738745927810669,
2333
+ "step": 1650
2334
+ },
2335
+ {
2336
+ "epoch": 0.89,
2337
+ "learning_rate": 1.9771841450646505e-07,
2338
+ "logits/chosen": 0.22391238808631897,
2339
+ "logits/rejected": 0.34880322217941284,
2340
+ "logps/chosen": -402.15704345703125,
2341
+ "logps/rejected": -385.95684814453125,
2342
+ "loss": 0.2246,
2343
+ "rewards/accuracies": 0.4749999940395355,
2344
+ "rewards/chosen": -1.4918997287750244,
2345
+ "rewards/margins": 0.1032886654138565,
2346
+ "rewards/rejected": -1.5951886177062988,
2347
+ "step": 1660
2348
+ },
2349
+ {
2350
+ "epoch": 0.89,
2351
+ "learning_rate": 1.7997260626118758e-07,
2352
+ "logits/chosen": 0.16321547329425812,
2353
+ "logits/rejected": 0.20088446140289307,
2354
+ "logps/chosen": -432.9225158691406,
2355
+ "logps/rejected": -423.73651123046875,
2356
+ "loss": 0.2085,
2357
+ "rewards/accuracies": 0.512499988079071,
2358
+ "rewards/chosen": -1.5294908285140991,
2359
+ "rewards/margins": 0.13079039752483368,
2360
+ "rewards/rejected": -1.6602811813354492,
2361
+ "step": 1670
2362
+ },
2363
+ {
2364
+ "epoch": 0.9,
2365
+ "learning_rate": 1.6303134264914365e-07,
2366
+ "logits/chosen": 0.2602766156196594,
2367
+ "logits/rejected": 0.26268571615219116,
2368
+ "logps/chosen": -391.55474853515625,
2369
+ "logps/rejected": -416.79736328125,
2370
+ "loss": 0.203,
2371
+ "rewards/accuracies": 0.512499988079071,
2372
+ "rewards/chosen": -1.5875943899154663,
2373
+ "rewards/margins": 0.08693927526473999,
2374
+ "rewards/rejected": -1.6745338439941406,
2375
+ "step": 1680
2376
+ },
2377
+ {
2378
+ "epoch": 0.9,
2379
+ "learning_rate": 1.469004986021355e-07,
2380
+ "logits/chosen": 0.18200412392616272,
2381
+ "logits/rejected": 0.23976688086986542,
2382
+ "logps/chosen": -433.8134765625,
2383
+ "logps/rejected": -419.9029235839844,
2384
+ "loss": 0.1764,
2385
+ "rewards/accuracies": 0.5,
2386
+ "rewards/chosen": -1.5231817960739136,
2387
+ "rewards/margins": 0.10605546087026596,
2388
+ "rewards/rejected": -1.629237413406372,
2389
+ "step": 1690
2390
+ },
2391
+ {
2392
+ "epoch": 0.91,
2393
+ "learning_rate": 1.315856680127367e-07,
2394
+ "logits/chosen": 0.22589726746082306,
2395
+ "logits/rejected": 0.2215690314769745,
2396
+ "logps/chosen": -458.8824768066406,
2397
+ "logps/rejected": -451.643310546875,
2398
+ "loss": 0.2159,
2399
+ "rewards/accuracies": 0.5249999761581421,
2400
+ "rewards/chosen": -1.6877247095108032,
2401
+ "rewards/margins": 0.08869151771068573,
2402
+ "rewards/rejected": -1.7764160633087158,
2403
+ "step": 1700
2404
+ },
2405
+ {
2406
+ "epoch": 0.91,
2407
+ "learning_rate": 1.1709216179442817e-07,
2408
+ "logits/chosen": 0.21676687896251678,
2409
+ "logits/rejected": 0.26381057500839233,
2410
+ "logps/chosen": -391.4685974121094,
2411
+ "logps/rejected": -410.5079040527344,
2412
+ "loss": 0.2025,
2413
+ "rewards/accuracies": 0.5625,
2414
+ "rewards/chosen": -1.452115535736084,
2415
+ "rewards/margins": 0.15328224003314972,
2416
+ "rewards/rejected": -1.6053978204727173,
2417
+ "step": 1710
2418
+ },
2419
+ {
2420
+ "epoch": 0.92,
2421
+ "learning_rate": 1.0342500603986421e-07,
2422
+ "logits/chosen": 0.24852243065834045,
2423
+ "logits/rejected": 0.22877268493175507,
2424
+ "logps/chosen": -399.13787841796875,
2425
+ "logps/rejected": -416.26031494140625,
2426
+ "loss": 0.1899,
2427
+ "rewards/accuracies": 0.46875,
2428
+ "rewards/chosen": -1.4324636459350586,
2429
+ "rewards/margins": 0.1060786023736,
2430
+ "rewards/rejected": -1.5385421514511108,
2431
+ "step": 1720
2432
+ },
2433
+ {
2434
+ "epoch": 0.92,
2435
+ "learning_rate": 9.058894027791643e-08,
2436
+ "logits/chosen": 0.08590670675039291,
2437
+ "logits/rejected": 0.2821047306060791,
2438
+ "logps/chosen": -400.29595947265625,
2439
+ "logps/rejected": -401.18670654296875,
2440
+ "loss": 0.2732,
2441
+ "rewards/accuracies": 0.46875,
2442
+ "rewards/chosen": -1.5163061618804932,
2443
+ "rewards/margins": 0.13003432750701904,
2444
+ "rewards/rejected": -1.6463406085968018,
2445
+ "step": 1730
2446
+ },
2447
+ {
2448
+ "epoch": 0.93,
2449
+ "learning_rate": 7.858841583008592e-08,
2450
+ "logits/chosen": 0.22975251078605652,
2451
+ "logits/rejected": 0.22381798923015594,
2452
+ "logps/chosen": -436.4466857910156,
2453
+ "logps/rejected": -396.7818908691406,
2454
+ "loss": 0.1904,
2455
+ "rewards/accuracies": 0.48750001192092896,
2456
+ "rewards/chosen": -1.4835584163665771,
2457
+ "rewards/margins": 0.10315178334712982,
2458
+ "rewards/rejected": -1.5867103338241577,
2459
+ "step": 1740
2460
+ },
2461
+ {
2462
+ "epoch": 0.93,
2463
+ "learning_rate": 6.742759426686313e-08,
2464
+ "logits/chosen": 0.2513951361179352,
2465
+ "logits/rejected": 0.2925378680229187,
2466
+ "logps/chosen": -405.58123779296875,
2467
+ "logps/rejected": -404.1107177734375,
2468
+ "loss": 0.2141,
2469
+ "rewards/accuracies": 0.5,
2470
+ "rewards/chosen": -1.4388840198516846,
2471
+ "rewards/margins": 0.10872511565685272,
2472
+ "rewards/rejected": -1.5476090908050537,
2473
+ "step": 1750
2474
+ },
2475
+ {
2476
+ "epoch": 0.94,
2477
+ "learning_rate": 5.7110345964571104e-08,
2478
+ "logits/chosen": 0.1784476488828659,
2479
+ "logits/rejected": 0.25003939867019653,
2480
+ "logps/chosen": -415.51593017578125,
2481
+ "logps/rejected": -410.5809020996094,
2482
+ "loss": 0.1926,
2483
+ "rewards/accuracies": 0.48750001192092896,
2484
+ "rewards/chosen": -1.519412875175476,
2485
+ "rewards/margins": 0.13301292061805725,
2486
+ "rewards/rejected": -1.65242600440979,
2487
+ "step": 1760
2488
+ },
2489
+ {
2490
+ "epoch": 0.94,
2491
+ "learning_rate": 4.764024876318357e-08,
2492
+ "logits/chosen": 0.07363792508840561,
2493
+ "logits/rejected": 0.30069494247436523,
2494
+ "logps/chosen": -460.7679138183594,
2495
+ "logps/rejected": -434.69757080078125,
2496
+ "loss": 0.2058,
2497
+ "rewards/accuracies": 0.543749988079071,
2498
+ "rewards/chosen": -1.5749928951263428,
2499
+ "rewards/margins": 0.14540357887744904,
2500
+ "rewards/rejected": -1.7203963994979858,
2501
+ "step": 1770
2502
+ },
2503
+ {
2504
+ "epoch": 0.95,
2505
+ "learning_rate": 3.902058672559633e-08,
2506
+ "logits/chosen": 0.14530272781848907,
2507
+ "logits/rejected": 0.1931276023387909,
2508
+ "logps/chosen": -424.07928466796875,
2509
+ "logps/rejected": -428.74932861328125,
2510
+ "loss": 0.1889,
2511
+ "rewards/accuracies": 0.5,
2512
+ "rewards/chosen": -1.6333696842193604,
2513
+ "rewards/margins": 0.10268044471740723,
2514
+ "rewards/rejected": -1.7360502481460571,
2515
+ "step": 1780
2516
+ },
2517
+ {
2518
+ "epoch": 0.95,
2519
+ "learning_rate": 3.125434899876933e-08,
2520
+ "logits/chosen": 0.17779022455215454,
2521
+ "logits/rejected": 0.2716149389743805,
2522
+ "logps/chosen": -426.74224853515625,
2523
+ "logps/rejected": -393.4443664550781,
2524
+ "loss": 0.2089,
2525
+ "rewards/accuracies": 0.512499988079071,
2526
+ "rewards/chosen": -1.4672796726226807,
2527
+ "rewards/margins": 0.13659346103668213,
2528
+ "rewards/rejected": -1.6038730144500732,
2529
+ "step": 1790
2530
+ },
2531
+ {
2532
+ "epoch": 0.96,
2533
+ "learning_rate": 2.4344228777145873e-08,
2534
+ "logits/chosen": 0.11182694137096405,
2535
+ "logits/rejected": 0.20840835571289062,
2536
+ "logps/chosen": -430.88555908203125,
2537
+ "logps/rejected": -420.53338623046875,
2538
+ "loss": 0.2144,
2539
+ "rewards/accuracies": 0.5,
2540
+ "rewards/chosen": -1.655738115310669,
2541
+ "rewards/margins": 0.0909801721572876,
2542
+ "rewards/rejected": -1.7467180490493774,
2543
+ "step": 1800
2544
+ },
2545
+ {
2546
+ "epoch": 0.97,
2547
+ "learning_rate": 1.829262236869772e-08,
2548
+ "logits/chosen": 0.21950674057006836,
2549
+ "logits/rejected": 0.22139541804790497,
2550
+ "logps/chosen": -420.4927673339844,
2551
+ "logps/rejected": -435.2838439941406,
2552
+ "loss": 0.1756,
2553
+ "rewards/accuracies": 0.581250011920929,
2554
+ "rewards/chosen": -1.5889238119125366,
2555
+ "rewards/margins": 0.13235324621200562,
2556
+ "rewards/rejected": -1.7212769985198975,
2557
+ "step": 1810
2558
+ },
2559
+ {
2560
+ "epoch": 0.97,
2561
+ "learning_rate": 1.3101628363929586e-08,
2562
+ "logits/chosen": 0.26217299699783325,
2563
+ "logits/rejected": 0.21528875827789307,
2564
+ "logps/chosen": -405.06927490234375,
2565
+ "logps/rejected": -418.52716064453125,
2566
+ "loss": 0.1739,
2567
+ "rewards/accuracies": 0.48124998807907104,
2568
+ "rewards/chosen": -1.4184885025024414,
2569
+ "rewards/margins": 0.11147931963205338,
2570
+ "rewards/rejected": -1.5299677848815918,
2571
+ "step": 1820
2572
+ },
2573
+ {
2574
+ "epoch": 0.98,
2575
+ "learning_rate": 8.773046908123195e-09,
2576
+ "logits/chosen": 0.2996518015861511,
2577
+ "logits/rejected": 0.27609384059906006,
2578
+ "logps/chosen": -472.917724609375,
2579
+ "logps/rejected": -475.19744873046875,
2580
+ "loss": 0.1954,
2581
+ "rewards/accuracies": 0.5375000238418579,
2582
+ "rewards/chosen": -1.7787902355194092,
2583
+ "rewards/margins": 0.10695469379425049,
2584
+ "rewards/rejected": -1.8857448101043701,
2585
+ "step": 1830
2586
+ },
2587
+ {
2588
+ "epoch": 0.98,
2589
+ "learning_rate": 5.308379077080817e-09,
2590
+ "logits/chosen": 0.19328948855400085,
2591
+ "logits/rejected": 0.27855589985847473,
2592
+ "logps/chosen": -399.41546630859375,
2593
+ "logps/rejected": -393.7513122558594,
2594
+ "loss": 0.1921,
2595
+ "rewards/accuracies": 0.4625000059604645,
2596
+ "rewards/chosen": -1.5310485363006592,
2597
+ "rewards/margins": 0.09220267832279205,
2598
+ "rewards/rejected": -1.6232513189315796,
2599
+ "step": 1840
2600
+ },
2601
+ {
2602
+ "epoch": 0.99,
2603
+ "learning_rate": 2.7088263565760996e-09,
2604
+ "logits/chosen": 0.17895421385765076,
2605
+ "logits/rejected": 0.3152690827846527,
2606
+ "logps/chosen": -428.0615234375,
2607
+ "logps/rejected": -412.861328125,
2608
+ "loss": 0.1983,
2609
+ "rewards/accuracies": 0.5562499761581421,
2610
+ "rewards/chosen": -1.5715734958648682,
2611
+ "rewards/margins": 0.13439974188804626,
2612
+ "rewards/rejected": -1.7059733867645264,
2613
+ "step": 1850
2614
+ },
2615
+ {
2616
+ "epoch": 0.99,
2617
+ "learning_rate": 9.752902257023633e-10,
2618
+ "logits/chosen": 0.16039499640464783,
2619
+ "logits/rejected": 0.25197163224220276,
2620
+ "logps/chosen": -391.1293640136719,
2621
+ "logps/rejected": -386.4063415527344,
2622
+ "loss": 0.2198,
2623
+ "rewards/accuracies": 0.45625001192092896,
2624
+ "rewards/chosen": -1.3501429557800293,
2625
+ "rewards/margins": 0.10042072832584381,
2626
+ "rewards/rejected": -1.450563669204712,
2627
+ "step": 1860
2628
+ },
2629
+ {
2630
+ "epoch": 1.0,
2631
+ "learning_rate": 1.083718442532189e-10,
2632
+ "logits/chosen": 0.18452344834804535,
2633
+ "logits/rejected": 0.27631479501724243,
2634
+ "logps/chosen": -442.81622314453125,
2635
+ "logps/rejected": -411.86212158203125,
2636
+ "loss": 0.157,
2637
+ "rewards/accuracies": 0.574999988079071,
2638
+ "rewards/chosen": -1.6199710369110107,
2639
+ "rewards/margins": 0.15131933987140656,
2640
+ "rewards/rejected": -1.7712904214859009,
2641
+ "step": 1870
2642
+ },
2643
+ {
2644
+ "epoch": 1.0,
2645
+ "step": 1875,
2646
+ "total_flos": 0.0,
2647
+ "train_loss": 0.22404387894471486,
2648
+ "train_runtime": 11451.6187,
2649
+ "train_samples_per_second": 2.62,
2650
+ "train_steps_per_second": 0.164
2651
+ }
2652
+ ],
2653
+ "logging_steps": 10,
2654
+ "max_steps": 1875,
2655
+ "num_input_tokens_seen": 0,
2656
+ "num_train_epochs": 1,
2657
+ "save_steps": 100,
2658
+ "total_flos": 0.0,
2659
+ "train_batch_size": 4,
2660
+ "trial_name": null,
2661
+ "trial_params": null
2662
+ }