Model save

Browse files

Files changed (9) hide show

README.md +17 -22
all_results.json +5 -5
model-00001-of-00003.safetensors +1 -1
model-00002-of-00003.safetensors +1 -1
model-00003-of-00003.safetensors +1 -1
runs/May20_00-19-57_n136-100-194/events.out.tfevents.1716136109.n136-100-194.871503.0 +2 -2
train_results.json +5 -5
trainer_state.json +0 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -15,15 +15,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model was trained from scratch on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.5366
-- Rewards/chosen: -2.9738
-- Rewards/rejected: -4.4991
-- Rewards/accuracies: 0.7617
-- Rewards/margins: 1.5252
-- Logps/rejected: -767.4317
-- Logps/chosen: -609.1594
-- Logits/rejected: 1.6095
-- Logits/chosen: 0.9559
 ## Model description
@@ -60,19 +60,14 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.5905        | 0.07  | 100  | 0.6429          | -0.1380        | -0.3441          | 0.6719             | 0.2061          | -351.9318      | -325.5744    | -1.7244         | -1.7878       |
-| 0.4495        | 0.15  | 200  | 0.5600          | -0.4940        | -1.0973          | 0.7461             | 0.6032          | -427.2510      | -361.1815    | -1.3665         | -1.4371       |
-| 0.3963        | 0.22  | 300  | 0.5291          | -1.1123        | -2.0359          | 0.7422             | 0.9236          | -521.1155      | -423.0034    | -1.2770         | -1.4609       |
-| 0.4012        | 0.3   | 400  | 0.5315          | -1.0588        | -1.9923          | 0.7734             | 0.9334          | -516.7505      | -417.6586    | -1.1223         | -1.3373       |
-| 0.3559        | 0.37  | 500  | 0.5276          | -1.4423        | -2.5146          | 0.7578             | 1.0723          | -568.9822      | -456.0086    | -0.6834         | -1.0067       |
-| 0.3291        | 0.45  | 600  | 0.5103          | -1.6617        | -2.7811          | 0.7695             | 1.1194          | -595.6332      | -477.9445    | 0.1886          | -0.2334       |
-| 0.2735        | 0.52  | 700  | 0.5289          | -2.2950        | -3.7006          | 0.7617             | 1.4056          | -687.5872      | -541.2795    | 0.6722          | 0.1870        |
-| 0.2752        | 0.59  | 800  | 0.5229          | -2.2134        | -3.5070          | 0.7656             | 1.2935          | -668.2236      | -533.1202    | 0.2752          | -0.1628       |
-| 0.2492        | 0.67  | 900  | 0.5152          | -2.0646        | -3.3529          | 0.7734             | 1.2882          | -652.8116      | -518.2382    | 1.0726          | 0.5184        |
-| 0.262         | 0.74  | 1000 | 0.5241          | -2.4505        | -3.8564          | 0.7617             | 1.4059          | -703.1603      | -556.8265    | 1.3124          | 0.6805        |
-| 0.2299        | 0.82  | 1100 | 0.5313          | -2.7647        | -4.2433          | 0.7578             | 1.4786          | -741.8574      | -588.2495    | 1.4834          | 0.8391        |
-| 0.1974        | 0.89  | 1200 | 0.5367          | -2.9484        | -4.4713          | 0.7617             | 1.5229          | -764.6512      | -606.6174    | 1.5458          | 0.8964        |
-| 0.1842        | 0.97  | 1300 | 0.5366          | -2.9738        | -4.4991          | 0.7617             | 1.5252          | -767.4317      | -609.1594    | 1.6095          | 0.9559        |
 ### Framework versions

 This model was trained from scratch on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.4283
+- Rewards/chosen: -1.4066
+- Rewards/rejected: -2.3431
+- Rewards/accuracies: 0.8594
+- Rewards/margins: 0.9365
+- Logps/rejected: -567.4185
+- Logps/chosen: -476.1736
+- Logits/rejected: 0.8733
+- Logits/chosen: 0.3446
 ## Model description
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.5295        | 0.12  | 100  | 0.6076          | -0.4310        | -0.7253          | 0.6992             | 0.2943          | -405.6328      | -378.6121    | -1.6941         | -1.7860       |
+| 0.436         | 0.23  | 200  | 0.5481          | -1.0281        | -1.5447          | 0.7578             | 0.5166          | -487.5739      | -438.3181    | -0.9160         | -1.0564       |
+| 0.4266        | 0.35  | 300  | 0.4992          | -1.6631        | -2.4082          | 0.8086             | 0.7450          | -573.9229      | -501.8244    | -0.0079         | -0.2921       |
+| 0.3779        | 0.46  | 400  | 0.4760          | -1.5291        | -2.2881          | 0.8164             | 0.7591          | -561.9199      | -488.4168    | 0.0913          | -0.2997       |
+| 0.3713        | 0.58  | 500  | 0.4527          | -1.3552        | -2.2406          | 0.8320             | 0.8854          | -557.1675      | -471.0288    | 0.4886          | 0.0541        |
+| 0.3817        | 0.69  | 600  | 0.4398          | -1.5276        | -2.4352          | 0.8516             | 0.9076          | -576.6248      | -488.2740    | 0.9378          | 0.4596        |
+| 0.3613        | 0.81  | 700  | 0.4308          | -1.4716        | -2.3968          | 0.8711             | 0.9252          | -572.7809      | -482.6695    | 0.9228          | 0.4112        |
+| 0.4032        | 0.92  | 800  | 0.4283          | -1.4066        | -2.3431          | 0.8594             | 0.9365          | -567.4185      | -476.1736    | 0.8733          | 0.3446        |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 1.0,
-    "train_loss": 0.335402155391883,
-    "train_runtime": 21644.3608,
-    "train_samples": 172268,
-    "train_samples_per_second": 7.959,
-    "train_steps_per_second": 0.062
 }

 {
     "epoch": 1.0,
+    "train_loss": 0.42703192135156026,
+    "train_runtime": 13837.373,
+    "train_samples": 111134,
+    "train_samples_per_second": 8.031,
+    "train_steps_per_second": 0.063
 }

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e9f836055bbd8c90fff82b466785b3c0bb773e39b03a40c15c13a2e943087d51
 size 4943178720

 version https://git-lfs.github.com/spec/v1
+oid sha256:d79be581ca4db04064a86c601510c713774b5d2212defedb547ca5140a1ea938
 size 4943178720

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bb1b6c8b26fd22edeb102b325e8c1fbfb5a31ab5cd157cb8b276c563db6e9c41
 size 4999819336

 version https://git-lfs.github.com/spec/v1
+oid sha256:1cc6378b4ecc04b565b7ba5c0687c590bc3ce5d97bdac059d5b1bd176ba33a5f
 size 4999819336

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:959b85a59ee4907acea070d073fd679eee4438a9f246c343a20944bc8cecd8c7
 size 4540532728

 version https://git-lfs.github.com/spec/v1
+oid sha256:d04242828ce6ba6f2d64d16aab8c8df98c4aaff7c486dd771a5797817ba55cb8
 size 4540532728

runs/May20_00-19-57_n136-100-194/events.out.tfevents.1716136109.n136-100-194.871503.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1ddcff838e03a8db37bc2080e67cdd23837c45705161072a7e8d5148878862e6
-size 66367

 version https://git-lfs.github.com/spec/v1
+oid sha256:88c8fb39162a1e0c31e5142469d7bf7b5d760cdb3f5b5438d73da6a3db9e076d
+size 70849

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 1.0,
-    "train_loss": 0.335402155391883,
-    "train_runtime": 21644.3608,
-    "train_samples": 172268,
-    "train_samples_per_second": 7.959,
-    "train_steps_per_second": 0.062
 }

 {
     "epoch": 1.0,
+    "train_loss": 0.42703192135156026,
+    "train_runtime": 13837.373,
+    "train_samples": 111134,
+    "train_samples_per_second": 8.031,
+    "train_steps_per_second": 0.063
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3077373225b1a919a01dd0dd5247b886bff39d3d7ce4c7bf8dd5920f3b027fe1
 size 6264

 version https://git-lfs.github.com/spec/v1
+oid sha256:3ad993e59763b6deb1a45e1763f37981a00bcfe17b4c33793e0c48bbe37f3f7a
 size 6264