Model save

Browse files

Files changed (7) hide show

README.md +24 -27
all_results.json +3 -3
model-00001-of-00003.safetensors +1 -1
model-00002-of-00003.safetensors +1 -1
model-00003-of-00003.safetensors +1 -1
train_results.json +3 -3
trainer_state.json +0 -0

README.md CHANGED Viewed

@@ -13,23 +13,20 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sanqiang/wdpo/runs/2uwctjg2)
 # zephyr-7b-dpo-full
 This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0070
-- Rewards/chosen: -1.4073
-- Rewards/rejected: -1.6626
-- Rewards/accuracies: 0.6101
-- Rewards/margins: 0.2554
-- Logps/rejected: -316.9625
-- Logps/chosen: -284.9710
-- Logits/rejected: -2.3666
-- Logits/chosen: -2.3785
-- Debug/policy Weights: 0.0083
-- Debug/losses: 0.0052
-- Debug/raw Losses: 0.6403
 ## Model description
@@ -64,20 +61,20 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Debug/policy Weights | Debug/losses | Debug/raw Losses |
-|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------------------:|:------------:|:----------------:|
-| 0.0684        | 0.0796 | 100  | 0.0623          | -0.1396        | -0.1630          | 0.6035             | 0.0234          | -167.0039      | -158.2063    | -2.7060         | -2.7139       | 0.0900               | 0.0615       | 0.6827           |
-| 0.0188        | 0.1592 | 200  | 0.0181          | -0.6590        | -0.7926          | 0.6353             | 0.1336          | -229.9578      | -210.1448    | -2.6167         | -2.6260       | 0.0261               | 0.0170       | 0.6478           |
-| 0.0106        | 0.2388 | 300  | 0.0124          | -0.8848        | -1.0088          | 0.6231             | 0.1239          | -251.5774      | -232.7285    | -2.5326         | -2.5410       | 0.0182               | 0.0117       | 0.6504           |
-| 0.0113        | 0.3183 | 400  | 0.0107          | -1.1250        | -1.3221          | 0.6259             | 0.1971          | -282.9042      | -256.7430    | -2.5431         | -2.5541       | 0.0146               | 0.0094       | 0.6486           |
-| 0.0049        | 0.3979 | 500  | 0.0052          | -1.5559        | -1.7544          | 0.5830             | 0.1985          | -326.1408      | -299.8377    | -2.5389         | -2.5502       | 0.0070               | 0.0046       | 0.6677           |
-| 0.0057        | 0.4775 | 600  | 0.0074          | -1.3034        | -1.5082          | 0.6138             | 0.2048          | -301.5209      | -274.5812    | -2.5458         | -2.5559       | 0.0100               | 0.0064       | 0.6465           |
-| 0.0088        | 0.5571 | 700  | 0.0103          | -1.1945        | -1.4133          | 0.6213             | 0.2188          | -292.0290      | -263.6917    | -2.5181         | -2.5285       | 0.0130               | 0.0083       | 0.6415           |
-| 0.0045        | 0.6367 | 800  | 0.0048          | -1.5892        | -1.8227          | 0.6054             | 0.2336          | -332.9696      | -303.1591    | -2.3814         | -2.3916       | 0.0058               | 0.0037       | 0.6507           |
-| 0.0058        | 0.7163 | 900  | 0.0066          | -1.4189        | -1.6455          | 0.6054             | 0.2266          | -315.2442      | -286.1336    | -2.3435         | -2.3544       | 0.0083               | 0.0052       | 0.6436           |
-| 0.006         | 0.7959 | 1000 | 0.0062          | -1.4586        | -1.6997          | 0.6091             | 0.2411          | -320.6679      | -290.1025    | -2.3587         | -2.3701       | 0.0075               | 0.0047       | 0.6449           |
-| 0.0058        | 0.8754 | 1100 | 0.0070          | -1.3982        | -1.6486          | 0.6063             | 0.2504          | -315.5557      | -284.0606    | -2.3679         | -2.3796       | 0.0084               | 0.0052       | 0.6403           |
-| 0.0064        | 0.9550 | 1200 | 0.0070          | -1.4073        | -1.6626          | 0.6101             | 0.2554          | -316.9625      | -284.9710    | -2.3666         | -2.3785       | 0.0083               | 0.0052       | 0.6403           |
 ### Framework versions

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sanqiang/wdpo/runs/dnn9mazg)
 # zephyr-7b-dpo-full
 This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.5417
+- Rewards/chosen: -2.1562
+- Rewards/rejected: -2.8807
+- Rewards/accuracies: 0.7313
+- Rewards/margins: 0.7245
+- Logps/rejected: -438.7701
+- Logps/chosen: -359.8675
+- Logits/rejected: 0.5902
+- Logits/chosen: 0.3561
 ## Model description
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.679         | 0.0796 | 100  | 0.6759          | -0.1436        | -0.1818          | 0.5998             | 0.0382          | -168.8750      | -158.6036    | -2.6862         | -2.6943       |
+| 0.5947        | 0.1592 | 200  | 0.6027          | -1.5133        | -2.0123          | 0.6679             | 0.4990          | -351.9330      | -295.5727    | -1.6083         | -1.6620       |
+| 0.578         | 0.2388 | 300  | 0.5751          | -1.2683        | -1.7143          | 0.6894             | 0.4460          | -322.1284      | -271.0768    | -1.3925         | -1.5128       |
+| 0.5575        | 0.3183 | 400  | 0.5613          | -1.7874        | -2.4481          | 0.7052             | 0.6607          | -395.5074      | -322.9848    | -0.2511         | -0.4263       |
+| 0.5311        | 0.3979 | 500  | 0.5601          | -2.0743        | -2.7782          | 0.7248             | 0.7039          | -428.5196      | -351.6741    | 0.1321          | -0.1444       |
+| 0.5658        | 0.4775 | 600  | 0.5562          | -1.9576        | -2.6629          | 0.7192             | 0.7053          | -416.9899      | -340.0069    | 0.9125          | 0.6661        |
+| 0.556         | 0.5571 | 700  | 0.5502          | -2.1146        | -2.7825          | 0.7201             | 0.6678          | -428.9443      | -355.7084    | 0.9969          | 0.7302        |
+| 0.5285        | 0.6367 | 800  | 0.5477          | -2.1980        | -2.9456          | 0.7229             | 0.7476          | -445.2567      | -364.0405    | 0.8564          | 0.6029        |
+| 0.5299        | 0.7163 | 900  | 0.5450          | -2.1121        | -2.8512          | 0.7341             | 0.7391          | -435.8159      | -355.4508    | 0.9832          | 0.7089        |
+| 0.5629        | 0.7959 | 1000 | 0.5440          | -2.1483        | -2.8941          | 0.7323             | 0.7457          | -440.1051      | -359.0749    | 0.7033          | 0.4600        |
+| 0.5351        | 0.8754 | 1100 | 0.5423          | -2.1496        | -2.8571          | 0.7304             | 0.7074          | -436.4062      | -359.2066    | 0.5029          | 0.2753        |
+| 0.5499        | 0.9550 | 1200 | 0.5417          | -2.1562        | -2.8807          | 0.7313             | 0.7245          | -438.7701      | -359.8675    | 0.5902          | 0.3561        |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 0.9996020692399522,
     "total_flos": 0.0,
-    "train_loss": 0.016716306088907514,
-    "train_runtime": 10019.6903,
     "train_samples": 160800,
-    "train_samples_per_second": 16.048,
     "train_steps_per_second": 0.125
 }

 {
     "epoch": 0.9996020692399522,
     "total_flos": 0.0,
+    "train_loss": 0.56636344817034,
+    "train_runtime": 10031.2749,
     "train_samples": 160800,
+    "train_samples_per_second": 16.03,
     "train_steps_per_second": 0.125
 }

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2665dd7ae6b8845472d555a3ac411c024c47379eb8ab0ec3b01ed462262c5cae
 size 4943162336

 version https://git-lfs.github.com/spec/v1
+oid sha256:98f00987598688fdebf9936701ec965959200df36d355e58d759228a95bd1106
 size 4943162336

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d79d69148dbbfc6544ef08399154439f0d93dbb9c57ce3f0dd468ecee3d39edc
 size 4999819336

 version https://git-lfs.github.com/spec/v1
+oid sha256:69882d48888219846a5788fd8b94d1e6391d766b051bcaf33889cd3e7e8ce63f
 size 4999819336

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:34f814d8fb6482f27437cc5bb3ed6eb467b95d7d34b0af08c5a5d52feead3565
 size 4540516344

 version https://git-lfs.github.com/spec/v1
+oid sha256:88597e6e627267d3070da8b8d6010bbdf8fdee4bfd6d7c44ef9daa98a75f8dc9
 size 4540516344

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 0.9996020692399522,
     "total_flos": 0.0,
-    "train_loss": 0.016716306088907514,
-    "train_runtime": 10019.6903,
     "train_samples": 160800,
-    "train_samples_per_second": 16.048,
     "train_steps_per_second": 0.125
 }

 {
     "epoch": 0.9996020692399522,
     "total_flos": 0.0,
+    "train_loss": 0.56636344817034,
+    "train_runtime": 10031.2749,
     "train_samples": 160800,
+    "train_samples_per_second": 16.03,
     "train_steps_per_second": 0.125
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff