Model save

Browse files

Files changed (8) hide show

README.md +28 -32
all_results.json +4 -4
model-00001-of-00003.safetensors +1 -1
model-00002-of-00003.safetensors +1 -1
model-00003-of-00003.safetensors +1 -1
train_results.json +4 -4
trainer_state.json +0 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -2,34 +2,30 @@
 license: mit
 base_model: HuggingFaceH4/mistral-7b-sft-beta
 tags:
-- alignment-handbook
 - generated_from_trainer
-datasets:
-- HuggingFaceH4/hh-rlhf-h4
 model-index:
-- name: baseline2
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# baseline2
-This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the HuggingFaceH4/hh-rlhf-h4 dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0260
-- Rewards/chosen: -2.1934
-- Rewards/rejected: -2.7139
-- Rewards/accuracies: 0.7034
-- Rewards/margins: 0.5205
-- Logps/rejected: -422.5251
-- Logps/chosen: -363.9185
-- Logits/rejected: -1.3643
-- Logits/chosen: -1.4823
-- Debug/policy Weights: 0.0429
-- Debug/losses: 0.0248
-- Debug/raw Losses: 0.5706
 ## Model description
@@ -64,20 +60,20 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Debug/policy Weights | Debug/losses | Debug/raw Losses |
-|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------------------:|:------------:|:----------------:|
-| 0.1732        | 0.08  | 100  | 0.1665          | -0.1359        | -0.1700          | 0.6082             | 0.0341          | -168.1360      | -158.1616    | -2.7242         | -2.7324       | 0.2430               | 0.1649       | 0.6787           |
-| 0.0483        | 0.16  | 200  | 0.0413          | -1.3001        | -1.4580          | 0.6054             | 0.1579          | -296.9352      | -274.5804    | -2.5074         | -2.5178       | 0.0625               | 0.0392       | 0.6508           |
-| 0.036         | 0.24  | 300  | 0.0367          | -1.5860        | -1.8750          | 0.6465             | 0.2890          | -338.6323      | -303.1715    | -2.3859         | -2.3966       | 0.0580               | 0.0355       | 0.6226           |
-| 0.0407        | 0.32  | 400  | 0.0425          | -1.5769        | -1.9682          | 0.6586             | 0.3914          | -347.9584      | -302.2592    | -2.3752         | -2.3958       | 0.0683               | 0.0413       | 0.6047           |
-| 0.033         | 0.4   | 500  | 0.0298          | -1.9640        | -2.3686          | 0.6632             | 0.4046          | -387.9910      | -340.9718    | -2.3550         | -2.3756       | 0.0479               | 0.0283       | 0.6049           |
-| 0.0236        | 0.48  | 600  | 0.0252          | -2.1962        | -2.6613          | 0.6716             | 0.4651          | -417.2640      | -364.1955    | -2.0262         | -2.0674       | 0.0404               | 0.0239       | 0.5903           |
-| 0.0278        | 0.56  | 700  | 0.0294          | -2.0777        | -2.5844          | 0.6828             | 0.5067          | -409.5729      | -352.3430    | -1.7539         | -1.8296       | 0.0478               | 0.0280       | 0.5799           |
-| 0.0196        | 0.64  | 800  | 0.0217          | -2.4830        | -2.9611          | 0.6791             | 0.4781          | -447.2491      | -392.8779    | -1.3359         | -1.4513       | 0.0350               | 0.0207       | 0.5850           |
-| 0.0277        | 0.72  | 900  | 0.0257          | -2.2966        | -2.8007          | 0.6884             | 0.5041          | -431.2014      | -374.2322    | -1.3720         | -1.4823       | 0.0419               | 0.0244       | 0.5759           |
-| 0.0252        | 0.8   | 1000 | 0.0267          | -2.1757        | -2.6982          | 0.6996             | 0.5225          | -420.9574      | -362.1451    | -1.4056         | -1.5164       | 0.0440               | 0.0255       | 0.5724           |
-| 0.0285        | 0.88  | 1100 | 0.0267          | -2.1709        | -2.6957          | 0.6996             | 0.5248          | -420.7030      | -361.6649    | -1.3619         | -1.4794       | 0.0440               | 0.0254       | 0.5707           |
-| 0.0253        | 0.96  | 1200 | 0.0260          | -2.1934        | -2.7139          | 0.7034             | 0.5205          | -422.5251      | -363.9185    | -1.3643         | -1.4823       | 0.0429               | 0.0248       | 0.5706           |
 ### Framework versions

 license: mit
 base_model: HuggingFaceH4/mistral-7b-sft-beta
 tags:
+- trl
+- dpo
 - generated_from_trainer
 model-index:
+- name: zephyr-7b-dpo-full
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# zephyr-7b-dpo-full
+This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.5440
+- Rewards/chosen: -2.2940
+- Rewards/rejected: -3.0054
+- Rewards/accuracies: 0.7090
+- Rewards/margins: 0.7114
+- Logps/rejected: -451.6765
+- Logps/chosen: -373.9785
+- Logits/rejected: 0.3244
+- Logits/chosen: 0.0742
 ## Model description
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6789        | 0.08  | 100  | 0.6770          | -0.1062        | -0.1422          | 0.5914             | 0.0360          | -165.3552      | -155.1927    | -2.7255         | -2.7337       |
+| 0.6062        | 0.16  | 200  | 0.6079          | -1.0212        | -1.3873          | 0.6670             | 0.3660          | -289.8622      | -246.6971    | -2.3696         | -2.3856       |
+| 0.5965        | 0.24  | 300  | 0.5907          | -1.3779        | -1.8008          | 0.6623             | 0.4229          | -331.2100      | -282.3621    | -2.2450         | -2.2656       |
+| 0.5729        | 0.32  | 400  | 0.5711          | -1.6763        | -2.2404          | 0.6828             | 0.5640          | -375.1720      | -312.2064    | -1.2920         | -1.3760       |
+| 0.5645        | 0.4   | 500  | 0.5639          | -2.0721        | -2.6869          | 0.6987             | 0.6147          | -419.8194      | -351.7883    | -0.6091         | -0.7860       |
+| 0.5513        | 0.48  | 600  | 0.5582          | -2.9237        | -3.5389          | 0.7108             | 0.6152          | -505.0223      | -436.9386    | 0.1224          | -0.1054       |
+| 0.5571        | 0.56  | 700  | 0.5559          | -2.7971        | -3.5456          | 0.7043             | 0.7485          | -505.6961      | -424.2823    | 0.2980          | 0.0356        |
+| 0.5609        | 0.64  | 800  | 0.5469          | -2.4314        | -3.0831          | 0.7108             | 0.6517          | -459.4439      | -387.7092    | 0.1922          | -0.0312       |
+| 0.5514        | 0.72  | 900  | 0.5474          | -2.4774        | -3.2082          | 0.6996             | 0.7308          | -471.9533      | -392.3096    | 0.5382          | 0.2860        |
+| 0.527         | 0.8   | 1000 | 0.5454          | -2.5040        | -3.2071          | 0.7080             | 0.7031          | -471.8454      | -394.9711    | 0.6372          | 0.3871        |
+| 0.5487        | 0.88  | 1100 | 0.5444          | -2.2851        | -2.9963          | 0.7090             | 0.7112          | -450.7599      | -373.0831    | 0.4336          | 0.1858        |
+| 0.5483        | 0.96  | 1200 | 0.5440          | -2.2940        | -3.0054          | 0.7090             | 0.7114          | -451.6765      | -373.9785    | 0.3244          | 0.0742        |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 1.0,
-    "train_loss": 0.049385896711877195,
-    "train_runtime": 11611.5481,
     "train_samples": 160800,
-    "train_samples_per_second": 13.848,
-    "train_steps_per_second": 0.108
 }

 {
     "epoch": 1.0,
+    "train_loss": 0.5712926928784438,
+    "train_runtime": 11451.0976,
     "train_samples": 160800,
+    "train_samples_per_second": 14.042,
+    "train_steps_per_second": 0.11
 }

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d91c6951684710ee13cfc6493657d73e5e07ec420fb9e3f03057df8a2410eee4
 size 4943162336

 version https://git-lfs.github.com/spec/v1
+oid sha256:18e1cec63bd40f863dc594533ae9ac02d7bcdd4f57a17c1ef5d63193122a0814
 size 4943162336

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:45a598088007ec1a9a7caaca9462759689bb90fd469cdf968e25a0b50190ed6f
 size 4999819336

 version https://git-lfs.github.com/spec/v1
+oid sha256:565d4244afeda54e7f62be9e162a16c6892085c081422f02c7a001ecce587eb6
 size 4999819336

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b16b3c8e88783d9d90ab30f2ce3f323612375d8cd3c2436ffa74a856527e56c5
 size 4540516344

 version https://git-lfs.github.com/spec/v1
+oid sha256:0debf1533b3a9f2ffea91ddec7f947ba3d1c43476aedcef3273235a227bb4ce5
 size 4540516344

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 1.0,
-    "train_loss": 0.049385896711877195,
-    "train_runtime": 11611.5481,
     "train_samples": 160800,
-    "train_samples_per_second": 13.848,
-    "train_steps_per_second": 0.108
 }

 {
     "epoch": 1.0,
+    "train_loss": 0.5712926928784438,
+    "train_runtime": 11451.0976,
     "train_samples": 160800,
+    "train_samples_per_second": 14.042,
+    "train_steps_per_second": 0.11
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7b563d8ba7853a3ff70d20b889cac2dc1b6f510a9e1fcf017bbae2c1ee40b07d
 size 5944

 version https://git-lfs.github.com/spec/v1
+oid sha256:ef3f3bcb1d637ffd73632ad00af47d3006ac1e6c1f0c109c90bd802bdaba6dcd
 size 5944