Model save

Browse files

Files changed (5) hide show

README.md +16 -32
all_results.json +5 -5
generation_config.json +2 -5
train_results.json +5 -5
trainer_state.json +0 -0

README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 ---
 library_name: transformers
 tags:
 - trl
 - dpo
@@ -14,17 +16,17 @@ should probably proofread and complete it, then remove this comment. -->
 # zephyr-7b-dpo-full
-This model was trained from scratch on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.7492
-- Rewards/chosen: -0.3582
-- Rewards/rejected: -0.5015
-- Rewards/accuracies: 0.7340
-- Rewards/margins: 0.1433
-- Logps/rejected: -2.0872
-- Logps/chosen: -1.6189
-- Logits/rejected: -0.9145
-- Logits/chosen: -0.9939
 ## Model description
@@ -44,42 +46,24 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 5e-07
-- train_batch_size: 1
 - eval_batch_size: 4
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
 - gradient_accumulation_steps: 16
-- total_train_batch_size: 64
 - total_eval_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 2.0
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.9882        | 0.1047 | 100  | 0.9859          | -0.0009        | -0.0044          | 0.6440             | 0.0035          | -1.0928        | -0.9043      | -0.4413         | -0.4710       |
-| 0.8661        | 0.2094 | 200  | 0.8801          | -0.0678        | -0.1219          | 0.6920             | 0.0542          | -1.3279        | -1.0381      | -0.6586         | -0.7288       |
-| 0.8166        | 0.3141 | 300  | 0.8344          | -0.1473        | -0.2371          | 0.7020             | 0.0898          | -1.5583        | -1.1971      | -0.8560         | -0.9338       |
-| 0.7377        | 0.4187 | 400  | 0.8165          | -0.1814        | -0.2803          | 0.7020             | 0.0989          | -1.6447        | -1.2653      | -0.8999         | -0.9866       |
-| 0.7878        | 0.5234 | 500  | 0.8025          | -0.2222        | -0.3318          | 0.7140             | 0.1096          | -1.7477        | -1.3469      | -0.9167         | -1.0040       |
-| 0.7098        | 0.6281 | 600  | 0.7852          | -0.2537        | -0.3788          | 0.7220             | 0.1251          | -1.8417        | -1.4100      | -0.9250         | -1.0109       |
-| 0.7637        | 0.7328 | 700  | 0.7751          | -0.2541        | -0.3682          | 0.7240             | 0.1141          | -1.8204        | -1.4107      | -0.9256         | -1.0116       |
-| 0.726         | 0.8375 | 800  | 0.7703          | -0.2974        | -0.4243          | 0.7340             | 0.1269          | -1.9327        | -1.4974      | -0.9103         | -0.9965       |
-| 0.7171        | 0.9422 | 900  | 0.7577          | -0.3100        | -0.4453          | 0.7440             | 0.1353          | -1.9746        | -1.5225      | -0.9430         | -1.0241       |
-| 0.6193        | 1.0468 | 1000 | 0.7531          | -0.3079        | -0.4465          | 0.7360             | 0.1386          | -1.9771        | -1.5183      | -0.9013         | -0.9842       |
-| 0.6201        | 1.1515 | 1100 | 0.7530          | -0.3522        | -0.5042          | 0.7460             | 0.1520          | -2.0924        | -1.6069      | -0.8843         | -0.9681       |
-| 0.6105        | 1.2562 | 1200 | 0.7508          | -0.3409        | -0.4807          | 0.7380             | 0.1398          | -2.0455        | -1.5843      | -0.8828         | -0.9657       |
-| 0.6529        | 1.3609 | 1300 | 0.7522          | -0.3580        | -0.4995          | 0.7180             | 0.1415          | -2.0830        | -1.6185      | -0.8704         | -0.9551       |
-| 0.5827        | 1.4656 | 1400 | 0.7494          | -0.3532        | -0.4944          | 0.7280             | 0.1412          | -2.0729        | -1.6089      | -0.8884         | -0.9717       |
-| 0.5978        | 1.5703 | 1500 | 0.7502          | -0.3686        | -0.5144          | 0.7280             | 0.1459          | -2.1129        | -1.6396      | -0.8574         | -0.9408       |
-| 0.6115        | 1.6750 | 1600 | 0.7480          | -0.3421        | -0.4823          | 0.7300             | 0.1402          | -2.0487        | -1.5868      | -0.8814         | -0.9648       |
-| 0.6209        | 1.7796 | 1700 | 0.7480          | -0.3516        | -0.4943          | 0.7360             | 0.1427          | -2.0728        | -1.6058      | -0.8739         | -0.9575       |
-| 0.5811        | 1.8843 | 1800 | 0.7491          | -0.3581        | -0.5018          | 0.7360             | 0.1437          | -2.0877        | -1.6187      | -0.8990         | -0.9798       |
-| 0.6246        | 1.9890 | 1900 | 0.7492          | -0.3582        | -0.5015          | 0.7340             | 0.1433          | -2.0872        | -1.6189      | -0.9145         | -0.9939       |
 ### Framework versions

 ---
 library_name: transformers
+license: apache-2.0
+base_model: alignment-handbook/zephyr-7b-sft-full
 tags:
 - trl
 - dpo
 # zephyr-7b-dpo-full
+This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6994
+- Rewards/chosen: -3.8492
+- Rewards/rejected: -5.0898
+- Rewards/accuracies: 0.7380
+- Rewards/margins: 1.2406
+- Logps/rejected: -2.5449
+- Logps/chosen: -1.9246
+- Logits/rejected: -2.1732
+- Logits/chosen: -2.1686
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 5e-07
+- train_batch_size: 2
 - eval_batch_size: 4
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
 - gradient_accumulation_steps: 16
+- total_train_batch_size: 128
 - total_eval_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.7536        | 0.8375 | 400  | 0.6994          | -3.8492        | -5.0898          | 0.7380             | 1.2406          | -2.5449        | -1.9246      | -2.1732         | -2.1686       |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
-    "epoch": 1.9994765768123528,
     "total_flos": 0.0,
-    "train_loss": 0.7080965608826483,
-    "train_runtime": 31525.1197,
     "train_samples": 61135,
-    "train_samples_per_second": 3.878,
-    "train_steps_per_second": 0.061
 }

 {
+    "epoch": 0.998691442030882,
     "total_flos": 0.0,
+    "train_loss": 0.7891295181130463,
+    "train_runtime": 22160.7549,
     "train_samples": 61135,
+    "train_samples_per_second": 2.759,
+    "train_steps_per_second": 0.022
 }

generation_config.json CHANGED Viewed

@@ -1,9 +1,6 @@
 {
   "_from_model_config": true,
-  "bos_token_id": 128000,
-  "do_sample": true,
-  "eos_token_id": 128001,
-  "temperature": 0.6,
-  "top_p": 0.9,
   "transformers_version": "4.45.2"
 }

 {
   "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
   "transformers_version": "4.45.2"
 }

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
-    "epoch": 1.9994765768123528,
     "total_flos": 0.0,
-    "train_loss": 0.7080965608826483,
-    "train_runtime": 31525.1197,
     "train_samples": 61135,
-    "train_samples_per_second": 3.878,
-    "train_steps_per_second": 0.061
 }

 {
+    "epoch": 0.998691442030882,
     "total_flos": 0.0,
+    "train_loss": 0.7891295181130463,
+    "train_runtime": 22160.7549,
     "train_samples": 61135,
+    "train_samples_per_second": 2.759,
+    "train_steps_per_second": 0.022
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff