jikaixuan commited on
Commit
b910b39
1 Parent(s): be9d7d4

End of training

Browse files
Files changed (3) hide show
  1. README.md +14 -11
  2. all_results.json +15 -0
  3. eval_results.json +14 -14
README.md CHANGED
@@ -2,10 +2,13 @@
2
  license: apache-2.0
3
  library_name: peft
4
  tags:
 
5
  - trl
6
  - dpo
7
  - generated_from_trainer
8
  base_model: mistralai/Mistral-7B-v0.1
 
 
9
  model-index:
10
  - name: zephyr-7b
11
  results: []
@@ -16,19 +19,19 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # zephyr-7b
18
 
19
- This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.6928
22
- - Rewards/chosen: -0.0288
23
- - Rewards/rejected: -0.1012
24
- - Rewards/accuracies: 0.3492
25
- - Rewards/margins: 0.0723
26
- - Logps/rejected: -85.5160
27
- - Logps/chosen: -71.7842
28
- - Logits/rejected: -2.1139
29
- - Logits/chosen: -2.1428
30
- - Use Label: 13461.3809
31
- - Pred Label: 5226.6191
32
 
33
  ## Model description
34
 
 
2
  license: apache-2.0
3
  library_name: peft
4
  tags:
5
+ - alignment-handbook
6
  - trl
7
  - dpo
8
  - generated_from_trainer
9
  base_model: mistralai/Mistral-7B-v0.1
10
+ datasets:
11
+ - HuggingFaceH4/ultrafeedback_binarized
12
  model-index:
13
  - name: zephyr-7b
14
  results: []
 
19
 
20
  # zephyr-7b
21
 
22
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
23
  It achieves the following results on the evaluation set:
24
  - Loss: 0.6928
25
+ - Rewards/chosen: -0.0289
26
+ - Rewards/rejected: -0.1011
27
+ - Rewards/accuracies: 0.3532
28
+ - Rewards/margins: 0.0722
29
+ - Logps/rejected: -85.5050
30
+ - Logps/chosen: -71.7912
31
+ - Logits/rejected: -2.1148
32
+ - Logits/chosen: -2.1436
33
+ - Use Label: 14417.4287
34
+ - Pred Label: 5654.5713
35
 
36
  ## Model description
37
 
all_results.json CHANGED
@@ -1,5 +1,20 @@
1
  {
2
  "epoch": 1.0,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  "train_loss": 0.692275420283772,
4
  "train_runtime": 20019.5915,
5
  "train_samples": 61135,
 
1
  {
2
  "epoch": 1.0,
3
+ "eval_logits/chosen": -2.1435508728027344,
4
+ "eval_logits/rejected": -2.114776372909546,
5
+ "eval_logps/chosen": -71.79116821289062,
6
+ "eval_logps/rejected": -85.50504302978516,
7
+ "eval_loss": 0.6928141117095947,
8
+ "eval_pred_label": 5654.5712890625,
9
+ "eval_rewards/accuracies": 0.3531745970249176,
10
+ "eval_rewards/chosen": -0.02890622988343239,
11
+ "eval_rewards/margins": 0.07216347008943558,
12
+ "eval_rewards/rejected": -0.10106971114873886,
13
+ "eval_runtime": 245.7902,
14
+ "eval_samples": 2000,
15
+ "eval_samples_per_second": 8.137,
16
+ "eval_steps_per_second": 0.256,
17
+ "eval_use_label": 14417.4287109375,
18
  "train_loss": 0.692275420283772,
19
  "train_runtime": 20019.5915,
20
  "train_samples": 61135,
eval_results.json CHANGED
@@ -1,18 +1,18 @@
1
  {
2
  "epoch": 1.0,
3
- "eval_logits/chosen": -1.9401931762695312,
4
- "eval_logits/rejected": -1.9123154878616333,
5
- "eval_logps/chosen": -77.5232162475586,
6
- "eval_logps/rejected": -95.19373321533203,
7
- "eval_loss": 0.6917868852615356,
8
- "eval_pred_label": 4738.58740234375,
9
- "eval_rewards/accuracies": 0.3591269850730896,
10
- "eval_rewards/chosen": -0.0862266793847084,
11
- "eval_rewards/margins": 0.11172995716333389,
12
- "eval_rewards/rejected": -0.19795666635036469,
13
- "eval_runtime": 247.3331,
14
  "eval_samples": 2000,
15
- "eval_samples_per_second": 8.086,
16
- "eval_steps_per_second": 0.255,
17
- "eval_use_label": 15333.4130859375
18
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "eval_logits/chosen": -2.1435508728027344,
4
+ "eval_logits/rejected": -2.114776372909546,
5
+ "eval_logps/chosen": -71.79116821289062,
6
+ "eval_logps/rejected": -85.50504302978516,
7
+ "eval_loss": 0.6928141117095947,
8
+ "eval_pred_label": 5654.5712890625,
9
+ "eval_rewards/accuracies": 0.3531745970249176,
10
+ "eval_rewards/chosen": -0.02890622988343239,
11
+ "eval_rewards/margins": 0.07216347008943558,
12
+ "eval_rewards/rejected": -0.10106971114873886,
13
+ "eval_runtime": 245.7902,
14
  "eval_samples": 2000,
15
+ "eval_samples_per_second": 8.137,
16
+ "eval_steps_per_second": 0.256,
17
+ "eval_use_label": 14417.4287109375
18
  }