Evan-Lin commited on
Commit
5ec1e42
1 Parent(s): 70c7aa7

Evan-Lin/dpo-llama-chat

Browse files
README.md CHANGED
@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 2.5599
21
- - Rewards/chosen: -2.5569
22
- - Rewards/rejected: -7.3372
23
- - Rewards/accuracies: 0.9800
24
- - Rewards/margins: 4.7803
25
- - Logps/rejected: -162.4099
26
- - Logps/chosen: -122.3962
27
- - Logits/rejected: -0.7668
28
- - Logits/chosen: -0.7657
29
 
30
  ## Model description
31
 
@@ -62,16 +62,16 @@ The following hyperparameters were used during training:
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
- | 0.7092 | 0.24 | 100 | 0.7304 | -0.1314 | -0.4498 | 0.6800 | 0.3184 | -93.5358 | -98.1411 | -0.7695 | -0.7645 |
66
- | 0.8504 | 0.48 | 200 | 1.0072 | -0.5851 | -1.5349 | 0.7960 | 0.9498 | -104.3870 | -102.6778 | -0.7380 | -0.7386 |
67
- | 0.9124 | 0.72 | 300 | 1.1845 | -0.9331 | -2.6958 | 0.8907 | 1.7627 | -115.9964 | -106.1584 | -0.8360 | -0.8375 |
68
- | 1.4704 | 0.96 | 400 | 1.3238 | -1.1702 | -3.5609 | 0.9520 | 2.3907 | -124.6469 | -108.5289 | -0.7828 | -0.7839 |
69
- | 1.7087 | 1.2 | 500 | 1.9982 | -1.8790 | -5.1153 | 0.9573 | 3.2363 | -140.1910 | -115.6172 | -0.7690 | -0.7698 |
70
- | 1.505 | 1.44 | 600 | 1.6522 | -1.5885 | -5.1419 | 0.9747 | 3.5534 | -140.4576 | -112.7124 | -0.7636 | -0.7657 |
71
- | 1.9902 | 1.68 | 700 | 2.3375 | -2.3061 | -6.4484 | 0.9733 | 4.1423 | -153.5226 | -119.8879 | -0.7499 | -0.7494 |
72
- | 2.1236 | 1.92 | 800 | 2.2806 | -2.2515 | -6.7675 | 0.9827 | 4.5160 | -156.7130 | -119.3421 | -0.7892 | -0.7887 |
73
- | 2.18 | 2.16 | 900 | 2.6104 | -2.5895 | -7.3523 | 0.9773 | 4.7628 | -162.5615 | -122.7226 | -0.7648 | -0.7637 |
74
- | 2.2955 | 2.4 | 1000 | 2.5599 | -2.5569 | -7.3372 | 0.9800 | 4.7803 | -162.4099 | -122.3962 | -0.7668 | -0.7657 |
75
 
76
 
77
  ### Framework versions
 
17
 
18
  This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 4.9481
21
+ - Rewards/chosen: 4.6795
22
+ - Rewards/rejected: 2.8189
23
+ - Rewards/accuracies: 0.8547
24
+ - Rewards/margins: 1.8606
25
+ - Logps/rejected: -60.8495
26
+ - Logps/chosen: -50.0326
27
+ - Logits/rejected: -0.2216
28
+ - Logits/chosen: -0.2323
29
 
30
  ## Model description
31
 
 
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 6.3 | 0.24 | 100 | 6.1290 | 3.4767 | 3.2110 | 0.5920 | 0.2657 | -56.9286 | -62.0606 | -0.2723 | -0.2654 |
66
+ | 5.5843 | 0.48 | 200 | 5.8936 | 3.6904 | 3.2305 | 0.6520 | 0.4599 | -56.7330 | -59.9230 | 0.2517 | 0.2475 |
67
+ | 5.757 | 0.72 | 300 | 5.6694 | 3.9164 | 3.1893 | 0.7253 | 0.7271 | -57.1450 | -57.6631 | 0.3505 | 0.3418 |
68
+ | 5.5385 | 0.96 | 400 | 5.4629 | 4.1466 | 3.1351 | 0.7600 | 1.0115 | -57.6871 | -55.3611 | 0.2059 | 0.1970 |
69
+ | 5.2301 | 1.2 | 500 | 5.2891 | 4.3324 | 3.0305 | 0.7880 | 1.3020 | -58.7338 | -53.5027 | 0.1063 | 0.0968 |
70
+ | 5.0115 | 1.44 | 600 | 5.1601 | 4.4582 | 2.9458 | 0.8213 | 1.5124 | -59.5800 | -52.2452 | -0.1082 | -0.1154 |
71
+ | 4.9893 | 1.68 | 700 | 5.0431 | 4.5787 | 2.9142 | 0.8413 | 1.6645 | -59.8968 | -51.0404 | -0.1716 | -0.1829 |
72
+ | 5.0292 | 1.92 | 800 | 4.9770 | 4.6501 | 2.8827 | 0.8427 | 1.7673 | -60.2111 | -50.3266 | -0.1929 | -0.2042 |
73
+ | 4.331 | 2.16 | 900 | 4.9577 | 4.6724 | 2.8191 | 0.8480 | 1.8534 | -60.8478 | -50.1027 | -0.2005 | -0.2121 |
74
+ | 4.5481 | 2.4 | 1000 | 4.9481 | 4.6795 | 2.8189 | 0.8547 | 1.8606 | -60.8495 | -50.0326 | -0.2216 | -0.2323 |
75
 
76
 
77
  ### Framework versions
adapter_config.json CHANGED
@@ -21,10 +21,10 @@
21
  "target_modules": [
22
  "k_proj",
23
  "v_proj",
24
- "out_proj",
25
- "fc_in",
26
  "fc_out",
27
  "wte",
 
 
28
  "q_proj"
29
  ],
30
  "task_type": "CAUSAL_LM",
 
21
  "target_modules": [
22
  "k_proj",
23
  "v_proj",
 
 
24
  "fc_out",
25
  "wte",
26
+ "fc_in",
27
+ "out_proj",
28
  "q_proj"
29
  ],
30
  "task_type": "CAUSAL_LM",
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c2c8fc9b287ab53e6a7e20595606dbbedee53538abff3828ffe08fea27a44f68
3
  size 25191360
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93108e6c28525ffb8d87b05696eb8b31bd08294e5d33015dc93951b035400c55
3
  size 25191360
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:91a0d0b1d86b5b623a652306ea09053cd2bb49795bce598806471707c9567f8a
3
  size 4728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:630e45561e688294a8f14eb6f70d7fa318ee9bc7202832df13997d33d58e0344
3
  size 4728