Evan-Lin commited on
Commit
db55fd0
1 Parent(s): 2f175cc

Evan-Lin/dpo-llama-chat

Browse files
README.md CHANGED
@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.4341
21
- - Rewards/chosen: -0.1990
22
- - Rewards/rejected: -1.2761
23
- - Rewards/accuracies: 0.8229
24
- - Rewards/margins: 1.0771
25
- - Logps/rejected: -102.3795
26
- - Logps/chosen: -79.1251
27
- - Logits/rejected: -0.8508
28
- - Logits/chosen: -0.8524
29
 
30
  ## Model description
31
 
@@ -45,13 +45,13 @@ More information needed
45
 
46
  The following hyperparameters were used during training:
47
  - learning_rate: 0.0001
48
- - train_batch_size: 1
49
  - eval_batch_size: 2
50
  - seed: 42
51
  - distributed_type: multi-GPU
52
  - num_devices: 2
53
- - gradient_accumulation_steps: 8
54
- - total_train_batch_size: 16
55
  - total_eval_batch_size: 4
56
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
  - lr_scheduler_type: cosine
@@ -62,16 +62,16 @@ The following hyperparameters were used during training:
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
- | 0.6788 | 0.06 | 100 | 0.6555 | 0.1458 | 0.0106 | 0.6134 | 0.1351 | -89.5126 | -75.6777 | -0.6838 | -0.7407 |
66
- | 0.62 | 0.12 | 200 | 0.6257 | -0.0832 | -0.3345 | 0.6558 | 0.2514 | -92.9643 | -77.9671 | -0.7521 | -0.7833 |
67
- | 0.5868 | 0.18 | 300 | 0.5646 | 0.0881 | -0.4372 | 0.7261 | 0.5253 | -93.9910 | -76.2543 | -0.7580 | -0.7860 |
68
- | 0.5267 | 0.24 | 400 | 0.5239 | -0.0974 | -0.7950 | 0.7520 | 0.6976 | -97.5691 | -78.1096 | -0.8008 | -0.8087 |
69
- | 0.5621 | 0.3 | 500 | 0.5007 | 0.0408 | -0.7836 | 0.7759 | 0.8245 | -97.4551 | -76.7269 | -0.7608 | -0.7779 |
70
- | 0.4802 | 0.35 | 600 | 0.4733 | -0.1319 | -1.0072 | 0.7898 | 0.8753 | -99.6912 | -78.4548 | -0.7715 | -0.7806 |
71
- | 0.4614 | 0.41 | 700 | 0.4561 | -0.0747 | -1.0657 | 0.8097 | 0.9910 | -100.2759 | -77.8826 | -0.8304 | -0.8458 |
72
- | 0.4368 | 0.47 | 800 | 0.4406 | -0.1388 | -1.1688 | 0.8123 | 1.0300 | -101.3069 | -78.5232 | -0.8317 | -0.8367 |
73
- | 0.4126 | 0.53 | 900 | 0.4327 | -0.2034 | -1.2710 | 0.8170 | 1.0676 | -102.3290 | -79.1693 | -0.8452 | -0.8473 |
74
- | 0.3931 | 0.59 | 1000 | 0.4341 | -0.1990 | -1.2761 | 0.8229 | 1.0771 | -102.3795 | -79.1251 | -0.8508 | -0.8524 |
75
 
76
 
77
  ### Framework versions
 
17
 
18
  This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.1928
21
+ - Rewards/chosen: -1.3672
22
+ - Rewards/rejected: -4.3992
23
+ - Rewards/accuracies: 0.9310
24
+ - Rewards/margins: 3.0321
25
+ - Logps/rejected: -133.6114
26
+ - Logps/chosen: -90.8071
27
+ - Logits/rejected: -0.8584
28
+ - Logits/chosen: -0.8277
29
 
30
  ## Model description
31
 
 
45
 
46
  The following hyperparameters were used during training:
47
  - learning_rate: 0.0001
48
+ - train_batch_size: 2
49
  - eval_batch_size: 2
50
  - seed: 42
51
  - distributed_type: multi-GPU
52
  - num_devices: 2
53
+ - gradient_accumulation_steps: 16
54
+ - total_train_batch_size: 64
55
  - total_eval_batch_size: 4
56
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
  - lr_scheduler_type: cosine
 
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.5985 | 0.24 | 100 | 0.5908 | -0.0098 | -0.3706 | 0.6857 | 0.3608 | -93.3248 | -77.2335 | -0.7818 | -0.8133 |
66
+ | 0.5032 | 0.47 | 200 | 0.4768 | -0.1589 | -0.9349 | 0.8037 | 0.7760 | -98.9677 | -78.7246 | -0.8669 | -0.8774 |
67
+ | 0.4105 | 0.71 | 300 | 0.4056 | -0.3303 | -1.5893 | 0.8316 | 1.2589 | -105.5115 | -80.4384 | -0.8423 | -0.8361 |
68
+ | 0.3707 | 0.94 | 400 | 0.3501 | -0.2376 | -1.6094 | 0.8760 | 1.3718 | -105.7129 | -79.5110 | -0.7540 | -0.7564 |
69
+ | 0.2363 | 1.18 | 500 | 0.2939 | -0.8615 | -2.9614 | 0.8932 | 2.0999 | -119.2329 | -85.7499 | -0.8983 | -0.8797 |
70
+ | 0.1947 | 1.42 | 600 | 0.2463 | -1.0709 | -3.5879 | 0.9085 | 2.5170 | -125.4976 | -87.8440 | -0.8982 | -0.8717 |
71
+ | 0.1823 | 1.65 | 700 | 0.2242 | -1.2056 | -3.7965 | 0.9158 | 2.5909 | -127.5844 | -89.1917 | -0.8272 | -0.8112 |
72
+ | 0.1476 | 1.89 | 800 | 0.2042 | -1.1764 | -3.9644 | 0.9271 | 2.7881 | -129.2632 | -88.8989 | -0.8622 | -0.8415 |
73
+ | 0.112 | 2.13 | 900 | 0.1936 | -1.3373 | -4.3265 | 0.9330 | 2.9891 | -132.8835 | -90.5088 | -0.8608 | -0.8338 |
74
+ | 0.0949 | 2.36 | 1000 | 0.1928 | -1.3672 | -4.3992 | 0.9310 | 3.0321 | -133.6114 | -90.8071 | -0.8584 | -0.8277 |
75
 
76
 
77
  ### Framework versions
adapter_config.json CHANGED
@@ -19,13 +19,13 @@
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
22
- "wte",
23
- "k_proj",
24
- "v_proj",
25
- "fc_out",
26
  "q_proj",
27
- "out_proj",
28
- "fc_in"
 
 
 
 
29
  ],
30
  "task_type": "CAUSAL_LM"
31
  }
 
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
 
 
 
 
22
  "q_proj",
23
+ "fc_out",
24
+ "v_proj",
25
+ "fc_in",
26
+ "k_proj",
27
+ "wte",
28
+ "out_proj"
29
  ],
30
  "task_type": "CAUSAL_LM"
31
  }
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e6c38cc77af068cd1a77a08e504cdc51e137de3a163c1a82e1b94e0e89c3ebf1
3
  size 25191360
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc3e5eb50b8a8a1cc43ddc29766728a77c67150c862524c353594915cdf705c3
3
  size 25191360
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:60c29a2734012a8770621766212c3c63b94b39b7b0bb2ce10e5ddd42195cbbc2
3
  size 4728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97dff2bbaa6fc557cf89b82d24d24aa9d4d70634cde8b9dc8894b5eeee7e0230
3
  size 4728