Evan-Lin/dpo-llama-chat

Browse files

Files changed (4) hide show

README.md +19 -19
adapter_config.json +2 -2
adapter_model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.5599
-- Rewards/chosen: -2.5569
-- Rewards/rejected: -7.3372
-- Rewards/accuracies: 0.9800
-- Rewards/margins: 4.7803
-- Logps/rejected: -162.4099
-- Logps/chosen: -122.3962
-- Logits/rejected: -0.7668
-- Logits/chosen: -0.7657
 ## Model description
@@ -62,16 +62,16 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.7092        | 0.24  | 100  | 0.7304          | -0.1314        | -0.4498          | 0.6800             | 0.3184          | -93.5358       | -98.1411     | -0.7695         | -0.7645       |
-| 0.8504        | 0.48  | 200  | 1.0072          | -0.5851        | -1.5349          | 0.7960             | 0.9498          | -104.3870      | -102.6778    | -0.7380         | -0.7386       |
-| 0.9124        | 0.72  | 300  | 1.1845          | -0.9331        | -2.6958          | 0.8907             | 1.7627          | -115.9964      | -106.1584    | -0.8360         | -0.8375       |
-| 1.4704        | 0.96  | 400  | 1.3238          | -1.1702        | -3.5609          | 0.9520             | 2.3907          | -124.6469      | -108.5289    | -0.7828         | -0.7839       |
-| 1.7087        | 1.2   | 500  | 1.9982          | -1.8790        | -5.1153          | 0.9573             | 3.2363          | -140.1910      | -115.6172    | -0.7690         | -0.7698       |
-| 1.505         | 1.44  | 600  | 1.6522          | -1.5885        | -5.1419          | 0.9747             | 3.5534          | -140.4576      | -112.7124    | -0.7636         | -0.7657       |
-| 1.9902        | 1.68  | 700  | 2.3375          | -2.3061        | -6.4484          | 0.9733             | 4.1423          | -153.5226      | -119.8879    | -0.7499         | -0.7494       |
-| 2.1236        | 1.92  | 800  | 2.2806          | -2.2515        | -6.7675          | 0.9827             | 4.5160          | -156.7130      | -119.3421    | -0.7892         | -0.7887       |
-| 2.18          | 2.16  | 900  | 2.6104          | -2.5895        | -7.3523          | 0.9773             | 4.7628          | -162.5615      | -122.7226    | -0.7648         | -0.7637       |
-| 2.2955        | 2.4   | 1000 | 2.5599          | -2.5569        | -7.3372          | 0.9800             | 4.7803          | -162.4099      | -122.3962    | -0.7668         | -0.7657       |
 ### Framework versions

 This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 4.9481
+- Rewards/chosen: 4.6795
+- Rewards/rejected: 2.8189
+- Rewards/accuracies: 0.8547
+- Rewards/margins: 1.8606
+- Logps/rejected: -60.8495
+- Logps/chosen: -50.0326
+- Logits/rejected: -0.2216
+- Logits/chosen: -0.2323
 ## Model description
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 6.3           | 0.24  | 100  | 6.1290          | 3.4767         | 3.2110           | 0.5920             | 0.2657          | -56.9286       | -62.0606     | -0.2723         | -0.2654       |
+| 5.5843        | 0.48  | 200  | 5.8936          | 3.6904         | 3.2305           | 0.6520             | 0.4599          | -56.7330       | -59.9230     | 0.2517          | 0.2475        |
+| 5.757         | 0.72  | 300  | 5.6694          | 3.9164         | 3.1893           | 0.7253             | 0.7271          | -57.1450       | -57.6631     | 0.3505          | 0.3418        |
+| 5.5385        | 0.96  | 400  | 5.4629          | 4.1466         | 3.1351           | 0.7600             | 1.0115          | -57.6871       | -55.3611     | 0.2059          | 0.1970        |
+| 5.2301        | 1.2   | 500  | 5.2891          | 4.3324         | 3.0305           | 0.7880             | 1.3020          | -58.7338       | -53.5027     | 0.1063          | 0.0968        |
+| 5.0115        | 1.44  | 600  | 5.1601          | 4.4582         | 2.9458           | 0.8213             | 1.5124          | -59.5800       | -52.2452     | -0.1082         | -0.1154       |
+| 4.9893        | 1.68  | 700  | 5.0431          | 4.5787         | 2.9142           | 0.8413             | 1.6645          | -59.8968       | -51.0404     | -0.1716         | -0.1829       |
+| 5.0292        | 1.92  | 800  | 4.9770          | 4.6501         | 2.8827           | 0.8427             | 1.7673          | -60.2111       | -50.3266     | -0.1929         | -0.2042       |
+| 4.331         | 2.16  | 900  | 4.9577          | 4.6724         | 2.8191           | 0.8480             | 1.8534          | -60.8478       | -50.1027     | -0.2005         | -0.2121       |
+| 4.5481        | 2.4   | 1000 | 4.9481          | 4.6795         | 2.8189           | 0.8547             | 1.8606          | -60.8495       | -50.0326     | -0.2216         | -0.2323       |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -21,10 +21,10 @@
   "target_modules": [
     "k_proj",
     "v_proj",
-    "out_proj",
-    "fc_in",
     "fc_out",
     "wte",
     "q_proj"
   ],
   "task_type": "CAUSAL_LM",

   "target_modules": [
     "k_proj",
     "v_proj",
     "fc_out",
     "wte",
+    "fc_in",
+    "out_proj",
     "q_proj"
   ],
   "task_type": "CAUSAL_LM",

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c2c8fc9b287ab53e6a7e20595606dbbedee53538abff3828ffe08fea27a44f68
 size 25191360

 version https://git-lfs.github.com/spec/v1
+oid sha256:93108e6c28525ffb8d87b05696eb8b31bd08294e5d33015dc93951b035400c55
 size 25191360

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:91a0d0b1d86b5b623a652306ea09053cd2bb49795bce598806471707c9567f8a
 size 4728

 version https://git-lfs.github.com/spec/v1
+oid sha256:630e45561e688294a8f14eb6f70d7fa318ee9bc7202832df13997d33d58e0344
 size 4728