Evan-Lin/dpo-llama-chat

Browse files

Files changed (4) hide show

README.md +22 -22
adapter_config.json +6 -6
adapter_model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.4341
-- Rewards/chosen: -0.1990
-- Rewards/rejected: -1.2761
-- Rewards/accuracies: 0.8229
-- Rewards/margins: 1.0771
-- Logps/rejected: -102.3795
-- Logps/chosen: -79.1251
-- Logits/rejected: -0.8508
-- Logits/chosen: -0.8524
 ## Model description
@@ -45,13 +45,13 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.0001
-- train_batch_size: 1
 - eval_batch_size: 2
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 2
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 16
 - total_eval_batch_size: 4
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
@@ -62,16 +62,16 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6788        | 0.06  | 100  | 0.6555          | 0.1458         | 0.0106           | 0.6134             | 0.1351          | -89.5126       | -75.6777     | -0.6838         | -0.7407       |
-| 0.62          | 0.12  | 200  | 0.6257          | -0.0832        | -0.3345          | 0.6558             | 0.2514          | -92.9643       | -77.9671     | -0.7521         | -0.7833       |
-| 0.5868        | 0.18  | 300  | 0.5646          | 0.0881         | -0.4372          | 0.7261             | 0.5253          | -93.9910       | -76.2543     | -0.7580         | -0.7860       |
-| 0.5267        | 0.24  | 400  | 0.5239          | -0.0974        | -0.7950          | 0.7520             | 0.6976          | -97.5691       | -78.1096     | -0.8008         | -0.8087       |
-| 0.5621        | 0.3   | 500  | 0.5007          | 0.0408         | -0.7836          | 0.7759             | 0.8245          | -97.4551       | -76.7269     | -0.7608         | -0.7779       |
-| 0.4802        | 0.35  | 600  | 0.4733          | -0.1319        | -1.0072          | 0.7898             | 0.8753          | -99.6912       | -78.4548     | -0.7715         | -0.7806       |
-| 0.4614        | 0.41  | 700  | 0.4561          | -0.0747        | -1.0657          | 0.8097             | 0.9910          | -100.2759      | -77.8826     | -0.8304         | -0.8458       |
-| 0.4368        | 0.47  | 800  | 0.4406          | -0.1388        | -1.1688          | 0.8123             | 1.0300          | -101.3069      | -78.5232     | -0.8317         | -0.8367       |
-| 0.4126        | 0.53  | 900  | 0.4327          | -0.2034        | -1.2710          | 0.8170             | 1.0676          | -102.3290      | -79.1693     | -0.8452         | -0.8473       |
-| 0.3931        | 0.59  | 1000 | 0.4341          | -0.1990        | -1.2761          | 0.8229             | 1.0771          | -102.3795      | -79.1251     | -0.8508         | -0.8524       |
 ### Framework versions

 This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.1928
+- Rewards/chosen: -1.3672
+- Rewards/rejected: -4.3992
+- Rewards/accuracies: 0.9310
+- Rewards/margins: 3.0321
+- Logps/rejected: -133.6114
+- Logps/chosen: -90.8071
+- Logits/rejected: -0.8584
+- Logits/chosen: -0.8277
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.0001
+- train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 2
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 64
 - total_eval_batch_size: 4
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.5985        | 0.24  | 100  | 0.5908          | -0.0098        | -0.3706          | 0.6857             | 0.3608          | -93.3248       | -77.2335     | -0.7818         | -0.8133       |
+| 0.5032        | 0.47  | 200  | 0.4768          | -0.1589        | -0.9349          | 0.8037             | 0.7760          | -98.9677       | -78.7246     | -0.8669         | -0.8774       |
+| 0.4105        | 0.71  | 300  | 0.4056          | -0.3303        | -1.5893          | 0.8316             | 1.2589          | -105.5115      | -80.4384     | -0.8423         | -0.8361       |
+| 0.3707        | 0.94  | 400  | 0.3501          | -0.2376        | -1.6094          | 0.8760             | 1.3718          | -105.7129      | -79.5110     | -0.7540         | -0.7564       |
+| 0.2363        | 1.18  | 500  | 0.2939          | -0.8615        | -2.9614          | 0.8932             | 2.0999          | -119.2329      | -85.7499     | -0.8983         | -0.8797       |
+| 0.1947        | 1.42  | 600  | 0.2463          | -1.0709        | -3.5879          | 0.9085             | 2.5170          | -125.4976      | -87.8440     | -0.8982         | -0.8717       |
+| 0.1823        | 1.65  | 700  | 0.2242          | -1.2056        | -3.7965          | 0.9158             | 2.5909          | -127.5844      | -89.1917     | -0.8272         | -0.8112       |
+| 0.1476        | 1.89  | 800  | 0.2042          | -1.1764        | -3.9644          | 0.9271             | 2.7881          | -129.2632      | -88.8989     | -0.8622         | -0.8415       |
+| 0.112         | 2.13  | 900  | 0.1936          | -1.3373        | -4.3265          | 0.9330             | 2.9891          | -132.8835      | -90.5088     | -0.8608         | -0.8338       |
+| 0.0949        | 2.36  | 1000 | 0.1928          | -1.3672        | -4.3992          | 0.9310             | 3.0321          | -133.6114      | -90.8071     | -0.8584         | -0.8277       |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -19,13 +19,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "wte",
-    "k_proj",
-    "v_proj",
-    "fc_out",
     "q_proj",
-    "out_proj",
-    "fc_in"
   ],
   "task_type": "CAUSAL_LM"
 }

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "q_proj",
+    "fc_out",
+    "v_proj",
+    "fc_in",
+    "k_proj",
+    "wte",
+    "out_proj"
   ],
   "task_type": "CAUSAL_LM"
 }

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e6c38cc77af068cd1a77a08e504cdc51e137de3a163c1a82e1b94e0e89c3ebf1
 size 25191360

 version https://git-lfs.github.com/spec/v1
+oid sha256:fc3e5eb50b8a8a1cc43ddc29766728a77c67150c862524c353594915cdf705c3
 size 25191360

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:60c29a2734012a8770621766212c3c63b94b39b7b0bb2ce10e5ddd42195cbbc2
 size 4728

 version https://git-lfs.github.com/spec/v1
+oid sha256:97dff2bbaa6fc557cf89b82d24d24aa9d4d70634cde8b9dc8894b5eeee7e0230
 size 4728