thorirhrafn
/

llama_DPO_model_e2

@@ -18,15 +18,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0572
-- Rewards/chosen: 0.4916
-- Rewards/rejected: -2.5677
 - Rewards/accuracies: 1.0
-- Rewards/margins: 3.0592
-- Logps/rejected: -210.4865
-- Logps/chosen: -155.7183
-- Logits/rejected: -1.0527
-- Logits/chosen: -0.8611
 ## Model description
@@ -45,7 +45,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 1e-06
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
@@ -59,26 +59,26 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6588        | 0.1   | 25   | 0.6197          | 0.0430         | -0.1117          | 0.9633             | 0.1547          | -185.9265      | -160.2034    | -1.0522         | -0.8546       |
-| 0.5198        | 0.2   | 50   | 0.4923          | 0.1198         | -0.3424          | 0.9933             | 0.4622          | -188.2335      | -159.4357    | -1.0525         | -0.8554       |
-| 0.422         | 0.3   | 75   | 0.3707          | 0.2016         | -0.6277          | 1.0                | 0.8293          | -191.0862      | -158.6175    | -1.0532         | -0.8571       |
-| 0.3133        | 0.4   | 100  | 0.2775          | 0.2622         | -0.9287          | 1.0                | 1.1908          | -194.0961      | -158.0122    | -1.0529         | -0.8575       |
-| 0.2536        | 0.5   | 125  | 0.2077          | 0.3244         | -1.2160          | 1.0                | 1.5403          | -196.9694      | -157.3904    | -1.0527         | -0.8608       |
-| 0.181         | 0.6   | 150  | 0.1559          | 0.3746         | -1.5115          | 1.0                | 1.8860          | -199.9242      | -156.8883    | -1.0534         | -0.8595       |
-| 0.1457        | 0.7   | 175  | 0.1203          | 0.4136         | -1.7795          | 1.0                | 2.1931          | -202.6049      | -156.4983    | -1.0534         | -0.8620       |
-| 0.1072        | 0.79  | 200  | 0.0950          | 0.4439         | -2.0245          | 1.0                | 2.4684          | -205.0550      | -156.1949    | -1.0532         | -0.8613       |
-| 0.0921        | 0.89  | 225  | 0.0792          | 0.4625         | -2.2196          | 1.0                | 2.6821          | -207.0056      | -156.0085    | -1.0535         | -0.8604       |
-| 0.0732        | 0.99  | 250  | 0.0694          | 0.4721         | -2.3665          | 1.0                | 2.8387          | -208.4748      | -155.9124    | -1.0530         | -0.8609       |
-| 0.0703        | 1.09  | 275  | 0.0636          | 0.4762         | -2.4589          | 1.0                | 2.9351          | -209.3987      | -155.8720    | -1.0527         | -0.8600       |
-| 0.0554        | 1.19  | 300  | 0.0606          | 0.4841         | -2.5053          | 1.0                | 2.9894          | -209.8628      | -155.7928    | -1.0528         | -0.8614       |
-| 0.0532        | 1.29  | 325  | 0.0592          | 0.4869         | -2.5331          | 1.0                | 3.0200          | -210.1407      | -155.7649    | -1.0527         | -0.8606       |
-| 0.061         | 1.39  | 350  | 0.0580          | 0.4912         | -2.5550          | 1.0                | 3.0462          | -210.3595      | -155.7218    | -1.0525         | -0.8611       |
-| 0.0612        | 1.49  | 375  | 0.0573          | 0.4930         | -2.5633          | 1.0                | 3.0563          | -210.4424      | -155.7034    | -1.0527         | -0.8613       |
-| 0.0539        | 1.59  | 400  | 0.0576          | 0.4921         | -2.5602          | 1.0                | 3.0523          | -210.4118      | -155.7133    | -1.0529         | -0.8596       |
-| 0.0517        | 1.69  | 425  | 0.0570          | 0.4917         | -2.5691          | 1.0                | 3.0608          | -210.5005      | -155.7172    | -1.0529         | -0.8602       |
-| 0.0627        | 1.79  | 450  | 0.0570          | 0.4938         | -2.5669          | 1.0                | 3.0607          | -210.4783      | -155.6961    | -1.0532         | -0.8608       |
-| 0.0575        | 1.89  | 475  | 0.0574          | 0.4911         | -2.5664          | 1.0                | 3.0574          | -210.4731      | -155.7233    | -1.0528         | -0.8612       |
-| 0.0578        | 1.99  | 500  | 0.0572          | 0.4916         | -2.5677          | 1.0                | 3.0592          | -210.4865      | -155.7183    | -1.0527         | -0.8611       |
 ### Framework versions

 This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0937
+- Rewards/chosen: 0.4389
+- Rewards/rejected: -2.0384
 - Rewards/accuracies: 1.0
+- Rewards/margins: 2.4774
+- Logps/rejected: -205.1940
+- Logps/chosen: -156.2447
+- Logits/rejected: -1.0509
+- Logits/chosen: -0.8587
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 8e-07
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.673         | 0.1   | 25   | 0.6445          | 0.0273         | -0.0740          | 0.9000             | 0.1013          | -185.5491      | -160.3607    | -1.0521         | -0.8545       |
+| 0.5737        | 0.2   | 50   | 0.5485          | 0.0856         | -0.2335          | 0.9933             | 0.3190          | -187.1442      | -159.7781    | -1.0526         | -0.8551       |
+| 0.4843        | 0.3   | 75   | 0.4496          | 0.1470         | -0.4343          | 1.0                | 0.5814          | -189.1528      | -159.1637    | -1.0527         | -0.8571       |
+| 0.4006        | 0.4   | 100  | 0.3655          | 0.2043         | -0.6419          | 1.0                | 0.8462          | -191.2286      | -158.5909    | -1.0521         | -0.8556       |
+| 0.3417        | 0.5   | 125  | 0.2945          | 0.2551         | -0.8630          | 1.0                | 1.1180          | -193.4393      | -158.0833    | -1.0522         | -0.8562       |
+| 0.2601        | 0.6   | 150  | 0.2353          | 0.3032         | -1.0903          | 1.0                | 1.3935          | -195.7128      | -157.6020    | -1.0520         | -0.8597       |
+| 0.2197        | 0.7   | 175  | 0.1891          | 0.3442         | -1.3124          | 1.0                | 1.6565          | -197.9333      | -157.1923    | -1.0522         | -0.8579       |
+| 0.1675        | 0.79  | 200  | 0.1532          | 0.3815         | -1.5253          | 1.0                | 1.9067          | -200.0621      | -156.8192    | -1.0526         | -0.8582       |
+| 0.1417        | 0.89  | 225  | 0.1289          | 0.4011         | -1.7082          | 1.0                | 2.1094          | -201.8920      | -156.6225    | -1.0525         | -0.8585       |
+| 0.1203        | 0.99  | 250  | 0.1117          | 0.4214         | -1.8534          | 1.0                | 2.2748          | -203.3437      | -156.4196    | -1.0517         | -0.8603       |
+| 0.1156        | 1.09  | 275  | 0.1034          | 0.4296         | -1.9336          | 1.0                | 2.3633          | -204.1459      | -156.3377    | -1.0517         | -0.8590       |
+| 0.0942        | 1.19  | 300  | 0.0990          | 0.4310         | -1.9823          | 1.0                | 2.4133          | -204.6330      | -156.3240    | -1.0514         | -0.8577       |
+| 0.0903        | 1.29  | 325  | 0.0957          | 0.4380         | -2.0137          | 1.0                | 2.4517          | -204.9467      | -156.2539    | -1.0511         | -0.8593       |
+| 0.1023        | 1.39  | 350  | 0.0946          | 0.4384         | -2.0296          | 1.0                | 2.4680          | -205.1059      | -156.2503    | -1.0519         | -0.8587       |
+| 0.0984        | 1.49  | 375  | 0.0945          | 0.4352         | -2.0350          | 1.0                | 2.4702          | -205.1597      | -156.2819    | -1.0510         | -0.8580       |
+| 0.0899        | 1.59  | 400  | 0.0939          | 0.4360         | -2.0393          | 1.0                | 2.4752          | -205.2024      | -156.2742    | -1.0513         | -0.8594       |
+| 0.0883        | 1.69  | 425  | 0.0939          | 0.4374         | -2.0378          | 1.0                | 2.4752          | -205.1877      | -156.2598    | -1.0514         | -0.8590       |
+| 0.1011        | 1.79  | 450  | 0.0939          | 0.4368         | -2.0412          | 1.0                | 2.4781          | -205.2217      | -156.2654    | -1.0513         | -0.8583       |
+| 0.0962        | 1.89  | 475  | 0.0935          | 0.4403         | -2.0395          | 1.0                | 2.4798          | -205.2041      | -156.2308    | -1.0510         | -0.8574       |
+| 0.0971        | 1.99  | 500  | 0.0937          | 0.4389         | -2.0384          | 1.0                | 2.4774          | -205.1940      | -156.2447    | -1.0509         | -0.8587       |
 ### Framework versions