thorirhrafn
/

llama_DPO_model_e2

@@ -18,15 +18,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.1001
-- Rewards/chosen: 0.4226
-- Rewards/rejected: -1.9804
 - Rewards/accuracies: 1.0
-- Rewards/margins: 2.4030
-- Logps/rejected: -204.6132
-- Logps/chosen: -156.4080
-- Logits/rejected: -1.0519
-- Logits/chosen: -0.8585
 ## Model description
@@ -45,7 +45,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 6e-07
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
@@ -53,42 +53,32 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- num_epochs: 3
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6757        | 0.1   | 25   | 0.6650          | 0.0149         | -0.0435          | 0.7767             | 0.0584          | -185.2444      | -160.4850    | -1.0519         | -0.8543       |
-| 0.6136        | 0.2   | 50   | 0.5989          | 0.0552         | -0.1462          | 0.9567             | 0.2014          | -186.2718      | -160.0822    | -1.0523         | -0.8553       |
-| 0.5526        | 0.3   | 75   | 0.5225          | 0.1032         | -0.2804          | 1.0                | 0.3837          | -187.6138      | -159.6014    | -1.0520         | -0.8542       |
-| 0.4819        | 0.4   | 100  | 0.4502          | 0.1474         | -0.4325          | 0.9967             | 0.5798          | -189.1341      | -159.1602    | -1.0518         | -0.8548       |
-| 0.4253        | 0.5   | 125  | 0.3835          | 0.1905         | -0.5943          | 1.0                | 0.7848          | -190.7523      | -158.7284    | -1.0527         | -0.8564       |
-| 0.3448        | 0.6   | 150  | 0.3197          | 0.2328         | -0.7813          | 1.0                | 1.0141          | -192.6229      | -158.3063    | -1.0526         | -0.8559       |
-| 0.3007        | 0.7   | 175  | 0.2637          | 0.2788         | -0.9753          | 1.0                | 1.2542          | -194.5630      | -157.8456    | -1.0525         | -0.8586       |
-| 0.2369        | 0.79  | 200  | 0.2192          | 0.3135         | -1.1671          | 1.0                | 1.4807          | -196.4808      | -157.4985    | -1.0519         | -0.8604       |
-| 0.1987        | 0.89  | 225  | 0.1825          | 0.3436         | -1.3550          | 1.0                | 1.6986          | -198.3592      | -157.1976    | -1.0520         | -0.8594       |
-| 0.1616        | 0.99  | 250  | 0.1532          | 0.3687         | -1.5379          | 1.0                | 1.9066          | -200.1886      | -156.9470    | -1.0519         | -0.8604       |
-| 0.1525        | 1.09  | 275  | 0.1346          | 0.3861         | -1.6703          | 1.0                | 2.0564          | -201.5127      | -156.7730    | -1.0511         | -0.8582       |
-| 0.1194        | 1.19  | 300  | 0.1246          | 0.3970         | -1.7483          | 1.0                | 2.1453          | -202.2923      | -156.6637    | -1.0509         | -0.8584       |
-| 0.1128        | 1.29  | 325  | 0.1161          | 0.4062         | -1.8227          | 1.0                | 2.2289          | -203.0370      | -156.5718    | -1.0511         | -0.8577       |
-| 0.1194        | 1.39  | 350  | 0.1108          | 0.4127         | -1.8680          | 1.0                | 2.2807          | -203.4899      | -156.5069    | -1.0514         | -0.8602       |
-| 0.1123        | 1.49  | 375  | 0.1070          | 0.4151         | -1.9092          | 1.0                | 2.3243          | -203.9014      | -156.4828    | -1.0515         | -0.8584       |
-| 0.1008        | 1.59  | 400  | 0.1046          | 0.4209         | -1.9290          | 1.0                | 2.3499          | -204.0999      | -156.4248    | -1.0516         | -0.8618       |
-| 0.0971        | 1.69  | 425  | 0.1033          | 0.4208         | -1.9461          | 1.0                | 2.3669          | -204.2709      | -156.4260    | -1.0510         | -0.8586       |
-| 0.109         | 1.79  | 450  | 0.1019          | 0.4235         | -1.9597          | 1.0                | 2.3832          | -204.4061      | -156.3985    | -1.0510         | -0.8587       |
-| 0.1035        | 1.89  | 475  | 0.1009          | 0.4234         | -1.9700          | 1.0                | 2.3934          | -204.5094      | -156.4001    | -1.0517         | -0.8580       |
-| 0.1046        | 1.99  | 500  | 0.1004          | 0.4210         | -1.9772          | 1.0                | 2.3983          | -204.5820      | -156.4234    | -1.0511         | -0.8603       |
-| 0.0961        | 2.09  | 525  | 0.1002          | 0.4227         | -1.9798          | 1.0                | 2.4025          | -204.6080      | -156.4070    | -1.0518         | -0.8587       |
-| 0.0932        | 2.19  | 550  | 0.1000          | 0.4237         | -1.9796          | 1.0                | 2.4033          | -204.6052      | -156.3964    | -1.0518         | -0.8597       |
-| 0.0901        | 2.29  | 575  | 0.1002          | 0.4231         | -1.9785          | 1.0                | 2.4015          | -204.5942      | -156.4030    | -1.0514         | -0.8594       |
-| 0.1033        | 2.38  | 600  | 0.1003          | 0.4248         | -1.9780          | 1.0                | 2.4028          | -204.5901      | -156.3859    | -1.0517         | -0.8616       |
-| 0.1108        | 2.48  | 625  | 0.0999          | 0.4262         | -1.9796          | 1.0                | 2.4057          | -204.6053      | -156.3723    | -1.0517         | -0.8583       |
-| 0.1026        | 2.58  | 650  | 0.0998          | 0.4208         | -1.9879          | 1.0                | 2.4088          | -204.6889      | -156.4255    | -1.0522         | -0.8594       |
-| 0.0956        | 2.68  | 675  | 0.1001          | 0.4227         | -1.9818          | 1.0                | 2.4045          | -204.6279      | -156.4070    | -1.0517         | -0.8588       |
-| 0.1003        | 2.78  | 700  | 0.0996          | 0.4241         | -1.9817          | 1.0                | 2.4058          | -204.6262      | -156.3926    | -1.0516         | -0.8584       |
-| 0.0874        | 2.88  | 725  | 0.0997          | 0.4228         | -1.9835          | 1.0                | 2.4064          | -204.6450      | -156.4057    | -1.0519         | -0.8609       |
-| 0.1001        | 2.98  | 750  | 0.1001          | 0.4226         | -1.9804          | 1.0                | 2.4030          | -204.6132      | -156.4080    | -1.0519         | -0.8585       |
 ### Framework versions

 This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0896
+- Rewards/chosen: 0.4401
+- Rewards/rejected: -2.0930
 - Rewards/accuracies: 1.0
+- Rewards/margins: 2.5330
+- Logps/rejected: -205.7391
+- Logps/chosen: -156.2334
+- Logits/rejected: -1.0514
+- Logits/chosen: -0.8587
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 8e-07
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
 - total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- num_epochs: 2
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6699        | 0.1   | 25   | 0.6428          | 0.0307         | -0.0744          | 0.9033             | 0.1051          | -185.5532      | -160.3267    | -1.0520         | -0.8550       |
+| 0.5702        | 0.2   | 50   | 0.5471          | 0.0866         | -0.2359          | 0.9933             | 0.3225          | -187.1690      | -159.7680    | -1.0514         | -0.8544       |
+| 0.488         | 0.3   | 75   | 0.4456          | 0.1502         | -0.4424          | 1.0                | 0.5926          | -189.2334      | -159.1314    | -1.0527         | -0.8555       |
+| 0.3957        | 0.4   | 100  | 0.3600          | 0.2054         | -0.6615          | 1.0                | 0.8669          | -191.4245      | -158.5795    | -1.0530         | -0.8577       |
+| 0.3338        | 0.5   | 125  | 0.2865          | 0.2569         | -0.8933          | 1.0                | 1.1502          | -193.7425      | -158.0646    | -1.0524         | -0.8564       |
+| 0.253         | 0.6   | 150  | 0.2257          | 0.3043         | -1.1373          | 1.0                | 1.4416          | -196.1830      | -157.5914    | -1.0523         | -0.8570       |
+| 0.2134        | 0.7   | 175  | 0.1819          | 0.3496         | -1.3537          | 1.0                | 1.7033          | -198.3466      | -157.1379    | -1.0530         | -0.8584       |
+| 0.1613        | 0.79  | 200  | 0.1473          | 0.3842         | -1.5693          | 1.0                | 1.9535          | -200.5027      | -156.7917    | -1.0525         | -0.8591       |
+| 0.1358        | 0.89  | 225  | 0.1231          | 0.4031         | -1.7582          | 1.0                | 2.1614          | -202.3919      | -156.6024    | -1.0523         | -0.8593       |
+| 0.115         | 0.99  | 250  | 0.1076          | 0.4205         | -1.8980          | 1.0                | 2.3185          | -203.7897      | -156.4292    | -1.0521         | -0.8590       |
+| 0.1111        | 1.09  | 275  | 0.0989          | 0.4291         | -1.9856          | 1.0                | 2.4148          | -204.6660      | -156.3426    | -1.0515         | -0.8591       |
+| 0.0902        | 1.19  | 300  | 0.0949          | 0.4280         | -2.0337          | 1.0                | 2.4617          | -205.1465      | -156.3540    | -1.0507         | -0.8576       |
+| 0.0867        | 1.29  | 325  | 0.0920          | 0.4325         | -2.0705          | 1.0                | 2.5030          | -205.5146      | -156.3087    | -1.0510         | -0.8576       |
+| 0.0973        | 1.39  | 350  | 0.0905          | 0.4357         | -2.0839          | 1.0                | 2.5196          | -205.6485      | -156.2766    | -1.0506         | -0.8576       |
+| 0.0942        | 1.49  | 375  | 0.0897          | 0.4422         | -2.0838          | 1.0                | 2.5260          | -205.6476      | -156.2122    | -1.0515         | -0.8578       |
+| 0.0858        | 1.59  | 400  | 0.0897          | 0.4392         | -2.0903          | 1.0                | 2.5295          | -205.7121      | -156.2415    | -1.0515         | -0.8587       |
+| 0.083         | 1.69  | 425  | 0.0893          | 0.4401         | -2.0972          | 1.0                | 2.5373          | -205.7811      | -156.2327    | -1.0511         | -0.8584       |
+| 0.0964        | 1.79  | 450  | 0.0897          | 0.4368         | -2.0947          | 1.0                | 2.5315          | -205.7564      | -156.2662    | -1.0511         | -0.8577       |
+| 0.0931        | 1.89  | 475  | 0.0890          | 0.4406         | -2.0970          | 1.0                | 2.5376          | -205.7794      | -156.2282    | -1.0512         | -0.8585       |
+| 0.0915        | 1.99  | 500  | 0.0896          | 0.4401         | -2.0930          | 1.0                | 2.5330          | -205.7391      | -156.2334    | -1.0514         | -0.8587       |
 ### Framework versions