thorirhrafn
/

llama_DPO_model_e2

@@ -18,15 +18,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0937
-- Rewards/chosen: 0.4389
-- Rewards/rejected: -2.0384
 - Rewards/accuracies: 1.0
-- Rewards/margins: 2.4774
-- Logps/rejected: -205.1940
-- Logps/chosen: -156.2447
-- Logits/rejected: -1.0509
-- Logits/chosen: -0.8587
 ## Model description
@@ -45,7 +45,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 8e-07
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
@@ -53,32 +53,42 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- num_epochs: 2
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.673         | 0.1   | 25   | 0.6445          | 0.0273         | -0.0740          | 0.9000             | 0.1013          | -185.5491      | -160.3607    | -1.0521         | -0.8545       |
-| 0.5737        | 0.2   | 50   | 0.5485          | 0.0856         | -0.2335          | 0.9933             | 0.3190          | -187.1442      | -159.7781    | -1.0526         | -0.8551       |
-| 0.4843        | 0.3   | 75   | 0.4496          | 0.1470         | -0.4343          | 1.0                | 0.5814          | -189.1528      | -159.1637    | -1.0527         | -0.8571       |
-| 0.4006        | 0.4   | 100  | 0.3655          | 0.2043         | -0.6419          | 1.0                | 0.8462          | -191.2286      | -158.5909    | -1.0521         | -0.8556       |
-| 0.3417        | 0.5   | 125  | 0.2945          | 0.2551         | -0.8630          | 1.0                | 1.1180          | -193.4393      | -158.0833    | -1.0522         | -0.8562       |
-| 0.2601        | 0.6   | 150  | 0.2353          | 0.3032         | -1.0903          | 1.0                | 1.3935          | -195.7128      | -157.6020    | -1.0520         | -0.8597       |
-| 0.2197        | 0.7   | 175  | 0.1891          | 0.3442         | -1.3124          | 1.0                | 1.6565          | -197.9333      | -157.1923    | -1.0522         | -0.8579       |
-| 0.1675        | 0.79  | 200  | 0.1532          | 0.3815         | -1.5253          | 1.0                | 1.9067          | -200.0621      | -156.8192    | -1.0526         | -0.8582       |
-| 0.1417        | 0.89  | 225  | 0.1289          | 0.4011         | -1.7082          | 1.0                | 2.1094          | -201.8920      | -156.6225    | -1.0525         | -0.8585       |
-| 0.1203        | 0.99  | 250  | 0.1117          | 0.4214         | -1.8534          | 1.0                | 2.2748          | -203.3437      | -156.4196    | -1.0517         | -0.8603       |
-| 0.1156        | 1.09  | 275  | 0.1034          | 0.4296         | -1.9336          | 1.0                | 2.3633          | -204.1459      | -156.3377    | -1.0517         | -0.8590       |
-| 0.0942        | 1.19  | 300  | 0.0990          | 0.4310         | -1.9823          | 1.0                | 2.4133          | -204.6330      | -156.3240    | -1.0514         | -0.8577       |
-| 0.0903        | 1.29  | 325  | 0.0957          | 0.4380         | -2.0137          | 1.0                | 2.4517          | -204.9467      | -156.2539    | -1.0511         | -0.8593       |
-| 0.1023        | 1.39  | 350  | 0.0946          | 0.4384         | -2.0296          | 1.0                | 2.4680          | -205.1059      | -156.2503    | -1.0519         | -0.8587       |
-| 0.0984        | 1.49  | 375  | 0.0945          | 0.4352         | -2.0350          | 1.0                | 2.4702          | -205.1597      | -156.2819    | -1.0510         | -0.8580       |
-| 0.0899        | 1.59  | 400  | 0.0939          | 0.4360         | -2.0393          | 1.0                | 2.4752          | -205.2024      | -156.2742    | -1.0513         | -0.8594       |
-| 0.0883        | 1.69  | 425  | 0.0939          | 0.4374         | -2.0378          | 1.0                | 2.4752          | -205.1877      | -156.2598    | -1.0514         | -0.8590       |
-| 0.1011        | 1.79  | 450  | 0.0939          | 0.4368         | -2.0412          | 1.0                | 2.4781          | -205.2217      | -156.2654    | -1.0513         | -0.8583       |
-| 0.0962        | 1.89  | 475  | 0.0935          | 0.4403         | -2.0395          | 1.0                | 2.4798          | -205.2041      | -156.2308    | -1.0510         | -0.8574       |
-| 0.0971        | 1.99  | 500  | 0.0937          | 0.4389         | -2.0384          | 1.0                | 2.4774          | -205.1940      | -156.2447    | -1.0509         | -0.8587       |
 ### Framework versions

 This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.1526
+- Rewards/chosen: 0.3611
+- Rewards/rejected: -1.5450
 - Rewards/accuracies: 1.0
+- Rewards/margins: 1.9061
+- Logps/rejected: -200.2592
+- Logps/chosen: -157.0226
+- Logits/rejected: -1.0513
+- Logits/chosen: -0.8571
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 5e-07
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
 - total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- num_epochs: 3
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6819        | 0.1   | 25   | 0.6708          | 0.0151         | -0.0312          | 0.7567             | 0.0463          | -185.1220      | -160.4831    | -1.0517         | -0.8540       |
+| 0.6351        | 0.2   | 50   | 0.6228          | 0.0428         | -0.1054          | 0.9600             | 0.1482          | -185.8636      | -160.2060    | -1.0524         | -0.8552       |
+| 0.5874        | 0.3   | 75   | 0.5655          | 0.0762         | -0.2019          | 0.9967             | 0.2781          | -186.8286      | -159.8719    | -1.0525         | -0.8548       |
+| 0.5179        | 0.4   | 100  | 0.5030          | 0.1133         | -0.3207          | 1.0                | 0.4340          | -188.0166      | -159.5010    | -1.0521         | -0.8545       |
+| 0.479         | 0.5   | 125  | 0.4468          | 0.1501         | -0.4388          | 1.0                | 0.5889          | -189.1974      | -159.1327    | -1.0524         | -0.8554       |
+| 0.406         | 0.6   | 150  | 0.3904          | 0.1842         | -0.5778          | 1.0                | 0.7620          | -190.5874      | -158.7915    | -1.0525         | -0.8576       |
+| 0.3731        | 0.7   | 175  | 0.3377          | 0.2223         | -0.7247          | 1.0                | 0.9470          | -192.0564      | -158.4104    | -1.0521         | -0.8559       |
+| 0.3075        | 0.79  | 200  | 0.2918          | 0.2537         | -0.8769          | 1.0                | 1.1305          | -193.5782      | -158.0974    | -1.0525         | -0.8583       |
+| 0.2621        | 0.89  | 225  | 0.2517          | 0.2822         | -1.0278          | 1.0                | 1.3100          | -195.0876      | -157.8119    | -1.0525         | -0.8573       |
+| 0.2285        | 0.99  | 250  | 0.2180          | 0.3118         | -1.1738          | 1.0                | 1.4855          | -196.5471      | -157.5160    | -1.0517         | -0.8568       |
+| 0.2162        | 1.09  | 275  | 0.1948          | 0.3279         | -1.2897          | 1.0                | 1.6176          | -197.7066      | -157.3551    | -1.0513         | -0.8567       |
+| 0.1752        | 1.19  | 300  | 0.1810          | 0.3383         | -1.3661          | 1.0                | 1.7044          | -198.4706      | -157.2514    | -1.0511         | -0.8576       |
+| 0.1672        | 1.29  | 325  | 0.1714          | 0.3456         | -1.4242          | 1.0                | 1.7698          | -199.0516      | -157.1775    | -1.0509         | -0.8568       |
+| 0.1722        | 1.39  | 350  | 0.1646          | 0.3535         | -1.4653          | 1.0                | 1.8187          | -199.4624      | -157.0993    | -1.0510         | -0.8568       |
+| 0.1649        | 1.49  | 375  | 0.1596          | 0.3586         | -1.4919          | 1.0                | 1.8505          | -199.7286      | -157.0477    | -1.0512         | -0.8569       |
+| 0.1534        | 1.59  | 400  | 0.1580          | 0.3603         | -1.5059          | 1.0                | 1.8663          | -199.8687      | -157.0304    | -1.0507         | -0.8571       |
+| 0.1492        | 1.69  | 425  | 0.1561          | 0.3589         | -1.5194          | 1.0                | 1.8783          | -200.0034      | -157.0448    | -1.0514         | -0.8578       |
+| 0.1625        | 1.79  | 450  | 0.1564          | 0.3586         | -1.5205          | 1.0                | 1.8791          | -200.0150      | -157.0482    | -1.0509         | -0.8570       |
+| 0.1561        | 1.89  | 475  | 0.1535          | 0.3613         | -1.5366          | 1.0                | 1.8979          | -200.1756      | -157.0212    | -1.0510         | -0.8576       |
+| 0.1565        | 1.99  | 500  | 0.1529          | 0.3643         | -1.5393          | 1.0                | 1.9036          | -200.2028      | -156.9913    | -1.0513         | -0.8567       |
+| 0.1476        | 2.09  | 525  | 0.1530          | 0.3640         | -1.5392          | 1.0                | 1.9032          | -200.2021      | -156.9944    | -1.0511         | -0.8569       |
+| 0.1457        | 2.19  | 550  | 0.1530          | 0.3605         | -1.5406          | 1.0                | 1.9011          | -200.2155      | -157.0287    | -1.0507         | -0.8577       |
+| 0.1376        | 2.29  | 575  | 0.1529          | 0.3585         | -1.5466          | 1.0                | 1.9051          | -200.2757      | -157.0492    | -1.0508         | -0.8579       |
+| 0.1574        | 2.38  | 600  | 0.1527          | 0.3634         | -1.5448          | 1.0                | 1.9082          | -200.2574      | -156.9998    | -1.0508         | -0.8566       |
+| 0.1662        | 2.48  | 625  | 0.1518          | 0.3645         | -1.5465          | 1.0                | 1.9109          | -200.2742      | -156.9890    | -1.0509         | -0.8572       |
+| 0.1535        | 2.58  | 650  | 0.1523          | 0.3628         | -1.5458          | 1.0                | 1.9086          | -200.2675      | -157.0059    | -1.0510         | -0.8571       |
+| 0.1488        | 2.68  | 675  | 0.1518          | 0.3658         | -1.5446          | 1.0                | 1.9104          | -200.2561      | -156.9763    | -1.0510         | -0.8572       |
+| 0.1564        | 2.78  | 700  | 0.1526          | 0.3618         | -1.5452          | 1.0                | 1.9071          | -200.2618      | -157.0154    | -1.0512         | -0.8568       |
+| 0.1367        | 2.88  | 725  | 0.1526          | 0.3643         | -1.5426          | 1.0                | 1.9069          | -200.2352      | -156.9905    | -1.0513         | -0.8570       |
+| 0.1543        | 2.98  | 750  | 0.1526          | 0.3611         | -1.5450          | 1.0                | 1.9061          | -200.2592      | -157.0226    | -1.0513         | -0.8571       |
 ### Framework versions