mnoukhov
/

pythia410m-dpo2-tldr

PEFT

Safetensors

Generated from Trainer

Model card Files Files and versions Community

mnoukhov commited on May 17, 2024

Commit

9b839f5

verified ·

1 Parent(s): bbd6669

mnoukhov/pythia410m-dpo2-tldr

Browse files

Files changed (1) hide show

README.md +12 -11

README.md CHANGED Viewed

@@ -16,13 +16,13 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [mnoukhov/pythia410m-sft-tldr](https://huggingface.co/mnoukhov/pythia410m-sft-tldr) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6553
-- Rewards/chosen: -0.0115
-- Rewards/rejected: -0.0984
-- Rewards/accuracies: 0.6726
-- Rewards/margins: 0.0869
-- Logps/rejected: -65.8898
-- Logps/chosen: -65.8898
 - Logps/ref Rejected: -59.5615
 - Logps/ref Chosen: -65.6594
@@ -61,10 +61,11 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/ref Rejected | Logps/ref Chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:------------------:|:----------------:|
-| No log        | 0.2016 | 63   | 0.6719          | 0.0165         | -0.0292          | 0.6739             | 0.0457          | -65.3302       | -65.3302     | -59.5615           | -65.6594         |
-| 0.6865        | 0.4032 | 126  | 0.6614          | 0.0030         | -0.0680          | 0.6752             | 0.0710          | -65.5994       | -65.5994     | -59.5615           | -65.6594         |
-| 0.6865        | 0.6048 | 189  | 0.6571          | -0.0046        | -0.0866          | 0.6744             | 0.0820          | -65.7515       | -65.7515     | -59.5615           | -65.6594         |
-| 0.6771        | 0.8064 | 252  | 0.6553          | -0.0115        | -0.0984          | 0.6726             | 0.0869          | -65.8898       | -65.8898     | -59.5615           | -65.6594         |
 ### Framework versions

 This model is a fine-tuned version of [mnoukhov/pythia410m-sft-tldr](https://huggingface.co/mnoukhov/pythia410m-sft-tldr) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6073
+- Rewards/chosen: -1.2728
+- Rewards/rejected: -1.5670
+- Rewards/accuracies: 0.6761
+- Rewards/margins: 0.2942
+- Logps/rejected: -91.1163
+- Logps/chosen: -91.1163
 - Logps/ref Rejected: -59.5615
 - Logps/ref Chosen: -65.6594
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/ref Rejected | Logps/ref Chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:------------------:|:----------------:|
+| 0.6681        | 0.1999 | 335  | 0.6376          | -0.2343        | -0.3789          | 0.6615             | 0.1446          | -70.3464       | -70.3464     | -59.5615           | -65.6594         |
+| 0.6485        | 0.3999 | 670  | 0.6171          | -0.9421        | -1.1796          | 0.6678             | 0.2375          | -84.5023       | -84.5023     | -59.5615           | -65.6594         |
+| 0.6362        | 0.5998 | 1005 | 0.6095          | -1.1035        | -1.3785          | 0.6743             | 0.2750          | -87.7290       | -87.7290     | -59.5615           | -65.6594         |
+| 0.6342        | 0.7998 | 1340 | 0.6063          | -1.2460        | -1.5415          | 0.6768             | 0.2955          | -90.5797       | -90.5797     | -59.5615           | -65.6594         |
+| 0.6299        | 0.9997 | 1675 | 0.6073          | -1.2728        | -1.5670          | 0.6761             | 0.2942          | -91.1163       | -91.1163     | -59.5615           | -65.6594         |
 ### Framework versions