ahmedabdelwahed
/

test

PEFT

Safetensors

Generated from Trainer

Model card Files Files and versions Community

ahmedabdelwahed commited on Dec 27, 2023

Commit

f0eb78a

•

1 Parent(s): 309e492

Mojiz-DPO

Browse files

Files changed (3) hide show

README.md +27 -17
adapter_model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -17,14 +17,14 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [ahmedabdelwahed/Mojiz-sft](https://huggingface.co/ahmedabdelwahed/Mojiz-sft) on the None dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.0000
-- Rewards/chosen: 16.5213
-- Rewards/rejected: -7.9440
 - Rewards/accuracies: 1.0
-- Rewards/margins: 24.4653
-- Logps/rejected: -86.9274
-- Logps/chosen: -293.2704
-- Logits/rejected: -11.2915
-- Logits/chosen: -12.2516
 ## Model description
@@ -50,22 +50,32 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 150
-- training_steps: 1000
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
 | 0.0017        | 0.41  | 100  | 0.0000          | 9.9359         | -3.7597          | 1.0                | 13.6956         | -78.5589       | -306.4413    | -11.4127        | -12.4541      |
-| 0.0002        | 0.82  | 200  | 0.0000          | 14.1969        | -5.8004          | 1.0                | 19.9973         | -82.6403       | -297.9192    | -11.3000        | -12.2682      |
-| 0.0037        | 1.22  | 300  | 0.0000          | 14.8615        | -6.7633          | 1.0                | 21.6248         | -84.5661       | -296.5901    | -11.2673        | -12.2269      |
-| 0.0           | 1.63  | 400  | 0.0000          | 15.4935        | -7.6471          | 1.0                | 23.1406         | -86.3337       | -295.3261    | -11.2271        | -12.1591      |
-| 0.0           | 2.04  | 500  | 0.0000          | 15.8634        | -7.8871          | 1.0                | 23.7505         | -86.8136       | -294.5863    | -11.2316        | -12.1672      |
-| 0.0           | 2.45  | 600  | 0.0000          | 16.1624        | -7.8756          | 1.0                | 24.0380         | -86.7906       | -293.9882    | -11.2578        | -12.2052      |
-| 0.0           | 2.86  | 700  | 0.0000          | 16.1247        | -8.2229          | 1.0                | 24.3476         | -87.4853       | -294.0637    | -11.2414        | -12.1705      |
-| 0.0           | 3.27  | 800  | 0.0000          | 16.4219        | -7.9771          | 1.0                | 24.3989         | -86.9936       | -293.4693    | -11.2814        | -12.2344      |
-| 0.0           | 3.67  | 900  | 0.0000          | 16.4248        | -7.9873          | 1.0                | 24.4122         | -87.0141       | -293.4634    | -11.2812        | -12.2342      |
-| 0.0           | 4.08  | 1000 | 0.0000          | 16.5213        | -7.9440          | 1.0                | 24.4653         | -86.9274       | -293.2704    | -11.2915        | -12.2516      |
 ### Framework versions

 This model is a fine-tuned version of [ahmedabdelwahed/Mojiz-sft](https://huggingface.co/ahmedabdelwahed/Mojiz-sft) on the None dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.0000
+- Rewards/chosen: 17.3685
+- Rewards/rejected: -9.0879
 - Rewards/accuracies: 1.0
+- Rewards/margins: 26.4564
+- Logps/rejected: -89.2152
+- Logps/chosen: -291.5760
+- Logits/rejected: -11.3625
+- Logits/chosen: -12.3502
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 150
+- training_steps: 2000
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
 | 0.0017        | 0.41  | 100  | 0.0000          | 9.9359         | -3.7597          | 1.0                | 13.6956         | -78.5589       | -306.4413    | -11.4127        | -12.4541      |
+| 0.0002        | 0.82  | 200  | 0.0000          | 14.2220        | -5.8111          | 1.0                | 20.0331         | -82.6616       | -297.8690    | -11.3000        | -12.2683      |
+| 0.0036        | 1.22  | 300  | 0.0000          | 14.9129        | -6.8369          | 1.0                | 21.7499         | -84.7133       | -296.4872    | -11.2647        | -12.2239      |
+| 0.0           | 1.63  | 400  | 0.0000          | 15.5788        | -7.9152          | 1.0                | 23.4939         | -86.8698       | -295.1555    | -11.2175        | -12.1437      |
+| 0.0           | 2.04  | 500  | 0.0000          | 15.9537        | -8.1483          | 1.0                | 24.1020         | -87.3360       | -294.4057    | -11.2256        | -12.1568      |
+| 0.0           | 2.45  | 600  | 0.0000          | 16.3724        | -8.1224          | 1.0                | 24.4948         | -87.2842       | -293.5682    | -11.2611        | -12.2092      |
+| 0.0           | 2.86  | 700  | 0.0000          | 16.2464        | -8.7026          | 1.0                | 24.9490         | -88.4446       | -293.8203    | -11.2351        | -12.1506      |
+| 0.0           | 3.27  | 800  | 0.0000          | 17.0144        | -7.9818          | 1.0                | 24.9962         | -87.0030       | -292.2843    | -11.3359        | -12.3118      |
+| 0.0           | 3.67  | 900  | 0.0000          | 17.0212        | -8.0028          | 1.0                | 25.0241         | -87.0451       | -292.2705    | -11.3356        | -12.3114      |
+| 0.0           | 4.08  | 1000 | 0.0000          | 17.1508        | -8.0673          | 1.0                | 25.2182         | -87.1741       | -292.0114    | -11.3467        | -12.3293      |
+| 0.0           | 4.49  | 1100 | 0.0000          | 17.1035        | -8.6586          | 1.0                | 25.7621         | -88.3567       | -292.1060    | -11.3427        | -12.3196      |
+| 0.0           | 4.9   | 1200 | 0.0000          | 17.1191        | -8.6910          | 1.0                | 25.8101         | -88.4214       | -292.0748    | -11.3428        | -12.3192      |
+| 0.0           | 5.31  | 1300 | 0.0000          | 17.2084        | -8.9388          | 1.0                | 26.1472         | -88.9170       | -291.8962    | -11.3530        | -12.3370      |
+| 0.0           | 5.71  | 1400 | 0.0000          | 17.2160        | -9.1194          | 1.0                | 26.3354         | -89.2783       | -291.8811    | -11.3506        | -12.3312      |
+| 0.0           | 6.12  | 1500 | 0.0000          | 17.2326        | -9.1376          | 1.0                | 26.3702         | -89.3146       | -291.8478    | -11.3494        | -12.3291      |
+| 0.0002        | 6.53  | 1600 | 0.0000          | 17.2830        | -9.1192          | 1.0                | 26.4022         | -89.2778       | -291.7470    | -11.3555        | -12.3384      |
+| 0.0           | 6.94  | 1700 | 0.0000          | 17.3062        | -9.1044          | 1.0                | 26.4107         | -89.2482       | -291.7006    | -11.3582        | -12.3429      |
+| 0.0           | 7.35  | 1800 | 0.0000          | 17.3229        | -9.1143          | 1.0                | 26.4372         | -89.2681       | -291.6673    | -11.3586        | -12.3435      |
+| 0.0           | 7.76  | 1900 | 0.0000          | 17.3411        | -9.1107          | 1.0                | 26.4518         | -89.2609       | -291.6309    | -11.3600        | -12.3457      |
+| 0.0           | 8.16  | 2000 | 0.0000          | 17.3685        | -9.0879          | 1.0                | 26.4564         | -89.2152       | -291.5760    | -11.3625        | -12.3502      |
 ### Framework versions

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:79535a0e2a032c784c61d90fcbb50b82a79fd7b9df89d4272ab43984c2421865
 size 7098016

 version https://git-lfs.github.com/spec/v1
+oid sha256:cee3e1bfa215df1a3e688744448a3a01bed605da4d393df476ecddf79c902fc5
 size 7098016

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:304e26e2abb03ba7eaa0fadfbe1f573864d1c879025abe293c6ffe715100b200
 size 4219

 version https://git-lfs.github.com/spec/v1
+oid sha256:fe74c66fc54071506a581583a53f3e4402f016002b22d95b06e97f597f7273e5
 size 4219