aritrasen/mistral-dpo

Browse files

Files changed (4) hide show

README.md +34 -12
adapter_model.safetensors +1 -1
runs/Dec21_18-30-01_95c7d1f2a3f6/events.out.tfevents.1703183408.95c7d1f2a3f6.26.0 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -7,8 +7,6 @@ base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
 model-index:
 - name: mistral-dpo
   results: []
-datasets:
-- Dahoas/full-hh-rlhf
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -18,15 +16,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6930
-- Rewards/chosen: -0.0558
-- Rewards/rejected: -0.0669
 - Rewards/accuracies: 0.5096
-- Rewards/margins: 0.0111
-- Logps/rejected: -200.9969
-- Logps/chosen: -193.8321
-- Logits/rejected: -2.2922
-- Logits/chosen: -2.3906
 ## Model description
@@ -52,14 +50,38 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
-- training_steps: 10
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6825        | 0.0   | 10   | 0.6930          | -0.0558        | -0.0669          | 0.5096             | 0.0111          | -200.9969      | -193.8321    | -2.2922         | -2.3906       |
 ### Framework versions

 model-index:
 - name: mistral-dpo
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.8911
+- Rewards/chosen: 0.5387
+- Rewards/rejected: 0.4878
 - Rewards/accuracies: 0.5096
+- Rewards/margins: 0.0509
+- Logps/rejected: -174.3804
+- Logps/chosen: -178.5185
+- Logits/rejected: -2.5028
+- Logits/chosen: -2.5350
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
+- training_steps: 250
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6703        | 0.0   | 10   | 0.6842          | -0.0001        | -0.0268          | 0.5865             | 0.0267          | -179.5257      | -183.9063    | -2.4290         | -2.4720       |
+| 0.7119        | 0.0   | 20   | 0.6751          | 0.1584         | 0.0990           | 0.5769             | 0.0594          | -178.2678      | -182.3211    | -2.4542         | -2.4988       |
+| 0.647         | 0.0   | 30   | 0.6702          | 0.3569         | 0.2540           | 0.5769             | 0.1029          | -176.7180      | -180.3367    | -2.4886         | -2.5306       |
+| 0.6748        | 0.0   | 40   | 0.6712          | 0.3439         | 0.2229           | 0.5288             | 0.1210          | -177.0292      | -180.4664    | -2.5206         | -2.5581       |
+| 0.6513        | 0.0   | 50   | 0.6707          | 0.4403         | 0.2838           | 0.5577             | 0.1565          | -176.4200      | -179.5021    | -2.5608         | -2.5853       |
+| 0.6103        | 0.0   | 60   | 0.6695          | 0.6831         | 0.4769           | 0.5577             | 0.2063          | -174.4892      | -177.0740    | -2.5719         | -2.5933       |
+| 1.0313        | 0.01  | 70   | 0.6724          | 0.7062         | 0.5084           | 0.5577             | 0.1978          | -174.1739      | -176.8436    | -2.5543         | -2.5843       |
+| 0.6876        | 0.01  | 80   | 0.6804          | 0.6995         | 0.5144           | 0.5385             | 0.1850          | -174.1135      | -176.9104    | -2.5443         | -2.5829       |
+| 0.9661        | 0.01  | 90   | 0.6828          | 0.7118         | 0.5376           | 0.5385             | 0.1742          | -173.8821      | -176.7873    | -2.5479         | -2.5846       |
+| 0.7354        | 0.01  | 100  | 0.6757          | 0.6765         | 0.5039           | 0.5577             | 0.1726          | -174.2186      | -177.1401    | -2.5399         | -2.5758       |
+| 1.0127        | 0.01  | 110  | 0.7129          | 0.6089         | 0.4855           | 0.5288             | 0.1234          | -174.4033      | -177.8165    | -2.5464         | -2.5760       |
+| 1.0366        | 0.01  | 120  | 0.7440          | 0.6068         | 0.4946           | 0.5481             | 0.1122          | -174.3115      | -177.8369    | -2.5516         | -2.5804       |
+| 1.2145        | 0.01  | 130  | 0.7564          | 0.6521         | 0.5396           | 0.5673             | 0.1125          | -173.8620      | -177.3846    | -2.5608         | -2.5878       |
+| 0.8342        | 0.01  | 140  | 0.7649          | 0.6639         | 0.5519           | 0.5385             | 0.1119          | -173.7388      | -177.2668    | -2.5547         | -2.5828       |
+| 0.7402        | 0.01  | 150  | 0.7991          | 0.5831         | 0.4883           | 0.5                | 0.0948          | -174.3747      | -178.0745    | -2.5498         | -2.5775       |
+| 0.7162        | 0.01  | 160  | 0.8396          | 0.6134         | 0.5474           | 0.5096             | 0.0659          | -173.7835      | -177.7718    | -2.5445         | -2.5713       |
+| 0.9396        | 0.01  | 170  | 0.8573          | 0.5700         | 0.5144           | 0.5288             | 0.0556          | -174.1144      | -178.2057    | -2.5326         | -2.5629       |
+| 0.5958        | 0.01  | 180  | 0.8708          | 0.5526         | 0.5017           | 0.5288             | 0.0509          | -174.2406      | -178.3789    | -2.5227         | -2.5540       |
+| 0.7588        | 0.02  | 190  | 0.8865          | 0.5428         | 0.4977           | 0.5288             | 0.0450          | -174.2806      | -178.4775    | -2.5207         | -2.5493       |
+| 0.7811        | 0.02  | 200  | 0.8933          | 0.5797         | 0.5429           | 0.5192             | 0.0368          | -173.8286      | -178.1080    | -2.5171         | -2.5434       |
+| 0.5735        | 0.02  | 210  | 0.8907          | 0.5577         | 0.5174           | 0.5288             | 0.0403          | -174.0838      | -178.3279    | -2.5069         | -2.5366       |
+| 0.7709        | 0.02  | 220  | 0.8886          | 0.5602         | 0.5167           | 0.5192             | 0.0435          | -174.0907      | -178.3035    | -2.5041         | -2.5361       |
+| 0.4914        | 0.02  | 230  | 0.8884          | 0.5237         | 0.4766           | 0.5192             | 0.0471          | -174.4924      | -178.6684    | -2.5050         | -2.5375       |
+| 0.739         | 0.02  | 240  | 0.8910          | 0.5281         | 0.4796           | 0.5192             | 0.0485          | -174.4621      | -178.6240    | -2.5027         | -2.5351       |
+| 0.5743        | 0.02  | 250  | 0.8911          | 0.5387         | 0.4878           | 0.5096             | 0.0509          | -174.3804      | -178.5185    | -2.5028         | -2.5350       |
 ### Framework versions

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3a7670b20147de176a9b407e70da7aa40acf3479f3f93ba737129a62c21535cf
 size 6832600

 version https://git-lfs.github.com/spec/v1
+oid sha256:7ee1512f5d20c94e2d685820ce34546d555489cb396373434b6e54d8ba4df875
 size 6832600

runs/Dec21_18-30-01_95c7d1f2a3f6/events.out.tfevents.1703183408.95c7d1f2a3f6.26.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f95051c65615cf007739a26f224d4cae371399059016e716fde064d51b009d1f
+size 40047

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d1a5c488c6ab1fd32c0bcdf7d0081ef6565e6436f00739c2c4d5781c702e6ec7
 size 4219

 version https://git-lfs.github.com/spec/v1
+oid sha256:a744ecc30871a71a7fc5ed07942856a6e759e716fa13531c07b20ed7930a0f2e
 size 4219