Weni
/

ZeroShot-Agents-Llama3-4.0.37-DPO

PEFT

Safetensors

Model card Files Files and versions Community

ironrock commited on Jun 27

Commit

76557a4

•

1 Parent(s): c5ea2d8

Model save

Browse files

Files changed (2) hide show

README.md +99 -0
adapter_model.safetensors +1 -1

README.md ADDED Viewed

	@@ -0,0 +1,99 @@

+---
+library_name: peft
+tags:
+- trl
+- dpo
+- generated_from_trainer
+base_model: Weni/ZeroShot-Agents-Llama3-4.0.11-SFT-merged
+model-index:
+- name: ZeroShot-Agents-Llama3-4.0.37-DPO
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# ZeroShot-Agents-Llama3-4.0.37-DPO
+This model is a fine-tuned version of [Weni/ZeroShot-Agents-Llama3-4.0.11-SFT-merged](https://huggingface.co/Weni/ZeroShot-Agents-Llama3-4.0.11-SFT-merged) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.2625
+- Rewards/chosen: 0.9675
+- Rewards/rejected: -1.4892
+- Rewards/accuracies: 0.9070
+- Rewards/margins: 2.4567
+- Logps/rejected: -54.9700
+- Logps/chosen: -30.7303
+- Logits/rejected: -0.1517
+- Logits/chosen: -0.1554
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 6e-06
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.03
+- training_steps: 570
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.5549        | 0.2092 | 20   | 0.4323          | 1.3557         | 0.4942           | 0.8081             | 0.8615          | -35.1361       | -26.8486     | -0.1975         | -0.1979       |
+| 0.3276        | 0.4183 | 40   | 0.3438          | 1.5786         | -0.8833          | 0.8895             | 2.4619          | -48.9107       | -24.6195     | -0.1595         | -0.1635       |
+| 0.3448        | 0.6275 | 60   | 0.2700          | 1.3059         | -0.8860          | 0.9070             | 2.1918          | -48.9380       | -27.3469     | -0.1764         | -0.1794       |
+| 0.2844        | 0.8366 | 80   | 0.2625          | 0.9675         | -1.4892          | 0.9070             | 2.4567          | -54.9700       | -30.7303     | -0.1517         | -0.1554       |
+| 0.3134        | 1.0458 | 100  | 0.2354          | 1.4560         | -1.6614          | 0.9244             | 3.1174          | -56.6918       | -25.8455     | -0.1389         | -0.1437       |
+| 0.2497        | 1.2549 | 120  | 0.2168          | 1.5733         | -1.4017          | 0.9186             | 2.9750          | -54.0951       | -24.6722     | -0.1432         | -0.1482       |
+| 0.2442        | 1.4641 | 140  | 0.2654          | 2.6265         | 0.2007           | 0.9070             | 2.4258          | -38.0710       | -14.1407     | -0.1262         | -0.1305       |
+| 0.2677        | 1.6732 | 160  | 0.1999          | 1.7590         | -2.0006          | 0.9244             | 3.7596          | -60.0844       | -22.8155     | -0.0957         | -0.1025       |
+| 0.2598        | 1.8824 | 180  | 0.2034          | 2.1611         | -1.1528          | 0.9302             | 3.3139          | -51.6058       | -18.7945     | -0.1060         | -0.1121       |
+| 0.165         | 2.0915 | 200  | 0.1710          | 1.5140         | -1.8672          | 0.9593             | 3.3812          | -58.7503       | -25.2654     | -0.0961         | -0.1023       |
+| 0.266         | 2.3007 | 220  | 0.2139          | 2.4282         | -1.2490          | 0.9302             | 3.6772          | -52.5682       | -16.1238     | -0.0783         | -0.0857       |
+| 0.2234        | 2.5098 | 240  | 0.1854          | 2.2003         | -1.1492          | 0.9419             | 3.3495          | -51.5703       | -18.4021     | -0.0816         | -0.0879       |
+| 0.1878        | 2.7190 | 260  | 0.1542          | 1.1292         | -2.8812          | 0.9535             | 4.0104          | -68.8902       | -29.1139     | -0.0584         | -0.0656       |
+| 0.1515        | 2.9281 | 280  | 0.2068          | 2.5962         | -1.2508          | 0.9419             | 3.8470          | -52.5865       | -14.4433     | -0.0552         | -0.0633       |
+| 0.1259        | 3.1373 | 300  | 0.1597          | 2.6323         | -1.4713          | 0.9477             | 4.1035          | -54.7910       | -14.0830     | -0.0438         | -0.0527       |
+| 0.1342        | 3.3464 | 320  | 0.1569          | 2.4367         | -2.3871          | 0.9477             | 4.8239          | -63.9495       | -16.0381     | -0.0302         | -0.0403       |
+| 0.134         | 3.5556 | 340  | 0.1322          | 2.2686         | -2.7697          | 0.9535             | 5.0382          | -67.7748       | -17.7199     | -0.0218         | -0.0326       |
+| 0.0875        | 3.7647 | 360  | 0.1696          | 2.7864         | -1.9444          | 0.9302             | 4.7308          | -59.5222       | -12.5413     | -0.0084         | -0.0191       |
+| 0.0724        | 3.9739 | 380  | 0.1411          | 2.7248         | -2.4834          | 0.9302             | 5.2081          | -64.9118       | -13.1577     | 0.0165          | 0.0051        |
+| 0.0471        | 4.1830 | 400  | 0.1487          | 2.5177         | -3.5385          | 0.9477             | 6.0562          | -75.4631       | -15.2287     | 0.0398          | 0.0267        |
+| 0.053         | 4.3922 | 420  | 0.1444          | 2.3522         | -3.7636          | 0.9535             | 6.1157          | -77.7140       | -16.8839     | 0.0486          | 0.0352        |
+| 0.0503        | 4.6013 | 440  | 0.1726          | 2.9349         | -2.6371          | 0.9302             | 5.5720          | -66.4492       | -11.0570     | 0.0511          | 0.0379        |
+| 0.0503        | 4.8105 | 460  | 0.1848          | 2.4749         | -3.5917          | 0.9477             | 6.0666          | -75.9953       | -15.6569     | 0.0731          | 0.0596        |
+| 0.0636        | 5.0196 | 480  | 0.1494          | 2.2862         | -4.0054          | 0.9593             | 6.2916          | -80.1320       | -17.5431     | 0.0867          | 0.0727        |
+| 0.0299        | 5.2288 | 500  | 0.1578          | 2.4928         | -3.6944          | 0.9593             | 6.1872          | -77.0218       | -15.4775     | 0.0949          | 0.0808        |
+| 0.0224        | 5.4379 | 520  | 0.1532          | 2.4571         | -3.9264          | 0.9535             | 6.3835          | -79.3418       | -15.8344     | 0.1001          | 0.0855        |
+| 0.0674        | 5.6471 | 540  | 0.1652          | 2.5101         | -3.8634          | 0.9477             | 6.3734          | -78.7120       | -15.3049     | 0.1032          | 0.0885        |
+| 0.0259        | 5.8562 | 560  | 0.1630          | 2.4884         | -3.9396          | 0.9477             | 6.4279          | -79.4737       | -15.5216     | 0.1059          | 0.0910        |
+### Framework versions
+- PEFT 0.11.1
+- Transformers 4.41.2
+- Pytorch 2.3.0+cu121
+- Datasets 2.19.2
+- Tokenizers 0.19.1

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a6689373c73893f375f8a07862cbe8a8bbc2a74409d3d4febba1d5da85a7c377
 size 2684416208

 version https://git-lfs.github.com/spec/v1
+oid sha256:3acf2ab3ac0196cf8b3871a17f2658ed7ac63e95bb2e92cfba8298fdff1fddb8
 size 2684416208