thorirhrafn
/

llama_DPO_model_e3

+---
+license: llama2
+library_name: peft
+tags:
+- trl
+- dpo
+- generated_from_trainer
+base_model: meta-llama/Llama-2-7b-hf
+model-index:
+- name: llama_DPO_model_e3
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# llama_DPO_model_e3
+This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.0722
+- Rewards/chosen: 0.4618
+- Rewards/rejected: -2.3246
+- Rewards/accuracies: 1.0
+- Rewards/margins: 2.7864
+- Logps/rejected: -208.0558
+- Logps/chosen: -156.0157
+- Logits/rejected: -1.0512
+- Logits/chosen: -0.8590
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 7e-07
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 8
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 3
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.675         | 0.1   | 25   | 0.6531          | 0.0248         | -0.0584          | 0.8667             | 0.0832          | -185.3936      | -160.3859    | -1.0523         | -0.8549       |
+| 0.5865        | 0.2   | 50   | 0.5720          | 0.0730         | -0.1895          | 0.9933             | 0.2625          | -186.7048      | -159.9039    | -1.0525         | -0.8552       |
+| 0.5203        | 0.3   | 75   | 0.4808          | 0.1258         | -0.3673          | 1.0                | 0.4931          | -188.4825      | -159.3763    | -1.0520         | -0.8543       |
+| 0.4291        | 0.4   | 100  | 0.3986          | 0.1804         | -0.5547          | 1.0                | 0.7352          | -190.3568      | -158.8295    | -1.0527         | -0.8559       |
+| 0.3712        | 0.5   | 125  | 0.3264          | 0.2303         | -0.7594          | 1.0                | 0.9897          | -192.4033      | -158.3308    | -1.0528         | -0.8572       |
+| 0.2856        | 0.6   | 150  | 0.2612          | 0.2765         | -0.9893          | 1.0                | 1.2658          | -194.7025      | -157.8685    | -1.0531         | -0.8592       |
+| 0.2433        | 0.7   | 175  | 0.2086          | 0.3223         | -1.2201          | 1.0                | 1.5424          | -197.0102      | -157.4110    | -1.0526         | -0.8573       |
+| 0.1822        | 0.79  | 200  | 0.1673          | 0.3627         | -1.4385          | 1.0                | 1.8012          | -199.1950      | -157.0071    | -1.0529         | -0.8606       |
+| 0.1511        | 0.89  | 225  | 0.1354          | 0.3921         | -1.6585          | 1.0                | 2.0506          | -201.3948      | -156.7133    | -1.0522         | -0.8601       |
+| 0.1211        | 0.99  | 250  | 0.1134          | 0.4119         | -1.8492          | 1.0                | 2.2612          | -203.3017      | -156.5144    | -1.0526         | -0.8591       |
+| 0.113         | 1.09  | 275  | 0.0999          | 0.4261         | -1.9792          | 1.0                | 2.4054          | -204.6017      | -156.3724    | -1.0511         | -0.8578       |
+| 0.087         | 1.19  | 300  | 0.0912          | 0.4374         | -2.0704          | 1.0                | 2.5078          | -205.5134      | -156.2602    | -1.0521         | -0.8612       |
+| 0.0808        | 1.29  | 325  | 0.0846          | 0.4439         | -2.1510          | 1.0                | 2.5949          | -206.3199      | -156.1949    | -1.0515         | -0.8600       |
+| 0.0875        | 1.39  | 350  | 0.0814          | 0.4537         | -2.1942          | 1.0                | 2.6479          | -206.7517      | -156.0968    | -1.0520         | -0.8589       |
+| 0.0826        | 1.49  | 375  | 0.0785          | 0.4559         | -2.2325          | 1.0                | 2.6884          | -207.1346      | -156.0752    | -1.0516         | -0.8585       |
+| 0.0717        | 1.59  | 400  | 0.0768          | 0.4564         | -2.2611          | 1.0                | 2.7175          | -207.4205      | -156.0697    | -1.0517         | -0.8595       |
+| 0.0694        | 1.69  | 425  | 0.0750          | 0.4602         | -2.2778          | 1.0                | 2.7380          | -207.5878      | -156.0322    | -1.0516         | -0.8590       |
+| 0.0809        | 1.79  | 450  | 0.0739          | 0.4647         | -2.2925          | 1.0                | 2.7572          | -207.7341      | -155.9865    | -1.0514         | -0.8586       |
+| 0.0747        | 1.89  | 475  | 0.0736          | 0.4595         | -2.3075          | 1.0                | 2.7670          | -207.8848      | -156.0394    | -1.0515         | -0.8584       |
+| 0.0751        | 1.99  | 500  | 0.0726          | 0.4643         | -2.3130          | 1.0                | 2.7773          | -207.9396      | -155.9911    | -1.0516         | -0.8589       |
+| 0.069         | 2.09  | 525  | 0.0725          | 0.4608         | -2.3223          | 1.0                | 2.7831          | -208.0324      | -156.0257    | -1.0512         | -0.8598       |
+| 0.0658        | 2.19  | 550  | 0.0724          | 0.4670         | -2.3178          | 1.0                | 2.7847          | -207.9872      | -155.9642    | -1.0514         | -0.8580       |
+| 0.0659        | 2.29  | 575  | 0.0720          | 0.4650         | -2.3217          | 1.0                | 2.7867          | -208.0269      | -155.9841    | -1.0516         | -0.8592       |
+| 0.0732        | 2.38  | 600  | 0.0725          | 0.4585         | -2.3236          | 1.0                | 2.7821          | -208.0455      | -156.0485    | -1.0511         | -0.8591       |
+| 0.0802        | 2.48  | 625  | 0.0723          | 0.4611         | -2.3249          | 1.0                | 2.7859          | -208.0582      | -156.0233    | -1.0511         | -0.8582       |
+| 0.0734        | 2.58  | 650  | 0.0723          | 0.4646         | -2.3213          | 1.0                | 2.7859          | -208.0227      | -155.9879    | -1.0510         | -0.8591       |
+| 0.068         | 2.68  | 675  | 0.0723          | 0.4627         | -2.3230          | 1.0                | 2.7857          | -208.0397      | -156.0069    | -1.0512         | -0.8585       |
+| 0.0708        | 2.78  | 700  | 0.0720          | 0.4617         | -2.3278          | 1.0                | 2.7895          | -208.0874      | -156.0165    | -1.0508         | -0.8592       |
+| 0.0621        | 2.88  | 725  | 0.0719          | 0.4613         | -2.3296          | 1.0                | 2.7909          | -208.1059      | -156.0208    | -1.0511         | -0.8585       |
+| 0.0708        | 2.98  | 750  | 0.0722          | 0.4618         | -2.3246          | 1.0                | 2.7864          | -208.0558      | -156.0157    | -1.0512         | -0.8590       |
+### Framework versions
+- PEFT 0.8.2
+- Transformers 4.38.1
+- Pytorch 2.2.0+cu118
+- Datasets 2.17.1
+- Tokenizers 0.15.2