thorirhrafn
/

llama_SFT_e1_DPO_e2

+---
+license: llama2
+library_name: peft
+tags:
+- trl
+- dpo
+- generated_from_trainer
+base_model: meta-llama/Llama-2-7b-hf
+model-index:
+- name: llama_SFT_e1_DPO_e2
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# llama_SFT_e1_DPO_e2
+This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.1258
+- Rewards/chosen: 0.3605
+- Rewards/rejected: -1.7770
+- Rewards/accuracies: 1.0
+- Rewards/margins: 2.1375
+- Logps/rejected: -203.4181
+- Logps/chosen: -156.2596
+- Logits/rejected: -1.0532
+- Logits/chosen: -0.8665
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 7e-07
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 8
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 2
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6825        | 0.1   | 25   | 0.6596          | 0.0243         | -0.0451          | 0.8667             | 0.0694          | -186.0986      | -159.6209    | -1.0534         | -0.8570       |
+| 0.6018        | 0.2   | 50   | 0.5820          | 0.0671         | -0.1728          | 0.9800             | 0.2399          | -187.3757      | -159.1936    | -1.0531         | -0.8568       |
+| 0.5333        | 0.3   | 75   | 0.5021          | 0.1133         | -0.3236          | 1.0                | 0.4369          | -188.8834      | -158.7311    | -1.0544         | -0.8586       |
+| 0.4522        | 0.4   | 100  | 0.4213          | 0.1615         | -0.5029          | 1.0                | 0.6644          | -190.6768      | -158.2497    | -1.0547         | -0.8596       |
+| 0.3962        | 0.5   | 125  | 0.3555          | 0.1988         | -0.6844          | 1.0                | 0.8832          | -192.4913      | -157.8759    | -1.0548         | -0.8608       |
+| 0.3164        | 0.6   | 150  | 0.2920          | 0.2416         | -0.8872          | 1.0                | 1.1288          | -194.5195      | -157.4483    | -1.0550         | -0.8660       |
+| 0.2673        | 0.7   | 175  | 0.2400          | 0.2789         | -1.0936          | 1.0                | 1.3725          | -196.5838      | -157.0758    | -1.0540         | -0.8656       |
+| 0.217         | 0.79  | 200  | 0.2008          | 0.3028         | -1.2873          | 1.0                | 1.5900          | -198.5201      | -156.8367    | -1.0540         | -0.8668       |
+| 0.1822        | 0.89  | 225  | 0.1694          | 0.3294         | -1.4600          | 1.0                | 1.7894          | -200.2475      | -156.5703    | -1.0541         | -0.8674       |
+| 0.1578        | 0.99  | 250  | 0.1483          | 0.3436         | -1.6056          | 1.0                | 1.9492          | -201.7036      | -156.4280    | -1.0538         | -0.8668       |
+| 0.1509        | 1.09  | 275  | 0.1364          | 0.3512         | -1.6903          | 1.0                | 2.0414          | -202.5503      | -156.3527    | -1.0534         | -0.8666       |
+| 0.1273        | 1.19  | 300  | 0.1322          | 0.3561         | -1.7242          | 1.0                | 2.0804          | -202.8900      | -156.3031    | -1.0532         | -0.8657       |
+| 0.1208        | 1.29  | 325  | 0.1284          | 0.3561         | -1.7546          | 1.0                | 2.1106          | -203.1934      | -156.3038    | -1.0534         | -0.8668       |
+| 0.1325        | 1.39  | 350  | 0.1270          | 0.3598         | -1.7654          | 1.0                | 2.1252          | -203.3020      | -156.2663    | -1.0532         | -0.8665       |
+| 0.1287        | 1.49  | 375  | 0.1263          | 0.3618         | -1.7718          | 1.0                | 2.1336          | -203.3654      | -156.2462    | -1.0534         | -0.8666       |
+| 0.1203        | 1.59  | 400  | 0.1252          | 0.3624         | -1.7783          | 1.0                | 2.1407          | -203.4305      | -156.2402    | -1.0532         | -0.8666       |
+| 0.1188        | 1.69  | 425  | 0.1254          | 0.3610         | -1.7767          | 1.0                | 2.1377          | -203.4145      | -156.2542    | -1.0530         | -0.8664       |
+| 0.1331        | 1.79  | 450  | 0.1253          | 0.3640         | -1.7760          | 1.0                | 2.1400          | -203.4073      | -156.2242    | -1.0531         | -0.8662       |
+| 0.1301        | 1.89  | 475  | 0.1252          | 0.3641         | -1.7772          | 1.0                | 2.1413          | -203.4194      | -156.2230    | -1.0531         | -0.8667       |
+| 0.1289        | 1.99  | 500  | 0.1258          | 0.3605         | -1.7770          | 1.0                | 2.1375          | -203.4181      | -156.2596    | -1.0532         | -0.8665       |
+### Framework versions
+- PEFT 0.8.2
+- Transformers 4.38.1
+- Pytorch 2.2.0+cu118
+- Datasets 2.17.1
+- Tokenizers 0.15.2