LBK95
/

Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V2

+---
+base_model: meta-llama/Llama-2-7b-hf
+library_name: peft
+license: llama2
+tags:
+- trl
+- dpo
+- generated_from_trainer
+model-index:
+- name: Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V2
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V2
+This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.2147
+- Rewards/chosen: -2.3589
+- Rewards/rejected: -2.1848
+- Rewards/accuracies: 0.3333
+- Rewards/margins: -0.1740
+- Logps/rejected: -176.9075
+- Logps/chosen: -185.7344
+- Logits/rejected: -0.3397
+- Logits/chosen: -0.3554
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 4
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 10
+- num_epochs: 3
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.7064        | 0.3020 | 77   | 0.7263          | -0.0650        | -0.0237          | 0.5                | -0.0414         | -155.2957      | -162.7962    | 0.2969          | 0.2895        |
+| 0.6816        | 0.6039 | 154  | 0.7127          | -0.1015        | -0.1222          | 0.5                | 0.0207          | -156.2813      | -163.1606    | 0.2989          | 0.2915        |
+| 0.6192        | 0.9059 | 231  | 0.7010          | -0.0808        | -0.1624          | 0.5833             | 0.0816          | -156.6835      | -162.9536    | 0.2774          | 0.2692        |
+| 0.2805        | 1.2078 | 308  | 0.8302          | -0.5931        | -0.6582          | 0.6667             | 0.0651          | -161.6412      | -168.0767    | 0.1922          | 0.1839        |
+| 0.3604        | 1.5098 | 385  | 0.8663          | -0.8552        | -0.8899          | 0.5833             | 0.0347          | -163.9578      | -170.6977    | 0.0866          | 0.0775        |
+| 0.3524        | 1.8118 | 462  | 0.9587          | -1.3495        | -1.3440          | 0.5                | -0.0055         | -168.4993      | -175.6406    | -0.0538         | -0.0645       |
+| 0.2168        | 2.1137 | 539  | 1.0785          | -1.8309        | -1.7601          | 0.5833             | -0.0708         | -172.6597      | -180.4545    | -0.2246         | -0.2382       |
+| 0.0395        | 2.4157 | 616  | 1.2284          | -2.4130        | -2.2406          | 0.3333             | -0.1724         | -177.4654      | -186.2757    | -0.3472         | -0.3633       |
+| 0.2081        | 2.7176 | 693  | 1.2147          | -2.3589        | -2.1848          | 0.3333             | -0.1740         | -176.9075      | -185.7344    | -0.3397         | -0.3554       |
+### Framework versions
+- PEFT 0.12.0
+- Transformers 4.44.0
+- Pytorch 2.4.0+cu121
+- Datasets 3.0.2
+- Tokenizers 0.19.1