File size: 7,609 Bytes
210bcff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca51f40
 
 
210bcff
ca51f40
 
 
 
 
210bcff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca51f40
210bcff
 
 
 
 
 
 
9bbfa57
210bcff
 
 
 
 
ca51f40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
210bcff
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
license: llama2
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
- name: llama_DPO_model_e2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# llama_DPO_model_e2

This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1001
- Rewards/chosen: 0.4226
- Rewards/rejected: -1.9804
- Rewards/accuracies: 1.0
- Rewards/margins: 2.4030
- Logps/rejected: -204.6132
- Logps/chosen: -156.4080
- Logits/rejected: -1.0519
- Logits/chosen: -0.8585

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 6e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6757        | 0.1   | 25   | 0.6650          | 0.0149         | -0.0435          | 0.7767             | 0.0584          | -185.2444      | -160.4850    | -1.0519         | -0.8543       |
| 0.6136        | 0.2   | 50   | 0.5989          | 0.0552         | -0.1462          | 0.9567             | 0.2014          | -186.2718      | -160.0822    | -1.0523         | -0.8553       |
| 0.5526        | 0.3   | 75   | 0.5225          | 0.1032         | -0.2804          | 1.0                | 0.3837          | -187.6138      | -159.6014    | -1.0520         | -0.8542       |
| 0.4819        | 0.4   | 100  | 0.4502          | 0.1474         | -0.4325          | 0.9967             | 0.5798          | -189.1341      | -159.1602    | -1.0518         | -0.8548       |
| 0.4253        | 0.5   | 125  | 0.3835          | 0.1905         | -0.5943          | 1.0                | 0.7848          | -190.7523      | -158.7284    | -1.0527         | -0.8564       |
| 0.3448        | 0.6   | 150  | 0.3197          | 0.2328         | -0.7813          | 1.0                | 1.0141          | -192.6229      | -158.3063    | -1.0526         | -0.8559       |
| 0.3007        | 0.7   | 175  | 0.2637          | 0.2788         | -0.9753          | 1.0                | 1.2542          | -194.5630      | -157.8456    | -1.0525         | -0.8586       |
| 0.2369        | 0.79  | 200  | 0.2192          | 0.3135         | -1.1671          | 1.0                | 1.4807          | -196.4808      | -157.4985    | -1.0519         | -0.8604       |
| 0.1987        | 0.89  | 225  | 0.1825          | 0.3436         | -1.3550          | 1.0                | 1.6986          | -198.3592      | -157.1976    | -1.0520         | -0.8594       |
| 0.1616        | 0.99  | 250  | 0.1532          | 0.3687         | -1.5379          | 1.0                | 1.9066          | -200.1886      | -156.9470    | -1.0519         | -0.8604       |
| 0.1525        | 1.09  | 275  | 0.1346          | 0.3861         | -1.6703          | 1.0                | 2.0564          | -201.5127      | -156.7730    | -1.0511         | -0.8582       |
| 0.1194        | 1.19  | 300  | 0.1246          | 0.3970         | -1.7483          | 1.0                | 2.1453          | -202.2923      | -156.6637    | -1.0509         | -0.8584       |
| 0.1128        | 1.29  | 325  | 0.1161          | 0.4062         | -1.8227          | 1.0                | 2.2289          | -203.0370      | -156.5718    | -1.0511         | -0.8577       |
| 0.1194        | 1.39  | 350  | 0.1108          | 0.4127         | -1.8680          | 1.0                | 2.2807          | -203.4899      | -156.5069    | -1.0514         | -0.8602       |
| 0.1123        | 1.49  | 375  | 0.1070          | 0.4151         | -1.9092          | 1.0                | 2.3243          | -203.9014      | -156.4828    | -1.0515         | -0.8584       |
| 0.1008        | 1.59  | 400  | 0.1046          | 0.4209         | -1.9290          | 1.0                | 2.3499          | -204.0999      | -156.4248    | -1.0516         | -0.8618       |
| 0.0971        | 1.69  | 425  | 0.1033          | 0.4208         | -1.9461          | 1.0                | 2.3669          | -204.2709      | -156.4260    | -1.0510         | -0.8586       |
| 0.109         | 1.79  | 450  | 0.1019          | 0.4235         | -1.9597          | 1.0                | 2.3832          | -204.4061      | -156.3985    | -1.0510         | -0.8587       |
| 0.1035        | 1.89  | 475  | 0.1009          | 0.4234         | -1.9700          | 1.0                | 2.3934          | -204.5094      | -156.4001    | -1.0517         | -0.8580       |
| 0.1046        | 1.99  | 500  | 0.1004          | 0.4210         | -1.9772          | 1.0                | 2.3983          | -204.5820      | -156.4234    | -1.0511         | -0.8603       |
| 0.0961        | 2.09  | 525  | 0.1002          | 0.4227         | -1.9798          | 1.0                | 2.4025          | -204.6080      | -156.4070    | -1.0518         | -0.8587       |
| 0.0932        | 2.19  | 550  | 0.1000          | 0.4237         | -1.9796          | 1.0                | 2.4033          | -204.6052      | -156.3964    | -1.0518         | -0.8597       |
| 0.0901        | 2.29  | 575  | 0.1002          | 0.4231         | -1.9785          | 1.0                | 2.4015          | -204.5942      | -156.4030    | -1.0514         | -0.8594       |
| 0.1033        | 2.38  | 600  | 0.1003          | 0.4248         | -1.9780          | 1.0                | 2.4028          | -204.5901      | -156.3859    | -1.0517         | -0.8616       |
| 0.1108        | 2.48  | 625  | 0.0999          | 0.4262         | -1.9796          | 1.0                | 2.4057          | -204.6053      | -156.3723    | -1.0517         | -0.8583       |
| 0.1026        | 2.58  | 650  | 0.0998          | 0.4208         | -1.9879          | 1.0                | 2.4088          | -204.6889      | -156.4255    | -1.0522         | -0.8594       |
| 0.0956        | 2.68  | 675  | 0.1001          | 0.4227         | -1.9818          | 1.0                | 2.4045          | -204.6279      | -156.4070    | -1.0517         | -0.8588       |
| 0.1003        | 2.78  | 700  | 0.0996          | 0.4241         | -1.9817          | 1.0                | 2.4058          | -204.6262      | -156.3926    | -1.0516         | -0.8584       |
| 0.0874        | 2.88  | 725  | 0.0997          | 0.4228         | -1.9835          | 1.0                | 2.4064          | -204.6450      | -156.4057    | -1.0519         | -0.8609       |
| 0.1001        | 2.98  | 750  | 0.1001          | 0.4226         | -1.9804          | 1.0                | 2.4030          | -204.6132      | -156.4080    | -1.0519         | -0.8585       |


### Framework versions

- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2