File size: 7,609 Bytes
210bcff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9bbfa57
 
 
210bcff
9bbfa57
 
 
 
 
210bcff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9bbfa57
210bcff
 
 
 
 
 
 
9bbfa57
210bcff
 
 
 
 
9bbfa57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
210bcff
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
license: llama2
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
- name: llama_DPO_model_e2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# llama_DPO_model_e2

This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1526
- Rewards/chosen: 0.3611
- Rewards/rejected: -1.5450
- Rewards/accuracies: 1.0
- Rewards/margins: 1.9061
- Logps/rejected: -200.2592
- Logps/chosen: -157.0226
- Logits/rejected: -1.0513
- Logits/chosen: -0.8571

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6819        | 0.1   | 25   | 0.6708          | 0.0151         | -0.0312          | 0.7567             | 0.0463          | -185.1220      | -160.4831    | -1.0517         | -0.8540       |
| 0.6351        | 0.2   | 50   | 0.6228          | 0.0428         | -0.1054          | 0.9600             | 0.1482          | -185.8636      | -160.2060    | -1.0524         | -0.8552       |
| 0.5874        | 0.3   | 75   | 0.5655          | 0.0762         | -0.2019          | 0.9967             | 0.2781          | -186.8286      | -159.8719    | -1.0525         | -0.8548       |
| 0.5179        | 0.4   | 100  | 0.5030          | 0.1133         | -0.3207          | 1.0                | 0.4340          | -188.0166      | -159.5010    | -1.0521         | -0.8545       |
| 0.479         | 0.5   | 125  | 0.4468          | 0.1501         | -0.4388          | 1.0                | 0.5889          | -189.1974      | -159.1327    | -1.0524         | -0.8554       |
| 0.406         | 0.6   | 150  | 0.3904          | 0.1842         | -0.5778          | 1.0                | 0.7620          | -190.5874      | -158.7915    | -1.0525         | -0.8576       |
| 0.3731        | 0.7   | 175  | 0.3377          | 0.2223         | -0.7247          | 1.0                | 0.9470          | -192.0564      | -158.4104    | -1.0521         | -0.8559       |
| 0.3075        | 0.79  | 200  | 0.2918          | 0.2537         | -0.8769          | 1.0                | 1.1305          | -193.5782      | -158.0974    | -1.0525         | -0.8583       |
| 0.2621        | 0.89  | 225  | 0.2517          | 0.2822         | -1.0278          | 1.0                | 1.3100          | -195.0876      | -157.8119    | -1.0525         | -0.8573       |
| 0.2285        | 0.99  | 250  | 0.2180          | 0.3118         | -1.1738          | 1.0                | 1.4855          | -196.5471      | -157.5160    | -1.0517         | -0.8568       |
| 0.2162        | 1.09  | 275  | 0.1948          | 0.3279         | -1.2897          | 1.0                | 1.6176          | -197.7066      | -157.3551    | -1.0513         | -0.8567       |
| 0.1752        | 1.19  | 300  | 0.1810          | 0.3383         | -1.3661          | 1.0                | 1.7044          | -198.4706      | -157.2514    | -1.0511         | -0.8576       |
| 0.1672        | 1.29  | 325  | 0.1714          | 0.3456         | -1.4242          | 1.0                | 1.7698          | -199.0516      | -157.1775    | -1.0509         | -0.8568       |
| 0.1722        | 1.39  | 350  | 0.1646          | 0.3535         | -1.4653          | 1.0                | 1.8187          | -199.4624      | -157.0993    | -1.0510         | -0.8568       |
| 0.1649        | 1.49  | 375  | 0.1596          | 0.3586         | -1.4919          | 1.0                | 1.8505          | -199.7286      | -157.0477    | -1.0512         | -0.8569       |
| 0.1534        | 1.59  | 400  | 0.1580          | 0.3603         | -1.5059          | 1.0                | 1.8663          | -199.8687      | -157.0304    | -1.0507         | -0.8571       |
| 0.1492        | 1.69  | 425  | 0.1561          | 0.3589         | -1.5194          | 1.0                | 1.8783          | -200.0034      | -157.0448    | -1.0514         | -0.8578       |
| 0.1625        | 1.79  | 450  | 0.1564          | 0.3586         | -1.5205          | 1.0                | 1.8791          | -200.0150      | -157.0482    | -1.0509         | -0.8570       |
| 0.1561        | 1.89  | 475  | 0.1535          | 0.3613         | -1.5366          | 1.0                | 1.8979          | -200.1756      | -157.0212    | -1.0510         | -0.8576       |
| 0.1565        | 1.99  | 500  | 0.1529          | 0.3643         | -1.5393          | 1.0                | 1.9036          | -200.2028      | -156.9913    | -1.0513         | -0.8567       |
| 0.1476        | 2.09  | 525  | 0.1530          | 0.3640         | -1.5392          | 1.0                | 1.9032          | -200.2021      | -156.9944    | -1.0511         | -0.8569       |
| 0.1457        | 2.19  | 550  | 0.1530          | 0.3605         | -1.5406          | 1.0                | 1.9011          | -200.2155      | -157.0287    | -1.0507         | -0.8577       |
| 0.1376        | 2.29  | 575  | 0.1529          | 0.3585         | -1.5466          | 1.0                | 1.9051          | -200.2757      | -157.0492    | -1.0508         | -0.8579       |
| 0.1574        | 2.38  | 600  | 0.1527          | 0.3634         | -1.5448          | 1.0                | 1.9082          | -200.2574      | -156.9998    | -1.0508         | -0.8566       |
| 0.1662        | 2.48  | 625  | 0.1518          | 0.3645         | -1.5465          | 1.0                | 1.9109          | -200.2742      | -156.9890    | -1.0509         | -0.8572       |
| 0.1535        | 2.58  | 650  | 0.1523          | 0.3628         | -1.5458          | 1.0                | 1.9086          | -200.2675      | -157.0059    | -1.0510         | -0.8571       |
| 0.1488        | 2.68  | 675  | 0.1518          | 0.3658         | -1.5446          | 1.0                | 1.9104          | -200.2561      | -156.9763    | -1.0510         | -0.8572       |
| 0.1564        | 2.78  | 700  | 0.1526          | 0.3618         | -1.5452          | 1.0                | 1.9071          | -200.2618      | -157.0154    | -1.0512         | -0.8568       |
| 0.1367        | 2.88  | 725  | 0.1526          | 0.3643         | -1.5426          | 1.0                | 1.9069          | -200.2352      | -156.9905    | -1.0513         | -0.8570       |
| 0.1543        | 2.98  | 750  | 0.1526          | 0.3611         | -1.5450          | 1.0                | 1.9061          | -200.2592      | -157.0226    | -1.0513         | -0.8571       |


### Framework versions

- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2