File size: 5,689 Bytes
210bcff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10094a9
 
 
210bcff
10094a9
 
 
 
 
210bcff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10094a9
210bcff
 
 
 
 
 
 
 
 
 
 
 
 
10094a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
210bcff
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: llama2
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
- name: llama_DPO_model_e2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# llama_DPO_model_e2

This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0937
- Rewards/chosen: 0.4389
- Rewards/rejected: -2.0384
- Rewards/accuracies: 1.0
- Rewards/margins: 2.4774
- Logps/rejected: -205.1940
- Logps/chosen: -156.2447
- Logits/rejected: -1.0509
- Logits/chosen: -0.8587

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.673         | 0.1   | 25   | 0.6445          | 0.0273         | -0.0740          | 0.9000             | 0.1013          | -185.5491      | -160.3607    | -1.0521         | -0.8545       |
| 0.5737        | 0.2   | 50   | 0.5485          | 0.0856         | -0.2335          | 0.9933             | 0.3190          | -187.1442      | -159.7781    | -1.0526         | -0.8551       |
| 0.4843        | 0.3   | 75   | 0.4496          | 0.1470         | -0.4343          | 1.0                | 0.5814          | -189.1528      | -159.1637    | -1.0527         | -0.8571       |
| 0.4006        | 0.4   | 100  | 0.3655          | 0.2043         | -0.6419          | 1.0                | 0.8462          | -191.2286      | -158.5909    | -1.0521         | -0.8556       |
| 0.3417        | 0.5   | 125  | 0.2945          | 0.2551         | -0.8630          | 1.0                | 1.1180          | -193.4393      | -158.0833    | -1.0522         | -0.8562       |
| 0.2601        | 0.6   | 150  | 0.2353          | 0.3032         | -1.0903          | 1.0                | 1.3935          | -195.7128      | -157.6020    | -1.0520         | -0.8597       |
| 0.2197        | 0.7   | 175  | 0.1891          | 0.3442         | -1.3124          | 1.0                | 1.6565          | -197.9333      | -157.1923    | -1.0522         | -0.8579       |
| 0.1675        | 0.79  | 200  | 0.1532          | 0.3815         | -1.5253          | 1.0                | 1.9067          | -200.0621      | -156.8192    | -1.0526         | -0.8582       |
| 0.1417        | 0.89  | 225  | 0.1289          | 0.4011         | -1.7082          | 1.0                | 2.1094          | -201.8920      | -156.6225    | -1.0525         | -0.8585       |
| 0.1203        | 0.99  | 250  | 0.1117          | 0.4214         | -1.8534          | 1.0                | 2.2748          | -203.3437      | -156.4196    | -1.0517         | -0.8603       |
| 0.1156        | 1.09  | 275  | 0.1034          | 0.4296         | -1.9336          | 1.0                | 2.3633          | -204.1459      | -156.3377    | -1.0517         | -0.8590       |
| 0.0942        | 1.19  | 300  | 0.0990          | 0.4310         | -1.9823          | 1.0                | 2.4133          | -204.6330      | -156.3240    | -1.0514         | -0.8577       |
| 0.0903        | 1.29  | 325  | 0.0957          | 0.4380         | -2.0137          | 1.0                | 2.4517          | -204.9467      | -156.2539    | -1.0511         | -0.8593       |
| 0.1023        | 1.39  | 350  | 0.0946          | 0.4384         | -2.0296          | 1.0                | 2.4680          | -205.1059      | -156.2503    | -1.0519         | -0.8587       |
| 0.0984        | 1.49  | 375  | 0.0945          | 0.4352         | -2.0350          | 1.0                | 2.4702          | -205.1597      | -156.2819    | -1.0510         | -0.8580       |
| 0.0899        | 1.59  | 400  | 0.0939          | 0.4360         | -2.0393          | 1.0                | 2.4752          | -205.2024      | -156.2742    | -1.0513         | -0.8594       |
| 0.0883        | 1.69  | 425  | 0.0939          | 0.4374         | -2.0378          | 1.0                | 2.4752          | -205.1877      | -156.2598    | -1.0514         | -0.8590       |
| 0.1011        | 1.79  | 450  | 0.0939          | 0.4368         | -2.0412          | 1.0                | 2.4781          | -205.2217      | -156.2654    | -1.0513         | -0.8583       |
| 0.0962        | 1.89  | 475  | 0.0935          | 0.4403         | -2.0395          | 1.0                | 2.4798          | -205.2041      | -156.2308    | -1.0510         | -0.8574       |
| 0.0971        | 1.99  | 500  | 0.0937          | 0.4389         | -2.0384          | 1.0                | 2.4774          | -205.1940      | -156.2447    | -1.0509         | -0.8587       |


### Framework versions

- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2