File size: 7,241 Bytes
8f17533
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
license: apache-2.0
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: alignment-handbook/zephyr-7b-sft-full
model-index:
- name: zephyr-7b-dpo-lora-pubmedqa-mix2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-lora-pubmedqa-mix2

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0013
- Rewards/chosen: -1.8126
- Rewards/rejected: -10.9731
- Rewards/accuracies: 1.0
- Rewards/margins: 9.1605
- Logps/rejected: -1144.0397
- Logps/chosen: -242.4412
- Logits/rejected: -1.7638
- Logits/chosen: -2.8841

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- total_eval_batch_size: 2
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.2697        | 0.04  | 3000  | 0.3396          | 0.2213         | -0.6386          | 1.0                | 0.8599          | -110.5876      | -39.0518     | -3.0278         | -3.0862       |
| 0.1599        | 0.07  | 6000  | 0.0750          | -0.5884        | -3.6673          | 1.0                | 3.0789          | -413.4546      | -120.0204    | -2.9055         | -3.0346       |
| 0.0563        | 0.11  | 9000  | 0.0204          | -0.6260        | -5.6712          | 1.0                | 5.0452          | -613.8441      | -123.7819    | -3.0269         | -3.1136       |
| 0.0463        | 0.14  | 12000 | 0.0287          | -0.7209        | -7.9224          | 1.0                | 7.2014          | -838.9609      | -133.2740    | -3.0642         | -3.1628       |
| 0.1206        | 0.18  | 15000 | 0.0030          | -0.9209        | -8.8089          | 1.0                | 7.8880          | -927.6118      | -153.2670    | -3.0802         | -3.1766       |
| 0.0508        | 0.22  | 18000 | 0.4964          | -0.4026        | -8.0330          | 1.0                | 7.6304          | -850.0245      | -101.4397    | -3.1314         | -3.2075       |
| 0.0323        | 0.25  | 21000 | 0.0872          | -1.4713        | -10.3437         | 1.0                | 8.8723          | -1081.0913     | -208.3129    | -2.6496         | -3.1189       |
| 0.4534        | 0.29  | 24000 | 0.0077          | -2.3507        | -12.1827         | 1.0                | 9.8320          | -1264.9957     | -296.2491    | -1.6282         | -2.8665       |
| 0.0013        | 0.32  | 27000 | 0.0019          | -2.1480        | -10.6645         | 1.0                | 8.5166          | -1113.1797     | -275.9768    | -1.7614         | -2.8604       |
| 0.1404        | 0.36  | 30000 | 0.0002          | -2.4964        | -12.4101         | 1.0                | 9.9138          | -1287.7384     | -310.8155    | -1.5907         | -2.8352       |
| 0.0198        | 0.4   | 33000 | 0.0009          | -3.0802        | -13.3347         | 1.0                | 10.2545         | -1380.1964     | -369.1991    | -1.6628         | -2.8372       |
| 0.0041        | 0.43  | 36000 | 0.0004          | -2.7800        | -12.5815         | 1.0                | 9.8014          | -1304.8732     | -339.1852    | -1.6282         | -2.8242       |
| 0.0007        | 0.47  | 39000 | 0.0007          | -2.9921        | -13.2089         | 1.0                | 10.2168         | -1367.6129     | -360.3922    | -1.6672         | -2.8403       |
| 0.0008        | 0.5   | 42000 | 0.0013          | -2.3107        | -11.8754         | 1.0                | 9.5647          | -1234.2609     | -292.2454    | -1.6475         | -2.8400       |
| 0.0024        | 0.54  | 45000 | 0.0010          | -3.3769        | -13.2333         | 1.0                | 9.8564          | -1370.0538     | -398.8731    | -1.6937         | -2.8403       |
| 0.0019        | 0.57  | 48000 | 0.0013          | -2.8151        | -12.4427         | 1.0                | 9.6277          | -1290.9999     | -342.6892    | -1.7047         | -2.8503       |
| 0.2266        | 0.61  | 51000 | 0.0014          | -1.9532        | -11.0212         | 1.0                | 9.0680          | -1148.8468     | -256.4992    | -1.6745         | -2.8650       |
| 0.0016        | 0.65  | 54000 | 0.0014          | -1.8077        | -10.7512         | 1.0                | 8.9435          | -1121.8423     | -241.9466    | -1.8328         | -2.8946       |
| 0.0019        | 0.68  | 57000 | 0.0013          | -1.8159        | -10.8808         | 1.0                | 9.0649          | -1134.8024     | -242.7715    | -1.7644         | -2.8860       |
| 0.0013        | 0.72  | 60000 | 0.0013          | -1.7356        | -10.8007         | 1.0                | 9.0651          | -1126.8002     | -234.7419    | -1.7574         | -2.8871       |
| 0.0014        | 0.75  | 63000 | 0.0013          | -1.8249        | -10.9773         | 1.0                | 9.1524          | -1144.4586     | -243.6743    | -1.7699         | -2.8867       |
| 0.0014        | 0.79  | 66000 | 0.0013          | -1.8308        | -10.9698         | 1.0                | 9.1389          | -1143.7017     | -244.2651    | -1.7597         | -2.8841       |
| 0.0011        | 0.83  | 69000 | 0.0013          | -1.8034        | -10.9390         | 1.0                | 9.1356          | -1140.6276     | -241.5220    | -1.7619         | -2.8858       |
| 0.0016        | 0.86  | 72000 | 0.0013          | -1.7971        | -10.9097         | 1.0                | 9.1126          | -1137.6914     | -240.8868    | -1.7608         | -2.8852       |
| 0.0239        | 0.9   | 75000 | 0.0013          | -1.7976        | -10.9400         | 1.0                | 9.1424          | -1140.7238     | -240.9355    | -1.7773         | -2.8872       |
| 0.0024        | 0.93  | 78000 | 0.0013          | -1.7862        | -10.9196         | 1.0                | 9.1334          | -1138.6901     | -239.8036    | -1.7733         | -2.8861       |
| 0.0018        | 0.97  | 81000 | 0.0013          | -1.8228        | -10.9802         | 1.0                | 9.1574          | -1144.7491     | -243.4639    | -1.7594         | -2.8860       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2