File size: 9,219 Bytes
dd9949b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
license: apache-2.0
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: zephyr-7b-dpo-qlora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-qlora

This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4894
- Rewards/chosen: -2.7838
- Rewards/rejected: -3.9130
- Rewards/accuracies: 0.7445
- Rewards/margins: 1.1292
- Logps/rejected: -635.9062
- Logps/chosen: -543.0328
- Logits/rejected: -1.2074
- Logits/chosen: -1.3340

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6823        | 0.03  | 100  | 0.6822          | 0.0498         | 0.0268           | 0.6610             | 0.0230          | -241.9323      | -259.6750    | -1.9559         | -2.0953       |
| 0.6492        | 0.05  | 200  | 0.6491          | -0.0479        | -0.1535          | 0.6815             | 0.1056          | -259.9606      | -269.4400    | -1.9338         | -2.0706       |
| 0.6101        | 0.08  | 300  | 0.6217          | -0.3407        | -0.5476          | 0.6770             | 0.2069          | -299.3728      | -298.7252    | -1.8680         | -2.0021       |
| 0.6173        | 0.1   | 400  | 0.5952          | -0.5027        | -0.8331          | 0.6835             | 0.3304          | -327.9222      | -314.9250    | -1.6582         | -1.7878       |
| 0.5435        | 0.13  | 500  | 0.5754          | -1.1151        | -1.6071          | 0.6890             | 0.4920          | -405.3195      | -376.1609    | -1.4273         | -1.5544       |
| 0.5547        | 0.16  | 600  | 0.5695          | -0.7600        | -1.2661          | 0.6985             | 0.5061          | -371.2198      | -340.6527    | -1.4396         | -1.5726       |
| 0.5282        | 0.18  | 700  | 0.5560          | -2.0627        | -2.9172          | 0.7165             | 0.8545          | -536.3329      | -470.9231    | -1.2515         | -1.3804       |
| 0.5205        | 0.21  | 800  | 0.5364          | -1.6968        | -2.4004          | 0.7265             | 0.7036          | -484.6470      | -434.3307    | -1.2756         | -1.4041       |
| 0.4983        | 0.24  | 900  | 0.5329          | -1.6798        | -2.4538          | 0.7205             | 0.7740          | -489.9910      | -432.6339    | -1.0956         | -1.2161       |
| 0.5443        | 0.26  | 1000 | 0.5279          | -1.8981        | -2.7666          | 0.7240             | 0.8684          | -521.2657      | -454.4658    | -1.1264         | -1.2533       |
| 0.565         | 0.29  | 1100 | 0.5207          | -1.5130        | -2.3368          | 0.7290             | 0.8238          | -478.2849      | -415.9483    | -1.1445         | -1.2715       |
| 0.5837        | 0.31  | 1200 | 0.5104          | -1.6729        | -2.5375          | 0.7355             | 0.8645          | -498.3547      | -431.9437    | -1.1065         | -1.2314       |
| 0.5342        | 0.34  | 1300 | 0.5146          | -2.7684        | -3.8446          | 0.7240             | 1.0762          | -629.0701      | -541.4911    | -1.0656         | -1.1852       |
| 0.5287        | 0.37  | 1400 | 0.5197          | -1.9068        | -2.8614          | 0.7235             | 0.9546          | -530.7440      | -455.3286    | -1.1253         | -1.2506       |
| 0.4634        | 0.39  | 1500 | 0.5165          | -2.1400        | -3.2391          | 0.7295             | 1.0991          | -568.5231      | -478.6544    | -1.1408         | -1.2696       |
| 0.5551        | 0.42  | 1600 | 0.5057          | -2.4748        | -3.5466          | 0.7310             | 1.0718          | -599.2672      | -512.1343    | -1.1162         | -1.2402       |
| 0.5183        | 0.44  | 1700 | 0.4993          | -2.7856        | -3.8497          | 0.7390             | 1.0641          | -629.5833      | -543.2154    | -1.1493         | -1.2784       |
| 0.478         | 0.47  | 1800 | 0.5060          | -2.6855        | -3.7424          | 0.7390             | 1.0569          | -618.8510      | -533.2012    | -1.1180         | -1.2419       |
| 0.4325        | 0.5   | 1900 | 0.4996          | -3.0306        | -4.2124          | 0.7370             | 1.1818          | -665.8478      | -567.7128    | -1.1245         | -1.2515       |
| 0.4926        | 0.52  | 2000 | 0.4934          | -2.6648        | -3.6771          | 0.7405             | 1.0123          | -612.3228      | -531.1354    | -1.1607         | -1.2879       |
| 0.5009        | 0.55  | 2100 | 0.4915          | -2.8243        | -3.8594          | 0.7510             | 1.0351          | -630.5530      | -547.0867    | -1.1825         | -1.3099       |
| 0.4777        | 0.58  | 2200 | 0.4914          | -2.3357        | -3.3121          | 0.7475             | 0.9764          | -575.8183      | -498.2264    | -1.2484         | -1.3780       |
| 0.4655        | 0.6   | 2300 | 0.4928          | -3.0709        | -4.2756          | 0.7450             | 1.2047          | -672.1651      | -571.7407    | -1.1628         | -1.2897       |
| 0.47          | 0.63  | 2400 | 0.4909          | -2.9333        | -4.0701          | 0.7410             | 1.1368          | -651.6222      | -557.9854    | -1.1517         | -1.2773       |
| 0.4963        | 0.65  | 2500 | 0.4933          | -2.6058        | -3.7730          | 0.7390             | 1.1672          | -621.9061      | -525.2288    | -1.1945         | -1.3239       |
| 0.4663        | 0.68  | 2600 | 0.4950          | -2.6796        | -3.8395          | 0.7450             | 1.1599          | -628.5566      | -532.6130    | -1.1991         | -1.3264       |
| 0.5286        | 0.71  | 2700 | 0.4961          | -2.6413        | -3.7802          | 0.7380             | 1.1389          | -622.6273      | -528.7829    | -1.2033         | -1.3309       |
| 0.4564        | 0.73  | 2800 | 0.4925          | -2.6808        | -3.8257          | 0.7405             | 1.1448          | -627.1752      | -532.7354    | -1.2038         | -1.3305       |
| 0.5166        | 0.76  | 2900 | 0.4904          | -2.7803        | -3.8999          | 0.7415             | 1.1197          | -634.5994      | -542.6777    | -1.2046         | -1.3310       |
| 0.4653        | 0.79  | 3000 | 0.4896          | -2.7971        | -3.8847          | 0.7425             | 1.0877          | -633.0811      | -544.3574    | -1.2067         | -1.3333       |
| 0.4808        | 0.81  | 3100 | 0.4901          | -2.8200        | -3.9473          | 0.7410             | 1.1273          | -639.3414      | -546.6562    | -1.2009         | -1.3278       |
| 0.4882        | 0.84  | 3200 | 0.4896          | -2.7656        | -3.8890          | 0.7440             | 1.1234          | -633.5068      | -541.2137    | -1.2088         | -1.3355       |
| 0.5123        | 0.86  | 3300 | 0.4895          | -2.7745        | -3.8976          | 0.7435             | 1.1231          | -634.3662      | -542.1025    | -1.2083         | -1.3352       |
| 0.4526        | 0.89  | 3400 | 0.4896          | -2.7856        | -3.9136          | 0.7445             | 1.1280          | -635.9655      | -543.2083    | -1.2051         | -1.3319       |
| 0.5432        | 0.92  | 3500 | 0.4896          | -2.7837        | -3.9130          | 0.7440             | 1.1292          | -635.9039      | -543.0231    | -1.2045         | -1.3314       |
| 0.4617        | 0.94  | 3600 | 0.4895          | -2.7857        | -3.9150          | 0.7435             | 1.1294          | -636.1135      | -543.2186    | -1.2104         | -1.3374       |
| 0.4797        | 0.97  | 3700 | 0.4896          | -2.7842        | -3.9131          | 0.7435             | 1.1289          | -635.9192      | -543.0764    | -1.2075         | -1.3343       |
| 0.5092        | 0.99  | 3800 | 0.4894          | -2.7838        | -3.9130          | 0.7445             | 1.1292          | -635.9062      | -543.0328    | -1.2074         | -1.3340       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2
- Datasets 2.14.6
- Tokenizers 0.15.2