File size: 9,352 Bytes
68e0893
 
 
 
 
 
 
 
58b624e
 
68e0893
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58b624e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
license: apache-2.0
base_model: amazingvince/zephyr-220m-sft-full
tags:
- generated_from_trainer
model-index:
- name: zephyr-220m-dpo-full
  results: []
datasets:
- HuggingFaceH4/ultrafeedback_binarized
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-220m-dpo-full

This model is a fine-tuned version of [amazingvince/zephyr-220m-sft-full](https://huggingface.co/amazingvince/zephyr-220m-sft-full) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5608
- Rewards/chosen: 0.4691
- Rewards/rejected: -0.0455
- Rewards/accuracies: 0.6930
- Rewards/margins: 0.5145
- Logps/rejected: -438.4595
- Logps/chosen: -544.6858
- Logits/rejected: -4.0092
- Logits/chosen: -3.9839

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 16
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6906        | 0.03  | 100  | 0.6932          | 0.0008         | 0.0007           | 0.4860             | 0.0002          | -437.9984      | -549.3683    | -4.0893         | -4.0515       |
| 0.6844        | 0.05  | 200  | 0.6855          | 0.0323         | 0.0173           | 0.5640             | 0.0150          | -437.8319      | -549.0540    | -4.0871         | -4.0501       |
| 0.6685        | 0.08  | 300  | 0.6675          | 0.1075         | 0.0537           | 0.6160             | 0.0538          | -437.4682      | -548.3016    | -4.0788         | -4.0432       |
| 0.6579        | 0.1   | 400  | 0.6426          | 0.2153         | 0.0941           | 0.6430             | 0.1212          | -437.0637      | -547.2234    | -4.0645         | -4.0309       |
| 0.6331        | 0.13  | 500  | 0.6241          | 0.2980         | 0.1106           | 0.6430             | 0.1874          | -436.8989      | -546.3970    | -4.0525         | -4.0221       |
| 0.6229        | 0.15  | 600  | 0.6138          | 0.3428         | 0.1103           | 0.6580             | 0.2325          | -436.9023      | -545.9487    | -4.0402         | -4.0116       |
| 0.6008        | 0.18  | 700  | 0.6053          | 0.3822         | 0.0970           | 0.6560             | 0.2852          | -437.0354      | -545.5550    | -4.0301         | -4.0042       |
| 0.5751        | 0.21  | 800  | 0.5998          | 0.4077         | 0.0879           | 0.6540             | 0.3198          | -437.1260      | -545.2994    | -4.0359         | -4.0099       |
| 0.6485        | 0.23  | 900  | 0.5922          | 0.4208         | 0.0655           | 0.6600             | 0.3553          | -437.3501      | -545.1683    | -4.0167         | -3.9936       |
| 0.6164        | 0.26  | 1000 | 0.5880          | 0.4046         | 0.0287           | 0.6620             | 0.3759          | -437.7182      | -545.3309    | -4.0092         | -3.9869       |
| 0.6225        | 0.28  | 1100 | 0.5852          | 0.4058         | 0.0110           | 0.6680             | 0.3948          | -437.8951      | -545.3189    | -4.0240         | -3.9984       |
| 0.6289        | 0.31  | 1200 | 0.5824          | 0.4127         | 0.0078           | 0.6670             | 0.4048          | -437.9265      | -545.2498    | -4.0253         | -3.9994       |
| 0.5818        | 0.34  | 1300 | 0.5818          | 0.4222         | 0.0097           | 0.6680             | 0.4125          | -437.9080      | -545.1544    | -4.0212         | -3.9953       |
| 0.567         | 0.36  | 1400 | 0.5797          | 0.4098         | -0.0141          | 0.6730             | 0.4238          | -438.1456      | -545.2791    | -4.0333         | -4.0062       |
| 0.5659        | 0.39  | 1500 | 0.5790          | 0.4204         | -0.0154          | 0.6780             | 0.4358          | -438.1591      | -545.1725    | -4.0245         | -3.9963       |
| 0.5993        | 0.41  | 1600 | 0.5783          | 0.4161         | -0.0285          | 0.6720             | 0.4446          | -438.2904      | -545.2161    | -4.0185         | -3.9907       |
| 0.5999        | 0.44  | 1700 | 0.5767          | 0.4067         | -0.0468          | 0.6840             | 0.4535          | -438.4729      | -545.3095    | -4.0207         | -3.9935       |
| 0.6004        | 0.46  | 1800 | 0.5731          | 0.4233         | -0.0394          | 0.6830             | 0.4627          | -438.3991      | -545.1437    | -4.0219         | -3.9944       |
| 0.5349        | 0.49  | 1900 | 0.5720          | 0.4285         | -0.0429          | 0.6830             | 0.4714          | -438.4335      | -545.0914    | -4.0295         | -4.0012       |
| 0.5377        | 0.52  | 2000 | 0.5702          | 0.4255         | -0.0540          | 0.6850             | 0.4795          | -438.5449      | -545.1220    | -4.0290         | -4.0009       |
| 0.4988        | 0.54  | 2100 | 0.5713          | 0.4347         | -0.0548          | 0.6840             | 0.4895          | -438.5533      | -545.0299    | -4.0317         | -4.0039       |
| 0.6093        | 0.57  | 2200 | 0.5706          | 0.4464         | -0.0456          | 0.6810             | 0.4920          | -438.4607      | -544.9128    | -4.0288         | -4.0014       |
| 0.5356        | 0.59  | 2300 | 0.5689          | 0.4484         | -0.0486          | 0.6880             | 0.4971          | -438.4912      | -544.8922    | -4.0257         | -3.9986       |
| 0.5753        | 0.62  | 2400 | 0.5681          | 0.4596         | -0.0441          | 0.6850             | 0.5037          | -438.4457      | -544.7802    | -4.0100         | -3.9846       |
| 0.5709        | 0.65  | 2500 | 0.5673          | 0.4693         | -0.0387          | 0.6910             | 0.5081          | -438.3924      | -544.6835    | -4.0100         | -3.9849       |
| 0.5565        | 0.67  | 2600 | 0.5665          | 0.4692         | -0.0401          | 0.6820             | 0.5092          | -438.4054      | -544.6850    | -4.0096         | -3.9843       |
| 0.585         | 0.7   | 2700 | 0.5650          | 0.4780         | -0.0351          | 0.6940             | 0.5131          | -438.3558      | -544.5962    | -4.0074         | -3.9820       |
| 0.5883        | 0.72  | 2800 | 0.5670          | 0.4914         | -0.0151          | 0.6880             | 0.5066          | -438.1562      | -544.4624    | -3.9894         | -3.9669       |
| 0.624         | 0.75  | 2900 | 0.5663          | 0.4877         | -0.0191          | 0.6840             | 0.5068          | -438.1958      | -544.4997    | -3.9935         | -3.9705       |
| 0.5347        | 0.77  | 3000 | 0.5644          | 0.4757         | -0.0335          | 0.6850             | 0.5092          | -438.3401      | -544.6199    | -4.0019         | -3.9777       |
| 0.5837        | 0.8   | 3100 | 0.5637          | 0.4783         | -0.0302          | 0.6830             | 0.5085          | -438.3073      | -544.5936    | -3.9976         | -3.9742       |
| 0.5293        | 0.83  | 3200 | 0.5634          | 0.4715         | -0.0363          | 0.6890             | 0.5078          | -438.3679      | -544.6616    | -4.0023         | -3.9778       |
| 0.5128        | 0.85  | 3300 | 0.5620          | 0.4745         | -0.0387          | 0.6880             | 0.5131          | -438.3917      | -544.6319    | -4.0053         | -3.9804       |
| 0.6204        | 0.88  | 3400 | 0.5625          | 0.4679         | -0.0442          | 0.6860             | 0.5121          | -438.4469      | -544.6978    | -4.0067         | -3.9815       |
| 0.5469        | 0.9   | 3500 | 0.5618          | 0.4612         | -0.0491          | 0.6860             | 0.5102          | -438.4956      | -544.7651    | -4.0098         | -3.9843       |
| 0.5807        | 0.93  | 3600 | 0.5615          | 0.4675         | -0.0454          | 0.6890             | 0.5129          | -438.4584      | -544.7015    | -4.0068         | -3.9818       |
| 0.5265        | 0.96  | 3700 | 0.5620          | 0.4675         | -0.0435          | 0.6880             | 0.5110          | -438.4403      | -544.7019    | -4.0082         | -3.9833       |
| 0.5484        | 0.98  | 3800 | 0.5615          | 0.4685         | -0.0449          | 0.6930             | 0.5133          | -438.4536      | -544.6919    | -4.0103         | -3.9851       |


### Framework versions

- Transformers 4.37.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.15.0
- Tokenizers 0.15.0

https://wandb.ai/amazingvince/huggingface/runs/z71h0hc3?workspace=user-amazingvince