File size: 3,524 Bytes
89ccd33
 
669880e
 
89ccd33
 
669880e
89ccd33
 
 
 
 
 
669880e
89ccd33
d977275
d282f28
ba49b4e
 
 
 
 
 
 
 
 
89ccd33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3a033a8
d282f28
8030cd0
d282f28
89ccd33
 
d282f28
25d6e11
8030cd0
89ccd33
 
 
46f987e
89ccd33
 
 
d282f28
 
ba49b4e
 
 
 
 
 
 
 
 
89ccd33
 
 
 
8030cd0
89ccd33
a329609
8030cd0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-full
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-full

This model was trained from scratch on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5261
- Rewards/chosen: -2.4591
- Rewards/rejected: -3.9221
- Rewards/accuracies: 0.7773
- Rewards/margins: 1.4631
- Logps/rejected: -703.8400
- Logps/chosen: -549.4910
- Logits/rejected: 0.0289
- Logits/chosen: 0.0663

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 2
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6201        | 0.21  | 100  | 0.6253          | -0.2753        | -0.6662          | 0.7031             | 0.3909          | -378.2405      | -331.1124    | 0.4172          | 0.3706        |
| 0.5547        | 0.42  | 200  | 0.5549          | -0.6988        | -1.4726          | 0.7656             | 0.7738          | -458.8863      | -373.4661    | 0.4261          | 0.3909        |
| 0.5343        | 0.63  | 300  | 0.5316          | -0.8044        | -1.6474          | 0.7656             | 0.8430          | -476.3628      | -384.0199    | 0.2851          | 0.2449        |
| 0.5323        | 0.84  | 400  | 0.5211          | -0.9068        | -1.8283          | 0.7812             | 0.9216          | -494.4600      | -394.2621    | 0.2834          | 0.2514        |
| 0.352         | 1.05  | 500  | 0.5258          | -1.9533        | -3.4166          | 0.7969             | 1.4634          | -653.2899      | -498.9117    | -0.0846         | -0.0654       |
| 0.3342        | 1.26  | 600  | 0.5268          | -2.3123        | -3.7246          | 0.7930             | 1.4124          | -684.0857      | -534.8101    | 0.1128          | 0.1344        |
| 0.337         | 1.47  | 700  | 0.5290          | -2.3753        | -3.8837          | 0.7773             | 1.5084          | -699.9910      | -541.1116    | 0.0099          | 0.0414        |
| 0.3398        | 1.67  | 800  | 0.5297          | -2.5097        | -4.0133          | 0.7734             | 1.5036          | -712.9506      | -554.5546    | 0.0381          | 0.0750        |
| 0.307         | 1.88  | 900  | 0.5261          | -2.4591        | -3.9221          | 0.7773             | 1.4631          | -703.8400      | -549.4910    | 0.0289          | 0.0663        |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1