File size: 4,373 Bytes
fb2ade4
 
 
83248f0
 
fb2ade4
 
 
83248f0
 
fb2ade4
 
 
 
 
 
 
 
 
 
83248f0
fb2ade4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
base_model: PKU-Alignment/alpaca-7b-reproduced
tags:
- alignment-handbook
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- PKU-Alignment/PKU-SafeRLHF
model-index:
- name: dpo-selective-alpaca
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# dpo-selective-alpaca

This model is a fine-tuned version of [PKU-Alignment/alpaca-7b-reproduced](https://huggingface.co/PKU-Alignment/alpaca-7b-reproduced) on the PKU-Alignment/PKU-SafeRLHF dataset.
It achieves the following results on the evaluation set:
- Loss: 4659.3857
- Rewards/chosen: -0.2274
- Rewards/rejected: -0.2645
- Rewards/accuracies: 0.6342
- Rewards/margins: 0.0372
- Rewards/safe Rewards: -0.2254
- Rewards/unsafe Rewards: -0.2253
- Logps/rejected: -174.8009
- Logps/chosen: -202.5513
- Logits/rejected: -1.7296
- Logits/chosen: -1.5835

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/safe Rewards | Rewards/unsafe Rewards | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------------:|:----------------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 4842.2766     | 0.11  | 500  | 4952.8877       | 0.0166         | 0.0096           | 0.6573             | 0.0070          | 0.0166               | 0.0165                 | -147.3908      | -178.1579    | -1.7834         | -1.6386       |
| 4764.3852     | 0.22  | 1000 | 4865.9209       | -0.0099        | -0.0282          | 0.6644             | 0.0184          | -0.0094              | -0.0098                | -151.1701      | -180.8021    | -1.7281         | -1.5780       |
| 4814.1586     | 0.32  | 1500 | 4783.4697       | -0.1011        | -0.1298          | 0.6566             | 0.0286          | -0.1003              | -0.1009                | -161.3237      | -189.9300    | -1.7085         | -1.5581       |
| 4693.2395     | 0.43  | 2000 | 4735.1978       | -0.1597        | -0.1926          | 0.6480             | 0.0329          | -0.1583              | -0.1588                | -167.6019      | -195.7835    | -1.7080         | -1.5598       |
| 4747.273      | 0.54  | 2500 | 4701.7651       | -0.1978        | -0.2321          | 0.6416             | 0.0344          | -0.1960              | -0.1962                | -171.5614      | -199.5948    | -1.7166         | -1.5693       |
| 4464.0027     | 0.65  | 3000 | 4681.6167       | -0.2061        | -0.2411          | 0.6356             | 0.0350          | -0.2041              | -0.2043                | -172.4578      | -200.4294    | -1.7240         | -1.5768       |
| 4613.8953     | 0.75  | 3500 | 4667.7300       | -0.2201        | -0.2561          | 0.6333             | 0.0360          | -0.2182              | -0.2182                | -173.9565      | -201.8304    | -1.7289         | -1.5822       |
| 4642.2859     | 0.86  | 4000 | 4661.8745       | -0.2258        | -0.2627          | 0.6336             | 0.0369          | -0.2238              | -0.2238                | -174.6188      | -202.3950    | -1.7298         | -1.5833       |
| 4747.2375     | 0.97  | 4500 | 4659.3687       | -0.2266        | -0.2638          | 0.6363             | 0.0372          | -0.2246              | -0.2245                | -174.7243      | -202.4745    | -1.7302         | -1.5838       |


### Framework versions

- Transformers 4.36.2
- Pytorch 2.1.2
- Datasets 2.14.6
- Tokenizers 0.15.0