File size: 3,877 Bytes
dbf997f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29ef704
db55fd0
 
 
 
 
 
 
 
 
dbf997f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29ef704
db55fd0
dbf997f
 
 
 
db55fd0
 
dbf997f
 
 
 
2f175cc
dbf997f
 
 
29ef704
 
db55fd0
 
 
 
 
 
 
 
 
 
dbf997f
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: meta-llama/Llama-2-7b-chat-hf
model-index:
- name: dpo-llama-chat
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# dpo-llama-chat

This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1928
- Rewards/chosen: -1.3672
- Rewards/rejected: -4.3992
- Rewards/accuracies: 0.9310
- Rewards/margins: 3.0321
- Logps/rejected: -133.6114
- Logps/chosen: -90.8071
- Logits/rejected: -0.8584
- Logits/chosen: -0.8277

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5985        | 0.24  | 100  | 0.5908          | -0.0098        | -0.3706          | 0.6857             | 0.3608          | -93.3248       | -77.2335     | -0.7818         | -0.8133       |
| 0.5032        | 0.47  | 200  | 0.4768          | -0.1589        | -0.9349          | 0.8037             | 0.7760          | -98.9677       | -78.7246     | -0.8669         | -0.8774       |
| 0.4105        | 0.71  | 300  | 0.4056          | -0.3303        | -1.5893          | 0.8316             | 1.2589          | -105.5115      | -80.4384     | -0.8423         | -0.8361       |
| 0.3707        | 0.94  | 400  | 0.3501          | -0.2376        | -1.6094          | 0.8760             | 1.3718          | -105.7129      | -79.5110     | -0.7540         | -0.7564       |
| 0.2363        | 1.18  | 500  | 0.2939          | -0.8615        | -2.9614          | 0.8932             | 2.0999          | -119.2329      | -85.7499     | -0.8983         | -0.8797       |
| 0.1947        | 1.42  | 600  | 0.2463          | -1.0709        | -3.5879          | 0.9085             | 2.5170          | -125.4976      | -87.8440     | -0.8982         | -0.8717       |
| 0.1823        | 1.65  | 700  | 0.2242          | -1.2056        | -3.7965          | 0.9158             | 2.5909          | -127.5844      | -89.1917     | -0.8272         | -0.8112       |
| 0.1476        | 1.89  | 800  | 0.2042          | -1.1764        | -3.9644          | 0.9271             | 2.7881          | -129.2632      | -88.8989     | -0.8622         | -0.8415       |
| 0.112         | 2.13  | 900  | 0.1936          | -1.3373        | -4.3265          | 0.9330             | 2.9891          | -132.8835      | -90.5088     | -0.8608         | -0.8338       |
| 0.0949        | 2.36  | 1000 | 0.1928          | -1.3672        | -4.3992          | 0.9310             | 3.0321          | -133.6114      | -90.8071     | -0.8584         | -0.8277       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.1