File size: 9,314 Bytes
44b1dd3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
license: apache-2.0
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
model-index:
- name: tinyllama-1.1b-chat-dpo-qlora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# tinyllama-1.1b-chat-dpo-qlora

This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6085
- Rewards/chosen: -1.0876
- Rewards/rejected: -1.3914
- Rewards/accuracies: 0.6580
- Rewards/margins: 0.3038
- Logps/rejected: -490.8211
- Logps/chosen: -504.9807
- Logits/rejected: -2.6096
- Logits/chosen: -2.6425

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6921        | 0.03  | 100  | 0.6923          | 0.0160         | 0.0142           | 0.5645             | 0.0018          | -350.2683      | -394.6286    | -2.7841         | -2.8363       |
| 0.6894        | 0.05  | 200  | 0.6894          | 0.0433         | 0.0353           | 0.5920             | 0.0080          | -348.1495      | -391.8949    | -2.7811         | -2.8333       |
| 0.6815        | 0.08  | 300  | 0.6844          | 0.0806         | 0.0609           | 0.6025             | 0.0196          | -345.5898      | -388.1692    | -2.7838         | -2.8349       |
| 0.6869        | 0.1   | 400  | 0.6788          | 0.0607         | 0.0269           | 0.6125             | 0.0339          | -348.9979      | -390.1522    | -2.7931         | -2.8423       |
| 0.6744        | 0.13  | 500  | 0.6724          | 0.0243         | -0.0249          | 0.6210             | 0.0492          | -354.1764      | -393.7983    | -2.7889         | -2.8371       |
| 0.6679        | 0.16  | 600  | 0.6625          | -0.0566        | -0.1346          | 0.6265             | 0.0780          | -365.1402      | -401.8826    | -2.7709         | -2.8179       |
| 0.637         | 0.18  | 700  | 0.6555          | -0.2568        | -0.3654          | 0.6290             | 0.1086          | -388.2211      | -421.9038    | -2.7596         | -2.8051       |
| 0.6166        | 0.21  | 800  | 0.6488          | -0.3935        | -0.5223          | 0.6320             | 0.1288          | -403.9116      | -435.5756    | -2.7523         | -2.7961       |
| 0.6335        | 0.24  | 900  | 0.6458          | -0.4516        | -0.6042          | 0.6380             | 0.1527          | -412.1083      | -441.3798    | -2.7325         | -2.7764       |
| 0.6286        | 0.26  | 1000 | 0.6406          | -0.8692        | -1.0442          | 0.625              | 0.1750          | -456.1026      | -483.1429    | -2.7123         | -2.7531       |
| 0.669         | 0.29  | 1100 | 0.6406          | -0.3445        | -0.4984          | 0.6365             | 0.1538          | -401.5222      | -430.6789    | -2.6946         | -2.7354       |
| 0.6723        | 0.31  | 1200 | 0.6358          | -0.4619        | -0.6430          | 0.6425             | 0.1811          | -415.9841      | -442.4163    | -2.6701         | -2.7077       |
| 0.605         | 0.34  | 1300 | 0.6297          | -0.6894        | -0.8903          | 0.6435             | 0.2009          | -440.7144      | -465.1627    | -2.6764         | -2.7122       |
| 0.6361        | 0.37  | 1400 | 0.6267          | -0.7144        | -0.9307          | 0.6505             | 0.2163          | -444.7496      | -467.6648    | -2.6711         | -2.7091       |
| 0.6085        | 0.39  | 1500 | 0.6213          | -1.0532        | -1.3084          | 0.6490             | 0.2552          | -482.5256      | -501.5469    | -2.6435         | -2.6797       |
| 0.6317        | 0.42  | 1600 | 0.6197          | -1.1246        | -1.3825          | 0.6490             | 0.2579          | -489.9323      | -508.6858    | -2.6172         | -2.6506       |
| 0.6702        | 0.44  | 1700 | 0.6182          | -1.0036        | -1.2644          | 0.6530             | 0.2609          | -478.1268      | -496.5815    | -2.6407         | -2.6762       |
| 0.5658        | 0.47  | 1800 | 0.6219          | -1.3479        | -1.6348          | 0.6445             | 0.2869          | -515.1606      | -531.0145    | -2.5866         | -2.6182       |
| 0.6039        | 0.5   | 1900 | 0.6154          | -0.9014        | -1.1716          | 0.6630             | 0.2702          | -468.8458      | -486.3656    | -2.6376         | -2.6742       |
| 0.6173        | 0.52  | 2000 | 0.6121          | -1.1535        | -1.4470          | 0.6575             | 0.2934          | -496.3810      | -511.5793    | -2.6232         | -2.6580       |
| 0.62          | 0.55  | 2100 | 0.6116          | -1.1600        | -1.4523          | 0.6650             | 0.2923          | -496.9117      | -512.2247    | -2.6278         | -2.6629       |
| 0.5957        | 0.58  | 2200 | 0.6132          | -0.9592        | -1.2431          | 0.6655             | 0.2839          | -475.9958      | -492.1489    | -2.6317         | -2.6674       |
| 0.6093        | 0.6   | 2300 | 0.6138          | -1.0935        | -1.3811          | 0.6625             | 0.2876          | -489.7906      | -505.5738    | -2.6283         | -2.6619       |
| 0.6009        | 0.63  | 2400 | 0.6108          | -1.0519        | -1.3479          | 0.6610             | 0.2959          | -486.4695      | -501.4175    | -2.6088         | -2.6432       |
| 0.5988        | 0.65  | 2500 | 0.6108          | -1.0427        | -1.3419          | 0.6590             | 0.2992          | -485.8730      | -500.4982    | -2.6143         | -2.6477       |
| 0.606         | 0.68  | 2600 | 0.6112          | -1.0188        | -1.3192          | 0.6545             | 0.3003          | -483.6013      | -498.1078    | -2.5974         | -2.6304       |
| 0.6118        | 0.71  | 2700 | 0.6106          | -1.0808        | -1.3857          | 0.6595             | 0.3049          | -490.2562      | -504.3045    | -2.5945         | -2.6274       |
| 0.6134        | 0.73  | 2800 | 0.6096          | -1.1549        | -1.4635          | 0.6585             | 0.3086          | -498.0366      | -511.7179    | -2.5978         | -2.6303       |
| 0.6159        | 0.76  | 2900 | 0.6097          | -1.0550        | -1.3509          | 0.6585             | 0.2959          | -486.7739      | -501.7256    | -2.6175         | -2.6500       |
| 0.5815        | 0.79  | 3000 | 0.6091          | -1.1025        | -1.4048          | 0.6570             | 0.3023          | -492.1650      | -506.4727    | -2.6089         | -2.6420       |
| 0.5885        | 0.81  | 3100 | 0.6089          | -1.0977        | -1.4006          | 0.6595             | 0.3029          | -491.7444      | -505.9960    | -2.6001         | -2.6337       |
| 0.6074        | 0.84  | 3200 | 0.6086          | -1.0982        | -1.4029          | 0.6605             | 0.3047          | -491.9724      | -506.0455    | -2.6056         | -2.6388       |
| 0.5981        | 0.86  | 3300 | 0.6087          | -1.0853        | -1.3881          | 0.6610             | 0.3028          | -490.4915      | -504.7571    | -2.6117         | -2.6442       |
| 0.5944        | 0.89  | 3400 | 0.6087          | -1.0897        | -1.3931          | 0.6580             | 0.3034          | -490.9887      | -505.1947    | -2.6026         | -2.6360       |
| 0.5979        | 0.92  | 3500 | 0.6085          | -1.0922        | -1.3962          | 0.6595             | 0.3040          | -491.3070      | -505.4438    | -2.6136         | -2.6460       |
| 0.6154        | 0.94  | 3600 | 0.6086          | -1.0905        | -1.3946          | 0.6595             | 0.3040          | -491.1413      | -505.2781    | -2.6066         | -2.6397       |
| 0.6053        | 0.97  | 3700 | 0.6086          | -1.0907        | -1.3946          | 0.6550             | 0.3039          | -491.1405      | -505.2943    | -2.6094         | -2.6423       |
| 0.602         | 0.99  | 3800 | 0.6085          | -1.0876        | -1.3914          | 0.6580             | 0.3038          | -490.8211      | -504.9807    | -2.6096         | -2.6425       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.3
- Pytorch 2.1.2
- Datasets 2.18.0
- Tokenizers 0.15.2