File size: 21,488 Bytes
5b9d863
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
---
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: LlamaCorn-1.1B-Chat
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# LlamaCorn-1.1B-Chat

This model was trained from scratch on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9305
- Rewards/chosen: -0.2148
- Rewards/rejected: -0.2954
- Rewards/accuracies: 0.5824
- Rewards/margins: 0.0806
- Logps/rejected: -183.8757
- Logps/chosen: -197.7534
- Logits/rejected: -2.6439
- Logits/chosen: -2.6493

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.9958        | 0.03  | 100   | 1.0003          | -0.0002        | -0.0002          | 0.4930             | -0.0001         | -180.9232      | -195.6078    | -2.6876         | -2.6924       |
| 0.9984        | 0.06  | 200   | 0.9995          | -0.0007        | -0.0013          | 0.4988             | 0.0006          | -180.9347      | -195.6127    | -2.6787         | -2.6838       |
| 0.9982        | 0.09  | 300   | 0.9997          | -0.0008        | -0.0015          | 0.4983             | 0.0007          | -180.9361      | -195.6136    | -2.6848         | -2.6897       |
| 0.9966        | 0.12  | 400   | 0.9999          | -0.0024        | -0.0027          | 0.4995             | 0.0003          | -180.9485      | -195.6291    | -2.6865         | -2.6914       |
| 0.9992        | 0.15  | 500   | 0.9984          | -0.0039        | -0.0054          | 0.5122             | 0.0015          | -180.9753      | -195.6440    | -2.6641         | -2.6694       |
| 0.9983        | 0.18  | 600   | 0.9981          | -0.0054        | -0.0073          | 0.5127             | 0.0020          | -180.9945      | -195.6589    | -2.6862         | -2.6911       |
| 0.9968        | 0.2   | 700   | 0.9972          | -0.0093        | -0.0127          | 0.5241             | 0.0034          | -181.0485      | -195.6985    | -2.6753         | -2.6803       |
| 0.9893        | 0.23  | 800   | 0.9951          | -0.0114        | -0.0164          | 0.5248             | 0.0051          | -181.0858      | -195.7188    | -2.6676         | -2.6728       |
| 0.988         | 0.26  | 900   | 0.9924          | -0.0169        | -0.0245          | 0.5421             | 0.0076          | -181.1663      | -195.7744    | -2.6763         | -2.6814       |
| 0.9879        | 0.29  | 1000  | 0.9907          | -0.0220        | -0.0318          | 0.5476             | 0.0098          | -181.2388      | -195.8248    | -2.6746         | -2.6796       |
| 0.9882        | 0.32  | 1100  | 0.9869          | -0.0261        | -0.0399          | 0.5598             | 0.0138          | -181.3200      | -195.8661    | -2.6647         | -2.6699       |
| 0.979         | 0.35  | 1200  | 0.9851          | -0.0364        | -0.0521          | 0.5563             | 0.0157          | -181.4419      | -195.9693    | -2.6684         | -2.6735       |
| 0.985         | 0.38  | 1300  | 0.9818          | -0.0385        | -0.0576          | 0.5608             | 0.0192          | -181.4978      | -195.9900    | -2.6874         | -2.6921       |
| 0.9821        | 0.41  | 1400  | 0.9805          | -0.0462        | -0.0668          | 0.5590             | 0.0206          | -181.5891      | -196.0672    | -2.6761         | -2.6810       |
| 0.9822        | 0.44  | 1500  | 0.9779          | -0.0550        | -0.0777          | 0.5632             | 0.0227          | -181.6983      | -196.1554    | -2.6764         | -2.6813       |
| 0.9755        | 0.47  | 1600  | 0.9756          | -0.0600        | -0.0855          | 0.5656             | 0.0255          | -181.7764      | -196.2058    | -2.6502         | -2.6557       |
| 0.9697        | 0.5   | 1700  | 0.9731          | -0.0652        | -0.0931          | 0.5651             | 0.0280          | -181.8526      | -196.2569    | -2.6752         | -2.6801       |
| 0.969         | 0.53  | 1800  | 0.9698          | -0.0701        | -0.1017          | 0.5687             | 0.0315          | -181.9380      | -196.3067    | -2.6635         | -2.6687       |
| 0.9643        | 0.55  | 1900  | 0.9685          | -0.0762        | -0.1092          | 0.5676             | 0.0331          | -182.0137      | -196.3669    | -2.6590         | -2.6642       |
| 0.9655        | 0.58  | 2000  | 0.9663          | -0.0821        | -0.1180          | 0.5756             | 0.0359          | -182.1012      | -196.4265    | -2.6802         | -2.6850       |
| 0.9719        | 0.61  | 2100  | 0.9645          | -0.0908        | -0.1281          | 0.5676             | 0.0373          | -182.2023      | -196.5133    | -2.6677         | -2.6727       |
| 0.9576        | 0.64  | 2200  | 0.9625          | -0.0953        | -0.1350          | 0.5729             | 0.0396          | -182.2709      | -196.5585    | -2.6679         | -2.6730       |
| 0.9619        | 0.67  | 2300  | 0.9603          | -0.1012        | -0.1436          | 0.5783             | 0.0424          | -182.3572      | -196.6170    | -2.6527         | -2.6580       |
| 0.9511        | 0.7   | 2400  | 0.9601          | -0.1105        | -0.1540          | 0.5722             | 0.0434          | -182.4612      | -196.7107    | -2.6565         | -2.6617       |
| 0.9516        | 0.73  | 2500  | 0.9570          | -0.1158        | -0.1618          | 0.5715             | 0.0460          | -182.5389      | -196.7630    | -2.6613         | -2.6664       |
| 0.9577        | 0.76  | 2600  | 0.9554          | -0.1236        | -0.1717          | 0.5719             | 0.0481          | -182.6387      | -196.8413    | -2.6595         | -2.6646       |
| 0.9471        | 0.79  | 2700  | 0.9541          | -0.1268        | -0.1763          | 0.5736             | 0.0495          | -182.6840      | -196.8731    | -2.6621         | -2.6672       |
| 0.9519        | 0.82  | 2800  | 0.9524          | -0.1336        | -0.1849          | 0.5738             | 0.0513          | -182.7705      | -196.9414    | -2.6762         | -2.6810       |
| 0.9522        | 0.85  | 2900  | 0.9515          | -0.1364        | -0.1896          | 0.5724             | 0.0531          | -182.8170      | -196.9696    | -2.6604         | -2.6655       |
| 0.9414        | 0.88  | 3000  | 0.9491          | -0.1395        | -0.1949          | 0.5744             | 0.0555          | -182.8706      | -197.0000    | -2.6706         | -2.6755       |
| 0.9509        | 0.9   | 3100  | 0.9483          | -0.1450        | -0.2020          | 0.5799             | 0.0570          | -182.9411      | -197.0551    | -2.6574         | -2.6625       |
| 0.9453        | 0.93  | 3200  | 0.9472          | -0.1472        | -0.2061          | 0.5834             | 0.0589          | -182.9822      | -197.0772    | -2.6424         | -2.6478       |
| 0.9577        | 0.96  | 3300  | 0.9461          | -0.1490        | -0.2081          | 0.5794             | 0.0590          | -183.0018      | -197.0956    | -2.6570         | -2.6622       |
| 0.9374        | 0.99  | 3400  | 0.9452          | -0.1532        | -0.2145          | 0.5770             | 0.0613          | -183.0663      | -197.1376    | -2.6499         | -2.6552       |
| 0.9299        | 1.02  | 3500  | 0.9439          | -0.1570        | -0.2195          | 0.5770             | 0.0625          | -183.1160      | -197.1755    | -2.6612         | -2.6663       |
| 0.936         | 1.05  | 3600  | 0.9438          | -0.1628        | -0.2265          | 0.5789             | 0.0637          | -183.1864      | -197.2330    | -2.6532         | -2.6584       |
| 0.9435        | 1.08  | 3700  | 0.9420          | -0.1655        | -0.2305          | 0.5807             | 0.0650          | -183.2263      | -197.2607    | -2.6673         | -2.6723       |
| 0.9341        | 1.11  | 3800  | 0.9422          | -0.1698        | -0.2351          | 0.5812             | 0.0653          | -183.2721      | -197.3029    | -2.6585         | -2.6636       |
| 0.9296        | 1.14  | 3900  | 0.9405          | -0.1736        | -0.2401          | 0.5714             | 0.0665          | -183.3225      | -197.3411    | -2.6382         | -2.6437       |
| 0.9338        | 1.17  | 4000  | 0.9402          | -0.1747        | -0.2426          | 0.5772             | 0.0680          | -183.3476      | -197.3519    | -2.6428         | -2.6483       |
| 0.9257        | 1.2   | 4100  | 0.9395          | -0.1780        | -0.2462          | 0.5766             | 0.0682          | -183.3829      | -197.3849    | -2.6411         | -2.6465       |
| 0.9368        | 1.23  | 4200  | 0.9386          | -0.1786        | -0.2485          | 0.5833             | 0.0699          | -183.4063      | -197.3914    | -2.6495         | -2.6548       |
| 0.916         | 1.25  | 4300  | 0.9385          | -0.1812        | -0.2513          | 0.5763             | 0.0702          | -183.4345      | -197.4169    | -2.6390         | -2.6445       |
| 0.9093        | 1.28  | 4400  | 0.9375          | -0.1864        | -0.2576          | 0.5831             | 0.0712          | -183.4972      | -197.4688    | -2.6448         | -2.6502       |
| 0.9408        | 1.31  | 4500  | 0.9368          | -0.1896        | -0.2615          | 0.5797             | 0.0719          | -183.5364      | -197.5016    | -2.6422         | -2.6476       |
| 0.9245        | 1.34  | 4600  | 0.9363          | -0.1926        | -0.2660          | 0.5787             | 0.0734          | -183.5815      | -197.5314    | -2.6563         | -2.6614       |
| 0.9469        | 1.37  | 4700  | 0.9364          | -0.1944        | -0.2666          | 0.5775             | 0.0722          | -183.5875      | -197.5493    | -2.6581         | -2.6632       |
| 0.9421        | 1.4   | 4800  | 0.9358          | -0.1946        | -0.2683          | 0.5819             | 0.0736          | -183.6040      | -197.5517    | -2.6640         | -2.6691       |
| 0.9076        | 1.43  | 4900  | 0.9356          | -0.1963        | -0.2704          | 0.5799             | 0.0741          | -183.6253      | -197.5680    | -2.6626         | -2.6676       |
| 0.94          | 1.46  | 5000  | 0.9353          | -0.1996        | -0.2738          | 0.5800             | 0.0742          | -183.6591      | -197.6010    | -2.6438         | -2.6492       |
| 0.9288        | 1.49  | 5100  | 0.9351          | -0.1999        | -0.2741          | 0.5809             | 0.0742          | -183.6625      | -197.6045    | -2.6433         | -2.6487       |
| 0.927         | 1.52  | 5200  | 0.9343          | -0.2009        | -0.2767          | 0.5821             | 0.0758          | -183.6883      | -197.6144    | -2.6499         | -2.6552       |
| 0.9171        | 1.55  | 5300  | 0.9339          | -0.2024        | -0.2784          | 0.5823             | 0.0760          | -183.7055      | -197.6292    | -2.6421         | -2.6476       |
| 0.9337        | 1.58  | 5400  | 0.9344          | -0.2040        | -0.2799          | 0.5787             | 0.0760          | -183.7208      | -197.6453    | -2.6459         | -2.6513       |
| 0.919         | 1.6   | 5500  | 0.9334          | -0.2058        | -0.2825          | 0.5811             | 0.0767          | -183.7465      | -197.6637    | -2.6390         | -2.6445       |
| 0.9297        | 1.63  | 5600  | 0.9341          | -0.2053        | -0.2822          | 0.5794             | 0.0770          | -183.7437      | -197.6582    | -2.6418         | -2.6472       |
| 0.9174        | 1.66  | 5700  | 0.9333          | -0.2067        | -0.2834          | 0.5800             | 0.0767          | -183.7554      | -197.6726    | -2.6492         | -2.6545       |
| 0.9275        | 1.69  | 5800  | 0.9332          | -0.2059        | -0.2826          | 0.5760             | 0.0767          | -183.7476      | -197.6642    | -2.6471         | -2.6524       |
| 0.9164        | 1.72  | 5900  | 0.9321          | -0.2079        | -0.2867          | 0.5809             | 0.0787          | -183.7881      | -197.6847    | -2.6387         | -2.6442       |
| 0.9218        | 1.75  | 6000  | 0.9322          | -0.2095        | -0.2872          | 0.5787             | 0.0777          | -183.7935      | -197.7004    | -2.6377         | -2.6432       |
| 0.944         | 1.78  | 6100  | 0.9319          | -0.2106        | -0.2895          | 0.5823             | 0.0789          | -183.8163      | -197.7118    | -2.6555         | -2.6607       |
| 0.9037        | 1.81  | 6200  | 0.9323          | -0.2105        | -0.2892          | 0.5780             | 0.0787          | -183.8135      | -197.7102    | -2.6459         | -2.6513       |
| 0.929         | 1.84  | 6300  | 0.9321          | -0.2114        | -0.2905          | 0.5773             | 0.0791          | -183.8265      | -197.7195    | -2.6446         | -2.6500       |
| 0.9091        | 1.87  | 6400  | 0.9324          | -0.2111        | -0.2904          | 0.5760             | 0.0793          | -183.8252      | -197.7167    | -2.6534         | -2.6586       |
| 0.9094        | 1.9   | 6500  | 0.9321          | -0.2123        | -0.2903          | 0.5770             | 0.0780          | -183.8242      | -197.7287    | -2.6477         | -2.6530       |
| 0.9449        | 1.93  | 6600  | 0.9320          | -0.2120        | -0.2903          | 0.5795             | 0.0784          | -183.8246      | -197.7251    | -2.6302         | -2.6358       |
| 0.9404        | 1.95  | 6700  | 0.9319          | -0.2115        | -0.2909          | 0.5802             | 0.0794          | -183.8302      | -197.7204    | -2.6443         | -2.6497       |
| 0.9155        | 1.98  | 6800  | 0.9314          | -0.2124        | -0.2919          | 0.5826             | 0.0795          | -183.8406      | -197.7291    | -2.6306         | -2.6362       |
| 0.9328        | 2.01  | 6900  | 0.9313          | -0.2127        | -0.2924          | 0.5884             | 0.0798          | -183.8456      | -197.7321    | -2.6296         | -2.6352       |
| 0.9012        | 2.04  | 7000  | 0.9321          | -0.2146        | -0.2932          | 0.5766             | 0.0785          | -183.8530      | -197.7515    | -2.6361         | -2.6416       |
| 0.9296        | 2.07  | 7100  | 0.9314          | -0.2127        | -0.2929          | 0.5780             | 0.0802          | -183.8507      | -197.7323    | -2.6460         | -2.6513       |
| 0.9076        | 2.1   | 7200  | 0.9315          | -0.2145        | -0.2945          | 0.5797             | 0.0799          | -183.8660      | -197.7507    | -2.6501         | -2.6554       |
| 0.922         | 2.13  | 7300  | 0.9315          | -0.2147        | -0.2935          | 0.5792             | 0.0788          | -183.8565      | -197.7523    | -2.6510         | -2.6562       |
| 0.9136        | 2.16  | 7400  | 0.9313          | -0.2146        | -0.2941          | 0.5819             | 0.0795          | -183.8625      | -197.7515    | -2.6410         | -2.6464       |
| 0.9401        | 2.19  | 7500  | 0.9314          | -0.2140        | -0.2937          | 0.5799             | 0.0797          | -183.8583      | -197.7451    | -2.6490         | -2.6543       |
| 0.9295        | 2.22  | 7600  | 0.9313          | -0.2153        | -0.2953          | 0.5812             | 0.0800          | -183.8747      | -197.7585    | -2.6569         | -2.6620       |
| 0.9128        | 2.25  | 7700  | 0.9309          | -0.2154        | -0.2960          | 0.5817             | 0.0806          | -183.8814      | -197.7590    | -2.6500         | -2.6553       |
| 0.9074        | 2.28  | 7800  | 0.9312          | -0.2159        | -0.2964          | 0.5836             | 0.0804          | -183.8851      | -197.7648    | -2.6505         | -2.6557       |
| 0.9114        | 2.3   | 7900  | 0.9310          | -0.2149        | -0.2949          | 0.5836             | 0.0800          | -183.8703      | -197.7544    | -2.6425         | -2.6479       |
| 0.9181        | 2.33  | 8000  | 0.9318          | -0.2145        | -0.2937          | 0.5772             | 0.0792          | -183.8585      | -197.7501    | -2.6611         | -2.6661       |
| 0.9009        | 2.36  | 8100  | 0.9311          | -0.2149        | -0.2952          | 0.5799             | 0.0803          | -183.8736      | -197.7543    | -2.6581         | -2.6632       |
| 0.9091        | 2.39  | 8200  | 0.9311          | -0.2165        | -0.2960          | 0.5829             | 0.0795          | -183.8816      | -197.7702    | -2.6378         | -2.6433       |
| 0.9091        | 2.42  | 8300  | 0.9312          | -0.2146        | -0.2950          | 0.5833             | 0.0805          | -183.8717      | -197.7510    | -2.6475         | -2.6528       |
| 0.9419        | 2.45  | 8400  | 0.9307          | -0.2138        | -0.2946          | 0.5777             | 0.0808          | -183.8678      | -197.7433    | -2.6364         | -2.6419       |
| 0.9203        | 2.48  | 8500  | 0.9313          | -0.2148        | -0.2948          | 0.5834             | 0.0800          | -183.8688      | -197.7529    | -2.6474         | -2.6527       |
| 0.9102        | 2.51  | 8600  | 0.9315          | -0.2158        | -0.2958          | 0.5821             | 0.0800          | -183.8791      | -197.7635    | -2.6436         | -2.6489       |
| 0.9327        | 2.54  | 8700  | 0.9316          | -0.2146        | -0.2946          | 0.5824             | 0.0800          | -183.8669      | -197.7511    | -2.6505         | -2.6558       |
| 0.9221        | 2.57  | 8800  | 0.9305          | -0.2149        | -0.2953          | 0.5828             | 0.0804          | -183.8742      | -197.7540    | -2.6659         | -2.6709       |
| 0.8851        | 2.6   | 8900  | 0.9315          | -0.2146        | -0.2949          | 0.5816             | 0.0803          | -183.8702      | -197.7508    | -2.6571         | -2.6622       |
| 0.924         | 2.63  | 9000  | 0.9304          | -0.2144        | -0.2951          | 0.5804             | 0.0807          | -183.8718      | -197.7492    | -2.6449         | -2.6503       |
| 0.9025        | 2.65  | 9100  | 0.9315          | -0.2150        | -0.2950          | 0.5790             | 0.0800          | -183.8715      | -197.7551    | -2.6410         | -2.6464       |
| 0.9348        | 2.68  | 9200  | 0.9308          | -0.2144        | -0.2946          | 0.5802             | 0.0802          | -183.8669      | -197.7491    | -2.6349         | -2.6405       |
| 0.9067        | 2.71  | 9300  | 0.9312          | -0.2155        | -0.2959          | 0.5857             | 0.0805          | -183.8805      | -197.7599    | -2.6410         | -2.6465       |
| 0.9263        | 2.74  | 9400  | 0.9307          | -0.2148        | -0.2957          | 0.5829             | 0.0809          | -183.8785      | -197.7536    | -2.6432         | -2.6486       |
| 0.912         | 2.77  | 9500  | 0.9306          | -0.2153        | -0.2957          | 0.5823             | 0.0805          | -183.8788      | -197.7581    | -2.6441         | -2.6495       |
| 0.9157        | 2.8   | 9600  | 0.9314          | -0.2169        | -0.2965          | 0.5785             | 0.0795          | -183.8859      | -197.7745    | -2.6439         | -2.6493       |
| 0.9094        | 2.83  | 9700  | 0.9309          | -0.2157        | -0.2961          | 0.5831             | 0.0804          | -183.8826      | -197.7625    | -2.6441         | -2.6494       |
| 0.9256        | 2.86  | 9800  | 0.9304          | -0.2160        | -0.2965          | 0.5838             | 0.0805          | -183.8867      | -197.7653    | -2.6439         | -2.6493       |
| 0.9287        | 2.89  | 9900  | 0.9305          | -0.2149        | -0.2955          | 0.5833             | 0.0806          | -183.8762      | -197.7545    | -2.6440         | -2.6494       |
| 0.9296        | 2.92  | 10000 | 0.9310          | -0.2157        | -0.2953          | 0.5795             | 0.0796          | -183.8741      | -197.7621    | -2.6439         | -2.6493       |
| 0.9335        | 2.95  | 10100 | 0.9311          | -0.2153        | -0.2953          | 0.5812             | 0.0800          | -183.8739      | -197.7578    | -2.6439         | -2.6493       |
| 0.9321        | 2.98  | 10200 | 0.9305          | -0.2149        | -0.2955          | 0.5824             | 0.0805          | -183.8759      | -197.7545    | -2.6439         | -2.6493       |


### Framework versions

- Transformers 4.36.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.0