File size: 4,442 Bytes
2c5674f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
557861d
 
 
 
 
 
 
 
 
 
2c5674f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
557861d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2c5674f
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
base_model: lvwerra/gpt2-imdb
tags:
- generated_from_trainer
model-index:
- name: gpt-imdb-kto-beta_0.1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gpt-imdb-kto-beta_0.1

This model is a fine-tuned version of [lvwerra/gpt2-imdb](https://huggingface.co/lvwerra/gpt2-imdb) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2062
- Rewards/chosen: 2.5179
- Rewards/rejected: -0.0433
- Rewards/accuracies: 0.8250
- Rewards/margins: 2.5611
- Logps/rejected: -264.1180
- Logps/chosen: -210.0866
- Logits/rejected: -30.4371
- Logits/chosen: -31.3849

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 24
- eval_batch_size: 24
- seed: 42
- optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 150
- training_steps: 7197

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.2522        | 0.21  | 500  | 0.2884          | 1.2801         | -0.2634          | 0.7875             | 1.5434          | -266.3188      | -222.4644    | -37.5496        | -38.4713      |
| 0.3335        | 0.42  | 1000 | 0.2696          | 1.5869         | -0.1616          | 0.7917             | 1.7485          | -265.3008      | -219.3961    | -37.8624        | -38.7817      |
| 0.2435        | 0.63  | 1500 | 0.2472          | 1.8228         | -0.2033          | 0.7896             | 2.0260          | -265.7180      | -217.0376    | -33.3680        | -34.1467      |
| 0.3162        | 0.83  | 2000 | 0.2497          | 2.2013         | 0.3606           | 0.7729             | 1.8407          | -260.0789      | -213.2520    | -33.0705        | -33.7146      |
| 0.1409        | 1.04  | 2500 | 0.2301          | 2.0789         | -0.0950          | 0.8042             | 2.1738          | -264.6351      | -214.4766    | -34.2110        | -35.1256      |
| 0.2415        | 1.25  | 3000 | 0.2221          | 2.1406         | -0.2423          | 0.8042             | 2.3829          | -266.1087      | -213.8594    | -35.0880        | -35.8295      |
| 0.1549        | 1.46  | 3500 | 0.2173          | 2.2945         | -0.0445          | 0.7979             | 2.3390          | -264.1307      | -212.3203    | -31.2025        | -32.0702      |
| 0.1764        | 1.67  | 4000 | 0.2117          | 2.3347         | -0.2551          | 0.8250             | 2.5898          | -266.2365      | -211.9187    | -31.0530        | -31.9754      |
| 0.131         | 1.88  | 4500 | 0.2101          | 2.3080         | -0.3171          | 0.8062             | 2.6251          | -266.8560      | -212.1852    | -30.9535        | -31.9058      |
| 0.2463        | 2.08  | 5000 | 0.2131          | 2.5808         | 0.2215           | 0.8167             | 2.3593          | -261.4699      | -209.4572    | -31.7099        | -32.5262      |
| 0.1536        | 2.29  | 5500 | 0.2084          | 2.5201         | -0.0034          | 0.8125             | 2.5236          | -263.7196      | -210.0640    | -30.3275        | -31.2806      |
| 0.2473        | 2.5   | 6000 | 0.2057          | 2.4813         | -0.1087          | 0.8188             | 2.5899          | -264.7721      | -210.4527    | -30.2259        | -31.1935      |
| 0.2168        | 2.71  | 6500 | 0.2060          | 2.5255         | -0.0304          | 0.8146             | 2.5559          | -263.9893      | -210.0102    | -30.4678        | -31.4146      |
| 0.1669        | 2.92  | 7000 | 0.2062          | 2.5179         | -0.0433          | 0.8250             | 2.5611          | -264.1180      | -210.0866    | -30.4371        | -31.3849      |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.1
- Datasets 2.15.0
- Tokenizers 0.15.0