File size: 10,096 Bytes
7216ca0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca9ca25
671f390
ca9ca25
671f390
ca9ca25
 
 
 
 
 
 
 
7216ca0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
671f390
7216ca0
 
 
 
 
671f390
7216ca0
 
 
 
 
 
671f390
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7216ca0
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
license: llama3
library_name: peft
tags:
- trl
- orpo
- generated_from_trainer
base_model: meta-llama/Meta-Llama-3-8B
model-index:
- name: results
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# results

This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 2.2477
- Rewards/chosen: -0.2025
- Rewards/rejected: -0.2831
- Rewards/accuracies: 0.8875
- Rewards/margins: 0.0806
- Logps/rejected: -2.8313
- Logps/chosen: -2.0249
- Logits/rejected: -2.1125
- Logits/chosen: -1.7341
- Nll Loss: 2.2267
- Log Odds Ratio: -0.3842
- Log Odds Chosen: 0.8874

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- num_epochs: 10

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
| 5.5008        | 0.2907 | 50   | 5.6262          | -0.5231        | -0.6023          | 0.8250             | 0.0792          | -6.0233        | -5.2314      | -2.0311         | -1.8904       | 5.5816   | -0.4363        | 0.7951          |
| 4.92          | 0.5814 | 100  | 5.1023          | -0.4828        | -0.5584          | 0.8250             | 0.0756          | -5.5836        | -4.8278      | -2.1181         | -2.0055       | 5.0596   | -0.4441        | 0.7604          |
| 4.6969        | 0.8721 | 150  | 4.6774          | -0.4489        | -0.5171          | 0.8500             | 0.0682          | -5.1705        | -4.4885      | -2.1660         | -2.0410       | 4.6355   | -0.4630        | 0.6879          |
| 3.9492        | 1.1628 | 200  | 3.8213          | -0.3674        | -0.4438          | 0.875              | 0.0765          | -4.4384        | -3.6736      | -2.2855         | -1.9961       | 3.8167   | -0.4302        | 0.7799          |
| 3.45          | 1.4535 | 250  | 3.4864          | -0.3342        | -0.4227          | 0.9125             | 0.0885          | -4.2266        | -3.3420      | -2.2557         | -1.8804       | 3.4837   | -0.3910        | 0.9067          |
| 3.2561        | 1.7442 | 300  | 3.2679          | -0.3119        | -0.3956          | 0.9000             | 0.0837          | -3.9559        | -3.1191      | -2.2849         | -1.9045       | 3.2595   | -0.4022        | 0.8630          |
| 3.0471        | 2.0349 | 350  | 3.1300          | -0.3005        | -0.3768          | 0.9000             | 0.0763          | -3.7679        | -3.0046      | -2.2584         | -1.8626       | 3.1220   | -0.4214        | 0.7911          |
| 2.9312        | 2.3256 | 400  | 2.9729          | -0.2816        | -0.3469          | 0.875              | 0.0653          | -3.4686        | -2.8161      | -2.2750         | -1.8891       | 2.9539   | -0.4551        | 0.6823          |
| 2.6856        | 2.6163 | 450  | 2.8281          | -0.2630        | -0.3133          | 0.8375             | 0.0503          | -3.1333        | -2.6298      | -2.2692         | -1.8896       | 2.8010   | -0.5058        | 0.5330          |
| 2.7304        | 2.9070 | 500  | 2.7191          | -0.2493        | -0.2893          | 0.7875             | 0.0400          | -2.8928        | -2.4927      | -2.2573         | -1.8775       | 2.6907   | -0.5448        | 0.4286          |
| 2.6224        | 3.1977 | 550  | 2.6362          | -0.2406        | -0.2809          | 0.7750             | 0.0403          | -2.8089        | -2.4062      | -2.2342         | -1.8500       | 2.6066   | -0.5412        | 0.4341          |
| 2.5026        | 3.4884 | 600  | 2.5858          | -0.2354        | -0.2761          | 0.7750             | 0.0407          | -2.7606        | -2.3537      | -2.2217         | -1.8389       | 2.5555   | -0.5383        | 0.4406          |
| 2.6062        | 3.7791 | 650  | 2.5413          | -0.2315        | -0.2783          | 0.7875             | 0.0468          | -2.7833        | -2.3151      | -2.2000         | -1.8150       | 2.5111   | -0.5115        | 0.5079          |
| 2.3809        | 4.0698 | 700  | 2.4987          | -0.2264        | -0.2712          | 0.8000             | 0.0448          | -2.7123        | -2.2642      | -2.1931         | -1.8048       | 2.4689   | -0.5187        | 0.4884          |
| 2.4307        | 4.3605 | 750  | 2.4637          | -0.2232        | -0.2721          | 0.8000             | 0.0489          | -2.7213        | -2.2323      | -2.1814         | -1.7947       | 2.4350   | -0.5014        | 0.5339          |
| 2.4116        | 4.6512 | 800  | 2.4364          | -0.2203        | -0.2709          | 0.8000             | 0.0506          | -2.7095        | -2.2034      | -2.1728         | -1.7871       | 2.4081   | -0.4942        | 0.5536          |
| 2.3713        | 4.9419 | 850  | 2.4145          | -0.2180        | -0.2716          | 0.8125             | 0.0535          | -2.7157        | -2.1803      | -2.1681         | -1.7788       | 2.3873   | -0.4823        | 0.5863          |
| 2.3885        | 5.2326 | 900  | 2.3904          | -0.2160        | -0.2735          | 0.8250             | 0.0575          | -2.7352        | -2.1603      | -2.1621         | -1.7749       | 2.3630   | -0.4664        | 0.6301          |
| 2.3782        | 5.5233 | 950  | 2.3710          | -0.2141        | -0.2735          | 0.8250             | 0.0595          | -2.7355        | -2.1408      | -2.1522         | -1.7627       | 2.3448   | -0.4588        | 0.6524          |
| 2.2396        | 5.8140 | 1000 | 2.3565          | -0.2130        | -0.2767          | 0.8500             | 0.0637          | -2.7666        | -2.1295      | -2.1432         | -1.7523       | 2.3312   | -0.4429        | 0.6988          |
| 2.2947        | 6.1047 | 1050 | 2.3363          | -0.2109        | -0.2761          | 0.8625             | 0.0652          | -2.7607        | -2.1086      | -2.1430         | -1.7592       | 2.3118   | -0.4374        | 0.7162          |
| 2.2506        | 6.3953 | 1100 | 2.3212          | -0.2094        | -0.2765          | 0.8625             | 0.0671          | -2.7653        | -2.0941      | -2.1394         | -1.7585       | 2.2969   | -0.4304        | 0.7376          |
| 2.2421        | 6.6860 | 1150 | 2.3090          | -0.2084        | -0.2781          | 0.8625             | 0.0697          | -2.7808        | -2.0840      | -2.1324         | -1.7495       | 2.2853   | -0.4213        | 0.7657          |
| 2.2733        | 6.9767 | 1200 | 2.2972          | -0.2072        | -0.2788          | 0.875              | 0.0715          | -2.7878        | -2.0724      | -2.1276         | -1.7452       | 2.2739   | -0.4147        | 0.7865          |
| 2.269         | 7.2674 | 1250 | 2.2879          | -0.2064        | -0.2803          | 0.875              | 0.0738          | -2.8025        | -2.0641      | -2.1251         | -1.7449       | 2.2651   | -0.4067        | 0.8118          |
| 2.1922        | 7.5581 | 1300 | 2.2843          | -0.2056        | -0.2779          | 0.875              | 0.0723          | -2.7791        | -2.0565      | -2.1274         | -1.7480       | 2.2614   | -0.4121        | 0.7953          |
| 2.1969        | 7.8488 | 1350 | 2.2745          | -0.2050        | -0.2797          | 0.875              | 0.0748          | -2.7975        | -2.0497      | -2.1249         | -1.7453       | 2.2520   | -0.4034        | 0.8228          |
| 2.1968        | 8.1395 | 1400 | 2.2674          | -0.2043        | -0.2805          | 0.875              | 0.0762          | -2.8054        | -2.0433      | -2.1219         | -1.7424       | 2.2452   | -0.3987        | 0.8385          |
| 2.2984        | 8.4302 | 1450 | 2.2618          | -0.2038        | -0.2810          | 0.8875             | 0.0772          | -2.8104        | -2.0379      | -2.1210         | -1.7416       | 2.2398   | -0.3952        | 0.8501          |
| 2.2809        | 8.7209 | 1500 | 2.2636          | -0.2041        | -0.2852          | 0.9125             | 0.0811          | -2.8523        | -2.0408      | -2.1185         | -1.7341       | 2.2419   | -0.3823        | 0.8918          |
| 2.2605        | 9.0116 | 1550 | 2.2537          | -0.2032        | -0.2833          | 0.9000             | 0.0801          | -2.8331        | -2.0316      | -2.1153         | -1.7363       | 2.2324   | -0.3857        | 0.8816          |
| 2.1305        | 9.3023 | 1600 | 2.2505          | -0.2028        | -0.2832          | 0.9000             | 0.0804          | -2.8322        | -2.0279      | -2.1129         | -1.7336       | 2.2294   | -0.3849        | 0.8848          |
| 2.1614        | 9.5930 | 1650 | 2.2487          | -0.2026        | -0.2833          | 0.9000             | 0.0807          | -2.8330        | -2.0261      | -2.1129         | -1.7343       | 2.2276   | -0.3841        | 0.8878          |
| 2.1278        | 9.8837 | 1700 | 2.2478          | -0.2025        | -0.2832          | 0.8875             | 0.0807          | -2.8322        | -2.0250      | -2.1129         | -1.7345       | 2.2268   | -0.3839        | 0.8882          |


### Framework versions

- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.2
- Tokenizers 0.19.1