File size: 5,520 Bytes
ec84c3f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
base_model: mistralai/Mistral-Nemo-Instruct-2407
library_name: peft
license: other
tags:
- llama-factory
- lora
- generated_from_trainer
model-index:
- name: sft_dpo_fs
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# sft_dpo_fs

This model is a fine-tuned version of [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) on the heat_transfer_dpo dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1535
- Rewards/chosen: 17.2823
- Rewards/rejected: 11.3004
- Rewards/accuracies: 0.9610
- Rewards/margins: 5.9819
- Logps/chosen: -2.2063
- Logps/rejected: -60.6033
- Logits/chosen: 0.0035
- Logits/rejected: -0.0076

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 8
- total_eval_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:------------:|:--------------:|:-------------:|:---------------:|
| 0.3835        | 0.0533 | 60   | 0.3287          | 17.1482        | 15.7095          | 0.9280             | 1.4386          | -3.5472      | -16.5118       | -0.5827       | -0.5914         |
| 0.2552        | 0.1067 | 120  | 0.1900          | 17.1335        | 13.7535          | 0.9320             | 3.3799          | -3.6944      | -36.0722       | -0.2065       | -0.2218         |
| 0.2362        | 0.16   | 180  | 0.2024          | 17.0614        | 11.9722          | 0.9510             | 5.0892          | -4.4150      | -53.8850       | -0.1087       | -0.1222         |
| 0.1781        | 0.2133 | 240  | 0.1546          | 17.0620        | 12.2862          | 0.9500             | 4.7758          | -4.4089      | -50.7448       | -0.1243       | -0.1381         |
| 0.265         | 0.2667 | 300  | 0.1536          | 17.2493        | 12.6444          | 0.9440             | 4.6050          | -2.5355      | -47.1637       | -0.1744       | -0.1856         |
| 0.1605        | 0.32   | 360  | 0.3194          | 17.3612        | 12.2655          | 0.9210             | 5.0958          | -1.4165      | -50.9525       | -0.1062       | -0.1173         |
| 0.2894        | 0.3733 | 420  | 0.1679          | 17.3116        | 12.2496          | 0.9450             | 5.0620          | -1.9131      | -51.1113       | -0.0905       | -0.1026         |
| 0.1149        | 0.4267 | 480  | 0.2951          | 17.0540        | 11.9844          | 0.9230             | 5.0696          | -4.4890      | -53.7628       | -0.0770       | -0.0883         |
| 0.0384        | 0.48   | 540  | 0.1739          | 17.2042        | 12.1334          | 0.9490             | 5.0708          | -2.9873      | -52.2731       | -0.0512       | -0.0612         |
| 0.4008        | 0.5333 | 600  | 0.1706          | 17.2853        | 11.6981          | 0.9470             | 5.5872          | -2.1760      | -56.6266       | -0.0358       | -0.0469         |
| 0.1678        | 0.5867 | 660  | 0.2050          | 17.2021        | 11.5656          | 0.9450             | 5.6365          | -3.0082      | -57.9516       | -0.0160       | -0.0270         |
| 0.2272        | 0.64   | 720  | 0.1402          | 17.3928        | 11.7696          | 0.9520             | 5.6233          | -1.1005      | -55.9117       | -0.0229       | -0.0322         |
| 0.1915        | 0.6933 | 780  | 0.2441          | 17.3947        | 11.7656          | 0.9320             | 5.6290          | -1.0823      | -55.9507       | -0.0166       | -0.0266         |
| 0.0635        | 0.7467 | 840  | 0.1689          | 17.3812        | 11.5343          | 0.9450             | 5.8469          | -1.2169      | -58.2643       | -0.0111       | -0.0217         |
| 0.1703        | 0.8    | 900  | 0.1400          | 17.3271        | 11.3817          | 0.9610             | 5.9455          | -1.7577      | -59.7906       | 0.0002        | -0.0105         |
| 0.1138        | 0.8533 | 960  | 0.1441          | 17.3149        | 11.3432          | 0.9630             | 5.9718          | -1.8795      | -60.1756       | 0.0015        | -0.0094         |
| 0.0513        | 0.9067 | 1020 | 0.1412          | 17.3211        | 11.3263          | 0.9610             | 5.9948          | -1.8178      | -60.3445       | 0.0045        | -0.0065         |
| 0.1189        | 0.96   | 1080 | 0.1508          | 17.2887        | 11.3001          | 0.9610             | 5.9886          | -2.1420      | -60.6061       | 0.0074        | -0.0036         |


### Framework versions

- PEFT 0.12.0
- Transformers 4.46.0
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.20.1