---
license: llama3
base_model: tsavage68/UTI_L3_1000steps_1e5rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: UTI2_L3_300steps_1e7rate_01beta_CSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# UTI2_L3_300steps_1e7rate_01beta_CSFTDPO

This model is a fine-tuned version of [tsavage68/UTI_L3_1000steps_1e5rate_SFT](https://huggingface.co/tsavage68/UTI_L3_1000steps_1e5rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5211
- Rewards/chosen: 0.1947
- Rewards/rejected: -0.2183
- Rewards/accuracies: 0.6500
- Rewards/margins: 0.4131
- Logps/rejected: -30.6679
- Logps/chosen: -17.1558
- Logits/rejected: -1.1555
- Logits/chosen: -1.1504

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 300

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6928        | 0.3333 | 25   | 0.6924          | 0.0009         | -0.0007          | 0.3600             | 0.0016          | -28.4922       | -19.0947     | -1.1524         | -1.1488       |
| 0.6893        | 0.6667 | 50   | 0.6863          | 0.0103         | -0.0035          | 0.6100             | 0.0138          | -28.5194       | -19.0000     | -1.1524         | -1.1488       |
| 0.6736        | 1.0    | 75   | 0.6701          | 0.0321         | -0.0151          | 0.6300             | 0.0471          | -28.6352       | -18.7825     | -1.1527         | -1.1490       |
| 0.622         | 1.3333 | 100  | 0.6366          | 0.0753         | -0.0439          | 0.6400             | 0.1192          | -28.9234       | -18.3503     | -1.1534         | -1.1493       |
| 0.5799        | 1.6667 | 125  | 0.5944          | 0.1218         | -0.0954          | 0.6400             | 0.2172          | -29.4390       | -17.8854     | -1.1535         | -1.1491       |
| 0.5812        | 2.0    | 150  | 0.5630          | 0.1556         | -0.1409          | 0.6500             | 0.2965          | -29.8935       | -17.5476     | -1.1544         | -1.1497       |
| 0.5284        | 2.3333 | 175  | 0.5418          | 0.1752         | -0.1786          | 0.6500             | 0.3538          | -30.2706       | -17.3511     | -1.1548         | -1.1499       |
| 0.4992        | 2.6667 | 200  | 0.5285          | 0.1875         | -0.2039          | 0.6500             | 0.3913          | -30.5232       | -17.2286     | -1.1552         | -1.1502       |
| 0.4892        | 3.0    | 225  | 0.5235          | 0.1916         | -0.2145          | 0.6500             | 0.4061          | -30.6293       | -17.1869     | -1.1554         | -1.1503       |
| 0.4895        | 3.3333 | 250  | 0.5212          | 0.1956         | -0.2171          | 0.6500             | 0.4127          | -30.6554       | -17.1470     | -1.1554         | -1.1503       |
| 0.4676        | 3.6667 | 275  | 0.5216          | 0.1945         | -0.2170          | 0.6500             | 0.4115          | -30.6547       | -17.1581     | -1.1553         | -1.1502       |
| 0.5106        | 4.0    | 300  | 0.5211          | 0.1947         | -0.2183          | 0.6500             | 0.4131          | -30.6679       | -17.1558     | -1.1555         | -1.1504       |


### Framework versions

- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.19.2
- Tokenizers 0.19.1