---
license: apache-2.0
base_model: tsavage68/mistralit2_1000_STEPS_5e7_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Mistral2_1000_STEPS_05beta_CDPOSFT
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Mistral2_1000_STEPS_05beta_CDPOSFT

This model is a fine-tuned version of [tsavage68/mistralit2_1000_STEPS_5e7_SFT](https://huggingface.co/tsavage68/mistralit2_1000_STEPS_5e7_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.5804
- Rewards/chosen: 1.1615
- Rewards/rejected: 0.9028
- Rewards/accuracies: 0.4286
- Rewards/margins: 0.2587
- Logps/rejected: -75.7158
- Logps/chosen: -73.1790
- Logits/rejected: -1.8951
- Logits/chosen: -1.8951

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 2.0435        | 0.0977 | 50   | 1.6546          | -0.2111        | -0.0540          | 0.3868             | -0.1572         | -77.6294       | -75.9242     | -1.6195         | -1.6195       |
| 3.1098        | 0.1953 | 100  | 1.7215          | -0.1670        | -0.5209          | 0.4286             | 0.3539          | -78.5632       | -75.8359     | -0.7416         | -0.7416       |
| 1.8949        | 0.2930 | 150  | 1.6841          | 2.1671         | 1.9786           | 0.4154             | 0.1886          | -73.5644       | -71.1677     | -1.7619         | -1.7619       |
| 1.4406        | 0.3906 | 200  | 1.6936          | 2.3177         | 2.1054           | 0.4264             | 0.2124          | -73.3108       | -70.8665     | -2.2879         | -2.2879       |
| 1.5623        | 0.4883 | 250  | 1.5911          | 0.8418         | 0.4811           | 0.4396             | 0.3607          | -76.5593       | -73.8184     | -1.5834         | -1.5834       |
| 1.8884        | 0.5859 | 300  | 1.5747          | 1.4552         | 1.2105           | 0.4418             | 0.2447          | -75.1005       | -72.5916     | -1.6640         | -1.6640       |
| 1.4373        | 0.6836 | 350  | 1.5569          | 1.3020         | 1.0909           | 0.4198             | 0.2111          | -75.3397       | -72.8979     | -1.9137         | -1.9136       |
| 1.4732        | 0.7812 | 400  | 1.5216          | 1.0023         | 0.6676           | 0.4571             | 0.3347          | -76.1863       | -73.4973     | -1.9794         | -1.9794       |
| 1.9109        | 0.8789 | 450  | 1.5502          | 1.3520         | 0.9986           | 0.4505             | 0.3534          | -75.5243       | -72.7979     | -1.8076         | -1.8076       |
| 1.4744        | 0.9766 | 500  | 1.5531          | 1.3605         | 1.1014           | 0.4264             | 0.2591          | -75.3186       | -72.7809     | -1.9385         | -1.9385       |
| 1.2615        | 1.0742 | 550  | 1.6623          | 0.6530         | 0.4114           | 0.4242             | 0.2415          | -76.6986       | -74.1960     | -2.3949         | -2.3949       |
| 1.8019        | 1.1719 | 600  | 1.6240          | 0.8707         | 0.6200           | 0.4308             | 0.2507          | -76.2815       | -73.7606     | -1.6149         | -1.6149       |
| 1.2202        | 1.2695 | 650  | 1.5993          | 1.1246         | 0.9014           | 0.4330             | 0.2233          | -75.7188       | -73.2527     | -1.8964         | -1.8964       |
| 1.0924        | 1.3672 | 700  | 1.5922          | 1.3888         | 1.1674           | 0.4242             | 0.2214          | -75.1866       | -72.7243     | -1.8455         | -1.8455       |
| 0.8059        | 1.4648 | 750  | 1.6004          | 1.1205         | 0.8834           | 0.4396             | 0.2371          | -75.7547       | -73.2610     | -1.9415         | -1.9415       |
| 0.9489        | 1.5625 | 800  | 1.5917          | 1.2725         | 1.0232           | 0.4264             | 0.2493          | -75.4751       | -72.9570     | -1.9293         | -1.9293       |
| 1.2564        | 1.6602 | 850  | 1.5797          | 1.1856         | 0.9286           | 0.4264             | 0.2570          | -75.6643       | -73.1308     | -1.8894         | -1.8894       |
| 1.2613        | 1.7578 | 900  | 1.5806          | 1.1682         | 0.9110           | 0.4308             | 0.2572          | -75.6995       | -73.1655     | -1.8963         | -1.8963       |
| 1.1197        | 1.8555 | 950  | 1.5804          | 1.1615         | 0.9030           | 0.4286             | 0.2585          | -75.7156       | -73.1791     | -1.8955         | -1.8955       |
| 0.7665        | 1.9531 | 1000 | 1.5804          | 1.1615         | 0.9028           | 0.4286             | 0.2587          | -75.7158       | -73.1790     | -1.8951         | -1.8951       |


### Framework versions

- Transformers 4.40.1
- Pytorch 2.0.0+cu117
- Datasets 2.19.0
- Tokenizers 0.19.1