---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.4935
- Rewards/chosen: -6.2215
- Rewards/rejected: -5.6448
- Rewards/accuracies: 0.3626
- Rewards/margins: -0.5767
- Logps/rejected: -85.0207
- Logps/chosen: -85.6008
- Logits/rejected: -5.8605
- Logits/chosen: -5.8604

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 1.1357        | 0.1   | 50   | 1.1734          | -1.9602        | -1.6509          | 0.3582             | -0.3094         | -45.0812       | -42.9883     | -3.1053         | -3.1052       |
| 1.7275        | 0.2   | 100  | 1.5539          | -4.8260        | -4.4502          | 0.3978             | -0.3758         | -73.0739       | -71.6456     | -2.7839         | -2.7839       |
| 1.6716        | 0.29  | 150  | 1.4805          | -4.2682        | -3.8441          | 0.3890             | -0.4241         | -67.0136       | -66.0676     | -3.8634         | -3.8634       |
| 1.9883        | 0.39  | 200  | 1.4624          | -4.1549        | -3.7121          | 0.3648             | -0.4429         | -65.6932       | -64.9352     | -4.6023         | -4.6023       |
| 1.2968        | 0.49  | 250  | 1.4720          | -4.1636        | -3.7323          | 0.3802             | -0.4312         | -65.8957       | -65.0215     | -4.0699         | -4.0699       |
| 1.5145        | 0.59  | 300  | 1.4656          | -4.1401        | -3.6836          | 0.3626             | -0.4564         | -65.4088       | -64.7864     | -4.8231         | -4.8231       |
| 1.7123        | 0.68  | 350  | 1.4617          | -4.1237        | -3.6671          | 0.3670             | -0.4567         | -65.2432       | -64.6233     | -4.7696         | -4.7696       |
| 1.295         | 0.78  | 400  | 1.4632          | -4.1764        | -3.7222          | 0.3714             | -0.4543         | -65.7941       | -65.1502     | -4.9799         | -4.9799       |
| 1.405         | 0.88  | 450  | 1.4666          | -4.1922        | -3.7464          | 0.3714             | -0.4458         | -66.0363       | -65.3076     | -5.0856         | -5.0856       |
| 1.9129        | 0.98  | 500  | 1.4701          | -4.2370        | -3.7742          | 0.3648             | -0.4628         | -66.3146       | -65.7560     | -5.1195         | -5.1195       |
| 1.2959        | 1.07  | 550  | 1.4889          | -4.3597        | -3.8796          | 0.3692             | -0.4802         | -67.3681       | -66.9833     | -5.1899         | -5.1899       |
| 1.2707        | 1.17  | 600  | 1.5193          | -4.6364        | -4.1231          | 0.3582             | -0.5133         | -69.8035       | -69.7498     | -5.9136         | -5.9136       |
| 1.3242        | 1.27  | 650  | 1.5168          | -4.6159        | -4.1101          | 0.3538             | -0.5057         | -69.6739       | -69.5444     | -5.3603         | -5.3603       |
| 1.397         | 1.37  | 700  | 2.1272          | -6.5216        | -6.2977          | 0.4022             | -0.2239         | -91.5493       | -88.6020     | -3.4923         | -3.4922       |
| 1.3107        | 1.46  | 750  | 1.4798          | -4.5654        | -4.0673          | 0.3626             | -0.4981         | -69.2450       | -69.0399     | -5.4624         | -5.4624       |
| 1.2491        | 1.56  | 800  | 1.4610          | -4.8769        | -4.3575          | 0.3648             | -0.5193         | -72.1476       | -72.1544     | -5.2893         | -5.2893       |
| 1.3924        | 1.66  | 850  | 1.4805          | -5.8437        | -5.2709          | 0.3473             | -0.5728         | -81.2817       | -81.8233     | -5.6057         | -5.6058       |
| 1.1725        | 1.76  | 900  | 1.4957          | -6.2498        | -5.6711          | 0.3626             | -0.5787         | -85.2834       | -85.8838     | -5.8532         | -5.8531       |
| 1.2113        | 1.86  | 950  | 1.4937          | -6.2249        | -5.6485          | 0.3626             | -0.5763         | -85.0578       | -85.6343     | -5.8631         | -5.8630       |
| 1.5057        | 1.95  | 1000 | 1.4935          | -6.2215        | -5.6448          | 0.3626             | -0.5767         | -85.0207       | -85.6008     | -5.8605         | -5.8604       |


### Framework versions

- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2