---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9939
- Rewards/chosen: -3.9532
- Rewards/rejected: -5.6547
- Rewards/accuracies: 0.6000
- Rewards/margins: 1.7015
- Logps/rejected: -85.1197
- Logps/chosen: -62.9180
- Logits/rejected: -2.0229
- Logits/chosen: -2.0243

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6073        | 0.1   | 50   | 0.6623          | -1.2716        | -1.5743          | 0.5736             | 0.3026          | -44.3150       | -36.1020     | -2.8014         | -2.8019       |
| 0.7223        | 0.2   | 100  | 0.7934          | -3.0203        | -3.2538          | 0.5077             | 0.2336          | -61.1108       | -53.5883     | -2.4237         | -2.4243       |
| 0.8563        | 0.29  | 150  | 0.7580          | -1.8675        | -2.3470          | 0.5604             | 0.4795          | -52.0427       | -42.0607     | -2.5521         | -2.5529       |
| 0.7701        | 0.39  | 200  | 0.7631          | -1.8702        | -2.1583          | 0.5231             | 0.2882          | -50.1556       | -42.0875     | -2.7052         | -2.7056       |
| 0.8749        | 0.49  | 250  | 0.7941          | -2.4787        | -2.6066          | 0.4879             | 0.1279          | -54.6385       | -48.1731     | -2.8184         | -2.8189       |
| 0.6954        | 0.59  | 300  | 0.8039          | -1.5721        | -1.9872          | 0.5473             | 0.4151          | -48.4439       | -39.1064     | -2.8263         | -2.8268       |
| 0.733         | 0.68  | 350  | 0.7751          | -0.5753        | -1.0891          | 0.5253             | 0.5138          | -39.4632       | -29.1387     | -2.7587         | -2.7591       |
| 0.8256        | 0.78  | 400  | 0.7376          | -1.2950        | -1.7911          | 0.5516             | 0.4962          | -46.4838       | -36.3354     | -2.9702         | -2.9707       |
| 0.6485        | 0.88  | 450  | 0.7344          | -1.7798        | -2.3960          | 0.5692             | 0.6162          | -52.5322       | -41.1838     | -2.7167         | -2.7174       |
| 0.612         | 0.98  | 500  | 0.7051          | -1.3500        | -2.0968          | 0.5978             | 0.7467          | -49.5400       | -36.8863     | -2.5131         | -2.5138       |
| 0.2108        | 1.07  | 550  | 0.7799          | -2.0131        | -3.4580          | 0.6418             | 1.4449          | -63.1524       | -43.5171     | -2.2469         | -2.2482       |
| 0.1378        | 1.17  | 600  | 0.9314          | -3.4717        | -5.1214          | 0.6198             | 1.6497          | -79.7863       | -58.1027     | -1.9917         | -1.9933       |
| 0.188         | 1.27  | 650  | 0.9857          | -3.6647        | -5.3449          | 0.6198             | 1.6803          | -82.0219       | -60.0328     | -1.9585         | -1.9601       |
| 0.3739        | 1.37  | 700  | 1.0046          | -3.6506        | -5.3352          | 0.6176             | 1.6846          | -81.9245       | -59.8915     | -2.0334         | -2.0349       |
| 0.0428        | 1.46  | 750  | 0.9881          | -3.8094        | -5.4955          | 0.6088             | 1.6861          | -83.5278       | -61.4803     | -2.0272         | -2.0287       |
| 0.131         | 1.56  | 800  | 0.9900          | -3.9653        | -5.6306          | 0.6022             | 1.6653          | -84.8782       | -63.0390     | -2.0228         | -2.0242       |
| 0.1558        | 1.66  | 850  | 0.9943          | -3.9735        | -5.6628          | 0.6000             | 1.6893          | -85.2000       | -63.1207     | -2.0177         | -2.0191       |
| 0.1876        | 1.76  | 900  | 0.9939          | -3.9576        | -5.6566          | 0.6000             | 1.6989          | -85.1381       | -62.9622     | -2.0227         | -2.0241       |
| 0.1415        | 1.86  | 950  | 0.9945          | -3.9552        | -5.6536          | 0.6022             | 1.6984          | -85.1084       | -62.9377     | -2.0232         | -2.0246       |
| 0.1163        | 1.95  | 1000 | 0.9939          | -3.9532        | -5.6547          | 0.6000             | 1.7015          | -85.1197       | -62.9180     | -2.0229         | -2.0243       |


### Framework versions

- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2