---
license: mit
base_model: HuggingFaceH4/mistral-7b-sft-beta
tags:
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-full-debug-regression
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-full-debug-regression

This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.7240
- Rewards/chosen: -4.3843
- Rewards/rejected: -7.9101
- Rewards/accuracies: 0.7640
- Rewards/margins: 3.5258
- Logps/rejected: -311.4621
- Logps/chosen: -319.5667
- Logits/rejected: -2.4790
- Logits/chosen: -2.5088

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.533         | 0.26  | 500  | 0.5084          | -0.1902        | -1.3680          | 0.7780             | 1.1778          | -246.0413      | -277.6251    | -2.9319         | -2.9487       |
| 0.4907        | 0.52  | 1000 | 0.5234          | -0.3346        | -1.8153          | 0.7620             | 1.4807          | -250.5139      | -279.0693    | -2.8401         | -2.8442       |
| 0.4388        | 0.77  | 1500 | 0.5202          | -0.7856        | -2.2720          | 0.7920             | 1.4864          | -255.0812      | -283.5798    | -2.7420         | -2.7444       |
| 0.0651        | 1.03  | 2000 | 0.5049          | -1.0044        | -2.8702          | 0.7860             | 1.8658          | -261.0635      | -285.7675    | -2.7335         | -2.7412       |
| 0.0887        | 1.29  | 2500 | 0.5946          | -1.9888        | -3.9256          | 0.7480             | 1.9368          | -271.6175      | -295.6113    | -2.5940         | -2.6173       |
| 0.0747        | 1.55  | 3000 | 0.5748          | -1.9590        | -4.0271          | 0.7560             | 2.0681          | -272.6327      | -295.3135    | -2.4969         | -2.5205       |
| 0.101         | 1.81  | 3500 | 0.5783          | -1.9521        | -4.1853          | 0.7680             | 2.2332          | -274.2144      | -295.2442    | -2.5069         | -2.5278       |
| 0.0195        | 2.07  | 4000 | 0.6253          | -2.9322        | -5.7633          | 0.7600             | 2.8310          | -289.9938      | -305.0455    | -2.4935         | -2.5158       |
| 0.0191        | 2.32  | 4500 | 0.7215          | -4.2183        | -7.6216          | 0.7620             | 3.4034          | -308.5774      | -317.9060    | -2.4756         | -2.5036       |
| 0.0105        | 2.58  | 5000 | 0.7341          | -4.2607        | -7.7440          | 0.7600             | 3.4833          | -309.8016      | -318.3306    | -2.5156         | -2.5437       |
| 0.0092        | 2.84  | 5500 | 0.7330          | -4.3756        | -7.9435          | 0.7600             | 3.5679          | -311.7966      | -319.4794    | -2.4856         | -2.5149       |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.1.0+cu118
- Datasets 2.14.6
- Tokenizers 0.14.1