---
license: llama2
base_model: elichen3051/llama2-7b-sft-chat-no-template
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
- HuggingFaceH4/orca_dpo_pairs
- HuggingFaceH4/cai-conversation-harmless
model-index:
- name: Llama2-7b-sft-chat-custom-template-dpo
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/eli3051/huggingface/runs/6n0utdab)
# Llama2-7b-sft-chat-custom-template-dpo

This model is a fine-tuned version of [elichen3051/llama2-7b-sft-chat-no-template](https://huggingface.co/elichen3051/llama2-7b-sft-chat-no-template) on the HuggingFaceH4/ultrafeedback_binarized, the HuggingFaceH4/orca_dpo_pairs and the HuggingFaceH4/cai-conversation-harmless datasets.
It achieves the following results on the evaluation set:
- Loss: 0.4717
- Rewards/chosen: -1.6807
- Rewards/rejected: -3.1957
- Rewards/accuracies: 0.6345
- Rewards/margins: 1.5150
- Logps/rejected: -519.5196
- Logps/chosen: -379.2986
- Logits/rejected: -2.7275
- Logits/chosen: -2.7213

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 7
- gradient_accumulation_steps: 8
- total_train_batch_size: 448
- total_eval_batch_size: 56
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 2

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6727        | 0.2032 | 43   | 0.6714          | -0.0530        | -0.0999          | 0.5871             | 0.0470          | -209.9431      | -216.5270    | -2.2167         | -2.2006       |
| 0.6056        | 0.4064 | 86   | 0.6041          | -0.5876        | -0.8878          | 0.6023             | 0.3002          | -288.7347      | -269.9940    | -3.0277         | -3.0177       |
| 0.573         | 0.6096 | 129  | 0.5451          | -0.9286        | -1.6015          | 0.6174             | 0.6729          | -360.0960      | -304.0913    | -2.9301         | -2.9238       |
| 0.5239        | 0.8128 | 172  | 0.5123          | -1.2863        | -2.2358          | 0.6288             | 0.9495          | -423.5324      | -339.8588    | -2.9884         | -2.9803       |
| 0.4668        | 1.0159 | 215  | 0.4945          | -1.4994        | -2.6377          | 0.6439             | 1.1383          | -463.7195      | -361.1752    | -2.5910         | -2.5843       |
| 0.4607        | 1.2191 | 258  | 0.4816          | -1.5810        | -2.8887          | 0.6402             | 1.3077          | -488.8177      | -369.3280    | -2.8026         | -2.7951       |
| 0.5068        | 1.4223 | 301  | 0.4764          | -1.5805        | -3.0061          | 0.6402             | 1.4256          | -500.5590      | -369.2790    | -2.7586         | -2.7513       |
| 0.4724        | 1.6255 | 344  | 0.4730          | -1.6832        | -3.1741          | 0.6383             | 1.4909          | -517.3631      | -379.5493    | -2.6296         | -2.6237       |
| 0.4836        | 1.8287 | 387  | 0.4718          | -1.6795        | -3.1900          | 0.6420             | 1.5105          | -518.9514      | -379.1832    | -2.6434         | -2.6374       |


### Framework versions

- Transformers 4.42.0.dev0
- Pytorch 2.3.1
- Datasets 2.19.2
- Tokenizers 0.19.1