metadata

license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e1
    results: []

llama_DPO_model_e1

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.1779
Rewards/chosen: 0.3527
Rewards/rejected: -1.3764
Rewards/accuracies: 1.0
Rewards/margins: 1.7292
Logps/rejected: -198.5740
Logps/chosen: -157.1067
Logits/rejected: -1.0528
Logits/chosen: -0.8587

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6603	0.1	25	0.6253	0.0416	-0.1007	0.9633	0.1423	-185.8169	-160.2181	-1.0525	-0.8550
0.5342	0.2	50	0.5074	0.1130	-0.3090	1.0	0.4220	-187.8993	-159.5039	-1.0525	-0.8569
0.4382	0.3	75	0.4022	0.1798	-0.5442	1.0	0.7241	-190.2517	-158.8354	-1.0530	-0.8563
0.3592	0.4	100	0.3212	0.2338	-0.7752	1.0	1.0090	-192.5613	-158.2961	-1.0531	-0.8579
0.3035	0.5	125	0.2590	0.2824	-0.9912	1.0	1.2736	-194.7217	-157.8096	-1.0528	-0.8583
0.2374	0.6	150	0.2125	0.3190	-1.1966	1.0	1.5157	-196.7760	-157.4438	-1.0528	-0.8575
0.2094	0.7	175	0.1868	0.3455	-1.3260	1.0	1.6714	-198.0693	-157.1793	-1.0528	-0.8598
0.1886	0.79	200	0.1796	0.3491	-1.3639	1.0	1.7130	-198.4486	-157.1428	-1.0532	-0.8617
0.1805	0.89	225	0.1785	0.3523	-1.3731	1.0	1.7254	-198.5406	-157.1107	-1.0530	-0.8593
0.1821	0.99	250	0.1779	0.3527	-1.3764	1.0	1.7292	-198.5740	-157.1067	-1.0528	-0.8587

Framework versions

PEFT 0.8.2
Transformers 4.38.1
Pytorch 2.2.0+cu118
Datasets 2.17.1
Tokenizers 0.15.2