metadata

license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: Llama-2-7b-hf-DPO-PartialEval_ET0.1_MT1.2_V.1.0
    results: []

Llama-2-7b-hf-DPO-PartialEval_ET0.1_MT1.2_V.1.0

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.7063
Rewards/chosen: -1.8087
Rewards/rejected: -2.4327
Rewards/accuracies: 0.7000
Rewards/margins: 0.6240
Logps/rejected: -101.7455
Logps/chosen: -109.7301
Logits/rejected: -1.0472
Logits/chosen: -1.0455

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6461	0.3009	68	0.6839	0.0330	0.0082	0.6000	0.0248	-77.3366	-91.3130	-0.2945	-0.2863
0.9723	0.6018	136	0.6779	0.0339	-0.0486	0.7000	0.0824	-77.9042	-91.3046	-0.3345	-0.3256
0.6461	0.9027	204	0.6352	-0.0081	-0.2128	0.8000	0.2047	-79.5466	-91.7240	-0.3939	-0.3854
0.2832	1.2035	272	0.5825	-0.8764	-1.2440	0.7000	0.3676	-89.8586	-100.4076	-0.6262	-0.6198
0.1923	1.5044	340	0.5559	-1.1573	-1.6161	0.7000	0.4587	-93.5792	-103.2166	-0.6844	-0.6797
0.3898	1.8053	408	0.6173	-1.3556	-1.8473	0.7000	0.4918	-95.8919	-105.1990	-0.8939	-0.8905
0.3404	2.1062	476	0.6381	-1.3063	-1.8875	0.7000	0.5812	-96.2932	-104.7061	-0.9068	-0.9042
0.4954	2.4071	544	0.6915	-1.7445	-2.3721	0.7000	0.6276	-101.1399	-109.0883	-1.0304	-1.0288
0.3914	2.7080	612	0.7063	-1.8087	-2.4327	0.7000	0.6240	-101.7455	-109.7301	-1.0472	-1.0455

Framework versions

PEFT 0.10.0
Transformers 4.40.2
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1