metadata

license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-chat-hf
model-index:
  - name: model_shp1_dpo1
    results: []

model_shp1_dpo1

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.0112
Rewards/chosen: -9.7625
Rewards/rejected: -9.2926
Rewards/accuracies: 0.4700
Rewards/margins: -0.4699
Logps/rejected: -307.4124
Logps/chosen: -345.0927
Logits/rejected: -1.0692
Logits/chosen: -1.0975

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0741	2.67	100	1.1825	-3.1899	-3.0393	0.4800	-0.1506	-244.8796	-279.3668	-1.2034	-1.2620
0.0016	5.33	200	2.1179	-8.9597	-8.2224	0.4100	-0.7372	-296.7111	-337.0645	-1.1154	-1.1503
0.0001	8.0	300	1.9624	-9.5308	-9.0562	0.4500	-0.4746	-305.0487	-342.7763	-1.0878	-1.1168
0.0001	10.67	400	1.9799	-9.6041	-9.1296	0.4500	-0.4745	-305.7831	-343.5089	-1.0797	-1.1079
0.0001	13.33	500	1.9938	-9.6787	-9.2063	0.4500	-0.4724	-306.5495	-344.2545	-1.0746	-1.1031
0.0001	16.0	600	2.0046	-9.7222	-9.2446	0.4600	-0.4776	-306.9330	-344.6898	-1.0722	-1.0999
0.0001	18.67	700	2.0079	-9.7525	-9.2749	0.4500	-0.4776	-307.2361	-344.9933	-1.0706	-1.0984
0.0001	21.33	800	2.0091	-9.7588	-9.2867	0.4600	-0.4721	-307.3541	-345.0561	-1.0699	-1.0978
0.0001	24.0	900	2.0158	-9.7704	-9.2915	0.4500	-0.4789	-307.4015	-345.1719	-1.0694	-1.0975
0.0001	26.67	1000	2.0112	-9.7625	-9.2926	0.4700	-0.4699	-307.4124	-345.0927	-1.0692	-1.0975

Framework versions

PEFT 0.10.0
Transformers 4.39.1
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.15.2