phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1-lora

This model is a fine-tuned version of Yhyu13/phi-2-sft-alpaca_gpt4_en-ep1 on the comparison_gpt4_en dataset. It achieves the following results on the evaluation set:

Loss: 0.0168
Rewards/chosen: -1.5750
Rewards/rejected: -11.4002
Rewards/accuracies: 0.9956
Rewards/margins: 9.8253
Logps/rejected: -142.2352
Logps/chosen: -139.5300
Logits/rejected: 0.6066
Logits/chosen: 0.9744

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 8
total_eval_batch_size: 2
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0534	0.24	1000	0.0217	-1.6714	-10.2359	0.9945	8.5645	-130.5921	-140.4941	0.3064	0.5735
0.0182	0.49	2000	0.0175	-1.5469	-10.9602	0.9951	9.4133	-137.8349	-139.2487	0.6230	1.0709
0.0162	0.73	3000	0.0171	-1.5517	-11.4444	0.9962	9.8927	-142.6772	-139.2976	0.6325	1.0048
0.0154	0.98	4000	0.0168	-1.5741	-11.4004	0.9956	9.8262	-142.2364	-139.5214	0.6051	0.9729

Framework versions

PEFT 0.7.1
Transformers 4.36.2
Pytorch 2.0.1+cu117
Datasets 2.14.5
Tokenizers 0.15.0

Yhyu13
/

phi-2-sft-dpo-gpt4_en-ep1-lora

phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1-lora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Yhyu13/phi-2-sft-dpo-gpt4_en-ep1-lora

Evaluation results