zephyr-7b-dpo-selfgen

This model is a fine-tuned version of EllieS/zephyr-7b-sft-qlora on the EllieS/pubmedqa_dpo_selfgen_data dataset. It achieves the following results on the evaluation set:

Loss: 0.0000
Rewards/chosen: -6.6466
Rewards/rejected: -19.5106
Rewards/accuracies: 1.0
Rewards/margins: 12.8639
Logps/rejected: -1996.6047
Logps/chosen: -731.7379
Logits/rejected: -2.0588
Logits/chosen: -2.4883

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 2
total_train_batch_size: 8
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
0.0241	0.42	7000	-2.8328	-2.8312	-143.5124	-856.1008	0.0101	1.0	-0.7644	7.3411	-8.1055
0.0001	0.83	14000	-2.3450	-1.9435	-714.5292	-1741.5647	0.0002	1.0	-6.4745	10.4856	-16.9602
0.0003	1.25	21000	-2.4293	-2.0264	-695.5377	-1973.5151	0.0001	1.0	-6.2846	12.9950	-19.2797
0.0	1.67	28000	-2.5393	-2.1793	-619.2334	-1821.8682	0.0001	1.0	-5.5216	12.2416	-17.7632
0.0001	2.09	35000	-2.4633	-1.9800	-817.4478	-2071.8862	0.0000	1.0	-7.5037	12.7596	-20.2634
0.0	2.5	42000	-2.4883	-2.0593	-730.7642	-2000.8484	0.0000	1.0	-6.6369	12.9161	-19.5530
0.0001	2.92	49000	-2.4895	-2.0591	-732.9475	-1999.9326	0.0000	1.0	-6.6587	12.8851	-19.5438

Framework versions

PEFT 0.7.1
Transformers 4.36.2
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

EllieS
/

zephyr-7b-dpo-selfgen

zephyr-7b-dpo-selfgen

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for EllieS/zephyr-7b-dpo-selfgen

Dataset used to train EllieS/zephyr-7b-dpo-selfgen

Evaluation results