zephyr-7b-gemma-dpo

This model is a fine-tuned version of HuggingFaceH4/zephyr-7b-gemma-sft-v0.1 on the RedaAlami/PKU-SafeRLHF-Processed dataset. It achieves the following results on the evaluation set:

Loss: 97.2382
Rewards/chosen: 0.0424
Rewards/rejected: 0.0341
Rewards/accuracies: 0.6062
Rewards/margins: 0.0083
Logps/rejected: -2.3880
Logps/chosen: -2.3290
Logits/rejected: 384.5392
Logits/chosen: 412.5483

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
99.2543	0.3017	100	98.5109	0.0407	0.0354	0.5822	0.0053	-2.3624	-2.3625	390.8526	418.0560
98.8709	0.6033	200	98.0235	0.0431	0.0367	0.5788	0.0063	-2.3359	-2.3153	388.3781	415.9555
97.9389	0.9050	300	97.6159	0.0460	0.0381	0.5959	0.0078	-2.3082	-2.2581	386.4085	414.2633
96.4776	1.2066	400	97.3138	0.0431	0.0347	0.5908	0.0083	-2.3763	-2.3158	385.0537	413.0242
97.3613	1.5083	500	97.2518	0.0430	0.0346	0.5908	0.0083	-2.3781	-2.3180	384.5959	412.6117
97.5077	1.8100	600	97.2543	0.0424	0.0341	0.5976	0.0083	-2.3888	-2.3300	384.5274	412.5387

Framework versions

PEFT 0.12.0
Transformers 4.43.3
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RedaAlami
/

zephyr-7b-gemma-dpo

zephyr-7b-gemma-dpo

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RedaAlami/zephyr-7b-gemma-dpo

Dataset used to train RedaAlami/zephyr-7b-gemma-dpo

Evaluation results