zephyr-7b-gemma-dpo

This model is a fine-tuned version of HuggingFaceH4/zephyr-7b-gemma-sft-v0.1 on the RedaAlami/PKU-SafeRLHF-Processed dataset. It achieves the following results on the evaluation set:

  • Loss: 97.2382
  • Rewards/chosen: 0.0424
  • Rewards/rejected: 0.0341
  • Rewards/accuracies: 0.6062
  • Rewards/margins: 0.0083
  • Logps/rejected: -2.3880
  • Logps/chosen: -2.3290
  • Logits/rejected: 384.5392
  • Logits/chosen: 412.5483

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
99.2543 0.3017 100 98.5109 0.0407 0.0354 0.5822 0.0053 -2.3624 -2.3625 390.8526 418.0560
98.8709 0.6033 200 98.0235 0.0431 0.0367 0.5788 0.0063 -2.3359 -2.3153 388.3781 415.9555
97.9389 0.9050 300 97.6159 0.0460 0.0381 0.5959 0.0078 -2.3082 -2.2581 386.4085 414.2633
96.4776 1.2066 400 97.3138 0.0431 0.0347 0.5908 0.0083 -2.3763 -2.3158 385.0537 413.0242
97.3613 1.5083 500 97.2518 0.0430 0.0346 0.5908 0.0083 -2.3781 -2.3180 384.5959 412.6117
97.5077 1.8100 600 97.2543 0.0424 0.0341 0.5976 0.0083 -2.3888 -2.3300 384.5274 412.5387

Framework versions

  • PEFT 0.12.0
  • Transformers 4.43.3
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for RedaAlami/zephyr-7b-gemma-dpo

Base model

google/gemma-7b
Adapter
(1)
this model

Dataset used to train RedaAlami/zephyr-7b-gemma-dpo