zephyr-7b-dpo-full / README.md
RikkiXu's picture
Model save
a11c741 verified
|
raw
history blame
No virus
3.53 kB
metadata
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3947
  • Rewards/chosen: -2.4314
  • Rewards/rejected: -2.0023
  • Rewards/accuracies: 0.3867
  • Rewards/margins: -0.4292
  • Logps/rejected: -517.7516
  • Logps/chosen: -554.9180
  • Logits/rejected: -1.0823
  • Logits/chosen: -1.1239

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.3047 0.1 100 -2.4405 -2.3863 -361.0801 -337.7748 0.8551 0.3203 -0.4930 -0.2905 -0.2025
0.1861 0.21 200 -1.5418 -1.5107 -450.2716 -421.0934 1.0495 0.3867 -1.3850 -0.3493 -1.0357
0.1608 0.31 300 -1.4367 -1.4022 -454.9446 -422.9684 1.0910 0.3945 -1.4317 -0.3772 -1.0544
0.1368 0.42 400 -1.0538 -1.0131 -520.1699 -479.6456 1.3010 0.4102 -2.0839 -0.4627 -1.6212
0.1364 0.52 500 -1.6466 -1.6090 -470.0934 -430.8614 1.1773 0.3711 -1.5832 -0.4498 -1.1334
0.1223 0.63 600 1.3206 -2.2971 -1.8297 0.4141 -0.4674 -500.4930 -541.4883 -1.1541 -1.1880
0.0971 0.73 700 1.4638 -2.6554 -2.1594 0.3906 -0.4959 -533.4667 -577.3128 -0.9392 -0.9712
0.1035 0.84 800 1.4475 -2.5761 -2.1538 0.3945 -0.4222 -532.9068 -569.3817 -0.8902 -0.9232
0.088 0.94 900 1.3947 -2.4314 -2.0023 0.3867 -0.4292 -517.7516 -554.9180 -1.0823 -1.1239

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.2+cu118
  • Datasets 2.16.1
  • Tokenizers 0.15.2