model_hh_usp4_400 / README.md
guoyu-zhang's picture
model_hh_usp4_400
6c5f9e7 verified
|
raw
history blame
3.83 kB
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-chat-hf
model-index:
  - name: model_hh_usp4_400
    results: []

model_hh_usp4_400

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.8447
  • Rewards/chosen: -12.8728
  • Rewards/rejected: -15.9738
  • Rewards/accuracies: 0.6200
  • Rewards/margins: 3.1010
  • Logps/rejected: -130.8977
  • Logps/chosen: -125.8617
  • Logits/rejected: -0.9554
  • Logits/chosen: -0.9016

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0115 4.0 100 2.1095 -12.3873 -13.8364 0.6200 1.4491 -128.5228 -125.3223 -0.2640 -0.2414
0.0271 8.0 200 3.0011 -0.7751 -1.2662 0.5200 0.4911 -114.5559 -112.4198 -0.1794 -0.1752
0.0029 12.0 300 5.8848 -30.6815 -32.8502 0.5300 2.1687 -149.6492 -145.6491 -0.9305 -0.8856
0.0 16.0 400 4.8706 -12.8599 -15.9629 0.6200 3.1030 -130.8855 -125.8473 -0.9564 -0.9029
0.0 20.0 500 4.8654 -12.8852 -15.9625 0.6200 3.0773 -130.8851 -125.8754 -0.9556 -0.9016
0.0 24.0 600 4.8663 -12.8617 -15.9688 0.6200 3.1071 -130.8921 -125.8493 -0.9551 -0.9013
0.0 28.0 700 4.8599 -12.8977 -15.9664 0.6200 3.0686 -130.8894 -125.8893 -0.9553 -0.9015
0.0 32.0 800 4.8410 -12.8636 -15.9914 0.6200 3.1277 -130.9172 -125.8515 -0.9553 -0.9014
0.0 36.0 900 4.8425 -12.8856 -15.9582 0.6200 3.0726 -130.8803 -125.8759 -0.9551 -0.9014
0.0 40.0 1000 4.8447 -12.8728 -15.9738 0.6200 3.1010 -130.8977 -125.8617 -0.9554 -0.9016

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2