model_hh_usp4_400 / README.md
guoyu-zhang's picture
model_hh_usp4_400
fb04e6c verified
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-chat-hf
model-index:
  - name: model_hh_usp4_400
    results: []

model_hh_usp4_400

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.4266
  • Rewards/chosen: -7.2918
  • Rewards/rejected: -9.3870
  • Rewards/accuracies: 0.5500
  • Rewards/margins: 2.0952
  • Logps/rejected: -122.5611
  • Logps/chosen: -121.2051
  • Logits/rejected: -0.2787
  • Logits/chosen: -0.2572

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.098 4.0 100 2.3307 -3.2289 -4.9359 0.5700 1.7070 -117.6155 -116.6908 -0.5136 -0.5011
0.2615 8.0 200 3.5637 -3.5399 -4.5546 0.5700 1.0147 -117.1918 -117.0363 -0.4837 -0.4844
0.0137 12.0 300 4.2146 -3.4955 -5.8321 0.5600 2.3366 -118.6113 -116.9870 -0.3503 -0.3327
0.0 16.0 400 4.4247 -7.2840 -9.3968 0.5500 2.1128 -122.5721 -121.1964 -0.2788 -0.2574
0.0 20.0 500 4.4045 -7.2800 -9.4193 0.5600 2.1393 -122.5971 -121.1920 -0.2793 -0.2578
0.0 24.0 600 4.4242 -7.2774 -9.3711 0.5600 2.0936 -122.5435 -121.1891 -0.2789 -0.2573
0.0 28.0 700 4.4048 -7.2951 -9.4062 0.5600 2.1110 -122.5825 -121.2088 -0.2785 -0.2570
0.0 32.0 800 4.4098 -7.2804 -9.3847 0.5500 2.1043 -122.5586 -121.1924 -0.2783 -0.2569
0.0 36.0 900 4.4251 -7.2849 -9.3768 0.5500 2.0918 -122.5498 -121.1974 -0.2792 -0.2575
0.0 40.0 1000 4.4266 -7.2918 -9.3870 0.5500 2.0952 -122.5611 -121.2051 -0.2787 -0.2572

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2