Edit model card

eurus-dpop-qlora-uf-ours-5e-6

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the generation/UF dataset. It achieves the following results on the evaluation set:

  • Loss: 5.2156
  • Positive Losses: 44.7294
  • Dpo Losses: 0.6420
  • Rewards/chosen: -0.4379
  • Rewards/rejected: -0.6399
  • Rewards/accuracies: 0.6280
  • Rewards/margins: 0.2020
  • Rewards/margins Max: 1.1678
  • Rewards/margins Min: -0.5905
  • Rewards/margins Std: 0.5855
  • Logps/rejected: -321.5092
  • Logps/chosen: -318.6669
  • Logits/rejected: -2.0623
  • Logits/chosen: -2.1787

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6736 0.28 100 1.3303 6.3472 0.6796 -0.0381 -0.0727 0.6030 0.0346 0.3012 -0.1958 0.1617 -264.7934 -278.6886 -2.1474 -2.2655
0.5967 0.56 200 1.9249 12.1132 0.6721 -0.0924 -0.1544 0.5930 0.0619 0.4845 -0.3189 0.2624 -272.9586 -284.1257 -2.2051 -2.3263
0.5403 0.85 300 2.2645 15.4958 0.6655 -0.1316 -0.2109 0.6030 0.0792 0.5268 -0.3293 0.2845 -278.6066 -288.0423 -2.1931 -2.3125
0.5489 1.13 400 2.7577 20.2944 0.6603 -0.1822 -0.2848 0.6170 0.1026 0.6736 -0.3927 0.3533 -285.9984 -293.0988 -2.1500 -2.2685
0.4521 1.41 500 3.3498 26.1254 0.6549 -0.2464 -0.3696 0.6080 0.1232 0.7653 -0.4233 0.3948 -294.4765 -299.5168 -2.1093 -2.2289
0.4973 1.69 600 3.2114 24.9181 0.6525 -0.2330 -0.3588 0.6280 0.1258 0.7463 -0.4100 0.3853 -293.4038 -298.1804 -2.0925 -2.2110
0.4859 1.97 700 3.9841 32.5303 0.6484 -0.3118 -0.4659 0.6230 0.1542 0.9148 -0.4919 0.4674 -304.1142 -306.0565 -2.0901 -2.2081
0.3213 2.25 800 5.6914 49.4901 0.6455 -0.4866 -0.6893 0.6210 0.2027 1.2066 -0.6341 0.6132 -326.4517 -323.5386 -2.0652 -2.1817
0.4163 2.54 900 5.0729 43.3077 0.6426 -0.4232 -0.6206 0.6270 0.1975 1.1450 -0.5818 0.5750 -319.5832 -317.1975 -2.0654 -2.1825
0.3992 2.82 1000 5.1952 44.5160 0.6420 -0.4357 -0.6373 0.6300 0.2016 1.1648 -0.5900 0.5841 -321.2483 -318.4470 -2.0618 -2.1784

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/eurus-dpop-qlora-uf-ours-5e-6

Adapter
(18)
this model