IE_L3_1000steps_1e5rate_03beta_SFT

This model is a fine-tuned version of tsavage68/IE_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1802
  • Rewards/chosen: -1.8241
  • Rewards/rejected: -17.1487
  • Rewards/accuracies: 0.7400
  • Rewards/margins: 15.3246
  • Logps/rejected: -132.7896
  • Logps/chosen: -88.8782
  • Logits/rejected: -0.8401
  • Logits/chosen: -0.7195

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1906 0.4 50 0.1802 -1.7929 -17.0298 0.7400 15.2369 -132.3931 -88.7740 -0.8398 -0.7194
0.1386 0.8 100 0.1802 -1.7771 -17.0460 0.7400 15.2689 -132.4471 -88.7214 -0.8398 -0.7192
0.1386 1.2 150 0.1802 -1.7982 -17.0858 0.7400 15.2876 -132.5800 -88.7917 -0.8401 -0.7193
0.1733 1.6 200 0.1802 -1.7978 -17.0381 0.7400 15.2403 -132.4209 -88.7903 -0.8396 -0.7190
0.2253 2.0 250 0.1802 -1.7877 -17.0275 0.7400 15.2398 -132.3854 -88.7567 -0.8395 -0.7189
0.1386 2.4 300 0.1802 -1.8012 -17.0499 0.7400 15.2487 -132.4602 -88.8018 -0.8399 -0.7195
0.1213 2.8 350 0.1802 -1.7983 -17.0687 0.7400 15.2705 -132.5230 -88.7921 -0.8395 -0.7189
0.1906 3.2 400 0.1802 -1.7995 -17.0794 0.7400 15.2799 -132.5586 -88.7960 -0.8403 -0.7193
0.1906 3.6 450 0.1802 -1.8034 -17.0941 0.7400 15.2908 -132.6077 -88.8090 -0.8399 -0.7193
0.2079 4.0 500 0.1802 -1.8158 -17.1281 0.7400 15.3123 -132.7209 -88.8505 -0.8397 -0.7185
0.156 4.4 550 0.1802 -1.8012 -17.1383 0.7400 15.3371 -132.7549 -88.8016 -0.8406 -0.7196
0.1213 4.8 600 0.1802 -1.7944 -17.0830 0.7400 15.2886 -132.5706 -88.7792 -0.8403 -0.7195
0.1906 5.2 650 0.1802 -1.7935 -17.1490 0.7400 15.3555 -132.7905 -88.7761 -0.8407 -0.7197
0.2426 5.6 700 0.1802 -1.7991 -17.1635 0.7400 15.3644 -132.8388 -88.7946 -0.8399 -0.7188
0.2599 6.0 750 0.1802 -1.7918 -17.1508 0.7400 15.3590 -132.7967 -88.7704 -0.8392 -0.7182
0.1213 6.4 800 0.1802 -1.8045 -17.1834 0.7400 15.3789 -132.9053 -88.8128 -0.8395 -0.7183
0.2426 6.8 850 0.1802 -1.8050 -17.1755 0.7400 15.3706 -132.8791 -88.8143 -0.8416 -0.7202
0.1733 7.2 900 0.1802 -1.7886 -17.1414 0.7400 15.3528 -132.7653 -88.7597 -0.8403 -0.7193
0.1386 7.6 950 0.1802 -1.8171 -17.1472 0.7400 15.3300 -132.7844 -88.8548 -0.8401 -0.7195
0.156 8.0 1000 0.1802 -1.8241 -17.1487 0.7400 15.3246 -132.7896 -88.8782 -0.8401 -0.7195

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
8.03B params
Tensor type
FP16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tsavage68/IE_L3_1000steps_1e5rate_03beta_SFT