tsavage68's picture
End of training
d5b2b79 verified
metadata
license: llama3
base_model: tsavage68/UTI_L3_1000steps_1e5rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: UTI2_L3_1000steps_1e6rate_01beta_CSFTDPO
    results: []

UTI2_L3_1000steps_1e6rate_01beta_CSFTDPO

This model is a fine-tuned version of tsavage68/UTI_L3_1000steps_1e5rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2741
  • Rewards/chosen: -0.0170
  • Rewards/rejected: -6.7809
  • Rewards/accuracies: 0.6400
  • Rewards/margins: 6.7639
  • Logps/rejected: -96.2941
  • Logps/chosen: -19.2736
  • Logits/rejected: -1.2664
  • Logits/chosen: -1.2475

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.67 0.3333 25 0.6075 0.1072 -0.0786 0.6400 0.1858 -29.2710 -18.0315 -1.1541 -1.1497
0.3388 0.6667 50 0.3079 0.3701 -1.1689 0.6500 1.5390 -40.1739 -15.4027 -1.1704 -1.1602
0.1782 1.0 75 0.2489 0.3405 -3.3088 0.6500 3.6493 -61.5725 -15.6982 -1.2173 -1.2009
0.1047 1.3333 100 0.2514 0.3299 -4.1473 0.6500 4.4772 -69.9577 -15.8048 -1.2277 -1.2096
0.1909 1.6667 125 0.2649 0.2370 -4.5013 0.6400 4.7383 -73.4979 -16.7332 -1.2311 -1.2144
0.364 2.0 150 0.2617 0.2324 -4.8873 0.6400 5.1197 -77.3577 -16.7794 -1.2337 -1.2169
0.26 2.3333 175 0.2628 0.1974 -5.1469 0.6400 5.3443 -79.9539 -17.1290 -1.2363 -1.2194
0.2253 2.6667 200 0.2643 0.1698 -5.3745 0.6400 5.5443 -82.2301 -17.4054 -1.2386 -1.2217
0.208 3.0 225 0.2660 0.1513 -5.5214 0.6400 5.6727 -83.6984 -17.5904 -1.2407 -1.2238
0.2253 3.3333 250 0.2667 0.1290 -5.6833 0.6400 5.8124 -85.3180 -17.8128 -1.2430 -1.2261
0.1733 3.6667 275 0.2681 0.1116 -5.8186 0.6400 5.9301 -86.6704 -17.9877 -1.2452 -1.2281
0.2773 4.0 300 0.2686 0.1005 -5.9317 0.6400 6.0322 -87.8013 -18.0979 -1.2472 -1.2299
0.2426 4.3333 325 0.2690 0.0844 -6.0431 0.6400 6.1276 -88.9161 -18.2589 -1.2493 -1.2319
0.156 4.6667 350 0.2692 0.0741 -6.1302 0.6400 6.2043 -89.7871 -18.3627 -1.2509 -1.2333
0.2253 5.0 375 0.2715 0.0625 -6.2127 0.6400 6.2752 -90.6117 -18.4779 -1.2530 -1.2353
0.2253 5.3333 400 0.2713 0.0535 -6.2910 0.6400 6.3446 -91.3949 -18.5679 -1.2545 -1.2367
0.2253 5.6667 425 0.2724 0.0411 -6.3668 0.6400 6.4079 -92.1528 -18.6919 -1.2563 -1.2383
0.208 6.0 450 0.2729 0.0353 -6.4187 0.6400 6.4541 -92.6719 -18.7501 -1.2573 -1.2392
0.2773 6.3333 475 0.2736 0.0283 -6.4704 0.6400 6.4987 -93.1886 -18.8205 -1.2582 -1.2400
0.3119 6.6667 500 0.2725 0.0224 -6.5105 0.6400 6.5329 -93.5893 -18.8791 -1.2592 -1.2409
0.208 7.0 525 0.2719 0.0140 -6.5739 0.6400 6.5880 -94.2240 -18.9630 -1.2606 -1.2422
0.1733 7.3333 550 0.2740 0.0094 -6.6118 0.6400 6.6212 -94.6024 -19.0092 -1.2618 -1.2433
0.2599 7.6667 575 0.2728 0.0021 -6.6411 0.6400 6.6432 -94.8961 -19.0825 -1.2622 -1.2436
0.2599 8.0 600 0.2736 -0.0003 -6.6671 0.6400 6.6668 -95.1557 -19.1060 -1.2631 -1.2444
0.2253 8.3333 625 0.2728 -0.0010 -6.6895 0.6400 6.6884 -95.3796 -19.1137 -1.2634 -1.2447
0.104 8.6667 650 0.2735 -0.0019 -6.7075 0.6400 6.7056 -95.5598 -19.1222 -1.2641 -1.2453
0.2253 9.0 675 0.2726 -0.0051 -6.7243 0.6400 6.7192 -95.7281 -19.1544 -1.2648 -1.2460
0.2253 9.3333 700 0.2736 -0.0097 -6.7446 0.6400 6.7348 -95.9304 -19.2006 -1.2653 -1.2465
0.2253 9.6667 725 0.2740 -0.0130 -6.7590 0.6400 6.7460 -96.0751 -19.2334 -1.2655 -1.2466
0.3119 10.0 750 0.2742 -0.0140 -6.7661 0.6400 6.7520 -96.1452 -19.2434 -1.2656 -1.2466
0.208 10.3333 775 0.2741 -0.0154 -6.7688 0.6400 6.7534 -96.1727 -19.2569 -1.2660 -1.2470
0.2253 10.6667 800 0.2728 -0.0133 -6.7751 0.6400 6.7618 -96.2353 -19.2360 -1.2661 -1.2471
0.2426 11.0 825 0.2734 -0.0133 -6.7787 0.6400 6.7654 -96.2719 -19.2365 -1.2662 -1.2473
0.2946 11.3333 850 0.2743 -0.0138 -6.7737 0.6400 6.7599 -96.2217 -19.2417 -1.2663 -1.2474
0.1733 11.6667 875 0.2739 -0.0147 -6.7807 0.6400 6.7660 -96.2913 -19.2500 -1.2662 -1.2472
0.156 12.0 900 0.2751 -0.0158 -6.7820 0.6400 6.7661 -96.3044 -19.2615 -1.2664 -1.2475
0.1906 12.3333 925 0.2747 -0.0152 -6.7835 0.6400 6.7682 -96.3194 -19.2557 -1.2663 -1.2474
0.2426 12.6667 950 0.2741 -0.0190 -6.7817 0.6400 6.7627 -96.3018 -19.2932 -1.2665 -1.2475
0.208 13.0 975 0.2741 -0.0170 -6.7809 0.6400 6.7639 -96.2941 -19.2736 -1.2664 -1.2475
0.3119 13.3333 1000 0.2741 -0.0170 -6.7809 0.6400 6.7639 -96.2941 -19.2736 -1.2664 -1.2475

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.2
  • Tokenizers 0.19.1