Edit model card

PE-12b-pythia

This model is a fine-tuned version of EleutherAI/pythia-12b-deduped on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1421
  • Rewards/chosen: 3.5045
  • Rewards/rejected: -2.3171
  • Rewards/accuracies: 0.9441
  • Rewards/margins: 5.8216
  • Logps/rejected: -95.5639
  • Logps/chosen: -116.1507
  • Logits/rejected: -0.4604
  • Logits/chosen: -0.4355

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-07
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.8825 0.05 100 0.8872 0.1884 0.1204 0.5056 0.0680 -90.6889 -122.7830 -0.5017 -0.4522
0.9136 0.09 200 0.8325 0.3253 0.0714 0.5894 0.2540 -90.7870 -122.5091 -0.4960 -0.4447
0.7507 0.14 300 0.7816 0.5741 0.2797 0.5670 0.2944 -90.3703 -122.0116 -0.4909 -0.4426
0.6142 0.18 400 0.6435 1.0753 0.4404 0.6369 0.6348 -90.0489 -121.0092 -0.4793 -0.4322
0.519 0.23 500 0.5196 1.7213 0.5624 0.7430 1.1590 -89.8050 -119.7171 -0.4559 -0.4084
0.4858 0.27 600 0.4351 2.2085 0.5923 0.7877 1.6162 -89.7450 -118.7428 -0.4592 -0.4138
0.4048 0.32 700 0.3878 2.6105 0.5736 0.8324 2.0369 -89.7825 -117.9388 -0.4398 -0.3953
0.3623 0.37 800 0.3383 2.7055 0.4610 0.8520 2.2446 -90.0078 -117.7487 -0.4492 -0.4046
0.308 0.41 900 0.3145 2.9742 0.3506 0.8520 2.6236 -90.2285 -117.2114 -0.4381 -0.3971
0.3092 0.46 1000 0.3125 3.1541 0.2687 0.8352 2.8854 -90.3922 -116.8515 -0.4276 -0.3926
0.2765 0.5 1100 0.2939 3.1208 0.1475 0.8603 2.9733 -90.6347 -116.9181 -0.4615 -0.4216
0.3058 0.55 1200 0.2772 2.9861 -0.1371 0.8771 3.1232 -91.2038 -117.1875 -0.4249 -0.3887
0.2702 0.59 1300 0.2592 3.3217 -0.0639 0.8715 3.3856 -91.0574 -116.5163 -0.4497 -0.4113
0.2316 0.64 1400 0.2491 3.3560 -0.2934 0.8855 3.6494 -91.5165 -116.4477 -0.4234 -0.3869
0.2344 0.68 1500 0.2506 3.2223 -0.2242 0.8687 3.4464 -91.3780 -116.7152 -0.4515 -0.4151
0.2332 0.73 1600 0.2350 3.2137 -0.4070 0.8855 3.6207 -91.7436 -116.7324 -0.4299 -0.3936
0.2258 0.78 1700 0.2477 3.0894 -0.5590 0.8939 3.6484 -92.0476 -116.9809 -0.4316 -0.3960
0.2526 0.82 1800 0.2277 3.2845 -0.5527 0.8771 3.8373 -92.0351 -116.5907 -0.4420 -0.4076
0.2025 0.87 1900 0.2182 3.2061 -0.8100 0.9022 4.0160 -92.5496 -116.7476 -0.4319 -0.3974
0.2253 0.91 2000 0.2149 3.2765 -0.9756 0.9078 4.2521 -92.8809 -116.6067 -0.4391 -0.4023
0.2084 0.96 2100 0.2223 3.1160 -1.0659 0.8939 4.1820 -93.0615 -116.9277 -0.4283 -0.3954
0.1896 1.0 2200 0.2100 3.1835 -1.0131 0.8911 4.1966 -92.9559 -116.7927 -0.4517 -0.4154
0.2294 1.05 2300 0.2070 3.1205 -1.0873 0.8939 4.2078 -93.1043 -116.9187 -0.4412 -0.4051
0.1897 1.1 2400 0.2011 3.1553 -1.0875 0.9050 4.2428 -93.1047 -116.8492 -0.4483 -0.4136
0.1943 1.14 2500 0.1953 3.3317 -1.2261 0.9022 4.5578 -93.3819 -116.4964 -0.4488 -0.4137
0.1749 1.19 2600 0.1975 3.2186 -1.3232 0.8911 4.5419 -93.5761 -116.7225 -0.4500 -0.4160
0.1881 1.23 2700 0.1838 3.3207 -1.3323 0.9274 4.6530 -93.5944 -116.5184 -0.4262 -0.3962
0.1611 1.28 2800 0.1833 3.2881 -1.3588 0.9106 4.6469 -93.6472 -116.5835 -0.4404 -0.4091
0.1653 1.32 2900 0.1959 3.2545 -1.6143 0.9190 4.8688 -94.1584 -116.6508 -0.4252 -0.3996
0.1613 1.37 3000 0.1779 3.3926 -1.5190 0.9218 4.9117 -93.9678 -116.3744 -0.4374 -0.4071
0.1785 1.42 3100 0.1840 3.4053 -1.6286 0.9246 5.0339 -94.1868 -116.3491 -0.4280 -0.3987
0.1544 1.46 3200 0.1686 3.5029 -1.6389 0.9218 5.1418 -94.2075 -116.1539 -0.4624 -0.4309
0.1492 1.51 3300 0.1706 3.2854 -1.8094 0.9330 5.0948 -94.5485 -116.5889 -0.4148 -0.3943
0.1719 1.55 3400 0.1691 3.5148 -1.7457 0.9274 5.2605 -94.4210 -116.1301 -0.4542 -0.4253
0.1905 1.6 3500 0.1719 3.4941 -1.7454 0.9246 5.2395 -94.4204 -116.1715 -0.4479 -0.4189
0.1354 1.64 3600 0.1749 3.5351 -1.7024 0.9106 5.2375 -94.3345 -116.0895 -0.4608 -0.4303
0.1644 1.69 3700 0.1597 3.5736 -1.6580 0.9246 5.2316 -94.2457 -116.0126 -0.4469 -0.4192
0.1598 1.73 3800 0.1613 3.6646 -1.7035 0.9078 5.3681 -94.3367 -115.8306 -0.4631 -0.4349
0.1337 1.78 3900 0.1583 3.5502 -1.8444 0.9134 5.3946 -94.6184 -116.0593 -0.4658 -0.4368
0.1534 1.83 4000 0.1572 3.5076 -1.9137 0.9190 5.4213 -94.7571 -116.1446 -0.4610 -0.4328
0.1327 1.87 4100 0.1607 3.5711 -1.9143 0.9218 5.4854 -94.7583 -116.0175 -0.4404 -0.4153
0.162 1.92 4200 0.1565 3.4852 -2.0136 0.9330 5.4988 -94.9568 -116.1893 -0.4641 -0.4373
0.1471 1.96 4300 0.1524 3.5639 -1.9766 0.9246 5.5406 -94.8830 -116.0319 -0.4627 -0.4338
0.1333 2.01 4400 0.1418 3.6173 -1.9710 0.9162 5.5883 -94.8717 -115.9251 -0.4608 -0.4328
0.13 2.05 4500 0.1485 3.6275 -1.9865 0.9358 5.6140 -94.9027 -115.9047 -0.4604 -0.4319
0.1311 2.1 4600 0.1503 3.4735 -2.1194 0.9134 5.5928 -95.1684 -116.2128 -0.4405 -0.4123
0.1329 2.15 4700 0.1431 3.5793 -2.1059 0.9218 5.6852 -95.1415 -116.0012 -0.4519 -0.4229
0.1346 2.19 4800 0.1494 3.6059 -2.0642 0.9274 5.6701 -95.0581 -115.9479 -0.4639 -0.4332
0.1462 2.24 4900 0.1455 3.4721 -2.1648 0.9218 5.6369 -95.2593 -116.2156 -0.4553 -0.4258
0.1221 2.28 5000 0.1538 3.6293 -2.1472 0.9385 5.7764 -95.2240 -115.9012 -0.4525 -0.4268
0.1329 2.33 5100 0.1486 3.4734 -2.1778 0.9358 5.6512 -95.2853 -116.2130 -0.4578 -0.4301
0.1284 2.37 5200 0.1527 3.4805 -2.1670 0.9078 5.6474 -95.2636 -116.1988 -0.4611 -0.4329
0.1238 2.42 5300 0.1433 3.4570 -2.1768 0.9274 5.6338 -95.2832 -116.2457 -0.4451 -0.4191
0.1317 2.46 5400 0.1421 3.5647 -2.2232 0.9330 5.7880 -95.3761 -116.0303 -0.4565 -0.4342
0.131 2.51 5500 0.1478 3.4211 -2.2681 0.9190 5.6892 -95.4659 -116.3175 -0.4444 -0.4147
0.1235 2.56 5600 0.1428 3.5292 -2.2798 0.9413 5.8089 -95.4892 -116.1014 -0.4485 -0.4234
0.1122 2.6 5700 0.1445 3.6102 -2.2363 0.9330 5.8465 -95.4023 -115.9393 -0.4473 -0.4233
0.1172 2.65 5800 0.1415 3.5813 -2.1899 0.9246 5.7712 -95.3095 -115.9972 -0.4648 -0.4357
0.1257 2.69 5900 0.1428 3.4075 -2.3047 0.9218 5.7122 -95.5390 -116.3447 -0.4553 -0.4269
0.1441 2.74 6000 0.1426 3.4287 -2.3210 0.9190 5.7497 -95.5717 -116.3024 -0.4673 -0.4401
0.1359 2.78 6100 0.1479 3.4833 -2.2993 0.9358 5.7826 -95.5282 -116.1931 -0.4409 -0.4173
0.1332 2.83 6200 0.1442 3.4741 -2.2726 0.9330 5.7466 -95.4748 -116.2116 -0.4512 -0.4262
0.1454 2.88 6300 0.1397 3.4410 -2.2911 0.9358 5.7320 -95.5118 -116.2778 -0.4604 -0.4355
0.1355 2.92 6400 0.1471 3.3740 -2.3739 0.9330 5.7479 -95.6775 -116.4117 -0.4473 -0.4225
0.1114 2.97 6500 0.1397 3.4854 -2.3222 0.9302 5.8076 -95.5740 -116.1889 -0.4595 -0.4345

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
1
Safetensors
Model size
11.8B params
Tensor type
BF16
·

Finetuned from