--- base_model: allenai/tulu-2-7b tags: - trl - dpo - generated_from_trainer model-index: - name: tulu-2-7b-full-UF-5e-7 results: [] --- # tulu-2-7b-full-UF-5e-7 This model is a fine-tuned version of [allenai/tulu-2-7b](https://huggingface.co/allenai/tulu-2-7b) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.9017 - Rewards/chosen: -4.8659 - Rewards/rejected: -5.8048 - Rewards/accuracies: 0.6230 - Rewards/margins: 0.9389 - Rewards/margins Max: 5.6516 - Rewards/margins Min: -2.8163 - Rewards/margins Std: 2.7854 - Logps/rejected: -916.6636 - Logps/chosen: -832.4283 - Logits/rejected: 0.4957 - Logits/chosen: 0.2899 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 2 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - total_train_batch_size: 16 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6816 | 0.07 | 100 | 0.6919 | 0.0000 | -0.0020 | 0.5417 | 0.0021 | 0.0277 | -0.0245 | 0.0175 | -336.3843 | -345.8331 | -1.1956 | -1.1695 | | 0.5468 | 0.15 | 200 | 0.6793 | -0.1136 | -0.1432 | 0.5794 | 0.0296 | 0.2495 | -0.1965 | 0.1511 | -350.5013 | -357.1989 | -1.1509 | -1.1466 | | 0.3597 | 0.22 | 300 | 0.6788 | -0.9347 | -1.0641 | 0.5714 | 0.1294 | 1.0084 | -0.7320 | 0.5779 | -442.5906 | -439.3020 | -1.0512 | -1.0629 | | 0.2059 | 0.29 | 400 | 0.7172 | -1.9680 | -2.3061 | 0.5972 | 0.3381 | 2.3443 | -1.3886 | 1.2205 | -566.7862 | -542.6320 | -0.8695 | -0.8807 | | 0.1354 | 0.37 | 500 | 0.8082 | -3.1553 | -3.7843 | 0.6190 | 0.6290 | 4.0818 | -2.2017 | 2.0321 | -714.6080 | -661.3674 | -0.1617 | -0.2554 | | 0.1327 | 0.44 | 600 | 0.8436 | -3.8517 | -4.6192 | 0.6190 | 0.7675 | 4.8313 | -2.4317 | 2.3526 | -798.1056 | -731.0093 | 0.1600 | 0.0173 | | 0.0777 | 0.52 | 700 | 0.9893 | -4.9432 | -5.9282 | 0.6190 | 0.9850 | 6.3532 | -3.2959 | 3.1250 | -929.0052 | -840.1605 | 0.6301 | 0.4163 | | 0.0638 | 0.59 | 800 | 0.8086 | -3.8655 | -4.6357 | 0.6190 | 0.7702 | 4.5021 | -2.2919 | 2.2427 | -799.7516 | -732.3853 | 0.2889 | 0.1244 | | 0.0997 | 0.66 | 900 | 0.8639 | -4.4406 | -5.3058 | 0.6270 | 0.8652 | 5.1592 | -2.6378 | 2.5658 | -866.7603 | -789.8954 | 0.3918 | 0.2055 | | 0.0708 | 0.74 | 1000 | 0.8618 | -4.4546 | -5.2895 | 0.6230 | 0.8349 | 5.0604 | -2.6224 | 2.5213 | -865.1302 | -791.2946 | 0.4063 | 0.2199 | | 0.141 | 0.81 | 1100 | 0.9049 | -4.8648 | -5.7977 | 0.6190 | 0.9330 | 5.6327 | -2.8439 | 2.7856 | -915.9548 | -832.3105 | 0.5083 | 0.3017 | | 0.0775 | 0.88 | 1200 | 0.9049 | -4.9040 | -5.8585 | 0.6210 | 0.9546 | 5.7130 | -2.8316 | 2.8132 | -922.0319 | -836.2313 | 0.5172 | 0.3074 | | 0.0464 | 0.96 | 1300 | 0.9017 | -4.8659 | -5.8048 | 0.6230 | 0.9389 | 5.6516 | -2.8163 | 2.7854 | -916.6636 | -832.4283 | 0.4957 | 0.2899 | ### Framework versions - Transformers 4.39.0.dev0 - Pytorch 2.1.0+cu121 - Datasets 2.14.6 - Tokenizers 0.15.2