--- license: mit library_name: peft tags: - alignment-handbook - generated_from_trainer - trl - dpo base_model: microsoft/phi-2 datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: phi-2-gpo-renew2-b0.001-i1 results: [] --- # phi-2-gpo-renew2-b0.001-i1 This model is a fine-tuned version of [DUAL-GPO/phi-2-gpo-renew2-b0.001-i0](https://huggingface.co/DUAL-GPO/phi-2-gpo-renew2-b0.001-i0) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.0538 - Rewards/chosen: 0.0010 - Rewards/rejected: 0.0012 - Rewards/accuracies: 0.4290 - Rewards/margins: -0.0002 - Logps/rejected: -366.0280 - Logps/chosen: -395.2844 - Logits/rejected: -0.7463 - Logits/chosen: -0.8436 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected | |:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:| | 0.1204 | 0.32 | 100 | -0.8372 | -0.7448 | -396.1279 | -367.0282 | 0.0537 | 0.4495 | 0.0002 | -0.0000 | 0.0002 | | 0.1673 | 0.64 | 200 | 0.0538 | 0.0013 | 0.0015 | 0.4305 | -0.0002 | -365.7495 | -395.0410 | -0.7569 | -0.8518 | | 0.1395 | 0.96 | 300 | 0.0538 | 0.0010 | 0.0012 | 0.4395 | -0.0002 | -365.9886 | -395.3006 | -0.7587 | -0.8541 | ### Framework versions - PEFT 0.7.1 - Transformers 4.36.2 - Pytorch 2.1.2 - Datasets 2.14.6 - Tokenizers 0.15.2