Safetensors
gemma

This models uses OpenRLHF Codebase for the average loss with the method Regularized-Preference-Optimization . The SFT loss coefficient is 0. The relevant paper is (https://arxiv.org/abs/2405.16436).

Downloads last month
15
Safetensors
Model size
8.54B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train ZHLiu627/zephyr-7b-gemma-dpo-avg