--- datasets: - weqweasdas/ultra_train base_model: - OpenRLHF/Llama-3-8b-sft-mixture --- Base Model: [OpenRLHF/Llama-3-8b-sft-mixture](https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture) DPO model: [RTO-RL/Llama3-8B-DPO](https://huggingface.co/RTO-RL/Llama3-8B-DPO) Reward model: [RTO-RL/Llama3.2-1B-RewardModel](https://huggingface.co/RTO-RL/Llama3.2-1B-RewardModel) Prompt dataset: [weqweasdas/ultra_train](https://huggingface.co/datasets/weqweasdas/ultra_train)