--- library_name: peft base_model: EleutherAI/pythia-410m-deduped license: apache-2.0 datasets: - argilla/dpo-mix-7k tags: - RLHF - RLAIF - PPO - RM - reward-model - reward_model --- # sapphia-410m-RM super duper ultra highly experimental lora finetune of EleutherAI/pythia-410m-deduped on argilla/dpo-mix-7k, to be a reward model. ## why? nexusflow achieved good results with traditional reward model finetuning! why not meeeeeee :3