Amazing model.

#1
by rjmehta - opened

How does this model differ from the berkeley-nest/Starling-LM-7B-alpha

Nexusflow org

Thank you! The model is trained with a similar pipeline, except that we are using a strong reward model Starling-RM-34B instead of Starling-RM-7B. The reward model leaderboard is here for your reference: https://huggingface.co/spaces/allenai/reward-bench

I would suggest adding this as a note in the readme

Nexusflow org

Thank you! I believe it's already in the readme:

We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO). Harnessing the power of the ranking dataset, berkeley-nest/Nectar, the upgraded reward model, Starling-RM-34B, and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.

Sign up or log in to comment