Nexusflow/Starling-LM-7B-beta

Mar 21, 2024

How does this model differ from the berkeley-nest/Starling-LM-7B-alpha

Nexusflow org Mar 21, 2024

Thank you! The model is trained with a similar pipeline, except that we are using a strong reward model Starling-RM-34B instead of Starling-RM-7B. The reward model leaderboard is here for your reference: https://huggingface.co/spaces/allenai/reward-bench

Tubbe

Mar 28, 2024

I would suggest adding this as a note in the readme

banghua

Nexusflow org Mar 28, 2024

Thank you! I believe it's already in the readme:

We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO). Harnessing the power of the ranking dataset, berkeley-nest/Nectar, the upgraded reward model, Starling-RM-34B, and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.

Nexusflow
/

Starling-LM-7B-beta

Amazing model.