jondurbin/bagel-dpo-7b-v0.4 · Fine-Tune Request: Bagel-dpo using AetherResearch/Cerebrum-1.0-7b

If you find it useful, please consider Fine-Tuning AetherResearch/Cerebrum-1.0-7b for bagel. I think it would enhance Bagel's reasoning abilities even further. But, check out their repo for yourself to make that determination. Repo Card Below...

Introduction

Cerebrum 7b is a large language model (LLM) created specifically for reasoning tasks. It is based on the Mistral 7b model, fine-tuned on a small custom dataset of native chain of thought data and further improved with targeted RLHF (tRLHF), a novel technique for sample-efficient LLM alignment. Unlike numerous other recent fine-tuning approaches, our training pipeline includes under 5000 training prompts and even fewer labeled datapoints for tRLHF.

Native chain of thought approach means that Cerebrum is trained to devise a tactical plan before tackling problems that require thinking. For brainstorming, knowledge intensive, and creative tasks Cerebrum will typically omit unnecessarily verbose considerations.

Zero-shot prompted Cerebrum significantly outperforms few-shot prompted Mistral 7b as well as much larger models (such as Llama 2 70b) on a range of tasks that require reasoning, including ARC Challenge, GSM8k, and Math.