π Introducing RefuelLLM-2 and RefuelLLM-2-small, the next version of our large language models purpose built for data labeling, enrichment and cleaning.
RefuelLLM-2 (83.82%) outperforms all state-of-the-art LLMs, including GPT-4-Turbo (80.88%), Claude-3-Opus (79.19%) and Gemini-1.5-Pro (74.59%), across a benchmark of ~30 data labeling tasks.
RefuelLLM-2-small (79.67%), aka Llama-3-Refueled, outperforms all comparable LLMs including Claude3-Sonnet (70.99%), Haiku (69.23%) and GPT-3.5-Turbo (68.13%).