But the clear lesson I learnt from building these demos are, the more powerful the underlying base model is, the closer you will get to GPT4o1. CoT is nothing more than simply inducing the latent reasoning capability from the model.
Weβve open-sourced an app, powered by SambaNova Cloud and Llama 405B, that intelligently detects when a web search is neededβthen answers directly or with RAG.
π₯ A hidden Easter egg is that Auto Search detection is already trained into Llama 3.1 checkpoints. Simply use the tool usage system prompt below, and the model will either respond with a web search query if it deems necessary or respond to the query directly.π₯
Environment: IPython Tools: Brave Search Knowledge Cutoff Date: December 2023 Today's Date: September 2024 You are a helpful assistant. Reminder: Search function calls MUST follow the specified format: "brave_search.call(query)"
Fast inference is no longer a nice-to-have demo; it will be the driving force behind future frontier models. Time to switch over to custom AI hardware and short Nvidia.
I have been able to generate some high quality synthetic data and use it as an LLM as a judge instead of the slower and more expensive alternatives like openAI or Anthropic.