The Rise of Specialized LLMs for Enterprise -https://mltblog.com/3QXXE4I
In this article, I discuss the main problems of standard LLMs (OpenAI and the likes), and how the new generation of LLMs addresses these issues. The focus is on Enterprise LLMs.
LLMs with Billions of Parameters: Most of the LLMs still fall in that category. The first ones (ChatGPT) appeared around 2022, though Bert is an early precursor. Most recent books discussing LLMs still define them as transformer architecture with deep neural networks (DNNs), costly training, and reliance on GPUs. The training is optimized to predict the next tokens or missing tokens. However, this task is remotely relevant to what modern LLMs now deliver to the user, see here. Yet it requires time and intensive computer resources. Indeed, this type of architecture works best with billions or trillions of tokens. In the end, most of these tokens are noise, requiring smart distillation for performance improvement.
The main issues are:
➡️ Performance: Requires GPU and large corpuses as input data. Re-training is expensive. Hallucinations are still a problem. Fine-tuning is delicate (Blackbox). You need prompt engineering to get the best results. Mixtures of experts (multiple sub-LLMs, DeepSeek) is one step towards improving accuracy.
➡️ Cost: Besides the GPU costs, the pricing model charges by the token, incentivizing vendors to use models with billions of tokens.
In my recent article “Piercing the Deepest Mathematical Mystery”, I paved the way to proving a famous multi-century old conjecture: are the digits of major mathematical constant such as π, e, log 2, or √2 evenly distributed? No one before ever managed to prove even the most basic trivialities, such as whether the proportion of ‘0’ or ‘1’ exists in the binary expansions of any of these constants, or if it oscillates indefinitely between 0% and 100%.
Here I provide an overview of the new framework built to uncover deep results about the digit distribution of Euler’s number e, discuss the latest developments, share a 10x faster version of the code, and feature new potential research areas in LLMs, AI, quantum dynamics, high performance computing, cryptography, dynamical systems, number theory and more, arising from my discovery. Perhaps the most interesting part is testing LLMs and other AI tools to assess their reasoning capabilities on a fascinating math problem with no solution posted anywhere.