IMO the world needs a better vanilla LLM, e.g. ๐DeepSeek๐ v4 or v3.5, which we will use in daily life. That's the direction Gemini Flash took which I praised.
Highly recommend the latest Gemini Flash. My favorite Google I/O gift. It ranks behind reasoning models but runs a lot faster than them. It beats DeepSeek v3.
The central argument here is that test-driven development is a natural fit to LLMs, which scale better than humans. I bet the future will see thousands of such leaderboards (many more proprietary ones), each dominated by a specialized model.
If you also tuned into Altman's second congress hearing (first in 2023) along with other AI executives, my takeaway is two words: New Deal (by FDR almost a century ago).
The causal link is quite fascinating and worthy of a few blogposts or deep research queries, but I won't have more time for this (I really wish so), so here goes.
* AI workload loves GPUs because they allocate more transistors than CPUs for computing, and pack them by high-bandwidth memory * More computing in the small physical space -> more power draw and more heat dissipation * more heat dissipation -> liquid cooling * new cooling and heavier power draw -> bigger racks (heavier and taller) * bigger racks -> (re)building data centers * new data centers with higher power demand (peak and stability) -> grid update and nuclear power
This time Gemini is very quick with API support on its 2.5 pro May release. The performance is impressive too, now it is among top contenders like o4, R1, and Claude.
I didn't noticed that Gemini 2.5 (pro and flash) has been silently launched for API preview. Their performance is solid, but below QwQ 32B and the latest DeepSeek v3.
The Qwen3 235B (MoE) is awfully slow ๐ข๐ข๐ข.
I heard it is able to switch between reasoning and non-reasoning, but for my question, it always goes straight to the reasoning mode without an override switch. I tried Fireworks, DeepInfra, and OpenRouter, and they are all the same.
I've recently attended a panel on AI applications. The panelists are managers/directors of Fortune 500 companies. These people make things happen and own results, so their stories and pain points are fresh.
(1) Models are used EVERYWHERE, customer facing and internal support, etc. (2) A successful application must improve one of the following: revenue (๐ต๐ต), cost (๐ต๐ต), CSAT (still ๐ต๐ต) (3) They proactively search on ๐คHF๐ค for models and use them. Open source models (especially small ones) can flexibly fit into their existing workflows/infras, which enable them to deliver, and fast. (4) The main barrier for adoption is license. A director told me they picked a model and finetuned it, then learned they would have to share enhancements. As a result, they dropped this model and the million dollar impact went to another model.
So to fellow model builders: (1) celebrate that our work is useful and generate lots of values (2) make your license permissive if you want maximum impact
Two takeaways for me. (1) deep neural network is the backbone to unify everything. RLHF will stand the test of time because it brings two distinct fields (NLP and RL) onto the same model weights. (2) language model will continue to play a central role in the era of agent. It probably won't be the end game to AGI, but definitely not offramp.