Release the 124B parent weights... We know you have it.

#31
by Dureka - opened

Google’s latest Gemma 4 release came with an odd twist. Jeff Dean, Google’s top AI leader, casually referenced a “124 billion parameter, Mixture of Experts” model, and then that model seemed to disappear almost immediately. After that, Google’s PR side quickly moved to frame it as a typo or mistake, saying the largest model actually released was the 26B version. But that explanation does not sit right. For one, it is hard to believe someone would accidentally type “124” when they meant “26.” On top of that, the released 26B model’s own files, including the README.md and model.safetensors.index.json for gemma-4-26B-A4B-it, show an architecture with about 26.5 billion parameters spread across 128 experts. That is where things get even more interesting, because dividing only 26 billion parameters across 128 experts would leave each expert unusually small and inefficient, which makes little sense unless this model is actually a heavily trimmed or distilled version of a much larger parent system. If you scale 128 experts to more typical functional sizes, you hit exactly ~124B parameters. Add to that the fact that Hugging Face representatives are already steering developers who want larger models toward the paid Gemini API, and it starts to feel like there may be more to this story than a simple typo. So the real question is: where did the 124B model go?

Jeff Deans X Post: https://archive.li/5vxUY

Sign up or log in to comment