This is a passthrough experiment with ~158B (160B) params. We merged all 64 layers from each model—no picking, full overlap. It's rough, unfiltered, and definitely experimental. This version is meant to test the concept.
Goal? MoE-level performance without being a MoE.
Does it work? 🤷♂️ We're finding out.
Try it. Break it. Let us know.
We don't recommend using this model. It's huge, needs serious hardware — more than we can run ourselves. If you must try it, use the cloud.
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
1
Ask for provider support