This is a passthrough experiment with ~158B (160B) params. We merged all 64 layers from each model—no picking, full overlap. It's rough, unfiltered, and definitely experimental. This version is meant to test the concept.

Goal? MoE-level performance without being a MoE.

Does it work? 🤷‍♂️ We're finding out.

Try it. Break it. Let us know.

We don't recommend using this model. It's huge, needs serious hardware — more than we can run ourselves. If you must try it, use the cloud.

Downloads last month
5
Safetensors
Model size
158B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for marcuscedricridia/160B-NotQuiteAMoE