This is a passthrough experiment with ~158B (160B) params. We merged all 64 layers from each model—no picking, full overlap. It's rough, unfiltered, and definitely experimental. This version is meant to test the concept.

Goal? MoE-level performance without being a MoE.

Does it work? 🤷‍♂️ We're finding out.

Try it. Break it. Let us know.

We don't recommend using this model. It's huge, needs serious hardware — more than we can run ourselves. If you must try it, use the cloud.

Downloads last month: 5

Safetensors

Model size

158B params

Tensor type

BF16

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for marcuscedricridia/160B-NotQuiteAMoE

marcuscedricridia/Springer-32B-Restore

marcuscedricridia/Springer1.1-32B-Qwen2.5-Coder

marcuscedricridia/Springer1.1-32B-Qwen2.5-Extras

marcuscedricridia/Springer1.1-32B-Qwen2.5-RP

marcuscedricridia/Springer1.1-32B-Qwen2.5-Reasoning

Merge model

this model

Finetunes

2 models

Quantizations

1 model