Crystalcareai/GemMoE-Beta-1

which gemma model size did you use?
have you thought of quantizing the model before incorporating it into the MoE framework?

what did you fine tune the 8 Gemma's on?

I'm in a similar development space where I want to try to make a heavily quantized self MoE to compare to the pure density unquantized variant. The goal being to fit the Q self MoE in the same memory volume as the pure density, the hope is that the duplicated attention heads increase recall without impacting quality.

I'm still fairly new to the space but I have trained stable diffusion models as well as written rag frameworks for llama.cpp

would love directions for how to start!

Crystalcareai
/

GemMoE-Beta-1

Questions