Questions

#1
by sl33pyC01E - opened

which gemma model size did you use?
have you thought of quantizing the model before incorporating it into the MoE framework?

what did you fine tune the 8 Gemma's on?

I'm in a similar development space where I want to try to make a heavily quantized self MoE to compare to the pure density unquantized variant. The goal being to fit the Q self MoE in the same memory volume as the pure density, the hope is that the duplicated attention heads increase recall without impacting quality.

I'm still fairly new to the space but I have trained stable diffusion models as well as written rag frameworks for llama.cpp

would love directions for how to start!

All of the individual models are available in the GemMoE collection on my page. They are all fine tunes of Gemma-7B base. I’ll answer the rest here in a bit.

Sign up or log in to comment