Your model

Apr 10

Title.

Apr 10

I could probably get around to training a 100M parameter Eagle 2 model in a few weeks, the problem is that I need a few hundred thousand sequences from the 31B model which I can only run with offloading to the system memory so that might take awhile, the E4B model should have out of the box support to be a draft model though, its just bigger than Eagle for the same performance and you have to have enough memory to run a 8B model and a 31B model at the same time.

Apr 10

Ok, there is no support for drafting between those two, but this has already been trained by someone else https://huggingface.co/thoughtworks/Gemma-4-31B-Eagle3

TeichAI org Apr 10

I don't think the tokenizer changed, any gemma4 model should be support as a drafting model.

Apr 10

•

edited Apr 10

I was about to edit my reply but thought to refresh just incase, bruh

Yall were fast!

Thanks for the response.

Also the thinking blocks and use of markdown might be broken 👀

Basically, i haven't even used it for tool use yet, but i predict it won't work well with anything that uses tools

TeichAI org Apr 10

Also the thinking blocks and use of markdown might be broken 👀

Could you be bit more specific here? Perhaps a screenshot of your broken output, as well as some info on how you're running your inference would be helpful.

TeichAI org Apr 10

"Yall were fast!"
When you get bored and keep refreshing a page, you catch things pretty quickly 🤣

TeichAI org Apr 10

Wait for v2, I was using it in cline/continue flawlessly. It built me a web app, setup local supabase, and wired everything together, frontend and backend :)

Apr 10

Also the thinking blocks and use of markdown might be broken 👀

Could you be bit more specific here? Perhaps a screenshot of your broken output, as well as some info on how you're running your inference would be helpful.

Apr 10

The model also breaks by it starting to repeat itself mid generation once given a long enough task and this was found because flash attention was enabled. (I am unsure if this is a normal part of local models where some can use it, some can't. but this model cannot use flash attention, or at least when some parts of the model is offloaded)

TeichAI org Apr 10

Seems like your agent runner (looks like LMStudio) doesn't support the Gemma4 thinking format? Could you provide a side-by-side with the Teich model & a regular Gemma 4 model?

TeichAI org Apr 10

oh that's because it's not trained to have a new line after closing the channel tag. so if you dont have reasoning parsing setup properly markdown renderers wont know to start after the end of the <channel|> tag

TeichAI org Apr 10

The model also breaks by it starting to repeat itself mid generation once given a long enough task and this was found because flash attention was enabled. (I am unsure if this is a normal part of local models where some can use it, some can't. but this model cannot use flash attention, or at least when some parts of the model is offloaded)

I did see this flash attention issue as well though, the v2 was working better with fa on but still trips up occasionally

Apr 10

•

edited Apr 10

Seems like your agent runner (looks like LMStudio) doesn't support the Gemma4 thinking format? Could you provide a side-by-side with the Teich model & a regular Gemma 4 model?

Your model

Original Gemma 4 Model

Note: I have not been able to get Gemma 4-31B it to think on LM studio (i've seen that it has a reasoning capability online, but i have yet to see the variant I downloaded think, the original variant is the LM Studio Community edition one i have downloaded)

TeichAI org Apr 10

oh that's because it's not trained to have a new line after closing the channel tag. so if you dont have reasoning parsing setup properly markdown renderers wont know to start after the end of the <channel|> tag

I think @armand0e got this right.

Apr 10

The model also breaks by it starting to repeat itself mid generation once given a long enough task and this was found because flash attention was enabled. (I am unsure if this is a normal part of local models where some can use it, some can't. but this model cannot use flash attention, or at least when some parts of the model is offloaded)

I did see this flash attention issue as well though, the v2 was working better with fa on but still trips up occasionally

Mind you, I am offloading because I have a 5090 and 128GB of Ram

TeichAI org Apr 10

I don't see how offloading could cause this. You could try only using the CPU. But other than that I would recommend waiting for v2

Apr 10

I don't see how offloading could cause this. You could try only using the CPU. But other than that I would recommend waiting for v2

V2 it is then. i'll keep an eye out 👍

TeichAI org Apr 10

Seems resolved enough to close.

CompactAI changed discussion status to closed Apr 10

TeichAI org Apr 10

Only took 2 hours to solve this. That might be a record. 🤣

TeichAI org Apr 10

so you are testing in LM studio then correct? if so here is your fix:

Head to the models tab
Click the gear icon next to our model
Select the inference tab all the way to the right and expand the Reasoning Parsing section.
Change the Start String to <|channel>thought and the End String to <channel|>

armand0e changed discussion status to open Apr 10

TeichAI org Apr 10

•

edited Apr 10

let me know if it works. Personally, I think you may get past that first hurdle and just be met with other issues. I will be reupdating these ggufs momentarily with the latest llama.cpp gemma 4 fixes

Apr 10

TeichAI org Apr 10

guessing it's the early-stopping/truncation issue with the old ggufs

updates going live now

TeichAI org Apr 11

please try again with the latest ggufs, they are up and tested. Confirmed working on my end (via llama.cpp chat ui)

Apr 11

I don't think the tokenizer changed, any gemma4 model should be support as a drafting model.

Well the E models have per layer embeddings which the 31B does not have, they have only 131k context, the 31B has 256k, and there tokenizer works with video and audio, whereas the 31B model only works with vision