Tsukasa burns down a handicapped hospital.

#2
by Ayalexf - opened

It's criminal you didn't release an L3 8B or Qwen 2 7B tune alongside it. I mean, sure, they're a pain in the arse to train, but you're smart folks. You'd figure it out! Good learning rate, multiple epochs... yeah, science!

kudamaki_tsukasa-kudamaki-tsukasa.gif

Working on it already. Should have Qwen-2 7B, Qwen-2 47B, and Qwen-1.5 32B done by the end of the day, if the they pass internal tests.

Working on it already. Should have Qwen-2 7B, Qwen-2 47B, and Qwen-1.5 32B done by the end of the day, if the they pass internal tests.

Yup, another The Alpin Dale classic. Really makes me want to use Aphrodite. God bless.

i think you mean qwen 2 57b , am really interested in that MOE model and what it could do

How do you test your models internally? I'm novice and trying to build RP models :) @alpindale

Really curious what Qwen2-57B-A14B can do when finetuned. It's the exact same size as Mixtral 8x7B, right? 8 7B experts with 2 active ones.

Working on it already. Should have Qwen-2 7B, Qwen-2 47B, and Qwen-1.5 32B done by the end of the day, if the they pass internal tests.

Will they be available over on the pygsite for testing?

Really curious what Qwen2-57B-A14B can do when finetuned. It's the exact same size as Mixtral 8x7B, right? 8 7B experts with 2 active ones.

Qwen 57B would be bigger by a bit, 56.3B vs 47B non active parameters (I'm guessing they use a different MOE type? I can't find any papers stating what kind is used). Speed wise it would almost identical as Mixtral has 13B active parameters vs 14B with qwen.

Qwen2-57B
Screenshot_20240623-195639.png
Mixtral-8x7B
Screenshot_20240623-195658.png
^ NyxKrage/LLM-Model-VRAM-Calculator which now support IQ quants :3

Sign up or log in to comment