Tsukasa burns down a handicapped hospital.

by deleted - opened Jun 18, 2024

deleted

Jun 18, 2024

It's criminal you didn't release an L3 8B or Qwen 2 7B tune alongside it. I mean, sure, they're a pain in the arse to train, but you're smart folks. You'd figure it out! Good learning rate, multiple epochs... yeah, science!

alpindale

Jun 18, 2024

Working on it already. Should have Qwen-2 7B, Qwen-2 47B, and Qwen-1.5 32B done by the end of the day, if the they pass internal tests.

deleted

Jun 18, 2024

Working on it already. Should have Qwen-2 7B, Qwen-2 47B, and Qwen-1.5 32B done by the end of the day, if the they pass internal tests.

Yup, another The Alpin Dale classic. Really makes me want to use Aphrodite. God bless.

Hansdudin202

Jun 18, 2024

i think you mean qwen 2 57b , am really interested in that MOE model and what it could do

thanhdaonguyen

Jun 19, 2024

How do you test your models internally? I'm novice and trying to build RP models :) @alpindale

CamiloMM

Jun 19, 2024

Really curious what Qwen2-57B-A14B can do when finetuned. It's the exact same size as Mixtral 8x7B, right? 8 7B experts with 2 active ones.

saishf

Jun 23, 2024

Working on it already. Should have Qwen-2 7B, Qwen-2 47B, and Qwen-1.5 32B done by the end of the day, if the they pass internal tests.

Will they be available over on the pygsite for testing?

Really curious what Qwen2-57B-A14B can do when finetuned. It's the exact same size as Mixtral 8x7B, right? 8 7B experts with 2 active ones.

Qwen 57B would be bigger by a bit, 56.3B vs 47B non active parameters (I'm guessing they use a different MOE type? I can't find any papers stating what kind is used). Speed wise it would almost identical as Mixtral has 13B active parameters vs 14B with qwen.

Qwen2-57B

Mixtral-8x7B

^ NyxKrage/LLM-Model-VRAM-Calculator which now support IQ quants :3

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment