Model

#5
by mrfakename - opened

Hi,
Thank you for releasing this model! Would you mind sharing some details on how it was trained, and what the training data is?
Thanks!

how-do-we-tell-him-mr-krabs.gif

Is this a frankenmerge?

mrfakename, this model is most likely a leak of mistral medium.

mrfakename, this model is most likely a leak of mistral medium.

Interesting! According to nisten it's a frankenmerge, do you know if that's accurate?

I had not seen this, thanks for the info

Interesting! According to nisten it's a frankenmerge, do you know if that's accurate?

He initially claimed it was a MoE, so I'd take this with a grain of salt. It outperforms mistral 7B by a mile from my testing though.

Interesting! According to nisten it's a frankenmerge, do you know if that's accurate?

He initially claimed it was a MoE, so I'd take this with a grain of salt. It outperforms mistral 7B by a mile from my testing though.

Frankenmerges can be MoEs, right?

Frankenmerges can be MoEs, right?

Correct

Does anybody know, at same quant level, whether this model or this model is better?

Nisten made all kinds of claims, some rather insane ones in the beginning.. yet I tested the model and it's relatively good. If it's a merge then of what? Who else uses mistral's format that put a model out recently? I suggest people just try it if they have the memory.. at least at Q4.

It chats well and it's not dumb, that's all that matters. I downloaded tons of disappointment of the leader board going by benchmarks.

I downloaded tons of disappointment of the leader board going by benchmarks.

Yes! So many models are disappointing when evaluated with real world usage.

this looks like an MoE 7x11 fine-tuned on mistral-medium synthetic data. it does mimic mistral's style very closely.

@aigeek0x0 do you see MoE router in the gguf? It's not MoE.

https://twitter.com/teortaxesTex/status/1752459593416847570

Interesting. Not sure if true but seems possible.

Excellent model. Reminds me of Claude. It’s willing to consider alternative solutions. It takes advice and will mold its answers to new promoted insights. Tested it with the difficult Aunt Agatha riddle and it handled it well.

Apparently someone succeeded at dequantizing it to fp16, 70+ MMLU scores
https://huggingface.co/152334H/miqu-1-70b-sf

Apparently someone succeeded at dequantizing it to fp16, 70+ MMLU scores
https://huggingface.co/152334H/miqu-1-70b-sf

Hmm. That makes no sense. How can you "add" precision to it? That would be like taking a blurry picture and making it clear again with all the detail.

Apparently someone succeeded at dequantizing it to fp16, 70+ MMLU scores
https://huggingface.co/152334H/miqu-1-70b-sf

the model appears to be legit, resembling "mistral-medium" as mentioned onhttps://twitter.com/teortaxesTex/status/1752459593416847570.

it (mistral-70b-instruct-alpha01) was likely trained on the Llama architecture, possibly for a quick presentation to investors.

this model is fine-tuned and adept at following instructions. based on my experiments, i can confirm that it is also aligned for safety.

The 5bit EXL2 performs OK. It gets 11 perplexity on PTB_NEW. Have to check it vs the q4km I have. So the re-compression wasn't the end of the world.

That makes no sense. How can you "add" precision to it? That would be like taking a blurry picture and making it clear again with all the detail.

@jeffwadsworth

It doesn't add any precision, but fp16 pytorch file format is much more universal and it's easier to work with if you want to do finetuning. It's the same blurry image, but now you have it in digital form and can do stuff to it in Photoshop and you're not limited in what you can do to the physical photo using scissors, markers and other physical tools.

Wow. Crazy.

Well, I guess it is a leak.

mrfakename changed discussion status to closed

Its pretty obvious it was some sort of leak considering the lack of information about its creation process!

Sign up or log in to comment