Benchmarks?

by ChuckMcSneed - opened 30 days ago

Discussion

ChuckMcSneed

30 days ago

It performs ~15% worse on my benchmark. How are the other benchmarks like perplexity?

TheDrummer

Owner 29 days ago

Hey Sneed, thanks for testing it out. The differences were imperceptible for me at first, so it's nice to see it quantified. 15% does seem brutal and I hope my Endurance finetune fixed that. But either way, if this opens the doors to those who could never run 123B, then I think that's an overall win. I have not tried doing any quantitative test on it, but I'm hoping someone can do us a solid and find out! (And also the Endurance tune)

ChuckMcSneed

29 days ago

15% does seem brutal

Well, you've cut away 20% of the model...

TheDrummer

Owner 29 days ago

15% does seem brutal

Well, you've cut away 20% of the model...

I was told these layers were useless!

bartowski

29 days ago

@ChuckMcSneed out of curiousity, which GGUF of mistral large did you use for testing?

If you're willing, can you try my Q6_K quant? I would be surprised if there was a big difference from imatrix, but i'm extremely curious: https://huggingface.co/bartowski/Lazarus-2407-100B-GGUF

ChuckMcSneed

29 days ago

@ChuckMcSneed out of curiousity, which GGUF of mistral large did you use for testing?

Good question! Instead of downloading the model, I've copied over mergekit yaml and made it locally from Largestral 2407 weights on my drive. I then converted it with llama.cpp version4261 (2759916d) with --outtype bf16. To keep as much of the performance as possible I've added --output-tensor-type BF16 --token-embedding-type BF16 arguments to llama-quantize.

If you're willing, can you try my Q6_K quant?

Why? Is there something special about them? Are you trying to give me pickled GGUF? 👀

Ainonake

29 days ago

Is there something special about them?

Imatrix.

ChuckMcSneed

29 days ago

The results are in! And on average they are the same. @bartowski 's quant failed the same amount of tasks, but at different parts of the benchmark. So, there is a difference in outputs, but it isn't significant enough to shift the average up or down.

ChuckMcSneed

27 days ago

Endurance-100B took a nosedive on UGI too.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment