Would love to try a quantized version!

#2
by dillfrescott - opened

If you, or anyone you can reach out to could quantize this model to gguf I'd be very happy!

oh wait i found this. is that the same as this one?

It's not the same model but could be good! There's been a lot of issues regarding GGUF and Llama 3 but I'll look into it if the model has ok scores on the Open LLM Leaderboard

Ok interesting. Thanks!

I second the need for GGUF quants.

I have taken an interest in this.

Damn, well i guess that means its going to have a gguf soon, Eric doesnt mess around! Thank you!!! Cant wait to try it!

image.png

this thing is smarter than Opus

working on uploading it, it'll be tomorrow.

Amazing, thanks @ehartford :)

what's the difference between this model and imi2's model? Merge config is the same.

Hope to see exl2 and eventually a 103b. I actually liked the 103s more than the 120s.

this thing is smarter than Opus

This is hype, I am itching to try this!

what's the difference between this model and imi2's model? Merge config is the same.

I believe its probably inspired by goliath, so imi2 probably used the same method as this repo.

Took a shot at GGUF. QuantFactory/Meta-Llama-3-120B-Instruct-GGUF
Let me know if this works as expected

Okay I can try it later today hopefully! Thanks!

Screenshot_20240505_151655.png

OOF. not impressed with the Q2 at least..

Probably related to how terribly llama 3 handles being quantized

I tried a lot of these merged models and below Q4 they were not better. At least 3.75bpw or not worth it.

Cool! Thanks Eric!

Time to rent a runpod machine again... LOL

I think I can officially close this since there have been multiple quants from people. Thus solving the issue.

dillfrescott changed discussion status to closed

Probably related to how terribly llama 3 handles being quantized

Do you have any resources on hand relating to this? I've noticed this too and would like to dig deeper on why its happening.

Do you have any resources on hand relating to this? I've noticed this too and would like to dig deeper on why its happening.

I have seen multiple reddit threads talking about it on locallama. I believe it was also mentioned here on HF. As far as the details, I do not know. Sorry!

GGUF still has issues. Keeps cropping up. Hence I am wary to d/l 60gb+ of it. EXL2 didn't appear to suffer from this problem. Every time there are "quantization" issues the user is always running llama.cpp

Interesting, perhaps the issue lies with llama.cpp itself and not the model.

QuantFactory/Meta-Llama-3-225B-Instruct-GGUF 225B if anyone can use them :)

Sign up or log in to comment