Would love to try a quantized version!

by dillfrescott - opened May 2

Discussion

dillfrescott

May 2

If you, or anyone you can reach out to could quantize this model to gguf I'd be very happy!

dillfrescott

May 2

oh wait i found this. is that the same as this one?

mlabonne

Owner May 2

It's not the same model but could be good! There's been a lot of issues regarding GGUF and Llama 3 but I'll look into it if the model has ok scores on the Open LLM Leaderboard

dillfrescott

May 2

Ok interesting. Thanks!

Iommed

May 2

I second the need for GGUF quants.

ehartford

May 5

I have taken an interest in this.

dillfrescott

May 5

Damn, well i guess that means its going to have a gguf soon, Eric doesnt mess around! Thank you!!! Cant wait to try it!

ehartford

May 5

ehartford

May 5

this thing is smarter than Opus

ehartford

May 5

working on uploading it, it'll be tomorrow.

mlabonne

Owner May 5

Amazing, thanks @ehartford :)

CoderCowMoo

May 5

what's the difference between this model and imi2's model? Merge config is the same.

jackboot

May 5

Hope to see exl2 and eventually a 103b. I actually liked the 103s more than the 120s.

dillfrescott

May 5

this thing is smarter than Opus

This is hype, I am itching to try this!

dillfrescott

May 5

what's the difference between this model and imi2's model? Merge config is the same.

I believe its probably inspired by goliath, so imi2 probably used the same method as this repo.

0-hero

May 5

Took a shot at GGUF. QuantFactory/Meta-Llama-3-120B-Instruct-GGUF
Let me know if this works as expected

dillfrescott

May 5

Okay I can try it later today hopefully! Thanks!

dillfrescott

May 5

•

edited May 5

OOF. not impressed with the Q2 at least..

dillfrescott

May 5

Probably related to how terribly llama 3 handles being quantized

jackboot

May 5

I tried a lot of these merged models and below Q4 they were not better. At least 3.75bpw or not worth it.

ehartford

May 6

https://huggingface.co/cognitivecomputations/Meta-Llama-3-120B-Instruct-gguf

dillfrescott

May 6

Cool! Thanks Eric!

Time to rent a runpod machine again... LOL

dillfrescott

May 6

I think I can officially close this since there have been multiple quants from people. Thus solving the issue.

dillfrescott changed discussion status to closed May 6

CoderCowMoo

May 6

Probably related to how terribly llama 3 handles being quantized

Do you have any resources on hand relating to this? I've noticed this too and would like to dig deeper on why its happening.

dillfrescott

May 6

Do you have any resources on hand relating to this? I've noticed this too and would like to dig deeper on why its happening.

I have seen multiple reddit threads talking about it on locallama. I believe it was also mentioned here on HF. As far as the details, I do not know. Sorry!

jackboot

May 6

GGUF still has issues. Keeps cropping up. Hence I am wary to d/l 60gb+ of it. EXL2 didn't appear to suffer from this problem. Every time there are "quantization" issues the user is always running llama.cpp

dillfrescott

May 6

Interesting, perhaps the issue lies with llama.cpp itself and not the model.

0-hero

May 6

QuantFactory/Meta-Llama-3-225B-Instruct-GGUF 225B if anyone can use them :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment