Fast work by the people on the llama.cpp team

by qaraleza - opened Apr 8, 2024

Discussion

qaraleza

Apr 8, 2024

Thanks!

yamikumods

Apr 8, 2024

Big thanks for the team's contribution to solve multiple problems.
It seems someone still reporting broken output problem on Metal background?
I bumped into same phenomenon with iq3 variation.
I hope this solves.

dranger003

Owner Apr 8, 2024

Yes, a lot of the code had to be updated to int64 because the tensor size of this model exceeds max int32 and there was an overflow. This is currently affecting the metal build (and maybe other backends) and the perplexity tool as well, as far as I know. I tested the CUDA backend successfully with all the weights from this HF repo.

yamikumods

Apr 8, 2024

I'm not sure how often the tensor size itself is referred in the code, but I guess it need thorough revision.
So, I'm gonna wait with patience.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment