Fast work by the people on the llama.cpp team

#8
by qaraleza - opened

Thanks!

Big thanks for the team's contribution to solve multiple problems.
It seems someone still reporting broken output problem on Metal background?
I bumped into same phenomenon with iq3 variation.
I hope this solves.

Yes, a lot of the code had to be updated to int64 because the tensor size of this model exceeds max int32 and there was an overflow. This is currently affecting the metal build (and maybe other backends) and the perplexity tool as well, as far as I know. I tested the CUDA backend successfully with all the weights from this HF repo.

I'm not sure how often the tensor size itself is referred in the code, but I guess it need thorough revision.
So, I'm gonna wait with patience.

Sign up or log in to comment