fp16 version

#2
by vmajor - opened

Could you upload the fp16, non quantized version? I can then make the 6_K version or even try to load the 'native' fp16. I am interested in understanding how quantizing affects the quality of responses in my use cases, and unless @ehartford gets the hardware to make 65B Wizards or @allenai give us access to 65B Tulu, Guanaco is currently still the most consistently performant model. Thus, I want to try to get the most out of it on my hardware. I cannot run it on transformers due to their dependance on bitsandbytes/GPU.

I already have, here: https://huggingface.co/TheBloke/guanaco-65B-HF

I skipped q6_k out of laziness because it's too large to upload and I have to ZIP it in two parts. I guess I could re-visit that and do it. But you can make your own from the link above.

Ha, that's great, thank you! Downloading it now. I'll make my own 6_K.

...and I feel your pain when it comes to files over 50 GB... I finally got them split, renamed and the repository configured. It took me two days because I cannot just sit here and watch it all happen, or fail. So tedious. Anyway, there will be an fp16 and 6_K version in my repository "soon".

vmajor changed discussion status to closed

Thanks for reminding me - I've just uploaded the q6_K to this repo, in multi-part ZIP.

Sign up or log in to comment