TheBloke/guanaco-65B-GGML

Jun 12, 2023

Could you upload the fp16, non quantized version? I can then make the 6_K version or even try to load the 'native' fp16. I am interested in understanding how quantizing affects the quality of responses in my use cases, and unless @ehartford gets the hardware to make 65B Wizards or @allenai give us access to 65B Tulu, Guanaco is currently still the most consistently performant model. Thus, I want to try to get the most out of it on my hardware. I cannot run it on transformers due to their dependance on bitsandbytes/GPU.

TheBloke

Owner Jun 12, 2023

I already have, here: https://huggingface.co/TheBloke/guanaco-65B-HF

I skipped q6_k out of laziness because it's too large to upload and I have to ZIP it in two parts. I guess I could re-visit that and do it. But you can make your own from the link above.

vmajor

Jun 12, 2023

Ha, that's great, thank you! Downloading it now. I'll make my own 6_K.

vmajor

Jun 15, 2023

...and I feel your pain when it comes to files over 50 GB... I finally got them split, renamed and the repository configured. It took me two days because I cannot just sit here and watch it all happen, or fail. So tedious. Anyway, there will be an fp16 and 6_K version in my repository "soon".

vmajor changed discussion status to closed Jun 15, 2023

TheBloke

Owner Jun 15, 2023

Thanks for reminding me - I've just uploaded the q6_K to this repo, in multi-part ZIP.

TheBloke
/

guanaco-65B-GGML

fp16 version