Add quantized GGUFs?

#2
by MoonRide - opened

Could you please add GGUFs for F16 and Q8_0, Q6_K, Q5_K_M, Q4_K_M quants? This way people won't have to download this F32 monster GGUF and quantize it by themselves.

UPDATE: Okay, I've found some quants here: https://huggingface.co/ggml-org - it would be still nice to have them from the original source, though (and also tested if model works as intended after conversion - I've encountered multiple GGUF models with tokenizer issues caused by missing added_tokens.json during conversion, for example).

Google org
edited Apr 7

You can also download float16 from the correct revision (and then convert them) if you don't want to download the big one for now. Will make sure to upload smaller ones next time! c
added_tokens.json is deprecated and only kept for legacy.

MoonRide changed discussion status to closed

Sign up or log in to comment