GGUF Version

#1
by avilum - opened

Hey Yam :)

Thanks for the amazing work. I really want to run this model on my macbook (32GB) using INT4/5/6 and llama.cpp.

I wondered if the 11B model can successfully convert to GGUF format with existing tools (or others?)
Google created GGUF checkpoints to the original model in FP32, followed by many quantized versions.

I think it is possible to quantize it and run on commodity HW such as Macbook with M3, 36GB (for offline work without GPUs).

Have you tried converting it to GGUF checkpoint? Anything I should have in mind before I start?
Or is it better to just use bitsandbytes with your exising model?

Thanks,
Avi

@yam-peleg
Update: I converted the models successfully to GGUF format using mainstream branch of llama.cpp (that has GemmaForCausalLM support)

Do you think we should upload them here or use a new model?
Let me know what you think. Here are the files:

-rw-r--r--  1 avi  staff   8.0G Apr  9 15:19 models/yam-peleg--Hebrew-Gemma-11B-Instruct-Q6_0.gguf
-rw-r--r--  1 avi  staff    39G Apr  9 15:10 models/yam-peleg--Hebrew-Gemma-11B-Instruct-f16.gguf
-rw-r--r--  1 avi  staff   8.0G Apr  9 13:47 models/yam-peleg--Hebrew-Gemma-11B-V2-Q6_0.gguf
-rw-r--r--  1 avi  staff    20G Apr  9 13:41 models/yam-peleg--Hebrew-Gemma-11B-V2-f16.gguf

Sign up or log in to comment