Edit model card

Gemma 2B Instruct GGUF

Contains Q4 & Q8 quantized GGUFs for google/gemma

Perf

Variant Device Perf
Q4 M1 Pro 10-core GPU 90 tok/s
Snapdragon 778G CPU 10 tok/s
RTX 2070S 40 tok/s
Q8 M1 Pro 10-core GPU 54 tok/s
Snapdragon 778G CPU 6 tok/s
RTX 2070S 25 tok/s
F16 M1 Pro 10-core GPU 30 tok/s
Snapdragon 778G CPU <1 tok/s
Downloads last month
33
GGUF
Model size
2.51B params
Architecture
gemma

4-bit

8-bit

16-bit

Inference API
Unable to determine this model's library. Check the docs .

Collection including iAkashPaul/gemma-2b-it-gguf