--- license: other license_name: gemma-terms-of-use license_link: https://ai.google.dev/gemma/terms tags: - gemma - gguf --- # Gemma 7B Instruct GGUF Contains Q4 & Q8 quantized GGUFs for [google/gemma](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b) ## Perf | Variant | Device | Perf | | - | - | - | | Q4 | RTX 2070S | 22 tok/s | | | M1 Pro 10-core GPU | 28 tok/s | | Q8 | RTX 2070S | 7 tok/s (could only offload 23/29 layers to GPU) | | | M1 Pro 10-core GPU | 17 tok/s |