Gemma-2-quantized
Collection
Gemma-2-2B and Gemma-2-9B Quantized in low-bit
•
4 items
•
Updated
This repository contains a 4-bit quantized version of the Gemma-2-9B model. Designed to minimize memory consumption and speed up inference.
Gemma-2-9b | Wiki | C4 | PIQA | ARC-E | ARC-C | HellaSwag | Wino | Avg. |
---|---|---|---|---|---|---|---|---|
0-shot | 0-shot | 0-shot | 0-shot | 25-shot | 0-shot | 0-shot | ||
Unquantized | 6.88 | 10.12 | 81.39 | 87.25 | 64.33 | 61.27 | 74.11 | 73.67 |
Int4 | 7.27 | 11.34 | 80.47 | 85.86 | 63.23 | 59.55 | 74.27 | 72.67 |
Benchmark scores are computed with lm-evaluation-harness
.
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("StoyanGanchev/gemma-2-9b-int4")
model = AutoModelForCausalLM.from_pretrained("StoyanGanchev/gemma-2-9b-int4")
All files could be accesed on repository