Edit model card

Gemma-2-9B quantized in Int4

Description

This repository contains a 4-bit quantized version of the Gemma-2-9B model. Designed to minimize memory consumption and speed up inference.

Benchmark results

Gemma-2-9b Wiki C4 PIQA ARC-E ARC-C HellaSwag Wino Avg.
0-shot 0-shot 0-shot 0-shot 25-shot 0-shot 0-shot
Unquantized 6.88 10.12 81.39 87.25 64.33 61.27 74.11 73.67
Int4 7.27 11.34 80.47 85.86 63.23 59.55 74.27 72.67

Benchmark scores are computed with lm-evaluation-harness.

How to use:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("StoyanGanchev/gemma-2-9b-int4")
model = AutoModelForCausalLM.from_pretrained("StoyanGanchev/gemma-2-9b-int4")

All files could be accesed on repository

Downloads last month
4
Safetensors
Model size
5.21B params
Tensor type
F32
·
FP16
·
U8
·
Inference API
Unable to determine this model's library. Check the docs .

Collection including StoyanGanchev/gemma-2-9b-int4