Gemma-2-9B quantized in Int4

Description

This repository contains a 4-bit quantized version of the Gemma-2-9B model. Designed to minimize memory consumption and speed up inference.

Benchmark results

Gemma-2-9b	Wiki	C4	PIQA	ARC-E	ARC-C	HellaSwag	Wino	Avg.
	0-shot	0-shot	0-shot	0-shot	25-shot	0-shot	0-shot
Unquantized	6.88	10.12	81.39	87.25	64.33	61.27	74.11	73.67
Int4	7.27	11.34	80.47	85.86	63.23	59.55	74.27	72.67

Benchmark scores are computed with lm-evaluation-harness.

How to use:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("StoyanGanchev/gemma-2-9b-int4")
model = AutoModelForCausalLM.from_pretrained("StoyanGanchev/gemma-2-9b-int4")

All files could be accesed on repository

StoyanGanchev
/

gemma-2-9b-int4

Gemma-2-9B quantized in Int4

Description

Benchmark results

How to use:

Collection including StoyanGanchev/gemma-2-9b-int4

Gemma-2-quantized