iAkashPaul
/

gemma-2b-it-gguf

Model card Files Files and versions Community

Gemma 2B Instruct GGUF

Contains Q4 & Q8 quantized GGUFs for google/gemma

Perf

Variant	Device	Perf
Q4	M1 Pro 10-core GPU	90 tok/s
	Snapdragon 778G CPU	10 tok/s
	RTX 2070S	40 tok/s
Q8	M1 Pro 10-core GPU	54 tok/s
	Snapdragon 778G CPU	6 tok/s
	RTX 2070S	25 tok/s
F16	M1 Pro 10-core GPU	30 tok/s
	Snapdragon 778G CPU	<1 tok/s

Downloads last month: 88

GGUF

Model size

2.51B params

Architecture

gemma

4-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including iAkashPaul/gemma-2b-it-gguf

GGUFs

Collection of usable GGUFs for running LLMs on the edge or consumer devices like phones & laptops! • 3 items • Updated Mar 7, 2024 • 1