Edit model card

image/webp

Google Gemma 7B

Description

This repo contains GGUF format model files for Google's Gemma 7B

Original model

Description

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Quantizon types

quantization method bits size description recommended
Q2_K 2 3.09 very small, very high quality loss โŒ
Q3_K_S 3 3.68 GB very small, high quality loss โŒ
Q3_K_L 3 4.4 GB small, substantial quality loss โŒ
Q4_0 4 4.81 GB legacy; small, very high quality loss โŒ
Q4_K_S 4 4.84 GB medium, balanced quality โœ…
Q4_K_M 4 5.13 GB medium, balanced quality โœ…
Q5_0 5 5.88 GB legacy; medium, balanced quality โŒ
Q5_K_S 5 5.88 GB large, low quality loss โœ…
Q5_K_M 5 6.04 GB large, very low quality loss โœ…
Q6_K 6 7.01 GB very large, extremely low quality loss โŒ
Q8_0 8 9.08 GB very large, extremely low quality loss โŒ
FP16 16 17.1 GB enormous, negligible quality loss โŒ

Usage

You can use this model with the latest builds of LM Studio and llama.cpp.
If you're new to the world of large language models, I recommend starting with LM Studio.

Downloads last month
96
GGUF
Model size
8.54B params
Architecture
gemma

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API (serverless) has been turned off for this model.

Quantized from