Edit model card

Gemma 2 9b Instruction Tuned - GGUF

These are GGUF quants of google/gemma-2-9b-it

Details about the model can be found at the above model page.

Llamacpp Version

These quants were made with llamacpp tag b3408.

If you have problems loading these models, please update your software to se the latest llamacpp version.

Perplexity Scoring

Below are the perplexity scores for the GGUF models. A lower score is better.

Quant Level Perplexity Score Standard Deviation
F32 8.7849 0.06498
BF16 8.7849 0.06498
Q8_0 8.7869 0.06500
Q6_K 8.7972 0.06510
Q5_K_M 8.7791 0.06489
Q5_K_S 8.7899 0.06503
Q4_K_M 8.8745 0.06575
Q4_K_S 8.9293 0.06636
Q3_K_L 9.0210 0.06693
Q3_K_M 9.1213 0.06784
Q3_K_S 9.1857 0.06726

Quant Details

This is the script used for quantization.

#!/bin/bash

# Define MODEL_NAME above the loop
MODEL_NAME="gemma-2-9b-it"

# Define the output directory
outputDir="${MODEL_NAME}-GGUF"

# Create the output directory if it doesn't exist
mkdir -p "${outputDir}"

# Make the F32 quant
f32file="${outputDir}/${MODEL_NAME}-F32.gguf"
if [ -f "${f32file}" ]; then
    echo "Skipping f32 as ${f32file} already exists."
else
    python convert_hf_to_gguf.py "~/src/models/${MODEL_NAME}" --outfile "${f32file}" --outtype "f32"
fi

# Abort out if the F32 didn't work
if [ ! -f "${f32file}" ]; then
   echo "No ${f32file} found."
   exit 1
fi

# Define the array of quantization strings
quants=("Q8_0" "Q6_K" "Q5_K_M" "Q5_K_S" "Q4_K_M" "Q4_K_S" "Q3_K_L" "Q3_K_M" "Q3_K_S")


# Loop through the quants array
for quant in "${quants[@]}"; do
    outfile="${outputDir}/${MODEL_NAME}-${quant}.gguf"
    
    # Check if the outfile already exists
    if [ -f "${outfile}" ]; then
        echo "Skipping ${quant} as ${outfile} already exists."
    else
        # Run the command with the current quant string
        ./llama-quantize "${f32file}" "${outfile}" "${quant}"
        
        echo "Processed ${quant} and generated ${outfile}"
    fi
done
Downloads last month
43
GGUF
Model size
9.24B params
Architecture
gemma2

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.