Edit model card

About

GGUF imatrix quants of AlexBefest/WoonaV1.2-9b model. All quants, except Q6_k and Q8_0 was maded with imatrix quantization method.

image/png

Prompt template: Gemma (RECOMMENDED TEMP=0.3-0.5)

<start_of_turn>user\n {prompt}<end_of_turn>

Provided files

Name Quant method Bits Size Min RAM required Use case
WoonaV1.2-9b-imat-Q2_K.gguf Q2_K [imatrix] 2 3.5 GB 5.1 GB small, very high quality loss - not recommended, but usable (probably faster than Q3_XXS, but worse)
WoonaV1.2-9b-imat-IQ3_XXS.gguf IQ3_XXS [imatrix] 3 3.5 GB 5.1 GB small, high quality loss
WoonaV1.2-9b-imat-IQ3_M.gguf IQ3_M [imatrix] 3 4.2 GB 5.7 GB small, high quality loss
WoonaV1.2-9b-imat-IQ4_XS.gguf IQ4_XS [imatrix] 4 4.8 GB 6.3 GB medium, slightly worse than Q4_K_M
WoonaV1.2-9b-imat-Q4_K_S.gguf Q4_K_S [imatrix] 4 5.1 GB 6.7 GB medium, balanced quality loss
WoonaV1.2-9b-imat-Q4_K_M.gguf Q4_K_M [imatrix] 4 5.4 GB 6.9 GB medium, balanced quality - recommended
WoonaV1.2-9b-imat-Q5_K_S.gguf Q5_K_S [imatrix] 5 6 GB 7.6 GB large, low quality loss - recommended
WoonaV1.2-9b-imat-Q5_K_M.gguf Q5_K_M [imatrix] 5 6.2 GB 7.8 GB large, very low quality loss - recommended
WoonaV1.2-9b-Q6_K.gguf Q6_K [static] 6 7.1 GB 8.7 GB very large, near perfect quality - recommended
WoonaV1.2-9b-Q8_0.gguf Q8_0 [static] 8 9.2 GB 10.8 GB very large, extremely low quality loss

How to Use

  • llama.cpp The opensource framework for running GGUF LLM models on which all other interfaces are made.
  • koboldcpp Easy method for windows inference. Lightweight open source fork llama.cpp with a simple graphical interface and many additional features.
  • LM studio Proprietary free fork llama.cpp with a graphical interface.
Downloads last month
571
GGUF
Model size
9.24B params
Architecture
gemma2

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for secretmoon/WoonaV1.2-9b-GGUF-Imatrix

Base model

google/gemma-2-9b
Quantized
(5)
this model