metadata
license: gemma
library_name: transformers
tags:
- unsloth
- sft
- pony
- MyLittlePony
- Russian
- Lora
base_model: AlexBefest/WoonaV1.2-9b
language:
- ru
pipeline_tag: text-generation
About
GGUF imatrix quants of AlexBefest/WoonaV1.2-9b model. All quants, except of q6_k and q8_0 was maded with imatrix quantization method.
Prompt template: Gemma (RECOMMENDED TEMP=0.3-0.5)
<start_of_turn>user\n {prompt}<end_of_turn>
Provided files
Name | Quant method | Bits | Size | Min RAM required | Use case |
---|---|---|---|---|---|
WoonaV1.2-9b-imat-Q2_K.gguf | Q2_K [imatrix] | 2 | 3.5 GB | 5.1 GB | very, significant quality loss - not recommended, but usable (faster) |
WoonaV1.2-9b-imat-IQ3_XXS.gguf | IQ3_XXS [imatrix] | 3 | 3.5 GB | 5.1 GB | small, high quality loss |
WoonaV1.2-9b-imat-IQ3_M.gguf | IQ3_M [imatrix] | 3 | 4.2 GB | 5.7 GB | small, high quality loss |
WoonaV1.2-9b-imat-IQ4_XS.gguf | Q4_XS [imatrix] | 4 | 4.8 GB | 6.3 GB | medium, substantial quality loss |
WoonaV1.2-9b-imat-Q4_K_S.gguf | Q4_K_S [imatrix] | 4 | 5.1 GB | 6.7 GB | medium, balanced quality loss |
WoonaV1.2-9b-imat-Q4_K_M.gguf | Q4_K_M [imatrix] | 4 | 5.4 GB | 6.9 GB | medium, balanced quality - recommended |
WoonaV1.2-9b-imat-Q5_K_S.gguf | Q5_K_S [imatrix] | 5 | 6 GB | 7.6 GB | large, low quality loss - recommended |
WoonaV1.2-9b-imat-Q5_K_M.gguf | Q5_K_M [imatrix] | 5 | 6.2 GB | 7.8 GB | large, very low quality loss - recommended |
WoonaV1.2-9b-Q6_K.gguf | Q6_K [static] | 6 | 7.1 GB | 8.7 GB | very large, near perfect loss - recommended |
WoonaV1.2-9b-Q8_0.gguf | Q8_0 [static] | 8 | 9.2 GB | 10.8 GB | very large, extremely low quality loss |
How to Use
- llama.cpp The opensource framework for running GGUF LLM models on which all other interfaces are made.
- koboldcpp Easy method for windows inference. Lightweight open source fork llama.cpp with a simple graphical interface and many additional features.
- LM studio Proprietary free fork llama.cpp with a graphical interface.