README.md · secretmoon/WoonaV1.2-9b-GGUF-Imatrix at 452d6c67f4266c92427f29d4e8f915e72463eb68

metadata

license: gemma
library_name: transformers
tags:
  - unsloth
  - sft
  - pony
  - MyLittlePony
  - Russian
  - Lora
base_model: AlexBefest/WoonaV1.2-9b
language:
  - ru
pipeline_tag: text-generation

About

GGUF imatrix quants of AlexBefest/WoonaV1.2-9b model. All quants, except of q6_k and q8_0 was maded with imatrix quantization method.

Prompt template: Gemma (RECOMMENDED TEMP=0.3-0.5)

<start_of_turn>user\n {prompt}<end_of_turn>

Provided files

Name	Quant method	Bits	Size	Min RAM required	Use case
WoonaV1.2-9b-imat-Q2_K.gguf	Q2_K [imatrix]	2	3.5 GB	5.1 GB	very, significant quality loss - not recommended, but usable (faster)
WoonaV1.2-9b-imat-IQ3_XXS.gguf	IQ3_XXS [imatrix]	3	3.5 GB	5.1 GB	small, high quality loss
WoonaV1.2-9b-imat-IQ3_M.gguf	IQ3_M [imatrix]	3	4.2 GB	5.7 GB	small, high quality loss
WoonaV1.2-9b-imat-IQ4_XS.gguf	Q4_XS [imatrix]	4	4.8 GB	6.3 GB	medium, substantial quality loss
WoonaV1.2-9b-imat-Q4_K_S.gguf	Q4_K_S [imatrix]	4	5.1 GB	6.7 GB	medium, balanced quality loss
WoonaV1.2-9b-imat-Q4_K_M.gguf	Q4_K_M [imatrix]	4	5.4 GB	6.9 GB	medium, balanced quality - recommended
WoonaV1.2-9b-imat-Q5_K_S.gguf	Q5_K_S [imatrix]	5	6 GB	7.6 GB	large, low quality loss - recommended
WoonaV1.2-9b-imat-Q5_K_M.gguf	Q5_K_M [imatrix]	5	6.2 GB	7.8 GB	large, very low quality loss - recommended
WoonaV1.2-9b-Q6_K.gguf	Q6_K [static]	6	7.1 GB	8.7 GB	very large, near perfect loss - recommended
WoonaV1.2-9b-Q8_0.gguf	Q8_0 [static]	8	9.2 GB	10.8 GB	very large, extremely low quality loss

How to Use

llama.cpp The opensource framework for running GGUF LLM models on which all other interfaces are made.
koboldcpp Easy method for windows inference. Lightweight open source fork llama.cpp with a simple graphical interface and many additional features.
LM studio Proprietary free fork llama.cpp with a graphical interface.