Pollux-4B-Judge GGUF

This repository contains GGUF versions of ai-forever/Pollux-4B-Judge for local inference with llama.cpp, LM Studio, and other GGUF-compatible runtimes.

Pollux-4B-Judge is a Russian-oriented LLM-as-a-judge model based on Qwen3-4B. It is intended for evaluating model answers against a specific criterion and scoring rubric.

Files

File Type Quantized Notes
Pollux-4B-Judge.BF16.gguf BF16 GGUF conversion No High-precision reference version
Pollux-4B-Judge.Q8_0.gguf Q8_0 GGUF quantization Yes High-quality quantized version

Which file should I use?

Use Pollux-4B-Judge.BF16.gguf if you want the highest-quality reference version.

Use Pollux-4B-Judge.Q8_0.gguf if you want a practical local version with lower memory usage and minimal expected quality loss.

Recommended inference settings

For judge-style usage, the original model card uses:

Setting Value
Temperature 0.0
Max tokens 512

For local GGUF inference, choose a context length large enough to fit the full evaluation prompt: instruction, reference answer, evaluated answer, criterion, and rubric. A practical starting point is 8192, but this is a local runtime recommendation rather than an official value from the original model card.

The model is intended to evaluate one criterion per request.

Prompt format

Recommended prompt structure:

### ะ—ะฐะดะฐะฝะธะต ะดะปั ะพั†ะตะฝะบะธ:
{instruction}

### ะญั‚ะฐะปะพะฝะฝั‹ะน ะพั‚ะฒะตั‚:
{reference_answer}

### ะžั‚ะฒะตั‚ ะดะปั ะพั†ะตะฝะบะธ:
{answer}

### ะšั€ะธั‚ะตั€ะธะน ะพั†ะตะฝะบะธ:
{criterion}

### ะจะบะฐะปะฐ ะพั†ะตะฝะธะฒะฐะฝะธั ะฟะพ ะบั€ะธั‚ะตั€ะธัŽ:
{rubric}

Use with llama.cpp

BF16:

llama-server -hf ledgergap/Pollux-4B-Judge-GGUF:BF16 -c 8192 -ngl 99

Q8_0:

llama-server -hf ledgergap/Pollux-4B-Judge-GGUF:Q8_0 -c 8192 -ngl 99

Use with LM Studio

Open LM Studio and paste this repository URL into the model search/download field:

https://huggingface.co/ledgergap/Pollux-4B-Judge-GGUF

Then select either the BF16 or Q8_0 GGUF file.

Original model

Original model: ai-forever/Pollux-4B-Judge

Downloads last month
258
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ledgergap/Pollux-4B-Judge-GGUF

Finetuned
Qwen/Qwen3-4B
Quantized
(1)
this model