constructai/VibeThinker-3B-GGUF

This is a quantized version of the original VibeThinker-3B , converted to the GGUF format for efficient CPU/GPU inference with llama.cpp, Ollama, or any GGUF‑compatible runner.


Original Model


Available Quantizations

Choose the quantization that fits your needs:

Quantization File Size
UD-IQ1_S 791 MB
UD-IQ1_M 850 MB
UD-IQ2_XXS 948 MB
Q2_K 1.27 GB
UD-IQ2_M 1.14 GB
UD-Q2_K_XL 1.27 GB
UD-IQ3_XXS 1.28 GB
Q3_K_S 1.45 GB
UD-IQ3_S 1.46 GB
Q3_K_M 1.59 GB
UD-Q3_K_M 1.59 GB
UD-Q3_K_XL 1.71 GB
UD-IQ4_XS 1.74 GB
Q4_K_S 1.83 GB
UD-IQ4_NL 1.83 GB
Q4_K_M 1.93 GB
UD-Q4_K_XL 1.93 GB
Q5_K_S 2.17 GB
UD-Q5_K_S 2.17 GB
Q5_K_M 2.22 GB
UD-Q5_K_M 2.22 GB
UD-Q5_K_XL 2.22 GB
Q6_K 2.54 GB
UD-Q6_K 2.54 GB
UD-Q6_K_XL 2.54 GB
Q8_0 3.29 GB
UD-Q8_K_XL 3.29 GB
F16 6.18 GB

For a 3B‑parameter model, even the larger files are quite manageable. Here’s what I recommend: F16 (6.18 GB) or Q8_0 (3.29 GB).

The other quants are also usable!


Usage

With ollama

ollama run hf.co/constructai/VibeThinker-3B-GGUF:F16

With llama.cpp

llama-server -hf constructai/VibeThinker-3B-GGUF:VibeThinker-3B-GGUF-F16.gguf

or

llama-cli -hf constructai/VibeThinker-3B-GGUF:VibeThinker-3B-GGUF-F16.gguf

With LM Studio

lms get constructai/VibeThinker-3B-GGUF@F16

Downloads last month
1,666
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for constructai/VibeThinker-3B-GGUF

Base model

Qwen/Qwen2.5-3B
Quantized
(46)
this model