Qwen2.5 · 0.5B · Instruct

_{EXL3 · 4.5 bpw · 0.6 GB · Dense · 24 layers}

An ExLlamaV3 build of Qwen/Qwen2.5-0.5B-Instruct at 4.5 bits per weight: the quality-leaning sweet spot: comfortable on a single 24 GB consumer GPU, effectively indistinguishable from FP16 on most reasoning tasks. See Quants for sibling repos at other bit‑widths or browse the collection.

Quants

BPW	Head bits	Calibration rows	Size	Status
3.0	8	250	~15 GB	_queued
4.0	8	250	~19 GB	_queued
4.5	8	250	0.6 GB	`this repo`
5.0	8	250	0.6 GB	link
6.0	8	250	0.7 GB	link

Inference

Loader	Use it for
TabbyAPI	OpenAI‑compatible HTTP server. Drop‑in for OpenAI clients.
text‑generation‑webui	Local chat UI. Pick the ExLlamaV3 loader from the model dropdown.
ExLlamaV3	Direct Python API for embedding the model in your own code or pipeline.

VRAM at 4.5 bpw: weights on disk + ~2 GB context overhead. Comfortable on a single 24 GB card with room for ~16k tokens of context; fits a 16 GB card with a reduced context window.

Download

pip install -U huggingface_hub

hf download \
  blockblockblock/Qwen2.5-0.5B-Instruct-exl3-4.5bpw \
  --local-dir ./Qwen2.5-0.5B-Instruct-exl3-4.5bpw

Quantization recipe _{(advanced, embedded in quantization_config.json)}

Setting	Value
Format	`EXL3`
Bits per weight	`4.5`
Head bits	`8`
Calibration rows	`250`
Codebook	`MCG`
Out‑scales	`always`
Parallel mode	`enabled`

Loaded automatically by every ExLlamaV3 loader; reproduced here for searchability.

License & use

Use and license follow the base model. Quantization adds no additional restrictions. Refer to the upstream repository for terms, citation, and safety documentation.

_{Quantized with BlockQuant · convention {org}/{model}-exl3-{bpw}bpw}

Downloads last month: 118

Safetensors

Model size

0.3B params

Tensor type

BF16

F16

I16

Model tree for blockblockblock/Qwen2.5-0.5B-Instruct-exl3-4.5bpw

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Quantized

(220)

this model

Collection including blockblockblock/Qwen2.5-0.5B-Instruct-exl3-4.5bpw

Qwen2.5-0.5B-Instruct EXL3

Collection

EXL3 quants of Qwen2.5-0.5B-Instruct, produced by BlockQuant. • 3 items • Updated 9 days ago